CN108989705B

CN108989705B - Video production method and device of virtual image and terminal

Info

Publication number: CN108989705B
Application number: CN201811013341.7A
Authority: CN
Inventors: 傅宇韬
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-08-31
Filing date: 2018-08-31
Publication date: 2020-05-22
Anticipated expiration: 2038-08-31
Also published as: CN108989705A

Abstract

The invention provides a video production method, a device and a terminal of an avatar, wherein the method comprises the following steps: selecting a corresponding video type according to the virtual image designated by the user; acquiring video playing content corresponding to the video type; after adding at least one label in video playing content, generating a video script; and generating the video of the virtual image according to the video script. The cost of making the video is low and the time invested is less. In the manufacturing process, other functional personnel are not needed to cooperate, and the single-person completion degree is high. The virtual image can be designed independently, video playing contents can be designed, expressions, actions and the like required by a user are added, and the independent design of video production of the virtual image is improved.

Description

Video production method and device of virtual image and terminal

Technical Field

The invention relates to the technical field of computers, in particular to a method, a device and a terminal for making a virtual image video.

Background

Statistically, there are as many as 3.5 million people who often view the secondary avatar. In addition to the more popular platforms such as the beepli and AcFun (animation Comic Fun), two-dimensional short video content is also in explosive growth. At present, the creation modes of short video contents of a quadratic element avatar class mainly include three types: the first type is manufactured according to the image and the content of the existing well-known cartoon company; the second type is a two-dimensional virtual image and content originally created by a group, and is popularized on an individual platform; the third category is that according to the existing two-dimensional image, the user person carries out the second creation through some simpler operations to complete the short video content under a certain specific situation.

In the three types of modes for making the short video of the two-dimensional virtual image, the first type and the second type are all made by the traditional animation making mode, the technical threshold is higher, the creation time is long, and the expansibility for making the content is poor. The cost for a user to learn video production is high, each short video production process is similar to the traditional animation production mode, a person with related technologies needs to design and adjust for a long time, a short video production platform similar to a tremble cannot be formed, and a threshold-free creation process is performed according to an image. The user needs to learn video production, a large amount of manpower investment is needed, the investment of a single user can not be completed, the division and cooperation of functional personnel and various tests are needed, and the participation degree difficulty of conventional users is high. The manufactured quadratic element video is completely customized, and the design of any image, any text content and various action contents cannot be finished. The image cannot avoid the intervention of manual review due to excessive violence, pornography or terrorism, but a user needs a relatively pure short video production application platform. The third type is realized by two manufacturing modes: firstly, a user personally utilizes video production software to carry out video clipping and effect production on original content of the user, however, the production effect of the videos is poor although the videos are novel in conception; secondly, the recording is carried out by simulating the role playing of the secondary element by a real person, and the manufacturing method does not belong to the category of the secondary element virtual image and belongs to the effect of artificial simulation.

Disclosure of Invention

The embodiment of the invention provides a video production method, a video production device and a video production terminal of an avatar, and at least solves the technical problems in the prior art.

In a first aspect, an embodiment of the present invention provides a method for making a video of an avatar, including:

selecting a corresponding video type according to the virtual image designated by the user;

acquiring video playing content corresponding to the video type;

after at least one label is added to the video playing content, a video script is generated;

and generating the video of the virtual image according to the video script.

With reference to the first aspect, in a first implementation manner of the first aspect, the generating a video script after adding at least one tag to the video playing content includes:

calibrating an adding position in the video playing content;

and inserting the label into the adding position to generate a video script, wherein the label comprises an action label and an expression label.

With reference to the first implementation manner of the first aspect, in a second implementation manner of the first aspect, after the generating the video script, the method further includes:

judging whether to select the corresponding avatar dubbing according to the avatar and the video type;

and if so, synthesizing the video voice corresponding to the video playing content according to the selected virtual image dubbing.

With reference to the second embodiment of the first aspect, the present invention, in a third embodiment of the first aspect,

after synthesizing the video voice corresponding to the video playing content according to the selected avatar dubbing, the method further comprises:

generating corresponding mouth shape information according to the video playing content;

analyzing the expression labels and the action labels in the video script to generate expression parameters and action parameters;

fusing the mouth shape information, the expression parameters and the action parameters to generate an animation file;

calling animation materials corresponding to the animation files in a material library by using a fusion algorithm to generate virtual image animations;

and generating the video of the virtual image according to the virtual image animation and the video voice.

With reference to the first aspect, in a fourth implementation manner of the first aspect, the selecting a corresponding video type according to an avatar specified by a user includes:

acquiring the virtual image specified by a user;

and judging whether the virtual image meets preset image conditions or not, and if so, selecting the corresponding video type.

In a second aspect, the present invention provides an avatar video production apparatus, comprising:

the video type selection module is used for selecting a corresponding video type according to the virtual image designated by the user;

the playing content acquisition module is used for acquiring video playing content corresponding to the video type;

the video script generation module is used for generating a video script after adding at least one label in the video playing content;

and the virtual image video generation module is used for controlling the virtual image to execute the content corresponding to the label according to the video script so as to generate the video of the virtual image.

With reference to the second aspect, in a first implementation manner of the second aspect, the video script generation module includes:

the position calibration unit is used for calibrating an adding position in the video playing content;

and the label adding unit is used for inserting the label into the adding position and generating a video script, wherein the label comprises an action label and an expression label.

With reference to the second aspect, the present invention provides, in a second embodiment of the second aspect, the apparatus further comprising:

the dubbing selection module is used for judging whether to select the dubbing of the corresponding avatar according to the avatar and the video type;

and the voice synthesis module is used for synthesizing the video voice corresponding to the video playing content according to the selected avatar dubbing.

With reference to the second aspect, the present invention provides, in a third embodiment of the second aspect, the apparatus further comprising:

the mouth shape information generating module is used for generating corresponding mouth shape information according to the video playing content;

the script analysis module is used for analyzing the expression labels and the action labels in the video script to generate expression parameters and action parameters;

the mouth shape information fusion module is used for fusing the mouth shape information, the situation parameters and the action parameters to generate an animation file;

and the animation generation module is used for calling animation materials corresponding to the animation files in a material library by utilizing a fusion algorithm to generate virtual image animation.

With reference to the third implementation manner of the second aspect, in a fourth implementation manner of the present invention, the avatar video generation module is further configured to generate a video of the avatar according to the avatar animation and the video speech.

With reference to the second aspect, in a fifth implementation manner of the second aspect, the video type selection module includes:

an avatar acquisition unit for acquiring the avatar specified by a user;

the virtual image auditing unit is used for judging whether the virtual image meets the preset image condition or not;

and the video type selection unit is used for selecting the corresponding video type.

In a third aspect, an embodiment of the present invention provides a video production terminal of an avatar, including a processor and a memory, the memory being used for storing a program for a video production terminal device supporting the avatar to execute the video production method of the avatar in the first aspect, and the processor being configured to execute the program stored in the memory. The terminal may also include a communication interface for the terminal to communicate with other devices or a communication network.

The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions.

In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium for computer software instructions for a video production apparatus for an avatar, which includes a program for executing the video production method for an avatar in the first aspect described above to the video production apparatus for the avatar.

One of the above technical solutions has the following advantages or beneficial effects: in the scheme, diversified requirements of the user on the virtual image can be met by appointing the pre-stored virtual image or appointing and uploading the favorite virtual image of the user. And the user selects the video type according to the self requirement and obtains the video playing content corresponding to the selected video type. Different video playing contents can enable users to carry out different video script designs. After at least one label, for example, a label of an expression, an action, etc., is added to the video playing content, a video script is generated. And generating the video of the virtual image according to the video script. The video production method of the virtual image provided by the scheme has the advantages that the cost for producing the video by the user is lower, and the input time is less. In the manufacturing process, other functional personnel are not needed to cooperate, and the single-person completion degree is high. The virtual image can be designed independently, video playing contents can be designed, expressions, actions and the like required by a user are added, and the independent design of video production of the virtual image is improved.

The foregoing summary is provided for the purpose of description only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features of the present invention will be readily apparent by reference to the drawings and following detailed description.

Drawings

In the drawings, like reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily to scale. It is appreciated that these drawings depict only some embodiments in accordance with the disclosure and are therefore not to be considered limiting of its scope.

Fig. 1 is a flowchart of a video production method of an avatar according to an embodiment of the present invention;

FIG. 2 is a schematic view of a video production method of an avatar according to an embodiment of the present invention;

fig. 3 is a schematic diagram illustrating a method for synthesizing a video by a video script according to an embodiment of the present invention;

FIG. 4 is a schematic view of an apparatus for creating an avatar according to an embodiment of the present invention;

FIG. 5 is a flow chart of an apparatus for creating an avatar according to an embodiment of the present invention;

FIG. 6 is a flow chart of an apparatus for creating an avatar according to an embodiment of the present invention;

fig. 7 is a schematic view of a video production terminal of an avatar according to an embodiment of the present invention.

Detailed Description

In the following, only certain exemplary embodiments are briefly described. As those skilled in the art will recognize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.

Example one

In a specific embodiment, as shown in fig. 1 and 2, there is provided a video production method of an avatar, including:

step S100: and selecting the corresponding video type according to the virtual image specified by the user.

First, an avatar specified by a user is acquired. The avatar specified by the user may include a variety of types. For example a pre-stored avatar. The pre-stored avatars include two different types of avatars, 2D and 3D, that are available for selection by the user. As another example, the avatar uploaded by the user. The user clicks a button for uploading a designated avatar to support uploading a picture containing the avatar in a particular format and resolution, in which the avatar should be the positive avatar of the upper body. The diversified demands of the user on the virtual image are met by acquiring the pre-stored virtual image appointed by the user or acquiring the virtual image appointed by the user and uploaded.

And after the virtual image is obtained, judging whether the virtual image meets the preset image condition. For example, whether the uploaded avatar picture meets the format requirement of the picture, whether the avatar is a two-dimensional avatar, whether the avatar is complete, whether the avatar relates to a yellow storm, and the like. And if the preset image conditions are met, selecting the corresponding video type. If not, the uploading is not allowed.

Step S200: and acquiring video playing content corresponding to the video type.

The video types may include a song category, a dance category, and a speech category, among others. Different video types correspond to different video playing contents. For example, the video playing content corresponding to the song category may include various song libraries and music compositions; the video playing content corresponding to the dance category can comprise various dance works; the video playing content corresponding to the speaking category may include a speaking text edited by the user as required, or a pre-stored speaking text selected by the user. The user can set different video scripts for different video playing contents.

Step S300: and after at least one label is added in the video playing content, generating a video script.

The selected video playing content can be in a text form, so that a label can be added to the video playing content conveniently. The labels may include emoji labels, action labels, background pictures, and various props, etc. And adding each label to the video playing content according to the requirements of the user to form a video script. The format of the video script may be a text format.

For example, after a song is selected, it may be paused at any time while the music is playing, placing the tag in the paused position. Then continuing to play the song, adding the action label, the expression label and the prop into the song, replacing the background and the like. In addition, the tab book can be clicked to cancel the added tab. After the tags are adjusted in the songs, video scripts of the song class are formed.

For another example, after a dance work is selected, since some preset actions are included in the dance work, the preset actions also exist in the form of tags, and the dance work is convenient for a user to modify. The user can add, delete and change the labels of the preset actions, so that the contents such as expression actions and the like are richer. More action labels, expression labels and the like can be added in the dance video playing content to form the dance video script after adjustment.

For another example, after selecting a pre-stored speaking text or editing the speaking text (the content in the pre-stored speaking text and the edited speaking text has a certain word number and duration limitation), adding an expression tag, an action tag and the like into the speaking text. For example, corresponding emoticons and action tags are added before and after a certain text. And dragging and adjusting each label among the characters to form a speaking video script.

Step S400: and generating the video of the virtual image according to the video script.

And calling the expression animation corresponding to the expression label in the expression material library and calling the action animation corresponding to the action label in the action material library while running the video playing content. Because the dance video has the song or the music background, the formed action animation is matched with the original song or the music background to form the dance video. The singing video and the speaking video need to be combined with the corresponding voice to form the video.

And the video of the virtual image can be optimized and adjusted. Besides adding labels related to the virtual images, some functions commonly used in video editing, such as cutting, merging, playing speed adjustment, illumination adjustment, color adjustment and the like, can be added to perform final optimization of the video. After the video production and optimization of the virtual image are completed through the production platform, uploading and sharing can be performed on a video website of the virtual image, so that a user can pay attention to, enjoy, collect and comment.

The video production method of the virtual image provided by the embodiment enables the cost of producing the video by the user to be lower and the invested time to be less. In the manufacturing process, other functional personnel are not needed to cooperate, and the single-person completion degree is high. The virtual image can be designed independently, video playing contents can be designed, expressions, actions and the like required by a user are added, and the independent design of video production of the virtual image is improved.

In the above embodiment, after adding at least one tag to the video playing content, generating a video script includes:

marking an adding position in video playing content;

and inserting a label into the adding position to generate the video script, wherein the label comprises an action label and an expression label.

The adding position is marked in the video playing content, for example, the front and back positions of "Hi jiahao" can be designated as the adding position for adding the emoji label and the action label. The video playing content before adjustment may be "Hi great family … …". After selecting labels such as smile, mouth-expanding and right-hand-waving, the labels are dragged to be before or after Hi, and a video script [ B smile + mouth-expanding ] (B right-hand-waving } Hi [ E smile + mouth-expanding ] good at home { E right-hand-waving } "is obtained. Wherein "smile" and "open mouth" are emoji labels and "swing right hand" is an action label. The beginning and end are marked with letters B and E, respectively, and B and E occur simultaneously. Of course, other types of marks may be used, and are within the scope of the present embodiment.

It should be noted that if the same emoji label or action label appears twice before and after it is automatically matched, the default is that the first one represents the start of the action or expression and the second one represents the end of the action or expression.

Of course, the above is only a specific implementation of generating a video script, and other types of video script generation methods may also be included, all of which are within the scope of the present embodiment.

On the basis of the above embodiment, after generating the video script, the method further includes:

if yes, synthesizing the video voice corresponding to the video playing content according to the selected virtual image dubbing.

After the video script is finished, the sound of the speaker in the video can be selected according to the virtual image, and different sound types are matched with different virtual images. For example, dozens of different sounds may be provided, covering both the male and female, the old and the young, the types of sounds including sweet, steady, high and low, etc. In the selection process, the user can select the sound by listening to the sample sound. The video types of the selected virtual image dubbing comprise song type videos and speaking type videos, and after the video types are selected, the selection range of the song type videos to the sound of a speaker can be properly reduced because each dubbing is not suitable for singing. And the talking video normally selects the virtual image dubbing.

It is noted that not all video types may select avatar dubbing, for example, dance-type video.

On the basis of the above embodiment, after synthesizing the video speech of the video playing content according to the selected avatar, the method further includes:

For singing videos and speaking videos, videos of virtual images cannot be directly obtained only by generating video scripts, and mouth shapes of people need to be fused, so that the virtual images are more vivid. Therefore, as shown in fig. 3, the specific process of fusing the mouth shape information to obtain the video of the avatar includes: firstly, analyzing the produced video script to obtain video playing content and animation mark language. And defining the action labels describing the body limb movement of the virtual image and the expression labels of the virtual image in the video script as animation mark language. And secondly, analyzing the motion expression of the animation mark language to obtain motion parameters and expression parameters. Wherein the action parameter is a parameter for controlling the avatar to perform an action, and the expression parameter is a parameter for controlling the avatar to perform an expression. And generating corresponding mouth shape information according to the video playing content, and fusing the mouth shape information with the action parameters and the expression parameters to generate an animation file. Then, the animation material corresponding to the animation file is called in the material library by utilizing a fusion algorithm to generate the virtual image animation. The animation material comprises actions, expressions and the like. And finally, generating the video of the virtual image according to the virtual image animation and the video voice.

In order to better link the actions and expressions of the virtual image and more accurately express the actions and expressions, the action labels can be bound with the body characteristics of the virtual image correspondingly, the expression labels can be bound with the facial characteristics of the virtual image correspondingly, the body characteristics are controlled to execute the actions according to the video script, and the facial characteristics are controlled to execute the expressions.

Example two

In a specific embodiment, as shown in fig. 4, there is provided a video production apparatus of an avatar, including:

a video type selection module 10, for selecting the corresponding video type according to the virtual image designated by the user;

a playing content obtaining module 20, configured to obtain video playing content corresponding to a video type;

the video script generating module 30 is configured to add at least one tag to video playing content to generate a video script;

and the virtual image video generation module 40 is used for generating the video of the virtual image according to the video script.

On the basis of the above embodiment, the video script generation module 30 includes:

the position calibration unit is used for calibrating the adding position in the video playing content;

and the label adding unit is used for inserting a label into the adding position and generating the video script, wherein the label comprises an action label and an expression label.

In addition to the above embodiments, as shown in fig. 5, the video creation apparatus for an avatar further includes:

a dubbing selecting module 50 for judging whether to select the dubbing of the corresponding avatar according to the avatar and the video type;

and a voice synthesis module 60, configured to synthesize a video voice corresponding to the video playing content according to the selected avatar dubbing.

In addition to the above embodiments, as shown in fig. 6, the video creation apparatus for an avatar further includes:

a mouth shape information generating module 70, configured to generate corresponding mouth shape information according to the video playing content;

the script analysis module 80 is configured to analyze the expression tags and the action tags in the video script to generate expression parameters and action parameters;

the mouth shape information fusion module 90 is used for fusing the mouth shape information, the expression parameters and the action parameters to generate an animation file;

and the animation generation module 100 is configured to call an animation material corresponding to the animation file in the material library by using a fusion algorithm, so as to generate the virtual image animation.

On the basis of the above embodiment, the video type selection module 10 includes:

an avatar acquisition unit for acquiring an avatar designated by a user;

the virtual image auditing unit is used for judging whether the virtual image meets the preset image conditions or not;

EXAMPLE III

An embodiment of the present invention provides a video production terminal of an avatar, as shown in fig. 7, including:

a memory 400 and a processor 500, the memory 400 having stored therein a computer program operable on the processor 500. The processor 500 implements the video production method of the avatar in the above-described embodiment when executing the computer program. The number of the memory 400 and the processor 500 may be one or more.

A communication interface 600 for the memory 400 and the processor 500 to communicate with the outside.

Memory 400 may comprise high-speed RAM memory and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

If the memory 400, the processor 500, and the communication interface 600 are implemented independently, the memory 400, the processor 500, and the communication interface 600 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus.

Optionally, in a specific implementation, if the memory 400, the processor 500, and the communication interface 600 are integrated on a single chip, the memory 400, the processor 500, and the communication interface 600 may complete communication with each other through an internal interface.

Example four

A computer-readable storage medium storing a computer program which, when executed by a processor, implements a video production method of an avatar according to any one of embodiments included herein.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.

The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various changes or substitutions within the technical scope of the present invention, and these should be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method for video production of an avatar, comprising:

acquiring video playing content corresponding to the video type;

generating a video of the avatar according to the video script; wherein the content of the first and second substances,

after adding at least one label in the video playing content, generating a video script, including:

calibrating an adding position in the video playing content;

inserting the label into the adding position to generate a video script, wherein the label comprises an action label and an expression label; after the video script is generated, the method further comprises the following steps:

if yes, synthesizing video voice corresponding to the video playing content according to the selected virtual image dubbing;

2. The method of claim 1, wherein selecting the corresponding video type according to the avatar specified by the user comprises:

acquiring the virtual image specified by a user;

3. An apparatus for producing a video of an avatar, comprising:

the virtual image video generation module is used for generating a video of the virtual image according to the video script; wherein the content of the first and second substances,

the video script generation module comprises:

the label adding unit is used for inserting the label into the adding position and generating a video script, wherein the label comprises an action label and an expression label; the device further comprises:

the voice synthesis module is used for synthesizing video voice corresponding to the video playing content according to the selected avatar dubbing;

4. The apparatus of claim 3, wherein said avatar video generation module is further for generating a video of said avatar based on said avatar animation and said video speech.

5. The apparatus of claim 3, wherein the video type selection module comprises:

an avatar acquisition unit for acquiring the avatar specified by a user;

6. A video production terminal for an avatar, comprising:

one or more processors;

a memory for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-2.

7. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-2.