CN112102449B

CN112102449B - Virtual character generation method, virtual character display device, virtual character display equipment and virtual character display medium

Info

Publication number: CN112102449B
Application number: CN202010965379.5A
Authority: CN
Inventors: 胡天舒; 姚锟; 马明明; 洪智滨
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-09-14
Filing date: 2020-09-14
Publication date: 2024-05-03
Anticipated expiration: 2040-09-14
Also published as: CN112102449A

Abstract

The application discloses a virtual character generation method, a virtual character display device, virtual character display equipment and a storage medium, and relates to the field of artificial intelligence, in particular to the field of computer vision and image processing. The virtual character generation method comprises the following steps: creating a first animated video and a second animated video of the virtual character, the virtual character appearing as a silence state in the first animated video, the virtual character appearing as performing a plurality of actions in the second animated video; dividing the second animation video into a plurality of action videos corresponding to the actions one by one, wherein the action videos are respectively associated with a plurality of voice instructions; generating a start transition frame and an end transition frame of the action video relative to a preset frame for each action video; and generating presentation data of the virtual character, wherein the presentation data comprises a first animation video, a plurality of action videos, association of the plurality of action videos and a plurality of voice instructions, and a start transition frame and an end transition frame of each action video relative to a preset frame.

Description

Virtual character generation method, virtual character display device, virtual character display equipment and virtual character display medium

Technical Field

The present application relates to the field of artificial intelligence, and more particularly, to the field of computer vision and image processing, and more particularly, to a method of generating a virtual character, a method of displaying a virtual character, a device for generating a virtual character, a device for displaying a virtual character, an electronic apparatus, and a computer-readable storage medium.

Background

In the large context of the information age, more and more entities are increasingly virtualized, while virtualization of people takes on an important role. The conversational virtual character with man-machine interaction capability is widely applied to various fields according to the actual demands of the current market. In the man-machine interaction process, the computer can drive the avatar of the avatar to change through the voice while generating the voice, so that the avatar shows an effect similar to a real person. However, most of the dialog time virtual characters with man-machine interaction capability can only change lip shape with voice, and cannot make various actions with voice.

Disclosure of Invention

Provided are a virtual character generation method, a virtual character display device, and a virtual character storage medium.

According to a first aspect, there is provided a method of generating a virtual character, comprising:

creating a first animated video and a second animated video of a virtual character, the virtual character appearing as a silence state in the first animated video, the virtual character appearing as performing a plurality of actions in the second animated video;

dividing the second animation video into a plurality of action videos corresponding to the actions one by one, wherein the action videos are respectively associated with a plurality of voice instructions;

generating a start transition frame and an end transition frame of the motion video relative to a preset frame of the first animation video for each motion video; and

Generating presentation data of the virtual character, wherein the presentation data comprises the first animation video, the plurality of action videos, association of the plurality of action videos and the plurality of voice instructions, and a start transition frame and an end transition frame of each action video relative to a preset frame.

According to a second aspect, there is provided a method of displaying a virtual character, comprising:

Acquiring display data of a virtual character, wherein the display data comprises a first animation video of the virtual character, a plurality of action videos of the virtual character, association of the plurality of action videos and a plurality of voice instructions, and a start transition frame and an end transition frame of each action video relative to a preset frame in the first animation video, wherein the plurality of voice instructions are in one-to-one correspondence with the plurality of voices; and

Playing the first animation video;

wherein in response to the occurrence of one of the plurality of voice instructions at a preset frame of the first animated video:

Determining an action video associated with the one voice instruction;

Sequentially playing the preset frame, a start transition frame of the action video associated with the voice command relative to the preset frame, an end transition frame of the action video associated with the voice command relative to the preset frame and the preset frame; and

And continuing to play the first animation video from the preset frame.

According to a third aspect, there is provided a virtual character generating apparatus including:

the video creation module is used for creating a first animation video and a second animation video of a virtual character, wherein the virtual character is displayed in a silence state in the first animation video, and the virtual character is displayed in the second animation video to execute a plurality of actions;

The video segmentation module is used for dividing the second animation video into a plurality of action videos corresponding to the actions one by one, and the action videos are respectively associated with a plurality of voice instructions;

the transition frame generation module is used for generating a start transition frame and an end transition frame of the action video relative to a preset frame of the first animation video aiming at each action video; and

And the display data generation module is used for generating display data of the virtual character, wherein the display data comprises the first animation video, the plurality of action videos, the association of the plurality of action videos and the plurality of voice instructions and a start transition frame and an end transition frame of each action video relative to a preset frame.

According to a fourth aspect, there is provided a virtual character display apparatus comprising:

the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring display data of a virtual character, wherein the display data comprises a first animation video of the virtual character, a plurality of action videos of the virtual character, association of the plurality of action videos and a plurality of voice instructions, and a start transition frame and an end transition frame of each action video relative to a preset frame in the first animation video, and the plurality of voice instructions are in one-to-one correspondence with a plurality of voices;

A play module for playing the first animated video, wherein in response to the occurrence of one of the plurality of voice commands at a preset frame of the first animated video: determining an action video associated with the one voice command, sequentially playing the preset frame, a start transition frame of the action video associated with the one voice command relative to the preset frame, an end transition frame of the action video associated with the one voice command relative to the preset frame and the preset frame, and continuing to play the first animation video from the preset frame.

According to a fifth aspect, there is provided an electronic device comprising:

At least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of generating a virtual character.

According to a sixth aspect, there is provided an electronic device comprising:

At least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the avatar presentation method described above.

According to a seventh aspect, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the above-described virtual character generation method.

According to an eighth aspect, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the above-described virtual character display method.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

According to a ninth aspect, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the above method.

Drawings

The drawings are included to provide a better understanding of the present application and are not to be construed as limiting the application. Wherein:

fig. 1 illustrates a flowchart of a method of generating a avatar according to an embodiment of the present application.

Fig. 2 illustrates a flowchart of one example of a method of creating a first animated video and a second animated video of a virtual character in accordance with an embodiment of the application.

FIG. 3 illustrates a flowchart of one example of generating a start transition frame and an end transition frame according to an embodiment of the present application.

Fig. 4 shows a flowchart of another example of generating a start transition frame and an end transition frame according to an embodiment of the application.

Fig. 5 shows a flowchart of a method of displaying a virtual character according to an embodiment of the present application.

Fig. 6 shows a schematic diagram of a first animated video and a second animated video of a virtual character according to an embodiment of the application.

Fig. 7 shows a schematic diagram of a method of dividing a second moving video into a plurality of motion videos according to an embodiment of the present application.

FIG. 8 illustrates a schematic diagram of one example method of generating a start transition frame and an end transition frame in accordance with an embodiment of the application.

Fig. 9 shows a schematic diagram of another example method of generating a start transition frame and an end transition frame according to an embodiment of the application.

Fig. 10 is a schematic diagram showing an example of a method of displaying a virtual character according to an embodiment of the present application.

Fig. 11 is a schematic diagram showing another example of a method of displaying a virtual character according to an embodiment of the present application.

Fig. 12 shows a schematic block diagram of a virtual character generating apparatus according to an embodiment of the present application.

Fig. 13 shows a schematic block diagram of a virtual character presentation apparatus according to an embodiment of the present application.

Fig. 14 shows a block diagram of an electronic device according to an embodiment of the application.

Detailed Description

Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 shows a flowchart of a method 100 of generating a avatar according to an embodiment of the present application.

In step S110, a first animated video and a second animated video of a virtual character are created. The avatar appears as a silence state in the first animated video and appears as performing a plurality of actions in the second animated video. In some embodiments, the first and second animated videos may each be 2D animated videos, which facilitates constructing presentation data of the avatar in a simple manner so that the avatar may respond more quickly to voice instructions during human-machine interaction.

In step S120, the second animated video is divided into a plurality of motion videos corresponding to the plurality of motions one by one, and the plurality of motion videos may be respectively associated with the plurality of voice commands. For example, in the second motion video, a frame at the start of each motion (also referred to as a start frame of the motion) may be marked as a start frame of the motion video of the motion, and a frame at the end of the motion (also referred to as an end frame of the motion) may be marked as an end frame of the motion video of the motion. The action videos can be extracted in a simple way by marking the action videos. Of course, the dividing manner of the motion video is not limited to this, and in some embodiments, the start frame and the end frame of each motion are respectively used as the start frame and the end frame of the motion video of the motion, and the second animation video is divided into a plurality of video files, where the plurality of video files are in one-to-one correspondence with the plurality of motion videos.

In step S130, for each motion video, a start transition frame and an end transition frame of the motion video with respect to a preset frame of the first animation video are generated. In some embodiments, the preset frames may be multiple, which allows for insertion of the desired action video at multiple different times during the playing of the first animated video, such that the avatar in the silence state may be triggered to take a corresponding action at multiple times. In some embodiments, the interval between any two adjacent preset frames may be equal. The moments which can trigger the virtual characters to act are evenly distributed in the first animation video, and the response efficiency of the virtual characters to voice instructions is improved.

In step S140, presentation data of the virtual character is generated, the presentation data including a first animated video, a plurality of action videos, an association of the plurality of action videos with a plurality of voice commands, and a start transition frame and an end transition frame of each action video with respect to a preset frame.

According to the embodiment of the application, the preset frames are arranged in the first animation video, the second animation video is divided into the plurality of action videos corresponding to the plurality of voice instructions one by one, and the transition frames corresponding to the preset frames are generated, so that the virtual character can have the capability of making corresponding actions according to the voice instructions generated at the preset frames, on one hand, the actions of the virtual character are not limited to lip changes, and are not limited to being performed in a preset sequence, and on the other hand, the actions of the virtual character and the silence state can be smoothly transitioned, and the display effect is improved.

In step S2101, a first video of a real person, in which the real person appears as a silence state, is continuously recorded for a first period of time.

In step S2102, a second video of the real person, in which the real person appears to perform a plurality of actions, is continuously recorded for a second period of time.

In step S2103, the first video is converted into an animated form to obtain a first animated video such that real characters in the first video are converted into virtual characters in the first animated video.

In step S2104, the second video is converted into an animated form to obtain a second animated video such that the real characters in the second video are converted into virtual characters in the second animated video.

In the process of recording the second video of the real person, the real person can return to the silence state after each action is finished, and then the next action is finished. In this way, the virtual character is caused to appear in the second animated video to sequentially perform a plurality of actions, wherein after one of the plurality of actions is performed, the silent state is restored, and another of the plurality of actions is performed starting from the silent state. Since the plurality of actions are subsequently divided into a plurality of videos independently and the plurality of videos are independently extracted and played during the presentation, there is no need to limit the execution order of the plurality of actions in creating the second animated video, and the real person can execute the plurality of actions in any order as desired.

The embodiment of the application realizes the recording effect of 'one mirror to the bottom' by continuously realizing the first video and the second video of the real person in the first time period and the second time period respectively, so that the first animation video and the second animation video are respectively continuous animations, and are not split into a plurality of fragments. This may reduce the difficulty of subsequent frame insertion operations. According to the embodiment of the application, the virtual character in the second video is restored to the silence state after one action of the actions is executed, and then the other action is executed, so that the starting part and the ending part of each action are basically consistent with the silence state of the first animation video, and the difficulty of subsequent frame inserting operation is further reduced.

FIG. 3 illustrates a flowchart of one example of generating a start transition frame and an end transition frame according to an embodiment of the present application. In the example of fig. 3, it is assumed that the above-described preset frame is an i-th frame, where i is an integer greater than or equal to 1. The process of generating the start transition frame and the end transition frame of the motion video with respect to the preset frame in the first animation video may include the following steps S3301 and S3302.

In step S3301, a frame is inserted between the i-th frame of the first moving image video and the start frame of the motion video to obtain a start transition frame of the motion video with respect to the i-th frame.

In step S3302, a frame is inserted between the end frame of the motion video and the i-th frame of the first moving picture video to obtain an end transition frame of the motion video with respect to the i-th frame.

According to the embodiment of the application, the initial transition frame and the end transition frame of the first animation video corresponding to the silence state are respectively generated for each action in the process of generating the virtual character, so that smooth transition between the action of the virtual character and the silence state can be realized in a simple manner in the subsequent display process, and the display effect is improved.

Fig. 4 shows a flowchart of another example of generating a start transition frame and an end transition frame according to an embodiment of the application. In the example of fig. 4, the preset frames include a start preset frame and an end preset frame, wherein the start preset frame is an i-th frame and the end preset frame is an i+k-th frame, wherein i is an integer greater than or equal to 1 and k is an integer greater than 1, and the process of generating the start transition frame and the end transition frame of the motion video with respect to the preset frames in the first animation video may include the following steps S4301 and S4302.

In step S4301, a frame is inserted between the i-th frame of the first animated video and the start frame of the action video to obtain a start transition frame of the action video with respect to the i-th frame.

In step S4302, a frame is inserted between the end frame of the motion video and the i+k frame of the first moving picture video to obtain an end transition frame of the motion video with respect to the i+k frame. In the case where the number of preset frames is plural, each preset frame includes a start preset frame and an end preset frame. The preset frames may be equally spaced, for example, the interval between the start preset frames of any two adjacent preset frames in the plurality of preset frames is equal.

According to the embodiment of the application, each preset frame comprises the initial preset frame and the final preset frame, so that each action starts from one frame in the silence state and then ends from the subsequent frame in the silence state in the subsequent display process, the virtual character does not need to return to the silence state completely consistent with that before the action is performed after the action is performed, and the display effect is more similar to that of a real character in a natural state.

Fig. 5 illustrates a flow chart of a method 500 of displaying a virtual character according to an embodiment of the application. The presentation method of fig. 5 is applicable to a avatar generated by the method of any of the embodiments described above.

In step S510, display data of the virtual character is obtained, where the display data includes a first animation video of the virtual character, a plurality of action videos of the virtual character, an association of the plurality of action videos with a plurality of voice commands, and a start transition frame and an end transition frame of each action video relative to a preset frame in the first animation video, where the plurality of voice commands are in one-to-one correspondence with the plurality of voices.

In step S520, a first animated video is played.

In step S530, it is determined whether one of the plurality of voice commands occurs at the preset frame of the first animated video, if so, step S540 is performed, otherwise, step S520 is returned.

Here, it should be noted that the occurrence of the voice command at the "preset frame" may not be limited to just during the playback of the preset frame or just when the preset frame comes, but may refer to a range of time before and after the preset frame. For example, assuming that the preset frame is the i-th frame, the occurrence of a voice command at the "preset frame" may mean that a voice command occurs in the range of the i-5 th frame to the i+5 th frame. If a voice instruction occurs within the range, step S540 is performed.

In step S540, a motion video associated with a voice command is determined.

In step S550, the preset frame, the start transition frame of the motion video associated with one voice command relative to the preset frame, the motion video associated with one voice command, the end transition frame of the motion video associated with one voice command relative to the preset frame, and the preset frame are sequentially played, and then step S520 is returned to continue playing the first animation video from the preset frame.

In this step, in the case that the preset frame is the i frame, the i frame of the first animation video, the start transition frame of the action video associated with one voice command with respect to the i frame, the action video associated with one voice command, the end transition frame of the action video associated with one voice command with respect to the i frame, and the i frame of the first animation video may be sequentially played, and then step S520 is returned to continue playing the first animation video, i.e., the i+1th frame, the i+2th frame … …, from the i frame.

In this step, in the case where the preset frames include a start preset frame and an end preset frame, and the start preset frame is the ith frame and the end preset frame is the ith+k frame, the ith frame of the first animation video, the start transition frame of the action video associated with one voice command with respect to the ith frame, the action video associated with one voice command, the end transition frame of the action video associated with one voice command with respect to the ith+k frame of the first animation video, and the ith+k frame of the first animation video may be sequentially played, and then step S520 is returned to continue playing the first animation video from the ith+k frame of the first animation video, i.e., the ith+k+1 frame, the ith+k+2 frame … ….

According to the embodiment of the application, the initial transition frame and the end transition frame of the action video are respectively played before and after the action video, so that smooth transition between the action and the silence state of the virtual character can be realized, and the display effect is improved.

In the case that there are a plurality of preset frames, the step S530 may determine whether one of the plurality of voice commands occurs at any one of the plurality of preset frames of the first animated video, if so, step S540 is performed, otherwise, step S530 is returned to continue the determination.

In some embodiments, during execution of step S550 described above, a voice associated with a voice command may also be played such that the avatar of the avatar makes an action associated with the voice while the voice is played, such as an action of the avatar making a hand-up while the voice "hello" is played, and an action of the avatar making a nod while the voice "thank" is played. In this way, the virtual character can make the action consistent with the content while speaking, so that the display effect of the virtual character is more vivid and visual.

An example of a virtual character generating method according to an embodiment of the present application will be described below with reference to fig. 6 to 9.

Fig. 6 shows a schematic diagram of a first animated video and a second animated video of a virtual character according to an embodiment of the application. As shown in fig. 6, the first animated video includes N frames F ₁,F₂,…,F_N and the second animated video includes M frames F' ₁,F'₂,...,F'_M. The first animated video and the second animated video may both be 2D animated videos. The avatar appears in the first animated video as a silent state, such as a silent standing state, in which the avatar can behave similarly to a real avatar while standing at silence, such as blinking, slight shaking, etc. The virtual character appears in the second animated video to perform a plurality of actions, such as nodding, inviting hands, bending over, spreading hands, and the like.

As shown in fig. 7, the second animated video includes M frames F ' ₁,F'₂,...,F'_M, assuming that the avatar performs a nodding motion in frames F ' ₁₀ to F ' ₁₀₀ and performs a nodding motion in frames F ' ₂₀₀ to F ' ₂₅₀, the frame F ' ₁₀ at the beginning of the nodding motion may be marked as a start frame of the motion video Act1 of the nodding motion, the frame F ' ₁₀₀ at the end of the nodding motion may be marked as an end frame of the motion video Act1 of the nodding motion, and similarly the frame F ' ₂₀₀ at the beginning of the nodding motion may be marked as a start frame of the motion video Act2 of the nodding motion, and the frame F ' ₂₅₀ at the end of the nodding motion may be marked as an end frame of the motion video Act2 of the nodding motion. The start or end frames may be marked manually or by a computer through action recognition. In this way, the second moving image video is divided into a plurality of action videos Act1 and Act2.

Each action video may be associated with a respective voice command. For example, as shown in table 1 below, an action video Act1 of a hand-in action may be associated with a voice instruction A1 for voice "hello", and a nodding action Act2 may be associated with a voice instruction A1 for voice "thank.

TABLE 1

Voice instruction	Motion video
		A1 (hello)	Act1 (poster)
A2 (thank you)	Act2 (nodding head)

Here, the voice command may be a command for making a virtual character speak, or may be a voice from a user. For example, when the user speaks "hello" into the avatar, the computer recognizes the user behavior and generates an instruction to play the voice "hello" that causes the audio player to play the voice "hello" causing the avatar to appear to also speak "hello". The play instruction of the computer aiming at the voice 'hello' can be used as a voice instruction A1 of the voice 'hello', and the voice 'hello' from the user can also be used as a voice instruction A1 aiming at the voice 'hello'.

FIG. 8 illustrates a schematic diagram of one example method of generating a start transition frame and an end transition frame in accordance with an embodiment of the application. As shown in a dark frame in fig. 8, an i-th frame F _i of the first moving picture video may be taken as a preset frame.

For frame F _i, a start transition frame and an end transition frame of motion video Act1 relative to frame F _i are generated. For example, as shown in fig. 8, a frame inserting operation is performed between a start frame (e.g., F' ₁₀ in fig. 7 described above) of the motion video Act1 and the frame F _i to generate at least one inserted frame F _{i_11} and F _{i_12}, and the inserted frames F _{i_11} and F _{i_12} may be regarded as a start transition frame (hereinafter collectively denoted as F _{i_1}) of the motion video Act1 with respect to the frame F _i. The interpolation operation may be based on one or more frames from the beginning to the end, for example, the interpolation frames F _{i_11} and F _{i_12} may be generated based on F _i in the first animated video and the first frame of action video Act1 (F ' ₁₀), and may be generated based on F _i-1 and F _i in the animated video and the first two frames of action video (F ' ₁₀ and F ' ₁₁). In a similar manner, an interpolation operation is performed between the end frame of the motion video Act1 (e.g., F' ₁₀₀ in fig. 7 described above) and the frame F _i to generate at least one interpolation frame F _{1_}i₁ and F _{1_1i}, and the interpolation frames F _{1_i1} and F _{1_1i} may be regarded as end transition frames (hereinafter collectively denoted as F _{1_i}) of the motion video Act1 with respect to the frame F _i.

For frame F _i, a start transition frame and an end transition frame of action video Act2 relative to frame F _i are also generated. For example, as shown in fig. 8, a frame inserting operation is performed between a start frame (e.g., F' ₂₀₀ in fig. 7 described above) of the motion video Act2 and the frame F _i to generate at least one of the inserted frames F _{i_11} and F _{i_12} as a start transition frame (hereinafter collectively denoted as F _{i_2}) of the motion video Act2 with respect to the frame F _i; a frame insertion operation is performed between the end frame of action video Act2 (e.g., F' ₂₅₀ in fig. 7 described above) and frame F _i to produce at least one insertion frame F _{2_i1} and F _{2_1i} as an end transition frame (hereinafter collectively denoted as F _{1_i}) of action video Act2 relative to frame F _i.

In some embodiments, a plurality of preset frames may be set in the first animated video, for example one preset frame per 10 frames, i.e., the preset frames may include F _i,F_i+10,F_i+20. For F _i+10, as shown in fig. 8, a start transition frame F _{(i+10)_1} and an end transition frame F _{1_(i+10)} of action video Act1 with respect to frame F _i, and a start transition frame F _{(i+10)_2} and an end transition frame F _{2_(i+10)} of action video Act2 with respect to frame F _i+10, and so on, may be generated in a similar manner.

Fig. 9 shows a schematic diagram of another example method of generating a start transition frame and an end transition frame according to an embodiment of the application. Fig. 9 is similar to fig. 8, except that each preset frame of the first animated video may include a start preset frame and an end preset frame, for example, frames F _i and F _i+1 are the start preset frame and the end preset frame of the first preset frame, and frames F _i+10 and F _i+11 are the start preset frame and the end preset frame of the second preset frame.

For frames F _i and F _i+1, a start transition frame F _{i_1} of action video Act1 with respect to frame F _i, and an end transition frame F _{1_(i+1)} of action video Act1 with respect to frame F _i+1 may be generated. In a similar manner, a start transition frame F _{i_2} of action video Act2 with respect to frame F _i and an end transition frame F _{2_(i+1)} with respect to frame F _i+1 are generated.

For frames F _i+10 and F _i+11, a start transition frame F _{(i+10)_1} of action video Act1 with respect to frame F _i+10, and an end transition frame F _{1_(i+11)} of action video Act1 with respect to frame F _i+11 may be generated. In a similar manner, a start transition frame F _{(i+10)_2} of action video Act2 with respect to frame F _i+10 and an end transition frame F _{2_(i+11)} with respect to frame F _i+11 are generated.

The first moving image video (as shown in any one of fig. 6 to 9), the action videos Act1 and Act2 and their association with the voice instructions A1 and A2 (as shown in fig. 7, for example), and the start transition frame and the end transition frame (as shown in fig. 8 or 9) of each action video Act1 and Act2 with respect to the preset frame generated in the above manner may be stored to obtain the presentation data of the virtual character. The presentation of the virtual character may be implemented based on the presentation data.

In the embodiment of fig. 9, the next frame of the start preset frame is taken as the end preset frame, for example, the start preset frame and the end preset frame of the first preset frame are F _i and F _i+1, respectively, and the start preset frame and the end preset frame of the second preset frame are F _i+1o and F _i+11, respectively. However, the embodiment of the present application is not limited thereto, and the interval k between the start preset frame and the end preset frame may be set as needed.

An example of a virtual character generating method according to an embodiment of the present application will be described below with reference to fig. 10 and 11.

Fig. 10 is a schematic diagram showing an example of a method of displaying a virtual character according to an embodiment of the present application. In fig. 10, a method of displaying a avatar is described in conjunction with the display data of the avatar described above with reference to fig. 8.

Fig. 10 shows an example of a presentation method of a virtual character according to an embodiment of the present application. In fig. 10, a presentation method of a virtual character is described in connection with the presentation data types described above with reference to fig. 8.

The first animated video is played beginning at frame F ₁.

As shown by the arrow in fig. 10, when playing the i-th frame F _i, for example, because the user speaks "hello" to the avatar, the computer generates a voice command A1 for playing the voice "hello", thereby triggering the computer to cause the avatar to make a hand-engaging action. Then, according to the voice command A1, the action video Act1 of the hand-engaged action associated with the voice command A1 is determined by looking up the lookup table shown in fig. 7. A start transition frame F _{i_1} and an end transition frame F _{1_i} of the action video Act1 with respect to the frame F _i of the first animation video are acquired. Then, the frame F _i of the first moving picture video, the start transition frame F _{i_1}, the action video Act1, the end transition frame F _{1_i}, and the frame F _i of the first moving picture video are sequentially played. In some embodiments, during the sequential playing of F _i、f_{i_1}、Act1、f_{1_i} and F _i, for example during the playing of action video Act1, the voice "hello" associated with voice instruction A1 may be played. In this way, the following display effect can be achieved: the user speaks "hello" to the avatar in the silence state, and the avatar starts a hand-in action from the silence state while answering "hello" and then returns to the silence state. Next, the frame F _i+1,F_i+2 continues to play, waiting for the next voice instruction that can trigger the avatar action.

As shown by the arrow in fig. 10, when played to frame F _i+10, the virtual character is triggered to make a nodding action, for example, by the user commend acting on the virtual person causing the computer to generate a voice command A2 for playing the voice "thank. Then, according to the voice command A2, the action video Act2 of the nodding action associated with the voice command A2 is determined by looking up the lookup table as shown in fig. 7. A start transition frame F _{(i+10)_2} and an end transition frame F _{2_(i+10)} of the action video Act2 with respect to the frame F _i+10 of the first animation video are acquired. Then, the frame F _i+10 of the first moving picture video, the start transition frame F _{(i+10)_2}, the action video Act2, the end transition frame F _{2_(i+10)}, and the frame F _i+10 of the first moving picture video are sequentially played. During this time, for example during the playing of action video Act2, a voice "thank you" associated with voice command A2 may be played. In this way, the following display effect can be achieved: the user commend has a avatar in a silence state, and the avatar starts a nodding motion from the silence state while answering "thank you" and then returns to the silence state. Next, play frame F _i+11,F_i+12 continues, wait for the next voice command that can trigger the avatar action until the first animated video is played.

The first animated video may be played in a loop, and when played again, the first animated video waits for a voice command that may trigger an action in the same manner as described above, and controls the avatar of the avatar to make a corresponding action in response to the voice command.

Fig. 11 is a schematic diagram showing another example of a method of displaying a virtual character according to an embodiment of the present application. In fig. 11, a description is given of a method of displaying a avatar in combination with the display data of the avatar described above with reference to fig. 9. The method of fig. 11 is similar to the method of fig. 10, except that each preset frame includes two frames, one as a start preset frame and the other as an end preset frame, each action having a start transition frame relative to the start preset frame and an end transition frame relative to the end preset frame.

In playing the first animated video, as shown by the arrow in fig. 11, if the computer generates a voice command A1 for playing the voice "hello" when playing to the frame F _i, an action video Act1 of the hand-in action associated with the voice command A1 is determined. The motion video Act1 is acquired with respect to a start transition frame F _{i_1} of frame F _i and with respect to an end transition frame F _{1_(i+1)} of frame F _i+1. Then, the frame F _i of the first moving picture video, the start transition frame F _{i_1}, the action video Act1, the end transition frame F _{1_(i+1)}, and the frame F _i+1 of the first moving picture video are sequentially played, and the voice "hello" is played during this period. Through the process, the user can say 'hello' to the avatar, and the avatar starts to make a recruitment action from the silence state while answering 'hello' and then returns to the silence state. Next, the frame F _i+1,F_i+2 continues to play, waiting for the next voice instruction that can trigger the avatar action.

In a similar manner, when frame F _i+10 is played, in response to voice instruction A2 for voice "thank you", action video Act2 of the nodding action associated with voice instruction A2 and its start transition frame F _{(i+10)_2} and end transition frame F _{2_(i+11)} are determined. Then, F _i+10、f_{(i+10)_2}、Act2、f_{2_(i+10)} and F _i+11 are played in sequence, and the voice "thank you" is played during this time. The effect of making a nodding motion while answering "thank you" can also be achieved by this process because the user commend has made the avatar. Next, the frame F _i+11,F_i+12 continues to play, waiting for the next voice instruction that can trigger the avatar action.

The so-called play to a certain frame in the above-described embodiment is not limited to just at the beginning of the frame or during the play of the frame, but may be extended to a range of several frames before and after the frame. For example, the occurrence of a voice command when playing to frame F _i may be referred to as: in the play period of the frames F _i-5 to F _i+5, if a voice instruction occurs, it may be determined that a voice instruction occurs when playing to the frame F _i.

Although specific voice instructions and specific actions are described above as examples for the generation and presentation of virtual characters in embodiments of the present application, embodiments of the present application are not limited thereto. The number and type of voice commands and their corresponding actions may be set as desired, as the application is not limited in this regard.

The virtual character in the embodiment of the application is suitable for various application scenes, including but not limited to explanation by replacing service personnel in public places such as hospitals, subways, markets and the like, answering player questions in games or guiding game progress and the like. Taking a subway as an example, assume that an electronic device in the form of, for example, a billboard or a robot is provided at a subway entrance, which may include a memory, a processor, a display, an audio player, an audio collector, and the like. In the standby state, the avatar in the silent state is displayed on the display of the electronic device. When the user speaks "hello" to the avatar, the audio sensor on the electronic device receives the voice, and the processor of the electronic device controls the avatar to make a nodding action on the display while also controlling the audio player of the electronic device to play the voice "hello". Through this process, the interaction between the avatar and the user is completed. Of course, the content and manner of interaction is not limited thereto, e.g., the user may ask a route to the avatar, and the avatar may make a gesture to indicate the direction of the route while answering the route.

Fig. 12 shows a schematic block diagram of a virtual character generating apparatus according to an embodiment of the present application. As shown in fig. 12, the avatar generation apparatus 1200 includes a video creation module 1210, a video segmentation module 1220, a transition frame generation module 1230, and a presentation data generation module 1240.

The video creation module 1210 is configured to create a first animated video and a second animated video of a virtual character, the virtual character appearing as a silence state in the first animated video, and the virtual character appearing as performing a plurality of actions in the second animated video.

The video segmentation module 1220 is configured to divide the second animated video into a plurality of motion videos corresponding to the plurality of motions one to one, where the plurality of motion videos are respectively associated with a plurality of voice commands;

the transition frame generation module 1230 is configured to generate, for each motion video, a start transition frame and an end transition frame of the motion video with respect to a preset frame of the first animation video.

The presentation data generating module 1240 is configured to generate presentation data of the virtual character, where the presentation data includes the first animated video, the plurality of action videos, an association of the plurality of action videos with the plurality of voice commands, and a start transition frame and an end transition frame of each action video with respect to a preset frame.

Fig. 13 shows a schematic block diagram of a virtual character presentation apparatus according to an embodiment of the present application. As shown in fig. 13, the avatar display device 1300 includes an acquisition module 1310 and a play module 1320.

The obtaining module 1310 is configured to obtain presentation data of a virtual character, where the presentation data includes a first animation video of the virtual character, a plurality of action videos of the virtual character, an association of the plurality of action videos with a plurality of voice commands, and a start transition frame and an end transition frame of each action video relative to a preset frame in the first animation video, where the plurality of voice commands are in one-to-one correspondence with the plurality of voices.

The play module 1320 is configured to play the first animated video, wherein in response to the occurrence of one of the plurality of voice commands at a preset frame of the first animated video: determining an action video associated with the one voice command, sequentially playing the preset frame, a start transition frame of the action video associated with the one voice command relative to the preset frame, an end transition frame of the action video associated with the one voice command relative to the preset frame and the preset frame, and continuing to play the first animation video from the preset frame.

According to embodiments of the present application, the present application also provides an electronic device, a readable storage medium and a computer program product. The virtual character generating method and the virtual character displaying method in the embodiments described above may be executed by the same electronic device, or may be executed by a plurality of electronic devices, respectively. The virtual character generating method and the virtual character displaying method in the above embodiments may be stored in the same readable storage medium, or may be stored in a plurality of readable storage media, respectively. The computer program product comprises a computer program which, when executed by a processor, can implement the method of any of the embodiments described above.

As shown in fig. 14, a block diagram of an electronic device 1400 in accordance with an embodiment of the present application. Electronic device 1400 is intended to represent various forms of digital computers, such as laptops, desktops, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic device 1400 may also represent various forms of mobile apparatuses, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing apparatuses. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 14, the electronic device 1400 includes: one or more processors 1401, memory 1402, and interfaces for connecting the components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 1401 is illustrated in fig. 14.

Memory 1402 is a non-transitory computer-readable storage medium provided by the present application. In some embodiments, the memory 1402 stores instructions executable by the at least one processor 1401 to cause the at least one processor 1401 to perform at least one of the virtual character generating method and the presentation method provided by the present application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute at least one of the virtual character generation method and the presentation method provided by the present application.

The memory 1402 serves as a non-transitory computer readable storage medium, and may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to a method of generating a virtual character in an embodiment of the present application (e.g., the video creation module 1210, the video segmentation module 1220, the transition frame generation module 1230, and the presentation data generation module 1240 shown in fig. 12), and/or program instructions/modules corresponding to a method of presenting a virtual character in an embodiment of the present application (e.g., the acquisition module 1310 and the play module 1320 shown in fig. 13). The processor 1401 executes various functional applications of the server and data processing, that is, at least one of the generation method and the presentation method of the virtual character in the above-described method embodiment, by executing a non-transitory software program, instructions, and modules stored in the memory 1402.

Memory 1402 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created by use of the electronic device according to an embodiment of the present application, and the like. Further, memory 1402 can include high-speed random access memory, and can also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 1402 optionally includes memory located remotely from processor 1401 which may be connected to the electronic device of embodiments of the present application via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device 1400 of the embodiment of the present application may further include: an input device 1403 and an output device 1404. The processor 1401, memory 1402, input device 1403, and output device 1404 may be connected by a bus or otherwise, for example in fig. 14.

Input device 1403 may receive input numeric or character information and generate key signal inputs related to user settings and function control of electronic device 1400, such as input devices for a touch screen, a keypad, a mouse, a trackpad, a touchpad, a joystick, one or more mouse buttons, a trackball, a joystick, and the like. The output devices 1404 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the preset frames are arranged in the first animation video, the second animation video is divided into the plurality of action videos corresponding to the plurality of voice instructions one by one, and the transition frames of the action videos relative to the preset frames are generated, so that the virtual character can have the capability of making corresponding actions according to the voice instructions generated at the preset frames, on one hand, the actions of the virtual character are not limited to lip changes, and are not limited to being performed in a preset sequence, and on the other hand, the actions and the silence states of the virtual character can be smoothly transitioned, and the display effect is improved.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed embodiments are achieved, and are not limited herein.

The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims

1. A method of generating a virtual character, comprising:

2. The method of claim 1, wherein the preset frame comprises an i-th frame, wherein i is an integer greater than or equal to 1, and generating a start transition frame and an end transition frame of the action video relative to the preset frame in the first animated video comprises:

Inserting frames between an ith frame of the first animation video and a starting frame of the action video to obtain a starting transition frame of the action video relative to the ith frame;

and inserting frames between the end frame of the action video and the ith frame of the first animation video to obtain an end transition frame of the action video relative to the ith frame.

3. The method of claim 1, wherein the preset frames comprise a start preset frame and an end preset frame, wherein the start preset frame is an i-th frame and the end preset frame is an i+k-th frame, wherein i is an integer greater than or equal to 1 and k is an integer greater than 1, the generating the start transition frame and the end transition frame of the action video relative to the preset frames in the first animation video comprising:

And inserting frames between the end frame of the action video and the (i+k) th frame of the first animation video to obtain an end transition frame of the action video relative to the (i+k) th frame.

4. The method of claim 1, wherein the preset frame has a plurality of frames.

5. The method of claim 4, wherein a spacing between any two adjacent preset frames of the plurality of preset frames is equal.

6. The method of claim 1, wherein dividing the second animated video into a plurality of action videos that correspond one-to-one to the plurality of actions comprises: for each of the plurality of actions,

And marking the starting frame of the action as the starting frame of the action video of the action in the second animation video, and marking the ending frame of the action as the ending frame of the action video of the action.

7. The method of any of claims 1-6, wherein the creating the first and second animated videos of the virtual character comprises:

Continuously recording a first video of a real person in a first time period, wherein the real person is in a silence state in the first video;

continuously recording a second video of a real person in a second time period, the real person appearing to perform the plurality of actions in the second video;

Converting the first video into an animation form to obtain the first animation video, so that real characters in the first video are converted into virtual characters in the first animation video;

The second video is converted into an animated form to obtain the second animated video such that real characters in the second video are converted into virtual characters in the second animated video.

8. The method of any of claims 1-6, wherein the avatar appears in the second animated video to sequentially perform the plurality of actions, wherein after one of the plurality of actions is performed, a mute state is restored, from which another one of the plurality of actions is performed.

9. The method of any of claims 1-6, wherein the first and second animated videos are both 2D animated videos.

10. A method of displaying a virtual character, comprising:

Playing the first animation video;

Determining an action video associated with the one voice instruction;

And continuing to play the first animation video from the preset frame.

11. The method of claim 10, wherein the preset frame comprises an i-th frame, wherein i is an integer greater than or equal to 1, playing the preset frame, the start transition frame of the action video associated with the one voice command relative to the preset frame, the action video associated with the one voice command, the end transition frame of the action video associated with the one voice command relative to the preset frame, and the preset frame in sequence comprises:

playing an ith frame of the first animation video;

Playing a start transition frame of the action video associated with the voice command relative to an ith frame;

playing an action video associated with the one voice command;

playing an end transition frame of the motion video associated with the one voice instruction relative to the i-th frame; and

And playing the ith frame of the first animation video.

12. The method of claim 10, wherein the preset frames comprise a start preset frame and an end preset frame, the start preset frame is an i-th frame and the end preset frame is an i+k-th frame, wherein i is an integer greater than or equal to 1 and k is an integer greater than 1, playing the preset frame, the start transition frame of the action video associated with the one voice instruction relative to the preset frame, the action video associated with the one voice instruction, the end transition frame of the action video associated with the one voice instruction relative to the preset frame, and the preset frame sequentially comprises:

playing an ith frame of the first animation video;

playing an action video associated with the one voice command;

playing an ending transition frame of an action video associated with the one voice instruction relative to an i+k frame of the first animated video; and

Playing the i+k frame of the first animated video,

Wherein continuing to play the first animated video from the preset frame comprises continuing to play the first animated video from an i+k frame of the first animated video.

13. The method of any of claims 10 to 12, wherein the speech associated with the one speech instruction is also played during the sequential playing of the preset frame, the start transition frame of the action video associated with the one speech instruction relative to the preset frame, the end transition frame of the action video associated with the one speech instruction relative to the preset frame, and the preset frame.

14. The method of any of claims 10-12, wherein the preset frames are a plurality, wherein the determining the action video associated with the one voice instruction is performed in response to the one voice instruction of the plurality of voice instructions occurring at any one of the plurality of preset frames of the first animated video.

15. A virtual character generating apparatus comprising:

16. A virtual character display device, comprising:

17. An electronic device, comprising:

At least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 10.

18. An electronic device, comprising:

At least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 11 to 14.

19. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1 to 10.

20. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 11 to 14.