WO2023231568A1 - 视频编辑方法、装置、计算机设备、存储介质及产品 - Google Patents

视频编辑方法、装置、计算机设备、存储介质及产品 Download PDF

Info

Publication number
WO2023231568A1
WO2023231568A1 PCT/CN2023/086471 CN2023086471W WO2023231568A1 WO 2023231568 A1 WO2023231568 A1 WO 2023231568A1 CN 2023086471 W CN2023086471 W CN 2023086471W WO 2023231568 A1 WO2023231568 A1 WO 2023231568A1
Authority
WO
WIPO (PCT)
Prior art keywords
character
text
video
storyboard
interface
Prior art date
Application number
PCT/CN2023/086471
Other languages
English (en)
French (fr)
Inventor
苏鹏程
王文帅
谢梅华
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2023231568A1 publication Critical patent/WO2023231568A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04845Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range for image manipulation, e.g. dragging, rotation, expansion or change of colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • G06F3/04883Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures for inputting data by handwriting, e.g. gesture or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8106Monomedia components thereof involving special audio data, e.g. different tracks for different languages
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video

Definitions

  • the present application relates to the field of computer technology, and specifically to a video editing method, a video editing device, a computer device, a computer-readable storage medium and a computer program product.
  • Embodiments of the present application provide a video editing method, device, equipment and computer-readable storage medium, which can conveniently generate animated videos, thereby improving the efficiency of video editing.
  • embodiments of the present application provide a video editing method, which is executed by a computer device.
  • the method includes:
  • the animated video includes the target character, and set audio lines corresponding to the text for the target character in the animated video;
  • embodiments of the present application provide a video editing device, which includes:
  • a display unit used to display the video editing interface
  • a processing unit configured to determine the target character and the input text in the video editing interface, wherein the text is presented in the form of lines of text in the video editing interface, and the video editing interface supports editing of the text Edit the text in the line;
  • the processing unit is also configured to generate an animated video, the animated video includes the target character, and set audio lines corresponding to the text for the target character in the animated video;
  • inventions of the present application provide a computer device.
  • the computer device includes a memory and a processor.
  • the memory stores a computer program.
  • the computer program When executed by the processor, it causes the processor to execute the above video editing method.
  • inventions of the present application provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the computer program When the computer program is read and executed by a processor of a computer device, it causes the computer device to execute the above video. Edit method.
  • inventions of the present application provide a computer program product or computer program.
  • the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the above video editing method.
  • Figure 1 is a schematic structural diagram of a video editing system provided by an embodiment of the present application.
  • Figure 2 is a schematic flowchart of a video editing method provided by an embodiment of the present application.
  • Figure 3a is a schematic diagram of a scene displaying a video editing interface provided by an embodiment of the present application.
  • Figure 3b is a schematic diagram of an interface for determining a target role provided by an embodiment of the present application.
  • Figure 3c is a schematic diagram of another interface for determining a target role provided by an embodiment of the present application.
  • Figure 3d is a schematic diagram of another interface for determining a target role provided by an embodiment of the present application.
  • Figure 3e is a schematic diagram of an interface for editing historical videos provided by an embodiment of the present application.
  • Figure 3f is a schematic diagram of an interface for setting a background provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of an interface for setting gesture data provided by an embodiment of the present application.
  • Figure 5a is a schematic diagram of a video editing interface provided by an embodiment of the present application.
  • Figure 5b is a schematic diagram of an interface for playing animated videos provided in the embodiment of the application.
  • Figure 5c is a schematic diagram of another interface for playing animated videos provided by an embodiment of the present application.
  • Figure 5d is a schematic diagram of a script interface for setting output order provided by an embodiment of the present application.
  • Figure 6a is a schematic diagram of an interface for storyboard sequencing provided by an embodiment of the present application.
  • Figure 6b is a schematic diagram of an interface for storyboard editing provided by an embodiment of the present application.
  • Figure 6c is a schematic diagram of a dynamically modified interface provided by an embodiment of the present application.
  • Figure 7a is a schematic diagram of an interface for role switching provided by an embodiment of the present application.
  • Figure 7b is a schematic diagram of a role management interface provided by an embodiment of the present application.
  • Figure 8a is a schematic diagram of an interface for exporting animated videos provided by an embodiment of the present application.
  • Figure 8b is a schematic diagram of an interface for sharing animated videos provided by an embodiment of the present application.
  • Figure 9a is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • Figure 9b is a schematic flow chart of one-way data driving provided by the embodiment of the present application.
  • Figure 9c is a schematic structural diagram of core data provided by an embodiment of the present application.
  • Figure 10a is a schematic flow chart of an operation case provided by the embodiment of the present application.
  • Figure 10b is a schematic flow chart of another operation case provided by the embodiment of the present application.
  • Figure 10c is a schematic flow chart of another operation case provided by the embodiment of the present application.
  • Figure 10d is a schematic flow chart of another operation case provided by the embodiment of the present application.
  • Figure 10e is a schematic flow chart of another operation case provided by the embodiment of the present application.
  • Figure 10f is a schematic flow chart of another operation case provided by the embodiment of the present application.
  • Figure 10g is a schematic flow chart of another operation case provided by the embodiment of the present application.
  • Figure 11a is a schematic flow chart of a text editing operation provided by an embodiment of the present application.
  • Figure 11b is a schematic flowchart of another text editing operation provided by an embodiment of the present application.
  • Figure 11c is a schematic flow chart of another text editing operation provided by an embodiment of the present application.
  • Figure 12 is a technical architecture diagram of a script display area provided by an embodiment of the present application.
  • Figure 13a is a schematic flow chart of timeline editing provided by an embodiment of the present application.
  • Figure 13b is a schematic flow chart of another timeline editing provided by an embodiment of the present application.
  • Figure 14a is a schematic diagram of the architecture of material management provided by an embodiment of the present application.
  • Figure 14b is a schematic structural diagram of a material business model provided by an embodiment of the present application.
  • Figure 14c is an abstract schematic diagram of a material business model provided by an embodiment of the present application.
  • Figure 15 is a schematic structural diagram of a video editing device provided by an embodiment of the present application.
  • Figure 16 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • This application mainly involves using text-to-speech (TTS) technology included in natural language processing (NLP) technology to convert a terminal device (running a client, which can be a video client, for example)
  • NLP natural language processing
  • the collected line text is converted into speech information, so that when the edited animation video is played, the line text can be played by voice
  • Corresponding voice information is available for users to watch and listen to. That is to say, this application can realize the conversion between text information and voice information through natural language processing technology.
  • the embodiment of the present application provides a video editing solution that can display a video editing interface; then, the target character and input text can be determined in the video editing interface, where the input text is presented in the form of lines of text in the video editing interface, It also supports editing the text in the text line; next, you can generate an animated video that contains the target character, and the target character will output the audio lines corresponding to the text in the animated video. Among them, during the playback of the animated video, when the scene containing the target character is played, the audio of the lines corresponding to the text is played simultaneously. It can be seen that in this application, by selecting a character and inputting a text, an animated video can be generated.
  • the input text is presented in the video editing interface in the form of text lines, and the text can be edited in the form of a document.
  • the operation is simple and convenient; and it can automatically establish an association between the character and the text, so that when the target character is displayed, the target character can read the line audio corresponding to the text.
  • This application can automatically associate characters with lines, thereby improving the efficiency of video editing.
  • FIG. 1 is a schematic structural diagram of a video editing system provided by an embodiment of the present application.
  • the video editing system may include at least a terminal device 1001 and a server 1002.
  • the terminal device 1001 in the video editing system shown in Figure 1 may include but is not limited to smartphones, tablets, laptops, desktop computers, smart speakers, smart TVs, smart watches, vehicle-mounted terminals, smart wearable devices, etc. , are often equipped with a display device.
  • the display device can be a monitor, a display screen, a touch screen, etc.
  • the touch screen can be a touch screen, a touch panel, etc.
  • the server 1002 in the video editing system shown in Figure 1 can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or it can provide cloud services, cloud databases, cloud computing, Cloud servers for basic cloud computing services such as cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms.
  • cloud services such as cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms.
  • the terminal device 1001 runs a client, such as a video client, a browser client, an information flow client, a game client, etc.
  • a video client is taken as an example for description.
  • the terminal device 1001 can display a user interface (UI) interface to the user in the video client.
  • UI user interface
  • the UI interface is a Flutter interface.
  • Flutter is a mobile UI interface framework that can quickly build high-quality UI on the operating system. native user interface.
  • the Flutter page can be a video editing interface, for example, and can be used to display animated videos.
  • the server 1002 may be used to provide the terminal device 1001 with video materials required in the video editing process (such as character identification, background, facial gestures (expressions), body movements, and other information).
  • the terminal device 1001 can display a video editing interface.
  • the target character may be determined by triggering the character addition entry set in the video editing interface, or may be any character selected from the historical videos displayed in the video editing interface.
  • the terminal device 1001 may send a role information acquisition request to the server 1002.
  • the server 1002 responds to the role information acquisition request sent by the terminal device 1001, obtains the configuration information of the target role (such as the target role's identification, name, etc.), and sends the configuration information of the target role to the terminal device 1001.
  • the configuration information of the target role such as the target role's identification, name, etc.
  • the terminal device 1001 receives the configuration information of the target character (such as the target character's identification, name, etc.) sent from the server 1002, and displays the target character in the video editing interface.
  • the configuration information of the target character such as the target character's identification, name, etc.
  • the terminal device 1001 may send a data conversion request to the server 1002, where the data conversion request is used to convert the text into audio of lines.
  • the server 1002 In response to the data conversion request sent by the terminal device 1001, the server 1002 converts the text into corresponding line audio through TTS technology.
  • the line audio can be, for example, mp3 (MPEG-1 AudioLayer-3, a high-performance sound compression encoding). format) file, and sends the line audio (such as mp3 file) corresponding to the text to the terminal device 1001.
  • the terminal device 1001 can generate an animation video.
  • the animated video can also be played in the video editing interface.
  • the mp3 file can be loaded to drive the presentation of the target character.
  • the audio of the lines corresponding to the text is played synchronously, that is, the target character is controlled to read the audio of the lines corresponding to the text synchronously.
  • FIG. 2 is a schematic flow chart of a video editing method provided by an embodiment of the present application.
  • the video editing method can be executed by a computer device, and the computer device can be the terminal device 1001 in the video editing system shown in FIG. 1 .
  • the video editing method may include the following steps S201 to S203:
  • the video editing interface may be an interface running in a video client for editing videos.
  • the video editing interface can be an interface for creating a new video; for another example, the video editing interface can also be an interface for modifying and updating historical videos.
  • the video editing interface may be shown as interface S10 shown in Figure 2 .
  • the video editing interface S10 may include multiple functional items, such as the scene adding function item 101, the text adding function item 102, the storyboard adding function item 103, and so on.
  • the scene adding function item 101 is used to add a background image
  • the text adding function item 102 is used to select a character and add corresponding text to the character
  • the storyboard adding function item 103 is used to add a new storyboard, where, At least one storyboard can constitute an animated video.
  • the video editing interface S10 may also include a character adding portal through which a new character can be added.
  • the video editing interface can be entered through the creation homepage of the video client.
  • Figure 3a is a schematic diagram of a scene displaying the video editing interface provided by an embodiment of the present application.
  • interface S301 is the creation homepage of the video client.
  • the creation homepage S301 is provided with a video editing portal 3011.
  • the video editing interface S302 will be displayed.
  • S202 Determine the target character and the input text in the video editing interface, where the text is presented in the form of text lines in the video editing interface, and the video editing interface supports editing the text in the text lines.
  • a target character can be selected in the video editing interface S20.
  • the target character can be 201, and then the target character 201 can be entered in the video editing interface S20.
  • the so-called text line refers to arranging the input text in the form of one line.
  • the input text includes N characters, and N is a positive integer. Then, these N characters are arranged in sequence from left to right, thus forming A text line, for example, the text input by the user in the video editing interface S20 is "Are you awake? Porter", so that the corresponding text line is presented in the video editing interface S30 (as shown in 310).
  • a text line can also be set to have a maximum number of characters. If the input text is 20 characters and the maximum number of characters corresponding to a text line is 10 characters, then the entered text will be in the video Displayed as two lines of text in the editing interface.
  • the text input in the video editing interface is presented in the form of text lines. If you need to edit the text (such as adding, deleting, modifying, etc.), you can also directly enter the corresponding text displayed in the video editing interface. The corresponding editing operation is performed in the text line.
  • This text interaction method is simple and convenient to operate.
  • the target character is added to the video editing interface.
  • determining the target character in the video editing interface may include: when a character addition event is detected, triggering the addition of the target character in the video editing interface.
  • the role adding event is generated by triggering the role adding entrance; or the role adding event is generated after detecting the role adding gesture.
  • the role adding gesture includes: single click gesture, double click gesture, hover gesture, and preset gesture. any kind.
  • the character adding entrance is set in the video editing interface.
  • triggering the addition of a target character in the video editing interface may include: in response to the character addition portal being triggered, outputting a character selection panel, with at least one character identifier to be selected displayed in the character selection panel; in response to For the selection operation of the target role identification, the target role corresponding to the target role identification is displayed in the video editing interface.
  • the entrance for adding a character can be a first-level entrance or a second-level entrance.
  • the so-called primary portal refers to the portal that can be directly displayed in the video editing interface.
  • the so-called secondary portal refers to the portal that is not directly displayed in the video editing interface, that is, the portal needs to be triggered by triggering other portals or interfaces before it can be displayed.
  • Figure 3b is a schematic diagram of an interface for determining a target role provided by an embodiment of the present application.
  • the video editing interface S302 is provided with a role adding portal 3021.
  • a role selection panel can be output.
  • the role selection panel can be a separate interface independent of the video editing interface.
  • the role selection panel It can also be in the same interface as the video editing interface.
  • the role selection panel may be a window 3031 in the interface S303. Then, the role corresponding to any role identification (for example, 3032) may be selected as the target role in the role selection panel 3031.
  • Figure 3c is a schematic diagram of another interface for determining a target role provided by an embodiment of the present application.
  • the video editing interface S302 is provided with an operation area 3022.
  • the user can draw character adding gestures in the operation area 3022. For example, drawing an "S" shaped gesture can trigger the output of the character selection panel 3031.
  • the target character is selected from historical videos.
  • the video editing interface displays multiple historical videos to be edited, and any historical video includes at least one character. Determine the target role in the video editing interface, including: selecting any role from historical videos to determine as the target role.
  • the video editing interface S301 includes a historical video display area 3012.
  • the historical video display area 3012 displays multiple historical videos to be edited (such as work 1, work 2, and work 3). If you select work 1 Afterwards, at least one character (for example, character 1, character 2, character 3) included in the work 1 can be displayed in interface S305. For example, character 1 displayed in interface S305 may be selected as the target character.
  • the video editing interface includes a preview area. During the process of selecting the target character, the selected character is displayed in the preview area each time, and the characters displayed in the preview area are replaced with the switching of the selection operation; when the target character is selected, it is displayed in the preview area Target role. As shown in Figure 3c, the preview area 3033 included in the video editing interface can display the character selected each time for the user to preview.
  • Figure 3e is a schematic diagram of an interface for editing historical videos provided by an embodiment of the present application.
  • multiple historical videos (work 1, work 2, and work 3) are displayed in the video editing interface S301.
  • the menu bar for work 1 can be output, such as in the video editing interface S307.
  • the menu bar 3071 displays multiple function items such as copying function items, renaming function items, and deleting function items.
  • the copy function item can be used to copy work 1
  • the rename function item can be used to change the name of work 1
  • the delete function item can be used to delete work 1.
  • Figure 3f is a schematic diagram of an interface for setting a background provided by an embodiment of the present application.
  • the scene selection panel can be output, as shown at 3091 in the video editing page S309.
  • the scene selection panel 3091 displays at least one scene screen to be recommended for the user to freely select.
  • the scene selection panel 3091 may also include different types of scene images, such as solid color type, indoor type, outdoor type, etc.
  • the materials corresponding to the multiple character identifiers displayed on the character selection panel and the materials corresponding to the multiple scene images displayed on the scene selection panel mentioned above may be provided by a third-party platform.
  • a third-party platform Through the open panel design of materials, more third-party design developers will be able to participate in material creation in the future, so that there will be an inexhaustible variety of scenes and characters for video creators to use.
  • FIG 4 is a schematic diagram of an interface for setting gesture data provided by an embodiment of the present application.
  • an attribute editing area 4011 for the target character is displayed in the video editing interface S401.
  • the attribute editing area 4011 may include at least one attribute editing item, such as a text editing item 4012, an expression editing item 4013, and an action. Edit item 4014.
  • the text editing item 4012 can be clicked, and then the keyboard will pop up, and the user can input text through the keyboard.
  • the input text can be: "Are you happy today?" Then, you can also click the expression editing item 4013 to display the expression candidate column 4021.
  • Each expression can be used to control the target character to present a corresponding facial gesture in the animated video, such as a facial expression.
  • Gestures can include: happy, sad, crying, laughing, etc.
  • you can also click the action editing item 4014 to display the action candidate column 4031.
  • the expression candidate column 4021 displays multiple actions, and each action can be used to control the target character to present a corresponding expression in the animated video.
  • Body movements such as body movements may include: lying down, waving, spinning, jumping, etc.
  • the target characters included in the animated video may refer to people, animals, objects, etc.
  • the types of the target characters may include but are not limited to: cartoons, animations, real people, etc.
  • the target character After selecting a target character, you can enter the corresponding text for the target character. At the same time, you can set posture data for the target character through the expression function item and action function item to control the target character in the animated video.
  • the posture presented may include any one or more of facial postures and body movements.
  • the target character also shows rich facial gestures and body movements in the process of expressing text content, improving the interest and expressiveness of the animated video.
  • S203 Generate an animated video.
  • the animated video contains the target character, and audio lines corresponding to the text are set for the target character in the animated video.
  • the text is played synchronously. Corresponding audio lines.
  • the generated animation video can be displayed in the video editing interface S30.
  • the line audio corresponding to the text is set for the target character in the animated video, so that during the playback of the animated video, when the scene containing the target character is played, the line audio is output for the target character, that is, the lines corresponding to the text are played simultaneously.
  • Audio the visual effect achieved is that the target character is reading the text corresponding to the audio of the line.
  • the animated video may include at least one character.
  • the specific process of inputting text and setting the corresponding posture data can refer to the process set for the target character in the above step S202. , the embodiments of this application will not be described again here.
  • the target character is also set with posture data.
  • the video editing interface includes a preview area and a script display area.
  • the preview area can be used to display the target character
  • the script display area can be used to display text and posture data.
  • Figure 5a is a schematic diagram of a video editing interface provided by an embodiment of the present application.
  • the video editing interface S501 may include a preview area 5011 and a script display area 5012.
  • the target character can be displayed in the preview area 5011, and the script display area 5012 can be used to display text and gesture data set for the target character.
  • the script display area 5012 may display: [Open eyes] (gesture data) Are you awake? Porter(text).
  • the animation video is played, and when the scene containing the target character is played, the line audio is played. Specifically, during the playback of the line audio, the text is highlighted in the screen containing the target character.
  • highlighting includes any one or more of the following: enlarging font display, changing font color display, and displaying according to preset fonts.
  • FIG. 5b is a schematic diagram of an interface for playing animated videos provided in an embodiment of the application.
  • the video editing interface S502 is also provided with a playback control 5023. Click the playback control 5023 to play the animated video in the video editing interface S503.
  • the video editing interface S503 may also be provided with a full-screen control 5031. If the user clicks the full-screen control 5031, the animation video can be switched to full-screen playback in the video editing interface, as shown in interface S504.
  • the so-called full-screen playback means that the content of the animated video is displayed in the video editing interface in full screen. In this way, the maximum preview effect can be achieved, making it convenient for the creator to check the creative details of all animated videos.
  • Figure 5c is a schematic diagram of another interface for playing animated videos provided by an embodiment of the present application.
  • a prompt window 5051 can be output.
  • Prompt text can be displayed in the prompt window 5061.
  • the prompt text can be: "The current video has not been loaded yet, please wait."
  • the prompt window 5061 is also provided with an exit control 5062 and a confirmation control 5063. If the user clicks to exit Control 5062 can give up playing the animated video. If the user clicks OK control 5063, the user can continue to wait for the animated video to be loaded before previewing the animated video.
  • the target character is also set with posture data.
  • Play the animation video and when it plays to the screen containing the target character, play the audio of the lines, and control the target character's posture.
  • the audio of the lines is played, and the posture of the target character is controlled, including any of the following:
  • the audio of the lines is played first, and after the lines are played, After the audio is played, control the target character to assume a posture; or, during the process of displaying the picture containing the target character, control the target character to assume an posture first, and then play the line audio; or, at any time during the playback of the line audio, control the target character Take a stance.
  • the target character is controlled to first display "open eyes” facial posture, and then play the line audio "Are you awake? Potter”; or, during the display of the picture containing the target character, control the target character to first show the body movement of "put down the chopsticks", and then play the line audio "I'm full.”?”, etc.
  • the target character is controlled to first display "open eyes” facial posture, and then play the line audio "Are you awake? Potter”; or, during the display of the picture containing the target character, control the target character to first show the body movement of "put down the chopsticks", and then play the line audio "I'm full.”?”, etc.
  • richer picture content can be presented, improving the fun and expressiveness of animated videos.
  • the target character is also set with posture data. If the number of texts is greater than 1, and one text corresponds to one line audio, then the number of line audios of the target character in the animated video is greater than 1, and the number of posture data is greater than 1, then the number of postures that the target character needs to present in the animated video is greater than 1.
  • the computer device is also used to perform the following operations: set the first output sequence between each text, and sequentially play the audio lines corresponding to each text in the animation video according to the first output sequence; set the second output sequence between each gesture data , and control the target character to sequentially present the postures corresponding to each posture data according to the second output sequence in the animated video; set the associated output order between any text and at least one posture data, and control the playback of any text corresponding to the animated video During the process of producing line audio, the target character is controlled to present at least one corresponding posture according to the associated output sequence.
  • the first output sequence, the second output sequence and the associated output sequence all support dynamic adjustment.
  • the first output order between each text can be: the order of outputting from left to right;
  • the second output order between each posture data can be: for adjacent facial postures and body movements, facial postures are output sequentially.
  • Corresponding posture data and posture data corresponding to body movements; the associated output sequence can be: output text first, then output posture data, etc.
  • Figure 5d is a schematic diagram of a script interface for setting the output sequence provided by an embodiment of the present application.
  • the schematic diagram of the script interface shows the text and gesture data set by the user for character A and character B.
  • the set text and gesture data can be expressed as the following three sentences of script data respectively:
  • Second sentence script data B character: [expression B1] [action B1] [expression B2], among which [expression B1] [action B1] [expression B2] belongs to adjacent facial postures and body movements, so, in sequence output;
  • the corresponding screen content can be presented according to the script data mentioned above.
  • character A reads the line audio corresponding to [Text A1], and then reads the line audio corresponding to [Text A2].
  • character A reads the line audio corresponding to [Text A2]
  • serial performance begins.
  • the facial posture corresponding to [expression A3] is expressed.
  • the first output sequence, the second output sequence and the associated output sequence all support dynamic adjustment.
  • the output sequence of the association between any text and at least one gesture data can be adjusted to: output the gesture data first, and then output the text. If the posture corresponding to the posture data includes facial posture and body movements, then the body movements can be output while the facial posture is output; or the facial posture is output first, and then the body movements are output; or the body movements are output first, and then the facial posture is output.
  • the user can control it according to the set first output sequence, second output sequence and associated output sequence.
  • the target character assumes a corresponding posture, and with the adjustment of the first output order, the second output order and the associated output order, the screen content displayed in the generated animated video is also different, which can improve the fun of video editing and improve User experience.
  • the storyboard supports being sequenced.
  • the animated video includes a first storyboard and a second storyboard, and the playback order of the first storyboard is prior to the playback order of the second storyboard.
  • Display the storyboard sequence interface which includes the first storyboard and the second storyboard, and the first storyboard and the second storyboard are displayed in the storyboard sequence interface in the order of playback order. Change the arrangement position of the first storyboard and/or the second storyboard in the storyboard sequence interface; adjust the playback order of the first storyboard or the second storyboard according to the changed arrangement position.
  • FIG. 6a is a schematic interface diagram of a shot sequencing provided by an embodiment of the present application.
  • an animated video is displayed in the video editing interface S601.
  • the animated video may include storyboard 1, storyboard 2 and storyboard 3, and the playback order of storyboard 1 is earlier than the playback order of storyboard 2.
  • the playback sequence of shot 2 is earlier than the playback sequence of shot 3.
  • This video is compiled
  • the editing interface S601 is provided with a storyboard sequence control 6011. When the storyboard sequence control 6011 is selected (for example, single click, double-click, or long press, etc.), the storyboard sequencer interface S602 is displayed.
  • the storyboard sequencer interface S602 Multiple storyboards (storyboard 1, storyboard 2, and storyboard 3) included in the animated video are displayed, and storyboard 1, storyboard 2, and storyboard 3 are displayed sequentially in the storyboard in accordance with the playback order.
  • the user can drag the storyboard in the storyboard sequencing interface S602 to change the arrangement position of the dragged storyboard. For example, you can drag storyboard 1 to the display position of storyboard 2, then storyboard 2 can be automatically Displayed at the display position of Storyboard 1.
  • the storyboard sequencing interface S603 may be displayed.
  • the playback order of Storyboard 1 and Storyboard 2 in the video editing interface is also adjusted accordingly. As shown in the video editing interface S604, the playback order of storyboard 2 is prior to the playback order of storyboard 1. In this way, the present application can quickly and conveniently sequence multiple storyboards included in the animated video, thereby adjusting the playback order of each storyboard in the animated video.
  • the storyboard supports editing.
  • the animated video contains at least one storyboard, and any storyboard supports editing. Editing includes any one or more of copying, deleting, and dynamic modification.
  • the target storyboard contains multiple video frames
  • the timeline editing panel displays the screen content of each video frame and the script data corresponding to each character in each video frame.
  • the script data includes any one or more of the following: text corresponding to the audio lines of each character, and posture data corresponding to each character.
  • Figure 6b is a schematic diagram of a storyboard editing interface provided by an embodiment of the present application.
  • the video editing interface S605 displays multiple stories, such as story 1 and story 2, and any story corresponds to an editing control.
  • the editing control 6051 for the target storyboard for example, storyboard 1
  • the storyboard editing menu bar 6061 corresponding to the storyboard 1 can be displayed.
  • the storyboard editing menu bar 6061 displays the storyboard adjustment function items, copy Storyboard function item, delete storyboard function item.
  • the adjust storyboard function item is used to support the dynamic modification of the storyboard
  • the copy storyboard function item is used to support the storyboard being copied
  • the delete storyboard function item is used to support the storyboard deletion.
  • the timeline editing panel 6071 corresponding to the target storyboard (storyboard 1) can be displayed.
  • the timeline editing panel 6071 displays the picture content corresponding to each video frame (for example, one video frame every 1 second) included in the storyboard, and the script data (for example, text and text) corresponding to character A and character B respectively. attitude data).
  • the text corresponding to character A can be: “Hi ⁇ Hurry, run!”
  • the posture data can include: “happy” and “frown”, as well as “waving” and “stomping legs”
  • the text corresponding to character B can be For: “I can't climb, I'm too tired”
  • the posture data can include: “wry smile” and “frown”, as well as “lying down” and “stomping legs”.
  • dynamic modification includes any of the following: position adjustment of script data corresponding to any role, time adjustment of script data corresponding to any role, and alignment processing of script data between different roles.
  • Figure 6c is a schematic diagram of a dynamically modified interface provided by an embodiment of the present application.
  • the posture data "frown" of character A can be deleted, and the time bar corresponding to the posture data "happy" (for example, the time period of 0s-1s) can be pulled.
  • the timeline 6081 extends to the entire storyboard 1 until the end (for example, the time period 0s-4s), thereby ensuring that character A maintains a happy facial posture in storyboard 1 from the original time period 0s-1s. And modified to maintain a happy facial posture during the time period of 0s-4s.
  • the role corresponding to the text supports being switched. Specifically, the role switching operation is received, and the role switching operation is used to switch the target role corresponding to the text to the reference role; in response to the role switching operation, the target role corresponding to the text is replaced with the reference role; where the reference role is the triggering target
  • the reference role is selected in the role selection panel displayed after the role's ID; or the reference role is selected in the shortcut selector.
  • the shortcut selector is used to display multiple role IDs that have reached the preset selection frequency within the preset time period.
  • N is a positive integer.
  • Figure 7a is a schematic diagram of an interface for role switching provided by an embodiment of the present application.
  • text corresponding to each character included in the animated video is displayed in the video editing interface S701, for example, text 7011 corresponding to the target character: "Are you awake? Porter", and then when clicked (for example, click, When you double-click or long-press the text 7011, the text editing box can be displayed.
  • the first shortcut entrance can be the target character's logo 7022.
  • the role selection panel can be evoked, and then by selecting in the role selection panel Replace the target role with the corresponding reference role. Then, in the animated video, the text: "Are you awake? Potter" will be read by the reference character instead of the target character.
  • the second shortcut entrance may be a character shortcut selector 7021, wherein the character shortcut selector 7021 displays multiple character identifiers that have reached a preset selection frequency within a preset time period.
  • the character shortcut selector 7021 displays The three most frequently selected role identifiers in the last week, and then the user can directly select the second role in the role shortcut selector 7021 to replace the first role. In this way, you can quickly switch the role corresponding to the text during the process of editing the text.
  • characters in animated videos are supported to be managed.
  • the video editing interface is provided with a character management control.
  • the character management interface is output.
  • the character management interface displays all characters included in the animated video and management items for each character; according to The management items manage each character in the animated video, where the management items include a character replacement item, and the management includes character replacement; or the management items include a timbre replacement item, and the management includes changing the timbre of the character's line audio.
  • Figure 7b is a schematic diagram of a role management interface provided by an embodiment of the present application.
  • the video editing interface S704 is provided with a role management control 7041.
  • a role management interface S705 is displayed.
  • the role management interface S705 displays multiple roles, such as role 1, Role 2, Role 3, Role 4, Role 5, and Role 6.
  • Each role can correspond to a role replacement item and a timbre replacement item.
  • character 1 may correspond to a character replacement item 7051 and a timbre replacement item 7052.
  • the timbre selection panel 7061 can be output.
  • One or more timbre identifiers are displayed in the timbre selection panel 7061, such as "Crayon Shin-chan”, “SpongeBob SquarePants”, and “Peppa Pig”. , “Li Taibai”, “Hu Daji”, “Angela” and so on.
  • the specified timbre can be selected for character 1 through the timbre selection panel 7061, so that when the animation video is played, the target character reads the corresponding line audio according to the specified timbre.
  • the target character can be changed from one type of timbre (such as a little boy's timbre) to another type of timbre (a funny uncle's timbre) when reciting the line audio, thereby enriching the animated video to make it more interesting and enhance user experience.
  • one type of timbre such as a little boy's timbre
  • another type of timbre a funny uncle's timbre
  • the generated animation video supports being exported.
  • the video export operation includes any one or more of the following: saving to the terminal device, publishing to the creator's homepage of the animated video, and sharing to a social conversation.
  • Figure 8a is a schematic diagram of an interface for exporting animated videos provided by an embodiment of the present application.
  • the video editing interface S801 is provided with an export control 8011.
  • the animated video can be played in the preview area 8022 of the video editing interface S802, and the loading status of the animated video can be displayed.
  • the text "Video composition is 25% in progress" is displayed, which means that 25% of the animated video has been loaded, and the remaining 75% has not yet been loaded.
  • the video editing interface S802 can also be provided with a switch control 8021. If the switch control 8021 is clicked, the animated video can be saved to the terminal device synchronously after the animation video is exported.
  • the switch control 8021 you can click the export control 8031 again to complete the formal export of the animated video, and the exported (published) animated video 8041 can be displayed on the creator's homepage S804.
  • the number of works can also be updated in the creator's homepage S804, for example, increasing the number of works by 1 (for example, the number of works can change from 15 to 16), which means the release of an animation video is completed.
  • the generated animation video supports being shared.
  • Figure 8b is a schematic diagram of an interface for sharing animated videos provided by an embodiment of the present application.
  • the video editing interface S805 is also provided with a sharing control 8051. If the sharing control 8051 is clicked, the video sharing interface S806 can be output.
  • the video sharing interface S806 can display multiple social sessions, such as social sessions. 1. Social Conversation 2, and Social Conversation 3.
  • the so-called social conversations may include individual conversations and group chat conversations.
  • a separate session refers to a social session in which two social users participate, and is used for information exchange between the two social users.
  • a group chat session refers to a social session in which multiple (more than two) social users participate, and is used for information exchange among the multiple social users.
  • the user can select at least one social session for video sharing in the video sharing interface S806. For example, if social session 1 is selected (the social session 1 may be a group chat session), the social session interface S807 of the social session 1 may be displayed, and the video link of the animated video may be displayed in the social session interface S807.
  • the video link can be displayed in the dialog box 8071 in the social conversation interface S807.
  • the video link may include but is not limited to: a website address, an icon, etc., and any social user in social session 1 can trigger the video link to play the animated video in the social session interface S807.
  • the core is to design a core data layer.
  • the core data layer stores core data used to drive the generation of animated videos.
  • the core data may include but is not limited to: input for the target character. text, and posture data set for the target character, etc.
  • the UI interface can include but is not limited to: the UI interface corresponding to the preview area, the UI interface corresponding to the script display area, Storyboard sequencing interface and timeline editing panel.
  • the embodiment of this application will be based on a single data drive method to ensure that the UI interface of the preview area and the UI interface of the script display area are synchronized in real time. renew.
  • FIG. 9a is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • the synchronous updates of the UI interface in the preview area and the UI interface in the script display area are driven by the way the Data Store (data storage layer) manages the stored core data.
  • each UI interface will have core data corresponding to the UI interface, and then by calculating the core data of each UI interface, the UIState value (interface state value) corresponding to each UI interface can be obtained.
  • different UI interfaces can be based on their corresponding The UIState value is used to refresh the interface.
  • the structure of the terminal device may include but is not limited to the following modules: preview area, data conversion layer, data storage layer, interface layer, basic component data manager, and basic component observer. Next, each of the above modules will be introduced respectively:
  • Preview area can be used to preview animated videos. Specifically, it can be used to preview the target character included in the animated video. In one possible implementation, if the target character is also set with posture data, then the preview area can also be previewed. The facial posture or body movements of the target character, etc.
  • Data conversion layer can be used to obtain new data from timeline text interface controls (such as keyboard) and timeline icon interface controls (such as scene selection panel, character selection panel, etc.), and can perform data processing based on the obtained new data. Convert to obtain scene interface data and character interface data.
  • the scene interface data can be generated after the user selects the scene screen in the scene selection panel;
  • the role interface data can include: the target role selected by the user in the role selection panel (such as the identification of the target role), the target role selected by the user in the role selection panel, and the target role selected by the user in the role selection panel.
  • the selected timbre such as the funny uncle timbre
  • the posture data set for the target character such as data corresponding to facial posture and/or body movements).
  • Data storage layer used to receive the data converted by the data conversion layer, such as the scene interface data and character interface data mentioned above.
  • these data can be obtained from the corresponding UI interface respectively.
  • the character interface data can be obtained from the UI interface corresponding to the preview area, and then the calculation processor can calculate the preview area based on the character interface data.
  • the UIState value of the corresponding UI interface in order to update the UI interface corresponding to the preview area.
  • Interface layer used to obtain (subscribe) corresponding interface data (such as scene interface data and character interface data) from the data storage layer, and update and display the timeline editing panel based on the obtained interface data; or, when in the video editing interface
  • a corresponding operation such as dragging the timeline
  • a callback instruction is sent to the data storage layer to notify the data storage layer to update the core data based on the corresponding operation.
  • Basic component observer used to observe or monitor data addition and data conversion process, and feed back the monitoring results to the data conversion layer.
  • Basic component data manager used to receive scene interface data, character interface data, etc. sent by the data conversion layer, so that the preview area displays animated videos based on these data (scene interface data, character interface data).
  • the core data layer is designed on the terminal device side and is based on one-way data driving to ensure the synchronous update of multiple UI interfaces. For example, if the user adds a body movement for the target character in the script display area, the target can be controlled synchronously in the preview area. The character shows the body movement; and if the user adds a facial gesture (such as a happy expression) to the target character in the script display area, the target character can be synchronously controlled to show a happy expression in the preview area.
  • a facial gesture such as a happy expression
  • Figure 9b is a schematic flow chart of one-way data driving provided by an embodiment of the present application.
  • data one-way driving mainly involves the interaction between the core data layer (Data Store) in the terminal device, the script display area, the renderer (the processor of the preview area), and the server.
  • the above interaction process can include Following steps:
  • a role information acquisition request is sent to the server.
  • the role information acquisition request is used to request to obtain role list configuration information (for example, it may include at least one role identifier, name, and other information).
  • the server i.e., the backend
  • the role list configuration information is displayed on the role selection panel, including at least one role identifier and the name corresponding to each role.
  • the target role's configuration information ie, role material
  • the server responds to the request for obtaining the configuration information of the target character, obtains (pulls) the configuration information of the target character, that is, the material of the character, and sends it to the terminal device.
  • the terminal device receives the configuration information of the target role and notifies the target role that the target role has been determined in the script display area. Specifically, after the material is pulled back, the character will be notified to be selected.
  • the Spine animation and other related files of the target character can be downloaded based on the character list configuration information, and the check box of the target character can be displayed in the character selection panel.
  • the script display area can send a data modification request to the core data layer.
  • the core data layer responds to the data modification request sent by the script display area, updates the core data stored in the core data layer, and obtains the updated core data.
  • the updated core data flows in one direction, thus driving the recalculation of various interface state values (such as the UIState value of the preview area and the UIState value of the script display area). Specifically, all registered processors can be traversed to calculate the interface state value.
  • the so-called one-way flow means that the core data layer can send updated core data to the script display area and preview area.
  • the script display area uses the corresponding processor (such as the script list processor) to calculate its own UIState value based on the updated core data.
  • the preview area uses the corresponding processor (such as a renderer) to calculate Diff data (difference data) based on the updated core data.
  • the difference data is calculated based on the core data before the update and the core data after the update. .
  • each UI interface is driven to refresh the UI interface based on the calculated UIState value.
  • the script display area refreshes the UI interface based on the UIState value corresponding to the script display area.
  • the preview area refreshes the UI interface based on the UIState value corresponding to the preview area, that is, the target character can be displayed in the preview area.
  • the renderer adds a new character node based on the diff data, and displays the currently newly added character in the renderer.
  • the user please refer to the attribute editing area 4011 in the interface S401 in Figure 4.
  • the user can click the text editing item 4012 in the attribute editing area 4011.
  • a data modification request can be sent to the core data layer to request the core data layer to update the core data again.
  • the core data layer triggers loading and updates the core data, and obtains the updated core data again.
  • the updated core data flows in one direction, thus driving the recalculation of various interface state values (such as the UIState value of the preview area and the UIState value of the script display area).
  • the script display area uses the corresponding processor (such as the script list processor) to calculate its own UIState value based on the updated core data.
  • the script display area displays the entered text (for example, "Are you awake? Porter” shown in interface S501 in Figure 5a).
  • the server responds to the text conversion request, interacts with the TTS service, uses TTS technology to convert the text input by the user into line audio (such as an mp3 file), and sends the line audio mp3 file and text corresponding to the text to the renderer.
  • line audio such as an mp3 file
  • the renderer calculates Diff data (difference data) based on the updated core data and waits for the server TTS conversion to complete. After it ends, you can receive the line audio mp3 files and text sent by the server.
  • the renderer renders the line component based on the received text and the corresponding line audio mp3 file, and displays the text-rendered interface in the preview area in the renderer. For example, as shown in interface S502 in Figure 5b, when the user subsequently triggers the playback of the animated video, the target character can be presented to the user in area 5021, and the target character can be controlled through the playback control 5023 to read the audio of the lines corresponding to the text.
  • each UI interface is based on the updated core data in the core data layer and uses one-way data driving to ensure the stability of each UI interface (such as the UI interface corresponding to the script display area and the UI interface corresponding to the preview area).
  • Synchronization Update the server uses non-blocking conversion during the TTS text-to-speech operation. The so-called non-blocking conversion can not interrupt the user's video editing process. For example, while the user is inputting text, it is asynchronously pulled to the server. Get the corresponding line audio (mp3 file), and notify the renderer to load it for the user to preview in real time in the preview area.
  • Figure 9c is a schematic structural diagram of core data provided by an embodiment of the present application.
  • the structure of the core data can be split hierarchically according to the product form. Specifically, it can be split into: Project (project), StoryBoard (storyboard), Statement (statement/text), Component (component) and other multiple levels.
  • the above-mentioned projects can be for animated videos, adding a storyboard, deleting a storyboard, etc.; the project data corresponding to the project can include project status, and the so-called project status can be used to reflect the status of the animated video.
  • the status of the animated video can be playing state or paused state, etc.
  • the storyboard data corresponding to the storyboard can include but is not limited to: the character list (all the characters in the character selection panel). Multiple character IDs displayed), timbre list (multiple timbre IDs displayed in the timbre selection panel), etc.
  • the sentence data corresponding to the sentence can include: role sentence (as the target Text set by the character), background sentences (can be the default text), sound effects sentences, etc.
  • the above-mentioned components may include, but are not limited to: character line components (used to control the line audio corresponding to the text read by the target character), character components (used to display the target character), and character expression components (used to control the target character). showing facial gestures), music components (used to control the target character to sound according to the set tone), etc.
  • the data corresponding to each level can be used to accurately express the user's operations in different dimensions. For example, adding a role (can be called the target role) in the storyboard ), add a statement; for another example, set posture data for the target character in the storyboard (posture data corresponding to facial posture, body movements corresponding to the posture data), etc., In this way, more perfect creative effects of animated videos can be adapted to more refined levels.
  • the embodiment of the present application mainly adopts a two-way linked list data management solution to adapt to text editing operations frequently performed by users (such as modifying text, adding text, or deleting text), and sequence adjustment operations. (such as adjusting the order of at least one storyboard), and insertion operations (such as adding a character under a storyboard, or inserting a piece of text, setting posture data for the character), etc.
  • each data node in the doubly linked list data includes two pointers (a start pointer and an end pointer). These two pointers can also be referred to as head and tail pointers.
  • Figure 10a is a schematic flowchart of an operation case provided by an embodiment of the present application. As shown in Figure 10a, this case mainly includes the following steps:
  • Figure 10b is a schematic flow chart of another operation case provided by the embodiment of the present application. As shown in Figure 10b, this case mainly includes the following steps:
  • Figure 10c is a schematic flow chart of another operation case provided by the embodiment of the present application. As shown in Figure 10c, this case mainly includes the following steps:
  • Figure 10d is a schematic flow chart of another operation case provided by the embodiment of the present application. As shown in Figure 10d, this case mainly includes the following steps:
  • Figure 10e is a schematic flow chart of another operation case provided by the embodiment of the present application. As shown in Figure 10e, this case mainly includes the following steps:
  • Figure 10f is a schematic flow chart of another operation case provided by the embodiment of the present application. As shown in Figure 10f, this case mainly includes the following steps:
  • Figure 10g is a schematic flow chart of another operation case provided by the embodiment of the present application. As shown in Figure 10g, this case mainly includes the following steps:
  • FIG. 11a is a schematic flow chart of a text editing operation provided by an embodiment of the present application.
  • the text editing operation may specifically include the following steps S11-S16:
  • FIG. 11b is a schematic flow chart of another text editing operation provided by an embodiment of the present application.
  • the text editing operation may specifically include the following steps S21-S27:
  • Figure 11c is a schematic flow chart of another text editing operation provided by an embodiment of the present application.
  • the text editing operation may specifically include the following steps S31-S38:
  • the role text linked list is reordered according to the setting time of the text. For example, if the setting time of text A is 10:00 and the setting time of text B is 10:01, then text A can be arranged before text B in the text doubly linked list.
  • the main timeline doubly linked list and the respective role doubly linked list are simultaneously maintained.
  • the main timeline doubly linked list is relied upon to obtain the arrangement of all texts from top to bottom in chronological order in the animated video. Sequence Listing.
  • This multi-dimensional data linked list method can easily meet the business requirements of displaying text in different business scenarios.
  • the above-mentioned script display interface refers to the interface corresponding to the script display area (script display area 5012 shown in Figure 5a), and the script display interface supports editing.
  • Figure 12 is a technical architecture diagram of a script display area provided by an embodiment of the present application.
  • the technical architecture diagram mainly includes: a storyboard list interface layer (a storyboard list is displayed, and the storyboard list includes at least A storyboard, the script display area 5012 shown in Figure 5a displays storyboard 1, storyboard 2, and storyboard 3) and a data layer (which may include a business data layer, a data processing layer, and a public data layer).
  • the storyboard list interface layer and the data layer have a one-to-one correspondence, including: a one-to-one correspondence between the storyboard list view and the storyboard data list, a one-to-one correspondence between empty containers and storyboard data, and a one-to-one correspondence between the script list view and the script data list.
  • the script view unit corresponds to the script data one-to-one.
  • the storyboard list view is composed of storyboard view units
  • the storyboard view unit includes two types of containers, one is an empty container, the so-called empty container refers to the container corresponding to the initialized empty storyboard data; the other is a container containing the script list view;
  • the script list view is a view that supports scrolling and consists of a single script view unit
  • Script view units can be divided into multiple types, specifically related to the type of script data they are bound to, such as character scripts, narration scripts, and music scripts.
  • the data layer is divided into three parts:
  • a. Business data layer Strongly related to the storyboard interface layer, it provides data sources for the script display interface. Among them, the so-called strong correlation with the storyboard interface layer means that the business data layer contains many data attributes of the storyboard interface layer, such as the data corresponding to the highlighted text;
  • Data processing layer used to process the process of data conversion between the business data layer and the public data layer;
  • Public data layer used to store core data and data shared between multiple modules (business data layer, data processing layer and public data layer).
  • the timeline editing panel mainly involves the following contents:
  • the timeline panel list for example, display the text and posture data of each character in the timeline editing panel, and present the text and posture data of each character to the user in the form of a list.
  • the timeline editing panel displays the text and posture data of character A and the text and posture data of character B, and the above data can be presented in the timeline editing panel in the form of a list.
  • 3 UI operations on the timeline editing panel can specifically include add, delete, and modify operations performed on the timeline editing panel.
  • the text corresponding to character B "I can't climb, I'm too tired” is modified to the text "Okay, let's work together" shown in interface S609; another example is , delete the posture data "frown" corresponding to character A in the timeline editing panel shown in interface S608 in Figure 6c, and so on.
  • Time axis time axis 6081 as shown in Figure 6c shows the total duration corresponding to the current storyboard, for example, the total duration of storyboard 1 is 4 seconds
  • the bottom bar horizontal bar informs the joint debugging.
  • the horizontal bar that displays the background music
  • the correspondence is also adaptively adjusted to achieve linkage adjustment between the timeline and the horizontal bars of the background music.
  • the storyboard data (which can include the data determined after the above detection and callback) is updated in real time and notifies the renderer of data joint debugging. Specifically, after detecting that the corresponding operation is performed on the timeline editing panel corresponding to the current storyboard, the renderer is notified to update the screen content in the preview area in real time, so that after the user performs the editing operation in the timeline editing panel , the corresponding screen content can be updated synchronously in the preview area to achieve the effect of data linkage.
  • Figure 13a is a schematic flow chart of timeline editing provided by an embodiment of the present application.
  • an editing operation addition, modification, deletion, etc.
  • an internal event of the timeline UI can be generated and the timeline UI
  • the internal event is notified to the middleware through the event creator, so that the middleware sends the internal event of the timeline UI to the calculation processor.
  • the calculation processor responds to the internal event of the timeline UI, thereby calculating the status corresponding to the internal event of the timeline UI.
  • Figure 13b is a schematic flow chart of another timeline editing provided by an embodiment of the present application.
  • the user can generate a click event after performing an editing operation in the timeline editing panel.
  • the click event can be sent to the timeline middleware as a selected edit event by the business party for forwarding, so that the timeline calculation processor and The menu calculation processor calculates the timeline status and menu status respectively.
  • the status calculated by each calculation processor is used to notify the modification of the corresponding UI interface change. Specifically, after the timeline processor calculates the timeline status, it notifies the modification of the timeline view (the UI interface corresponding to the timeline editing panel is updated), and after the menu calculation processor calculates the menu status, it notifies the modification of the menu view.
  • the process of performing video editing mainly involves "determining the target character and inputting text in the video editing interface", where the determined target character, input text and other related materials need to be obtained from the server side. Therefore, the embodiment of this application proposes an architecture diagram of material management based on the management of related materials.
  • Figure 14a is a schematic diagram of an architecture of material management provided by an embodiment of the present application.
  • the user (client object) on the terminal device side can perform a video editing operation in the video editing interface displayed by the video client (for example, StoryVerse APP).
  • the video editing operation may specifically include: determining the target role , enter a piece of text, set posture data for the target character, etc.
  • the video client can pull the scene classification list, scene list, role classification list, role list, expression list, and action list based on the video editing operation of the client object, and send the above data to the video access layer to make the video
  • the access layer sends the above data to the material module (such as the material module dao layer) for material management.
  • the material module such as the material module dao layer
  • the management object can include scene materials (used to provide one or more scene images), skeleton materials (used to provide Spine animation files corresponding to one or more characters), character materials (used to provide one or more characters). characters), expression materials (used to provide one or more facial gestures), action materials (used to provide one or more body movements), and timbre types (used to provide one or more timbres), etc. Go to the material management terminal to manage materials.
  • the material module dao layer can perform unified material management on the material management end and related materials involved in the material module, store them locally in localcache, and write them into the material management library.
  • Figure 14b is a schematic structural diagram of a material business model provided by an embodiment of the present application.
  • the embodiment of the present application can adopt "plug and pull thinking" Let the target character change roles and switch actions.
  • plug-and-play idea means that based on the basic skeleton corresponding to the target character (which can be provided by the skeleton material), corresponding dressing accessories can be added to the skeleton, or a new skeleton can be added. These newly added bones and costume accessories can also be "removed" from the current skeleton.
  • the idea of plugging and unplugging is mainly to use splicing to facilitate the addition and deletion of bones and costume accessories.
  • Figure 14c is an abstract schematic diagram of a material business model provided by an embodiment of the present application.
  • the specific processes involved may include the following:
  • resource types may include but are not limited to: characters, skeletons, actions (body movements), expressions (facial gestures), music, and timbres.
  • Each resource type supports further classification: for example, the scene type supports classification into daily and indoor; another example, the character type supports classification into Bear Haunting and Douluo Continent. Among them, each resource type supports a tree structure during the classification process.
  • Card belongs to the resource category and is expressed as a specific resource instance.
  • resource instances corresponding to character resources can include: Meow, Bear, Crayon Shin-chan, etc.; for another example, resource instances corresponding to emoticon resources can include : Cry, laugh. Among them, the card also needs to support tree structure.
  • Skeleton For example, it can include character materials, action materials, expression materials, etc. Among them, action materials are used to provide resources for the target character to present body movements, and expression materials are used to provide resources for the target character to present gestures and expressions.
  • the material resources can be, for example, pictures, compressed packages (such as zip packages, rar packages), etc.
  • the embodiment of the present application does not specifically limit the data format of the material resources.
  • FIG. 15 is a schematic structural diagram of a video editing device provided by an embodiment of the present application.
  • the video editing device 1500 may be a computer program (including program code) running in a computer device.
  • the video editing device 1500 may be an application software; the video editing device 1500 may be used to perform the methods provided by the embodiments of the present application. Corresponding steps.
  • the video editing device 1500 may include:
  • Display unit 1501 used to display the video editing interface
  • the processing unit 1502 is configured to determine the target character and the input text in the video editing interface, where the text is presented in the form of lines of text in the video editing interface, and the video editing interface supports the The text in the text line is edited;
  • the processing unit 1502 is also used to generate an animated video that contains the target character, and sets audio lines corresponding to the text for the target character in the animated video, wherein during the playback process of the animated video , when the screen containing the target character is played, the audio of the lines corresponding to the text will be played simultaneously.
  • processing unit 1502 may be described with reference to the above method embodiments, and will not be described again here.
  • FIG. 16 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the computer device 1600 is used to perform the steps performed by the terminal device or the server in the foregoing method embodiments.
  • the computer device 1600 includes: at least one processor 1610; at least one input device 1620, at least one output device 1630 and a memory 1640.
  • the above-mentioned processor 1610, input device 1620, output device 1630 and memory 1640 are connected through a bus 1650.
  • the memory 1640 is used to store computer programs, which include program instructions.
  • the processor 1610 is used to call the program instructions stored in the memory 1640 to perform various operations described in the above embodiments.
  • the embodiment of the present application also provides a computer storage medium, and a computer program is stored in the computer storage medium, and the computer program includes program instructions.
  • the processor executes the above program instructions, it can execute The methods in the above corresponding embodiments will not be described again here.
  • program instructions may be deployed on one computer device, or executed on multiple computer devices located at one location, or on multiple computer devices distributed across multiple locations and interconnected by a communications network.
  • a computer program product includes a computer program. After a processor of a computer device reads the computer program from the computer program product, the processor can execute the computer program, so that The computer device can execute the methods in the corresponding embodiments mentioned above, so the details will not be described again here.
  • the above programs can be stored in computer-readable storage media.
  • the programs When executed, , may include the processes of the above method embodiments.
  • the above-mentioned storage media can be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM) or a random access memory (Random Access Memory, RAM), etc.

Abstract

本申请提出一种视频编辑方法、装置、计算机设备、存储介质及产品,该视频编辑方法包括:显示视频编辑界面;在所述视频编辑界面中确定目标角色及输入的文本,其中,所述文本在所述视频编辑界面中以文本行的方式呈现,且所述视频编辑界面支持对所述文本行中的文本进行编辑;及,生成动画视频,所述动画视频包含所述目标角色,且在所述动画视频中为所述目标角色设置所述文本对应的台词音频;其中,在所述动画视频的播放过程中,当播放至包含所述目标角色的画面时,同步播放所述文本对应的台词音频。

Description

视频编辑方法、装置、计算机设备、存储介质及产品
本申请要求于2022年5月30日提交中国专利局、申请号为202210603765.9、申请名称为“视频编辑方法、装置、计算机设备、存储介质及产品”的中国专利申请的优先权。
技术领域
本申请涉及计算机技术领域,具体涉及一种视频编辑方法、一种视频编辑装置、一种计算机设备、一种计算机可读存储介质及一种计算机程序产品。
发明背景
随着计算机技术的不断发展,各式各样的视频遍布于人们日常生活中的方方面面。因此,视频的创作与编辑也成为视频领域的一个热门研究话题。目前,视频编辑时通常需要在时间轴上将角色与对应的台词文本进行手动对齐,操作过于繁琐,导致视频编辑的效率较低。因此,如何提高视频编辑的效率是当前亟待解决的一个技术问题。
发明内容
本申请实施例提供了一种视频编辑方法、装置、设备及计算机可读存储介质,可以便捷的生成动画视频,从而提高视频编辑的效率。
一方面,本申请实施例提供了一种视频编辑方法,由计算机设备执行,该方法包括:
显示视频编辑界面;
在所述视频编辑界面中确定目标角色及输入的文本,其中,所述文本在所述视频编辑界面中以文本行的方式呈现,且所述视频编辑界面支持对所述文本行中的文本进行编辑;及,
生成动画视频,所述动画视频包含所述目标角色,且在所述动画视频中为所述目标角色设置所述文本对应的台词音频;
其中,在所述动画视频的播放过程中,当播放至包含所述目标角色的画面时,同步播放所述文本对应的台词音频。
另一方面,本申请实施例提供了一种视频编辑装置,该装置包括:
显示单元,用于显示视频编辑界面;
处理单元,用于在所述视频编辑界面中确定目标角色及输入的文本,其中,所述文本在所述视频编辑界面中以文本行的方式呈现,且所述视频编辑界面支持对所述文本行中的文本进行编辑;
所述处理单元,还用于生成动画视频,所述动画视频包含所述目标角色,且在所述动画视频中为所述目标角色设置所述文本对应的台词音频;
其中,在所述动画视频的播放过程中,当播放至包含所述目标角色的画面时,同步播放所述文本对应的台词音频。
另一方面,本申请实施例提供一种计算机设备,该计算机设备包括存储器和处理器,存储器存储有计算机程序,计算机程序被处理器执行时,使得处理器执行上述的视频编辑方法。
另一方面,本申请实施例提供一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被计算机设备的处理器读取并执行时,使得计算机设备执行上述的视频编辑方法。
另一方面,本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述的视频编辑方法。
附图简要说明
图1是本申请实施例提供的一种视频编辑系统的结构示意图;
图2是本申请实施例提供的一种视频编辑方法的流程示意图;
图3a是本申请实施例提供的一种显示视频编辑界面的场景示意图;
图3b是本申请实施例提供的一种确定目标角色的界面示意图;
图3c是本申请实施例提供的另一种确定目标角色的界面示意图;
图3d是本申请实施例提供的另一种确定目标角色的界面示意图;
图3e是本申请实施例提供的一种编辑历史视频的界面示意图;
图3f是本申请实施例提供的一种设置背景的界面示意图;
图4是本申请实施例提供的一种设置姿态数据的界面示意图;
图5a是本申请实施例提供的一种视频编辑界面的示意图;
图5b是申请实施例提供的一种播放动画视频的界面示意图;
图5c是本申请实施例提供的另一种播放动画视频的界面示意图;
图5d是本申请实施例提供的一种设置输出顺序的脚本界面示意图;
图6a是本申请实施例提供的一种分镜调序的界面示意图;
图6b是本申请实施例提供的一种分镜编辑的界面示意图;
图6c是本申请实施例提供的一种动态修改的界面示意图;
图7a是本申请实施例提供的一种角色切换的界面示意图;
图7b是本申请实施例提供的一种角色管理的界面示意图;
图8a是本申请实施例提供的一种导出动画视频的界面示意图;
图8b是本申请实施例提供的一种分享动画视频的界面示意图;
图9a是本申请实施例提供的一种终端设备的结构示意图;
图9b是本申请实施例提供的数据单向驱动的流程示意图;
图9c是本申请实施例提供的一种核心数据的结构示意图;
图10a是本申请实施例提供的一种操作案例的流程示意图;
图10b是本申请实施例提供的另一种操作案例的流程示意图;
图10c是本申请实施例提供的另一种操作案例的流程示意图;
图10d是本申请实施例提供的另一种操作案例的流程示意图;
图10e是本申请实施例提供的另一种操作案例的流程示意图;
图10f是本申请实施例提供的另一种操作案例的流程示意图;
图10g是本申请实施例提供的另一种操作案例的流程示意图;
图11a是本申请实施例提供的一种文本编辑操作的流程示意图;
图11b是本申请实施例提供的另一种文本编辑操作的流程示意图;
图11c是本申请实施例提供的另一种文本编辑操作的流程示意图;
图12是本申请实施例提供的一种脚本显示区域的技术架构图;
图13a是本申请实施例提供的一种时间轴编辑的流程示意图;
图13b是本申请实施例提供的另一种时间轴编辑的流程示意图;
图14a是本申请实施例提供的一种素材管理的架构示意图;
图14b是本申请实施例提供的一种素材业务模型的结构示意图;
图14c是本申请实施例提供的一种素材业务模型的抽象原理图;
图15是本申请实施例提供的一种视频编辑装置的结构示意图;
图16是本申请实施例提供的一种计算机设备的结构示意图。
实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请主要涉及通过自然语言处理(Nature Language processing,NLP)技术所包括的文字转语音(Text to Speech,TTS)技术,将终端设备(运行有客户端,客户端例如具体可以为视频客户端)所采集到的台词文本转换为语音信息,以便在播放编辑好的动画视频时,语音播放台词文本 对应的语音信息以供用户观看和收听。即本申请可以通过自然语言处理技术,实现文本信息和语音信息之间的转换。
本申请实施例提供了一种视频编辑方案,可以显示视频编辑界面;然后,在视频编辑界面中可以确定目标角色及输入文本,其中,输入的文本在视频编辑界面中以文本行的方式呈现,且支持对文本行中的文本进行编辑;接下来,即可生成动画视频,该动画视频包含目标角色,且目标角色在动画视频中会输出文本对应的台词音频。其中,在动画视频的播放过程中,当播放至包含目标角色的画面时,同步播放文本对应的台词音频。可见,本申请中通过选择一个角色以及输入一个文本,即可生成一段动画视频,另外,输入的文本是按照文本行的方式呈现在视频编辑界面中,可以按照文档的形式对文本进行编辑操作,操作简单、便捷;并且,可以自动的将角色与文本之间建立关联,从而可以在显示目标角色的画面时使得目标角色朗读文本对应的台词音频,相比于手动进行角色与台词的对齐操作,本申请可自动将角色与台词之间进行关联,从而可以提高视频编辑的效率。
接下来,结合上述的视频编辑方法对本申请提供的视频编辑系统进行相关介绍。请参见图1,图1是本申请实施例提供的一种视频编辑系统的结构示意图。如图1所示,该视频编辑系统至少可以包括终端设备1001和服务器1002。
其中,图1所示的视频编辑系统中的终端设备1001可以包括但不限于智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能电视、智能手表、车载终端、智能可穿戴设备等等,往往配置有显示装置,显示装置可以为显示器、显示屏、触摸屏等等,触摸屏可以为触控屏、触控面板等等。
其中,图1所示的视频编辑系统中的服务器1002可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content Delivery Network,内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器。
在一种可能的实现方式中,终端设备1001中运行有客户端,如视频客户端、浏览器客户端、信息流客户端、游戏客户端等。在本申请各实施例中,以视频客户端为例进行说明。终端设备1001可以在视频客户端中向用户展示用户接口(User Interface,UI)界面,例如,该UI界面为Flutter界面,Flutter是一种移动的UI界面框架,可以在操作系统上快速构建高质量的原生用户界面。该Flutter页面例如可以为视频编辑界面,可用于显示动画视频。服务器1002可用于为终端设备1001提供在视频编辑过程中所需的视频素材(例如角色的标识、背景、面部姿态(表情)、肢体动作等信息)。
接下来,结合本申请上述提及的视频编辑方案,对终端设备1001和服务器1002之间的交互过程进行相关说明:
1)首先,终端设备1001可以显示视频编辑界面。
2)接下来,用户可以在视频编辑界面中确定目标角色。其中,目标角色的确定方式可以是通过触发视频编辑界面中设置的角色添加入口后添加的,也可以是在视频编辑界面中所显示的历史视频中所选中的任一角色。当确定目标角色之后,终端设备1001可以向服务器1002发送角色信息获取请求。
3)服务器1002响应于终端设备1001发送的角色信息获取请求,获取目标角色的配置信息(例如目标角色的标识、名称等),并将目标角色的配置信息发送给终端设备1001。
4)终端设备1001接收来自服务器1002发送的目标角色的配置信息(例如目标角色的标识、名称等),并在视频编辑界面中显示目标角色。
5)进一步地,用户还可以在视频编辑界面中输入文本。当输入文本之后,终端设备1001可以向服务器1002发送数据转换请求,该数据转换请求用于将文本转换为台词音频。
6)服务器1002响应于终端设备1001发送的数据转换请求,通过TTS技术,将文本转换为对应的台词音频,台词音频例如可以为mp3(MPEG-1 AudioLayer-3,一种高性能的声音压缩编码格式)文件,并向终端设备1001发送文本对应的台词音频(例如mp3文件)。
7)终端设备1001在接收到服务器1002返回的台词音频(例如mp3文件)之后,可以生成动画视频。在一种可能的实现方式中,还可以在视频编辑界面中播放该动画视频,其中,在动画视频的播放过程中,当显示目标角色的画面时,可以对mp3文件进行加载以驱动在呈现目标角色的画面时,同步播放文本对应的台词音频,即,控制目标角色同步朗读文本对应的台词音频。
接下来,结合附图对本申请实施例提供的视频编辑方法进行详细描述。请参见图2,图2是本申请实施例提供的一种视频编辑方法的流程示意图。本实施例中,该视频编辑方法可由计算机设备执行,该计算机设备可以是图1所示的视频编辑系统中的终端设备1001。如图2所示,该视频编辑方法可包括以下步骤S201~S203:
S201:显示视频编辑界面。
本申请实施例中,视频编辑界面可以是运行在视频客户端中用于编辑视频的界面。例如,视频编辑界面可以是用于新建一个视频的界面;又如,视频编辑界面也可以是用于对历史视频进行修改、更新的界面。
举例来说,视频编辑界面可以如图2所示的界面S10所示。其中,该视频编辑界面S10中可以包括多个功能项,例如场景添加功能项101、文本添加功能项102、分镜添加功能项103等等。其中,场景添加功能项101用于添加背景图;文本添加功能项102用于选择一个角色,并为该角色添加相应的文本;分镜添加功能项103用于添加一个新的分镜,其中,至少一个分镜可以构成一个动画视频。可选的,视频编辑界面S10中还可以包括角色添加入口,通过该角色添加入口可以添加一个新的角色。
在一种可能的实现方式中,可以通过视频客户端的创作主页进入至视频编辑界面,请参见图3a,图3a是本申请实施例提供的一种显示视频编辑界面的场景示意图。如图3a所示,界面S301是视频客户端的创作主页,该创作主页S301中设有视频编辑入口3011,当用户点击该视频编辑入口3011后,即可显示视频编辑界面S302。
S202:在视频编辑界面中确定目标角色及输入的文本,其中,文本在视频编辑界面中以文本行的方式呈现,且视频编辑界面支持对文本行中的文本进行编辑。
举例来说,如图2中视频编辑界面S20所示,可以在该视频编辑界面S20中选择一个目标角色,例如目标角色可以为201,然后可以在该视频编辑界面S20中为该目标角色201输入相应的文本。其中,所谓文本行是指将所输入的文本按照一行一行的形式进行排列,例如所输入的文本包括N个文字,N为正整数,那么,这N个文字从左至右依次排列,从而构成一个文本行,例如,用户在视频编辑界面S20中所输入的文本为“醒了吗?波特”,从而在视频编辑界面S30中呈现相应的文本行(如310所示)。其中,在视频编辑界面中,一个文本行可以还可以设置有最大文字数量,若所输入的文本为20个文字,一个文本行对应的最大文字数量为10个文字,那么该输入的文本在视频编辑界面中显示为两个文本行。
通过上述方式,在视频编辑界面中输入的文本按照文本行的方式进行呈现,若需要对文本进行编辑(例如增加、删除、修改等操作),也可直接在该视频编辑界面中所呈现的对应文本行中执行相应的编辑操作,这种文本交互方式,操作简单且方便。
接下来,分别对如何确定目标角色和输入文本等相关过程进行详细说明。
首先,对如何确定目标角色的相关过程进行详细说明:
在一种可能的实现方式中,目标角色是被添加至视频编辑界面中的。具体来说,在视频编辑界面中确定目标角色,可以包括:当检测到角色添加事件时,触发在视频编辑界面中添加目标角色。其中,角色添加事件是通过触发角色添加入口后生成的;或者,角色添加事件是在检测到角色添加手势后生成的,角色添加手势包括:单击手势、双击手势、悬浮手势、预设手势中的任一种。
具体来说,角色添加入口被设置于视频编辑界面中。当检测到角色添加事件时,触发在视频编辑界面中添加目标角色,可以包括:响应于角色添加入口被触发,输出角色选择面板,角色选择面板中显示有至少一个待选择的角色标识;响应于针对目标角色标识的选择操作,在视频编辑界面中显示目标角色标识对应的目标角色。其中,角色添加入口可以为一级入口,也可以为二级 入口,所谓一级入口是指可以直接显示于视频编辑界面中的入口,所谓二级入口是指不直接显示于视频编辑界面中的入口,即需要通过触发其它入口或者界面才可显示的入口。
举例来说,请参见图3b,图3b是本申请实施例提供的一种确定目标角色的界面示意图。如图3b所示,视频编辑界面S302中设置有角色添加入口3021,点击该角色添加入口3021后可以输出角色选择面板,角色选择面板可以是独立于视频编辑界面的一个单独的界面,角色选择面板也可以是与视频编辑界面处于同一界面。例如,角色选择面板可以为界面S303中的一个窗口3031,那么,可以在该角色选择面板3031中选择任一个角色标识(例如3032)对应的角色作为目标角色。
又如,请参见图3c,图3c是本申请实施例提供的另一种确定目标角色的界面示意图。如图3c所示,视频编辑界面S302中设置有操作区域3022,用户可以在该操作区域3022绘制角色添加手势,例如绘制一个“S”型的手势,即可触发输出角色选择面板3031。
在另一种可能的实现方式中,目标角色是从历史视频中选择的。具体来说,视频编辑界面中显示有多个待编辑的历史视频,任一历史视频中包括至少一个角色。在视频编辑界面中确定目标角色,包括:从历史视频中选择任一角色确定为目标角色。
举例来说,请参见图3d,图3d是本申请实施例提供的另一种确定目标角色的界面示意图。如图3d所示,视频编辑界面S301中包括历史视频显示区域3012,该历史视频显示区域3012中显示有多个待编辑的历史视频(例如作品1、作品2、作品3),若选择作品1后,可以在界面S305中显示该作品1所包括的至少一个角色(例如角色1、角色2、角色3)。例如,可以选择界面S305中所显示的角色1作为目标角色。
在一种可能的实现方式中,视频编辑界面包括预览区域。在选择目标角色的过程中,每次被选择的角色均显示于预览区域中,且预览区域中显示的角色随着选择操作的切换而进行替换;当目标角色被选中时,在预览区域中呈现目标角色。如图3c所示,视频编辑界面包括的预览区域3033中可以显示每次所选择的角色以供用户预览。
其中,视频编辑界面中所显示的任一历史视频支持被编辑。请参见图3e,图3e是本申请实施例提供的一种编辑历史视频的界面示意图。如图3e所示,视频编辑界面S301中显示有多个历史视频(作品1、作品2、作品3),当点击作品1后,可以输出针对作品1的菜单栏,例如视频编辑界面S307中的3071所示,该菜单栏3071中显示有复制功能项、重命名功能项、删除功能项等多个功能项。其中,复制功能项可用于对作品1进行复制操作,重命名功能项可用于对作品1的名称进行更改,删除功能项可用于对作品1进行删除操作。
另外,在选择目标角色之前,还可以为所选择的目标角色设置相应的背景。请参见图3f,图3f是本申请实施例提供的一种设置背景的界面示意图。如图3f所示,通过点击视频编辑界面S308中的场景添加功能项,即可输出场景选择面板,如视频编辑页面S309中的3091所示。其中,该场景选择面板3091中显示有至少一个待推荐的场景画面以供用户进行自由选择。进一步地,场景选择面板3091中还可以包括不同类型的场景画面,例如纯色类型、室内类型、户外类型等等。
需要说明的是,上述所提及的角色选择面板所显示的多个角色标识对应的素材、场景选择面板所显示的多个场景画面对应的素材,可以是由第三方平台所提供。通过素材的开放面板设计,也使得未来更多的第三方设计开发者参与到素材创作中来,从而可以有取之不尽、各式各样的场景画面和角色供创作视频的创作者使用。
然后,对如何为目标角色输入相应的文本、以及设置姿态数据的相关过程进行详细说明:
在选择目标角色之后,可以为该目标角色输入相应的文本、以及设置姿态数据。请参见图4,图4是本申请实施例提供的一种设置姿态数据的界面示意图。如图4所示,在视频编辑界面S401中显示有针对目标角色的属性编辑区域4011,该属性编辑区域4011中可以包括至少一个属性编辑项,例如文本编辑项4012、表情编辑项4013、以及动作编辑项4014。其中,可以点击文本编辑项4012,然后弹出键盘,用户可以通过键盘输入文本,例如输入的文本可以为:“今天你开心吗”。然后,也可以点击表情编辑项4013,显示表情候选栏4021,该表情候选栏4021中显示有多个表情,每个表情可以用于控制目标角色在动画视频中呈现相对应的面部姿态,例如面部姿态可以包括:开心、难过、大哭、大笑等。另外,也可以点击动作编辑项4014,显示动作候选栏4031,该表情候选栏4021中显示有多个动作,每个动作可以用于控制目标角色在动画视频中呈现相对应的 肢体动作,例如肢体动作可以包括:躺下、挥手、旋转、跳跃等。
需要说明的是,本申请实施例中动画视频所包括的目标角色可以是指人物、动物、以及物品等,且目标角色的类型可以包括但不限于:卡通、动漫、真实人物等等。
通过这种方式,当选择一个目标角色后,可以为该目标角色输入对应的文本,与此同时,可以通过表情功能项和动作功能项为目标角色设置姿态数据,以控制目标角色在动画视频中所呈现的姿态,该姿态可以包括面部姿态及肢体动作中的任一种或多种。从而使得目标角色在表达文本内容的过程中也表现出丰富的面部姿态及肢体动作,提升动画视频的趣味性、表现力。
S203:生成动画视频,动画视频包含目标角色,且在动画视频中为目标角色设置文本对应的台词音频,其中,在动画视频的播放过程中,当播放至包含目标角色的画面时,同步播放文本对应的台词音频。
具体来说,生成的动画视频可以在视频编辑界面S30中显示。其中,在动画视频中为目标角色设置文本对应的台词音频,使得在动画视频的播放过程中,当播放至包含目标角色的画面时,为目标角色输出台词音频,也就是同步播放文本对应的台词音频,所达到的视觉效果是,目标角色在朗读该台词音频对应的文本。可以理解的是,动画视频中可以包括至少一个角色,针对每一个角色而言,为其输入文本,以及设置相应的姿态数据的具体过程,均可参考上述步骤S202中为目标角色所设置的过程,本申请实施例在此不再赘述。
在一种可能的实现方式中,目标角色还设置有姿态数据。视频编辑界面包括预览区域和脚本显示区域。预览区域可以用于显示目标角色,脚本显示区域可以用于显示文本及姿态数据。
举例来说,请参见图5a,图5a是本申请实施例提供的一种视频编辑界面的示意图。如图5a所示,该视频编辑界面S501中可以包括预览区域5011和脚本显示区域5012。其中,目标角色可以显示于预览区域5011中,脚本显示区域5012可以用于显示为目标角色所设置的文本及姿态数据。例如,脚本显示区域5012可以显示有:[睁眼](姿态数据)醒了吗?波特(文本)。
接下来,对所生成的动画视频的播放过程进行详细说明:
在一种可能的实现方式中,播放动画视频,并当播放至包含目标角色的画面时,播放台词音频。具体来说,在台词音频的播放过程中,在包含目标角色的画面中突出显示文本。其中,突出显示包括以下任一种或多种:放大字体显示、改变字体颜色显示、按照预设字体显示。
具体来说,可以在视频编辑界面中播放动画视频,请参见图5b,图5b是申请实施例提供的一种播放动画视频的界面示意图。如图5b所示,该视频编辑界面S502中还设置有播放控件5023,点击该播放控件5023,即可在视频编辑界面S503中播放该动画视频。进一步地,视频编辑界面S503中还可以设置有全屏控件5031,若用户点击该全屏控件5031,即可切换为在视频编辑界面中全屏播放该动画视频,如图界面S504所示。其中,所谓全屏播放是指该动画视频的画面内容全屏显示于视频编辑界面中,通过这种方式,可以达到最大化的预览效果,方便创作者检查所有动画视频中的创作细节。
在一种可能的实现方式中,当点击播放动画视频后,若该动画视频尚未加载完成,则可以输出提示窗口。请参见图5c,图5c是本申请实施例提供的另一种播放动画视频的界面示意图。如图5c所示,当用户点击视频编辑界面S505中的播放控件之后,若该动画视频尚未加载完成,则可以输出提示窗口5051。该提示窗口5061中可以显示有提示文本,例如提示文本可以为:“当前视频尚未加载完成,请等待...”该提示窗口5061中还设置有退出控件5062和确定控件5063,若用户点击退出控件5062即可放弃播放该动画视频,若用户点击确定控件5063,即可继续等待动画视频加载完毕后,方可预览该动画视频。
在一种可能的实现方式中,目标角色还设置有姿态数据。播放动画视频,并当播放至包含目标角色的画面时,播放台词音频,并控制目标角色呈现姿态。其中,当播放至包含目标角色的画面时,播放台词音频,并控制目标角色呈现姿态,包括以下任一种:在显示包含目标角色的画面的过程中,先播放台词音频,并在播完台词音频后,控制目标角色呈现姿态;或者,在显示包含目标角色的画面的过程中,控制目标角色先呈现姿态,再播放台词音频;或者,在台词音频的播放过程中的任意时刻,控制目标角色呈现姿态。
举例来说,如图5b所示,在显示包含目标角色的画面过程中,控制目标角色先呈现“睁眼” 的面部姿态,再播放台词音频“醒了吗?波特”;或者,在显示包含目标角色的画面过程中,控制目标角色先呈现“放下筷子”的肢体动作,再播放台词音频“吃饱了吗?”,等等。通过这种方式,可以呈现出较为丰富的画面内容,提升趣味性和动画视频的表现力。
在一种可能的实现方式中,目标角色还设置有姿态数据。若文本的数量大于1,一个文本对应一个台词音频,则目标角色在动画视频中的台词音频数量大于1,姿态数据的数量大于1,则目标角色在动画视频中需要呈现的姿态数量大于1。计算机设备还用于执行以下操作:设置各个文本之间的第一输出顺序,并在动画视频中按照第一输出顺序依次播放各文本对应的台词音频;设置各个姿态数据之间的第二输出顺序,并在动画视频中按照第二输出顺序控制目标角色依次呈现各姿态数据对应的姿态;设置任一文本与至少一个姿态数据之间的关联输出顺序,并在动画视频中控制播放任一文本对应的台词音频的过程中,按照关联输出顺序控制目标角色呈现至少一个相应姿态。其中,第一输出顺序、第二输出顺序和关联输出顺序均支持动态调整。
其中,各个文本之间的第一输出顺序可以为:从左至右依次输出的顺序;各个姿态数据之间的第二输出顺序可以为:针对相邻的面部姿态和肢体动作,依次输出面部姿态对应的姿态数据和肢体动作对应的姿态数据;关联输出顺序可以为:先输出文本、再输出姿态数据等等。
举例来说,请参见图5d,图5d是本申请实施例提供的一种设置输出顺序的脚本界面示意图。如图5d所示,该脚本界面示意图中显示有用户为A角色和B角色所设置的文本及姿态数据,所设置的文本及姿态数据可分别表示为如下三句脚本数据:
第一句脚本数据:A角色:文本A1[表情A1][动作A1][表情A2]文本A2[表情A3],其中,[表情A1][动作A1][表情A2],属于相邻的面部姿态和肢体动作,因此,依次输出;
第二句脚本数据:B角色:[表情B1][动作B1][表情B2],其中,[表情B1][动作B1][表情B2],属于相邻的面部姿态和肢体动作,因此,依次输出;
第三句脚本数据:A角色:文本A3。
那么,根据上述所设置的第一输出顺序、第二输出顺序和关联输出顺序,在播放动画视频的过程中,可以按照上述所提及的脚本数据呈现相应的画面内容。首先,执行第一句脚本数据:A角色朗读[文本A1]对应的台词音频,然后朗读[文本A2]对应的台词音频,A角色在朗读[文本A2]对应的台词音频时,开始串行表现[表情A1]对应的面部姿态、[动作A1]对应的肢体动作、以及[表情A2]对应的面部姿态;由于[表情A3]后面没有紧跟的文本,因此A角色在朗读完[文本A2]对应的台词音频后,表现[表情A3]对应的面部姿态。然后,执行第二句脚本数据:B角色串行表现[表情B1]对应的面部姿态、[动作B1]对应的肢体动作、[表情B2]对应的面部姿态。最后,执行第三句脚本数据:A角色朗读[文本A3]对应的台词音频。
可以理解的是,第一输出顺序、第二输出顺序和关联输出顺序均支持动态调整。具体来说,任一文本与至少一个姿态数据之间的关联输出顺序可以调整为:先输出姿态数据、再输出文本。若姿态数据对应的姿态包括面部姿态和肢体动作,那么可以在输出面部姿态的同时输出肢体动作;或者,先输出面部姿态,再输出肢体动作;又或者,先输出肢体动作,再输出面部姿态。
通过这种方式,用户在编辑视频的过程中,通过确定目标角色、为目标角色设置相应的文本和姿态数据后,即可按照所设置的第一输出顺序、第二输出顺序和关联输出顺序控制目标角色呈现相应的姿态,且随着第一输出顺序、第二输出顺序和关联输出顺序的调整,其生成的动画视频所显示的画面内容也不相同,从而可以提高视频编辑的趣味性,提升用户体验感。
接下来,对分镜调序及分镜编辑等相关过程进行详细说明:
在一种可能的实现方式中,分镜支持被调序。动画视频包括第一分镜和第二分镜,第一分镜的播放顺序先于第二分镜的播放顺序。显示分镜调序界面,分镜调序界面中包括第一分镜和第二分镜,且第一分镜和第二分镜按照播放顺序的先后顺序排列显示于分镜调序界面中。在分镜调序界面中变更第一分镜和/或第二分镜的排列位置;根据变更后的排列位置,调整所述第一分镜或所述第二分镜的播放顺序。
举例来说,请参见图6a,图6a是本申请实施例提供的一种分镜调序的界面示意图。如图6a所示,视频编辑界面S601中显示有动画视频,该动画视频可以包括分镜1、分镜2和分镜3,且分镜1的播放顺序早于分镜2的播放顺序,分镜2的播放顺序早于分镜3的播放顺序。该视频编 辑界面S601中设有分镜调序控件6011,当分镜调序控件6011被选择(例如单击、双击或长按等操作)时,显示分镜调序界面S602,该分镜调序界面S602中显示有动画视频所包括的多个分镜(分镜1、分镜2和分镜3),并且分镜1、分镜2和分镜3,按照播放顺序,先后排列显示于该分镜调序界面S602中。用户可以在分镜调序界面S602中拖拽分镜以变更被拖拽的分镜的排列位置,例如可以将分镜1拖拽至分镜2的显示位置处,那么分镜2即可自动显示于分镜1的显示位置处。当分镜1和分镜2的排列位置发生变化后,可以显示分镜调序界面S603。
进一步地,分镜1和分镜2的排列位置发生变化后,分镜1和分镜2在视频编辑界面中的播放顺序也被适应调整。如视频编辑界面S604所示,分镜2的播放顺序先于分镜1的播放顺序。通过这种方式,本申请可以快速、便捷地对动画视频所包括的多个分镜进行调序,从而调整各个分镜在动画视频中的播放顺序。
在一种可能的实现方式中,分镜支持被编辑。动画视频包含至少一个分镜,任一分镜均支持被编辑,编辑包括复制、删除、动态修改中的任一种或多种。响应于对目标分镜的动态修改操作,显示目标分镜对应的时间轴编辑面板;在目标分镜对应的时间轴编辑面板上,对目标分镜涉及的画面内容进行动态修改,并基于动态修改更新动画视频。
具体来说,目标分镜包含多个视频帧,时间轴编辑面板中显示有每个视频帧的画面内容、以及每个视频帧中各角色分别对应的脚本数据。脚本数据包括以下任一项或多项:各角色的台词音频对应的文本,以及各角色对应的姿态数据。
举例来说,请参见图6b,图6b是本申请实施例提供的一种分镜编辑的界面示意图。如图6b所示,视频编辑界面S605中显示有多个分镜,例如分镜1、分镜2,任一个分镜对应一个编辑控件。当针对目标分镜(例如分镜1)的编辑控件6051被选择时,可以显示分镜1对应的分镜编辑菜单栏6061,该分镜编辑菜单栏6061中显示有调整分镜功能项、复制分镜功能项、删除分镜功能项。其中,调整分镜功能项用于支持分镜被动态修改、复制分镜功能项用于支持分镜被复制、删除分镜功能项用于支持分镜被删除。若用户点击调整分镜功能项,则可以显示目标分镜(分镜1)对应的时间轴编辑面板6071。其中,该时间轴编辑面板6071中显示有该分镜所包括的每个视频帧(例如每1s为一个视频帧)对应的画面内容、以及A角色和B角色分别对应的脚本数据(例如文本和姿态数据)。其中,A角色对应的文本可以为:“Hi~快,跑起来!”,姿态数据可以包括:“开心”和“皱眉头”、以及“挥手”和“跺腿”;B角色对应的文本可以为:“爬不动啦,太累了”,姿态数据可以包括:“苦笑”和“皱眉头”、以及“躺下”和“跺腿”。
进一步地,动态修改包括以下任一项:对任一角色对应的脚本数据进行位置调整、对任一角色对应的脚本数据进行时间调整、对不同角色之间的脚本数据进行对齐处理。
举例来说,请参见图6c,图6c是本申请实施例提供的一种动态修改的界面示意图。如图6c所示,在视频编辑界面S608中,例如,可以将A角色的姿态数据“皱眉头”删除,并将姿态数据“开心”对应的时间条(例如0s-1s这一时间段)拉长至整个分镜1的时间轴6081直至结束(例如0s-4s这一时间段),从而保证A角色在分镜1中由原来的在0s-1s这一时间段内保持开心的面部姿态,并修改为在0s-4s这一时间段内均保持开心的面部姿态。又如,可以将B角色的文本“跑不动啦,太累了”修改为“好的,一起加油”,以及将姿态数据“皱眉头”删除,并将姿态数据“开心”对应的时间条(例如0s-1s这一时间段)拉长至整个分镜1的时间轴直至结束(例如0s-4s这一时间段),从而可以保证A角色和B角色在分镜1的画面内容中始终一起呈现开心的面部姿态。通过这种方式,可以通过分镜级别的时间轴编辑面板,精细地调节动画视频中在呈现各个角色时所对应的面部姿态、肢体动作、台词音频等,从而满足视频创作者更高的视频创作需求。
接下来,对角色切换及角色管理等相关过程进行详细说明:
在一种可能的实现方式中,文本对应的角色支持被切换。具体来说,接收角色切换操作,角色切换操作用于将文本对应的目标角色切换为参考角色;响应于角色切换操作,将文本对应的目标角色替换为参考角色;其中,参考角色是在触发目标角色的标识后所显示的角色选择面板中选择的;或者,参考角色是在快捷选择器中选择的,快捷选择器用于显示预设时间段内达到预设选择频次的多个角色标识。可以理解的是,若在预设时间段内达到预设选择频次的角色标识较多,可以进一步从达到预设选择频次的多个角色标识中,确定出选择频次排名前N个的角色标识,显 示于快捷选择器中,N为正整数。
举例来说,在编辑文本的过程中,可以对当前文本所对应的角色进行快捷切换。请参见图7a,图7a是本申请实施例提供的一种角色切换的界面示意图。如图7a所示,在视频编辑界面S701中显示有动画视频所包括的各个角色对应的文本,例如目标角色对应的文本7011:“醒了吗?波特”,然后当点击(例如单击、双击、或者长按等操作)该文本7011时,可以显示文本编辑框。该文本编辑中设有两个快捷入口,其中,第一个快捷入口可以为目标角色的标识7022,若点击该目标角色的标识7022,即可唤起角色选择面板,然后通过在角色选择面板中选择相应的参考角色后以替换目标角色。那么,在动画视频中,该文本:“醒了吗?波特”将由参考角色替换目标角色进行朗读。
另外,第二个快捷入口可以为角色快捷选择器7021,其中,该角色快捷选择器7021中显示有预设时间段内达到预设选择频次的多个角色标识,例如角色快捷选择器7021显示有最近一个星期内选择频次排名前三的三个角色标识,然后用户可以直接在该角色快捷选择器7021中选择第二角色以替换第一角色。通过这种方式,可以在编辑文本的过程中,快捷的对该文本对应的角色进行切换。
在一种可能的实现方式中,动画视频中的角色支持被管理。具体来说,视频编辑界面中设置有角色管理控件,当角色管理控件被选中时,输出角色管理界面,角色管理界面中显示有动画视频中包含的所有角色及针对每个角色的管理项;根据管理项对动画视频中的各个角色进行管理,其中,管理项包括角色替换项,管理包括角色替换;或者,管理项包括音色更换项,管理包括更改角色的台词音频的音色。
举例来说,请参见图7b,图7b是本申请实施例提供的一种角色管理的界面示意图。如图7b所示,视频编辑界面S704中设置有角色管理控件7041,响应于该角色管理控件7041被触发,显示角色管理界面S705,该角色管理界面S705中显示有多个角色,例如角色1、角色2、角色3、角色4、角色5、角色6,其中,每个角色可以对应一个角色替换项和一个音色更换项。例如,角色1可以对应一个角色替换项7051和一个音色更换项7052。例如,若用户点击角色替换项7051可以输出角色选择面板,然后在该角色选择面板中可以选择一个角色2,以此实现将动画视频中角色1替换为角色2,那么,后续在动画视频中所有呈现角色1的画面内容均对应被替换为呈现角色2的画面内容。
又如,若用户点击音色更换项7052可以输出音色选择面板7061,该音色选择面板7061中显示有一种或多种音色标识,例如“蜡笔小新”、“海绵宝宝”、“小猪佩奇”、“李太白”、“狐妲己”、“安琪拉”等等。通过音色选择面板7061可以为角色1选择指定的音色,以使在播放动画视频时,使得目标角色按照所指定的音色朗读对应的台词音频。通过这种方式,可以基于音色更换项使得目标角色在朗读台词音频时由一种类型的音色(例如小男孩音色)变更为另一种类型的音色(搞笑诙谐的大叔音色),从而丰富动画视频的趣味性、提升用户体验感。
最后,对编辑好的动画视频的导出以及分享过程进行详细说明:
在一种可能的实现方式中,生成的动画视频支持被导出。对所述动画视频进行视频导出操作,所述视频导出操作包括以下任一种或多种:保存至所述终端设备中、发布至所述动画视频的创作者主页中、分享至社交会话中。
请参见图8a,图8a是本申请实施例提供的一种导出动画视频的界面示意图。如图8a所示,视频编辑界面S801中设置有导出控件8011,当用户点击该导出控件8011后,可以在视频编辑界面S802中的预览区域8022播放该动画视频,并可以显示动画视频的加载状态,例如,显示文字“视频合成中25%“,其表示该动画视频已被加载25%,剩余75%还未加载完毕。同时,视频编辑界面S802中还可以设置有开关控件8021,若点击该开关控件8021,则可以在动画视频导出完成后,同步将该动画视频保存至终端设备中。
接下来,在点击该开关控件8021后,可以再次点击导出控件8031,即可完成动画视频的正式导出,并可以在创作者的主页S804中,显示已导出(发布)的动画视频8041。同时,还可以在创作者的主页S804中更新作品数量,例如将作品数量增加1(例如作品数量可以由15变为16),即代表完成一个动画视频的发布。
在一种可能的实现方式中,生成的动画视频支持被分享。请参见图8b,图8b是本申请实施例提供的一种分享动画视频的界面示意图。如图8b所示,视频编辑界面是S805中还设置有分享控件8051,若点击该分享控件8051,即可输出视频分享界面S806,该视频分享界面S806可以显示有多个社交会话,例如社交会话1、社交会话2、以及社交会话3。
其中,所谓社交会话可以包括单独会话和群聊会话。单独会话是指两个社交用户参与的社交会话,用于在该两个社交用户之间进行信息交流。群聊会话是指多个(大于两个)社交用户参与的社交会话,用于在该多个社交用户之间进行信息交流。例如,用户可以在该视频分享界面S806中选择至少一个社交会话进行视频分享。例如,若选择社交会话1(该社交会话1可以为群聊会话),则可以显示社交会话1的社交会话界面S807,并在该社交会话界面S807中显示动画视频的视频链接。例如,该动画视频是由用户3分享的,那么在社交会话界面S807中的对话框8071中,可以显示该视频链接。其中,该视频链接可以包括但不限于:网址、图标等,社交会话1中的任一社交用户可以通过触发该视频链接,从而在社交会话界面中S807播放该动画视频。
接下来,对本申请所涉及的相关后台技术进行详细说明。其中,将结合附图分别对上述所提及的视频编辑系统中的终端设备侧和服务器侧所涉及的具体技术进行相应说明:
一、终端设备侧所涉及的相关技术:
本申请实施例中,针对终端设备侧而言,其核心是设计了核心数据层,该核心数据层中存储有用于驱动生成动画视频的核心数据,核心数据可以包括但不限于:为目标角色输入的文本、以及为目标角色所设置的姿态数据等。
另外,本申请可以基于核心数据层的数据单向驱动方式,来保证各个UI界面的同步实时更新,例如UI界面可以包括但不限于:预览区域对应的UI界面、脚本显示区域对应的UI界面、分镜调序界面、时间轴编辑面板。
接下来,以预览区域对应的UI界面和脚本显示区域对应的UI界面为例,本申请实施例将基于数据单项驱动的方式,来保证预览区域的UI界面与脚本显示区域的UI界面的同步实时更新。
(1)视频编辑界面中的预览区域的UI界面与脚本显示区域的UI界面的同步更新:
请参见图9a,图9a是本申请实施例提供的一种终端设备的结构示意图。如图9a所示,预览区域的UI界面与脚本显示区域的UI界面的同步更新,均是通过Data Store(数据存储层)对所存储的核心数据进行管理的方式来进行驱动的。其中,每个UI界面都会有对应UI界面的核心数据,然后通过计算各自UI界面的核心数据可以得到各个UI界面对应的UIState值(界面状态值),最后,不同的UI界面可以根据自身所对应的UIState值进行界面的刷新变化。
具体来说,如图9a所示,终端设备的结构可以包括但不限于以下模块:预览区域、数据转换层、数据存储层、界面层、基础组件数据管理器、以及基础组件观察器。接下来,分别对上述各个模块进行相应介绍:
预览区域:可以用于预览动画视频,具体可以用于预览动画视频所包括的目标角色,在一种可能的实现方式中,若目标角色还设置有姿态数据,那么在该预览区域中还可以预览目标角色所呈现的面部姿态或肢体动作等等。
数据转换层:可用于从时间轴文本界面控件(例如键盘)和时间轴图标界面控件(例如场景选择面板、角色选择面板等)中获取新增数据,并可基于所获取的新增数据进行数据转换,从而得到场景界面数据、角色界面数据。其中,场景界面数据可以是用户在场景选择面板中选择了场景画面后生成的;角色界面数据可以包括:用户在角色选择面板中所选择的目标角色(例如目标角色的标识)、为目标角色所选择的音色(例如搞笑诙谐的大叔音色)、以及为目标角色所设置的姿态数据(例如面部姿态和/或肢体动作所对应的数据)。
数据存储层:用于接收数据转换层所转换后的数据,例如上述所提及的场景界面数据和角色界面数据。
当然,这些数据(场景界面数据和角色界面数据)可以分别由对应的UI界面获取,例如角色界面数据可以由预览区域对应的UI界面获取,然后计算处理器可以基于该角色界面数据计算得到预览区域对应的UI界面的UIState值,以便对预览区域对应的UI界面进行更新。
界面层:用于从数据存储层获取(订阅)相应的界面数据(例如场景界面数据和角色界面数据),并基于获取到的界面数据更新显示时间轴编辑面板;或者,当在视频编辑界面中检测到相应操作(例如拖拽时间轴)时,向数据存储层发送回调指令,以通知数据存储层基于相应的操作更新核心数据。
基础组件观察器:用于观察或监测数据新增以及数据转换过程,并将监测结果反馈给数据转换层。
基础组件数据管理器:用于接收数据转换层所发送的场景界面数据、角色界面数据等,以使预览区域基于这些数据(场景界面数据、角色界面数据)显示动画视频。
通过上述方式,在终端设备侧设计核心数据层,基于数据单向驱动来保证多处UI界面的同步更新,例如用户在脚本显示区域为目标角色添加一个肢体动作,即可在预览区域同步控制目标角色呈现该肢体动作;又如用户在脚本显示区域为目标角色添加一个面部姿态(例如一个开心的表情),即可在预览区域同步控制目标角色表现出开心的表情。
(2)接下来,将结合本申请上述实施例所提及的确定角色和输入文本这两个过程,对数据单向驱动的详细过程进一步说明:
请参见图9b,图9b是本申请实施例提供的数据单向驱动的流程示意图。如图9b所示,数据单向驱动主要涉及终端设备中的核心数据层(Data Store)、脚本显示区域、渲染器(预览区域的处理器)、以及服务器之间的交互,上述交互流程可以包括以下步骤:
其中,确定角色的相关过程如下:
1、用户可以在终端设备的脚本显示区域中打开角色选择面板,也称为角色浮层页。响应于用户的操作,向服务器发送角色信息获取请求,该角色信息获取请求用于请求获取角色列表配置信息(例如可以包括至少一个角色标识、名称等信息)。
2、服务器(即后台)拉取角色列表配置信息。
3、在脚本显示区域中展示服务器下发的角色列表。具体地,在角色选择面板显示角色列表配置信息,包括至少一个角色标识、以及每个角色对应的名称。
4、用户从角色选择面板中点击目标角色标识,并向服务器发送针对目标角色的配置信息(即角色素材)的获取请求。具体地,用户从角色浮层页点击某个角色,并请求从后台拉取角色素材。
5、服务器响应于针对目标角色的配置信息的获取请求,获取(拉取)目标角色的配置信息,即该角色的素材,并发送至终端设备。
6、终端设备接收目标角色的配置信息,并在脚本显示区域通知已确定目标角色。具体地,素材拉取回来后,通知选中该角色。在一实施例中,可以根据角色列表配置信息,下载目标角色的Spine(骨骼)动画等相关文件,在角色选择面板中展示该目标角色的选中框。
7、当用户在角色选择面板中点击目标角色标识并成功选中该目标角色后,脚本显示区域可以向核心数据层发送数据修改请求。核心数据层响应脚本显示区域发送的数据修改请求,对核心数据层所存储的核心数据进行更新,得到更新后的核心数据。
8、更新后的核心数据单向流动,从而驱动重新计算各种界面状态值(例如预览区域的UIState值、以及脚本显示区域的UIState值)。具体地,可以遍历所有注册处理器计算界面状态值。所谓单向流动是指核心数据层可以将更新后的核心数据发送至脚本显示区域和预览区域。
9、脚本显示区域基于更新后的核心数据,利用相应的处理器(如脚本列表处理器)来计算自身的UIState值。
10、预览区域基于更新后的核心数据,利用相应的处理器(例如渲染器)计算Diff数据(差异数据),其中,该差异数据是基于更新前的核心数据以及更新后的核心数据计算得到的。
11、脚本显示区域对应的UIState值和预览区域对应的UIState值计算完成后,驱动各UI界面基于计算完的UIState值进行UI界面的刷新。
12、脚本显示区域基于脚本显示区域对应的UIState值,刷新UI界面。
13、预览区域基于预览区域对应的UIState值,刷新UI界面,即可以在预览区域显示目标角色。具体地,渲染器根据diff数据,新增一个角色节点,在渲染器展示当前新增角色。
另外,输入文本的相关过程如下:
14、用户点击角色辅助框气泡,例如请参见图4中的界面S401中的属性编辑区域4011,用户可以点击属性编辑区域4011中的文本编辑项4012。
15、拉取键盘(如图4中属性编辑区域4011所示),执行文本编辑操作,例如具体可以为输入一段文本(如台词),点击完成。
16、当在脚本显示区域中检测到文本编辑操作后,可以向核心数据层发送数据修改请求,用于请求核心数据层再次更新核心数据。
17、核心数据层触发加载并更新核心数据,得到再次更新后的核心数据。再次更新后的核心数据单向流动,从而驱动重新计算各种界面状态值(例如预览区域的UIState值、以及脚本显示区域的UIState值)。
另外,在加载并更新核心数据的过程中被资源中间件拦截,异步向服务器发送文本转换请求,以使服务器执行步骤20。
18、脚本显示区域基于再次更新后的核心数据,利用相应的处理器(如脚本列表处理器)来计算自身的UIState值。
19、脚本显示区域显示已输入的文本(例如图5a界面S501中所示的“醒了吗?波特”)。
20、服务器响应文本转换请求,和TTS服务交互,利用TTS技术,将用户输入的文本转换为台词音频(例如mp3文件),并将文本对应的台词音频mp3文件、以及文本发送给渲染器。
21、渲染器基于再次更新后的核心数据计算Diff数据(差异数据),并等待服务器TTS转换结束。在结束后,可以接收服务器发送的台词音频mp3文件、以及文本。
22、渲染器基于接收到的文本及对应的台词音频mp3文件,渲染台词组件,并且在渲染器中的预览区域显示文本渲染后的界面。例如图5b中界面S502所示,在后续用户触发播放动画视频时,可以在区域5021中向用户呈现目标角色、以及通过播放控件5023控制目标角色朗读文本对应的台词音频。
通过这种方式,各个UI界面基于核心数据层中的更新后的核心数据,利用数据单向驱动的方式,保证各个UI界面(例如脚本显示区域对应的UI界面和预览区域对应的UI界面)的同步更新。另外,在服务器进行TTS文本转语音操作的过程中,采用了非阻塞式转换,所谓非阻塞式转换可以做到不打断用户编辑视频的过程,例如在用户输入文本的同时,异步向服务器拉取对应的台词音频(mp3文件),并且通知渲染器加载后以供用户在预览区域实时预览。
(3)核心数据的具体结构:
可以理解的是,本申请主要基于上述所提及的核心数据来驱动生成动画视频,因此对核心数据的具体结构进行详细说明。请参见图9c,图9c是本申请实施例提供的一种核心数据的结构示意图。如图9c所示,本申请实施例中,根据产品形态可将核心数据的结构进行层级拆分,具体可以拆分为:Project(项目)、StoryBoard(分镜)、Statement(语句/文本)、Component(组件)等多个层级。
具体来说,上述所提及的项目可以是针对动画视频而言,添加一个分镜、删除一个分镜等;该项目对应的项目数据可以包括项目状态,所谓项目状态可用于反映动画视频的状态,例如动画视频的状态可以为播放状态或者暂停状态等等。
另外,针对分镜层级而言,可以在该分镜下添加一个角色、添加一段文本、删除一段文本等;该分镜对应的分镜数据可以包括但不限于:角色列表(角色选择面板中所显示的多个角色标识)、音色列表(音色选择面板中所显示的多个音色标识)等等。接下来,针对分镜下的语句(需要说明的是,本申请后续实施例中“语句”又可称为“文本”)层级而言,该语句对应的语句数据可以包括:角色语句(为目标角色所设置的文本)、背景语句(可以是默认的文本)、音效语句等等。最后,上述所提及的组件可以包括但不限于:角色台词组件(用于控制目标角色朗读文本对应的台词音频)、角色组件(用于显示目标角色)、角色表情组件(用于控制目标角色呈现面部姿态)、音乐组件(用于控制目标角色按照所设置的音色进行发声)、等等。
可以理解的是,通过对核心数据进行多个层级拆分后,就可以通过每个层级对应的数据来精准表达用户不同维度的操作,例如,在分镜内添加一个角色(可称为目标角色)、添加一条语句;又如,对分镜内的目标角色设置姿态数据(面部姿态对应的、肢体动作对应的姿态数据)等等, 从而可以基于更精细的层级来适应动画视频更完美的创作效果。
(4)双链表数据管理方案:
结合图9c所示的核心数据的结构示意图,本申请实施例主要采用了双向链表数据管理方案,来适应于用户频繁执行的文本编辑操作(例如修改文本、增加文本或删除文本),顺序调整操作(例如调整至少一个分镜的顺序),和插入操作(例如在一个分镜下添加一个角色、或插入一段文本、为角色设置姿态数据)等等。
其中,所谓双链表又称双向链表,双向链表数据中的每个数据结点包括两个指针(开始指针和结束指针),这两个指针又可简称为首尾指针。接下来,结合附图10a-图10g分别对应的不同case(案例)对本申请实施例提供的双向链表数据管理方案进行详细说明:
Case1:在分镜1下为角色1添加文本A。
请参见图10a,图10a是本申请实施例提供的一种操作案例的流程示意图。如图10a所示,该案例主要包括以下几个步骤:
1.更新分镜1文本双向链表,增加文本A;
2.更新分镜1角色双向链表,增加角色1节点;
3.更新分镜1角色1节点的双向链表,增加文本A。
Case2:在分镜1下为角色1添加文本B。
请参见图10b,图10b是本申请实施例提供的另一种操作案例的流程示意图。如图10b所示,该案例主要包括以下几个步骤:
1.更新分镜1文本双向链表,增加文本B;
2.更新分镜1角色1节点的双向链表,增加文本B。
Case3:在分镜1下为角色2添加文本C。
请参见图10c,图10c是本申请实施例提供的另一种操作案例的流程示意图。如图10c所示,该案例主要包括以下几个步骤:
1.更新分镜1文本双向链表,增加文本C;
2.更新分镜1角色双向链表,增加角色2节点;
3.更新分镜1角色2节点的文本双向链表,增加文本C。
Case4:增加分镜2,以及在分镜2下为角色1添加文本D。
请参见图10d,图10d是本申请实施例提供的另一种操作案例的流程示意图。如图10d所示,该案例主要包括以下几个步骤:
1.更新分镜双向链表,增加分镜2;
2.更新分镜2文本双向链表,增加文本D;
3.更新分镜2角色双向链表,增加角色1节点;
4.更新分镜2角色1节点的文本双向链表,增加文本D;
5.查找分镜1下的文本双向链表,链接文本C和文本D;
6.查找分镜1下角色1的文本双向链表,链接文本B和文本D。
Case5:交换分镜1下的文本C和文本A的位置。
请参见图10e,图10e是本申请实施例提供的另一种操作案例的流程示意图。如图10e所示,该案例主要包括以下几个步骤:
1.更新分镜1下的文本双向链表,交换文本A和文本C;
2.查找分镜2下的文本双向链表,链接文本A和文本D,断开文本C;
3.更新分镜1下角色1的双向链表,交换文本B和文本A;
4.更新分镜1下的角色双向链表,交换角色2和角色1的位置。
Case6:增加分镜3,以及在分镜3下添加角色2,并为角色2添加文本E。
请参见图10f,图10f是本申请实施例提供的另一种操作案例的流程示意图。如图10f所示,该案例主要包括以下几个步骤:
1.更新分镜双向链表,增加分镜3;
2.更新分镜3文本双向链表,增加文本E;
3.更新分镜3角色双向链表,增加角色2节点;
4.更新分镜3下的角色2节点的文本双向链表,增加文本E;
5.查找分镜2的文本双向链表,链接文本D和文本E;
6.查找分镜1的角色2的文本双向链表,链接文本C和文本E。
Case7:调整分镜1和分镜2的顺序。
请参见图10g,图10g是本申请实施例提供的另一种操作案例的流程示意图。如图10g所示,该案例主要包括以下几个步骤:
1.更新分镜双向链表,调整分镜1和分镜2的顺序;
2.更新分镜1文本双向链表,将文本A和文本E链接,以及将文本C和文本D链接;
3.更新分镜2文本双向链表,将文本D和文本C链接;
4.查找分镜1的角色1的文本双向链表,将文本D和文本B链接。
(5)文本编辑操作流程:
接下来,结合图11a-图11c对本申请实施例针对文本所涉及的编辑操作进行详细说明:
①在文本B后插入文本A。
请参见图11a,图11a是本申请实施例提供的一种文本编辑操作的流程示意图。该文本编辑操作可以具体包括如下步骤S11-S16:
S11、在主时间轴双向链表中将文本A插入文本B后,并更新时间。
S12、确定是否需要更新分镜的首尾指针。若不需要,则执行S13;若需要,则执行S16。
S13、判断角色文本列表中是否包括文本中的角色。若包括,则执行S14;若不包括,则执行S15。
S14、将文本插入到该角色的文本双向链表中。
S15、创建该角色的角色文本双向链表,并加入到角色列表中。
S16、更新分镜的结束指针。
②删除文本A。
请参见图11b,图11b是本申请实施例提供的另一种文本编辑操作的流程示意图。该文本编辑操作可以具体包括如下步骤S21-S27:
S21、将文本A在主时间轴双向链表中删除,并更新时间。
S22、确定是否需要更新分镜的首尾指针。若不需要,则执行S23;若需要,则执行S27。
S23、将文本A从角色文本双向链表中删除。
S24、判断该角色是否没有文本。若是,则执行S26;若否,则执行S25。
S25、将角色文本列表重新排序。
S26、将该角色从角色文本列表中移除。
S27、更新分镜的结束指针。
③调整文本A和文本B的顺序。
请参见图11c,图11c是本申请实施例提供的另一种文本编辑操作的流程示意图。该文本编辑操作可以具体包括如下步骤S31-S38:
S31、在主时间轴双向链表中调整文本A和文本B的位置,更新时间。
S32、判断是否需要更新分镜的首尾指针。若需要,则执行S38;若不需要,则执行S33。
S33、判断文本A和文本B是否对应于同一角色。若是,则执行S34;若不是,则执行S35。
S34、调整文本A和文本B在角色文本链表中的位置。
S35、将文本A和文本B分别从各自的角色文本链表中移除。
S36、将文本A和文本B,在各自的角色文本链表中,根据设置时间,找到合适的位置插入。
S37、角色文本链表中根据文本的设置时间重新排序。例如,文本A的设置时间为10:00,文本B的设置时间为10:01,那么,在文本双向链表中文本A可以排列于文本B之前。
S38、更新分镜的首尾指针。
综上可知,可以看到在对文本进行编辑的过程中,同步维护主时间轴双向链表和各自角色双向链表。其中,依赖主时间轴双向链表在动画视频中获取按照时间顺序自上到下的所有文本的排 序列表。并且,依赖角色双向链表,可以快速获取到某个角色自身对应的文本,这种多维度数据链表的方式便于满足在不同业务场景中展示文本的业务诉求。
(6)脚本显示界面:
其中,上述所提及的脚本显示界面是指脚本显示区域(如图5a所示的脚本显示区域5012)对应的界面,且脚本显示界面支持被编辑。请参见图12,图12是本申请实施例提供的一种脚本显示区域的技术架构图,其中,该技术架构图主要包括:分镜列表界面层(显示有分镜列表,分镜列表包括至少一个分镜,如图5a所示的脚本显示区域5012中显示有分镜1、分镜2、以及分镜3)和数据层(可包括业务数据层、数据处理层、以及公共数据层)。
其中,分镜列表界面层和数据层一一对应,其中包括:分镜列表视图与分镜数据列表一一对应,空容器与分镜数据一一对应,脚本列表视图与脚本数据列表一一对应,脚本视图单元与脚本数据一一对应。这样,根据依赖链形成了树形结构:
a.分镜列表视图由分镜视图单元组成;
b.分镜视图单元包括两种容器,一种为空容器,所谓空容器是指初始化空分镜数据对应的容器;另一种是包含脚本列表视图的容器;
c.脚本列表视图是一种支持滚动的视图,由单条脚本视图单元组成;
d.脚本视图单元可以分为多个种类,具体与它所绑定的脚本数据种类有关,例如角色脚本、旁白脚本、音乐脚本。
其中,数据层分为三个部分:
a.业务数据层:和分镜列表界面层强相关,为脚本显示界面提供数据来源。其中,所谓和分镜列表界面层强相关,是指业务数据层中包含了很多分镜列表界面层的数据属性,比如突出显示文本时所对应的数据;
b.数据处理层:用于处理业务数据层和公共数据层数据相互转换的过程;
c.公共数据层:用于存储核心数据,以及多个模块(业务数据层、数据处理层和公共数据层)之间共享的数据。
(7)时间轴编辑面板:
本申请实施例中,针对时间轴编辑面板,主要涉及以下几项内容:
①对时间轴编辑面板所显示的脚本数据(例如图6c所示的各角色的台词音频对应的文本,以及各角色对应的姿态数据)进行数据转换和数据封装。
②显示时间轴面板(panel)列表,例如在时间轴编辑面板中显示各个角色的文本及姿态数据,并将每个角色的文本及姿态数据以列表的形式呈现给用户。如图6c所示,时间轴编辑面板中显示有A角色的文本和姿态数据、以及B角色的文本和姿态数据,并且上述数据可以以列表的形式呈现在时间轴编辑面板中。
③时间轴编辑面板上的UI操作,具体可以包括在时间轴编辑面板上执行的增加、删除、修改操作。例如,在图6c中界面S608所示的时间轴编辑面板中对B角色对应的文本“爬不动啦,太累了”修改为界面S609所示的文本“好的,一起加油”;又如,将图6c中界面S608所示的时间轴编辑面板中对A角色对应的姿态数据“皱眉头”进行删除等等。
④对时间轴编辑面板上执行的上述操作(增加、删除、修改等操作)进行检测及回调(所谓回调是指返回检测到的结果),以使渲染器基于回调后的数据实时更新相应的UI界面(即预览区域所呈现的界面)。
⑤时间轴(如图6c所示的时间轴6081,显示有当前分镜对应的总时长,例如分镜1的总时长为4s)和底部bar(横条)通知联调。举例来说,如图6c所示,若用户将分镜1的背景音乐(鞭炮声)对应从时间轴的1s-2s位置处拖长至1s-4s位置处,那么在显示背景音乐的横条对应也适应性调整,从而达到时间轴和背景音乐的横条之间的联动调整。
⑥分镜数据(可以包括上述检测及回调后确定的数据)实时更新,通知渲染器数据联调。具体来说,在检测到针对当前分镜对应的时间轴编辑面板上执行了相应的操作后,通知渲染器实时更新预览区域中的画面内容,从而用户在时间轴编辑面板中执行了编辑操作后,可同步在预览区域中更新相应的画面内容,达到数据联动的效果。
接下来,结合附图13a-图13b进一步对本申请实施例所涉及的在时间轴编辑面板上执行的编辑操作进行说明。请参见图13a,图13a是本申请实施例提供的一种时间轴编辑的流程示意图。如图13a所示,当在时间轴编辑面板(也即时间轴编辑区域)中检测到编辑操作(增加、修改、删除等操作)时,可以生成时间轴UI内部事件,并将该时间轴UI内部事件经过事件creator通知给中间件,以使中间件将该时间轴UI内部事件发送至计算处理器,计算处理器响应于时间轴UI内部事件,从而计算得到该时间轴UI内部事件对应的状态值,并将计算得到的状态值写入至于存储器(store)中。接下来,存储器中写入了新的状态值后可以进行状态通知管理,以回调计算得到的状态值(包括时间轴状态值TimelineViewState)至时间轴编辑区域(TimelineViewArea),从而驱动时间轴编辑面板进行状态更新。
请参见图13b,图13b是本申请实施例提供的另一种时间轴编辑的流程示意图。如图13b所示,用户在时间轴编辑面板中执行编辑操作后可以生成点击事件,该点击事件可以经业务方作为选中编辑事件发送至时间轴中间件进行转发,以使时间轴计算处理器和菜单计算处理器分别计算时间轴状态及菜单状态,最后,各计算处理器计算得到的状态用于通知修改对应的UI界面发生变化。具体来说,时间轴处理器计算时间轴状态后,通知修改时间轴视图(时间轴编辑面板对应的UI界面发生更新),以及菜单计算处理器计算菜单状态后,通知修改菜单视图。
二、服务器侧所涉及的相关技术:
(1)素材管理架构:
本申请实施例中,在执行视频编辑的过程中主要涉及“在视频编辑界面中确定目标角色及输入文本”,其中,所确定的目标角色以及输入的文本等相关素材均需要从服务器侧获取,因此本申请实施例基于对相关素材的管理,提出了一种素材管理的架构图。
请参见图14a,图14a是本申请实施例提供的一种素材管理的架构示意图。如图14a所示,终端设备侧的用户(客户端对象)可以在视频客户端(例如,StoryVerse APP)所显示的视频编辑界面中执行视频编辑操作,例如视频编辑操作具体可以包括:确定目标角色、输入一段文本、为目标角色设置姿态数据等等。视频客户端可以基于客户端对象的视频编辑操作,拉取场景分类列表、场景列表、角色分类列表、角色列表、表情列表、以及动作列表,并将上述数据发送至视频接入层,以使视频接入层将上述数据发送给素材模块(如素材模块dao层)进行素材管理。
另外,管理对象可以将场景素材(用于提供一种或多种场景画面)、骨骼素材(用于提供一种或多种角色对应的Spine动画文件)、角色素材(用于提供一种或多种角色)、表情素材(用于提供一种或多种面部姿态)、动作素材(用于提供一种或多种肢体动作)、以及音色类型(用于提供一种或多种音色)等发送至素材管理端进行素材管理。最后,素材模块dao层可以对素材管理端和素材模块所涉及的相关素材进行统一的素材管理,并进行本地存储localcache,写入至素材管理库中。
(2)素材业务模型的抽象处理:
请参见图14b,图14b是本申请实施例提供的一种素材业务模型的结构示意图。如图14b所示,为了使得动画视频中的目标角色可以方便的进行角色替换、面部姿态的切换以及肢体动作的切换等(例如,将动画视频中的目标角色切换为参考角色;又如,将目标角色的面部姿态由“开心”切换为“难过”;还如,将目标角色的肢体动作由“旋转”切换为“躺下”,等等),本申请实施例可以采用“插拔思想”让目标角色进行角色的替换、以及动作的切换。
所谓插拔思想是指可以基于目标角色对应的基础骨骼(可由骨骼素材提供),在该骨骼上增加相应的装扮附件,或者,可以新增一个骨骼。这些新增的骨骼和装扮附件同样可以从当前骨骼中进行"摘除",也就是说,插拔思想主要是采用拼接的方式方便骨骼、装扮附件的新增和删除。
基于图14b所示的素材业务模型的“插拔思想”,可以对素材业务模型进行进一步抽象,并结合图14c,图14c是本申请实施例提供的一种素材业务模型的抽象原理图。其涉及的具体流程可以包括如下:
1.对资源类型进行抽象分类,从而用于区分各类素材。例如资源类型可以包括但不限于:角色、骨骼、动作(肢体动作)、表情(面部姿态)、音乐、音色。
2.每种资源类型支持进一步分类:例如,场景类型支持分类为日常和室内;又如,角色类型支持分类为熊出没、斗罗大陆。其中,各资源类型在分类过程中支持树形结构。
3.卡片:归属于资源分类下,表示为具体的资源实例,例如:角色资源对应的资源实例可以包括:喵呜、熊大、蜡笔小新等;又如,表情资源对应的资源实例可以包括:哭、笑。其中,卡片也需要支持树形结构。
4.骨骼:例如可以包含角色素材、动作素材、表情素材等,其中,动作素材用于为目标角色呈现肢体动作时提供资源,表情素材用于为目标角色呈现姿态表情时提供资源。
5.素材:视频客户端需要使用的具体素材资源,素材资源例如可以为:图片、压缩包(例如zip包、rar包)等,本申请实施例对素材资源的数据格式并不作具体限定。
可以看出,本申请实施例中,通过对资源类型进行抽象分类的方式,若新增一类素材,则只需新增一个资源类型即可达到复用素材业务模型的效果,从而节约了设计成本。
请参见图15,图15是本申请实施例提供的一种视频编辑装置的结构示意图。视频编辑装置1500可以是运行于计算机设备中的一个计算机程序(包括程序代码),例如该视频编辑装置1500为一个应用软件;该视频编辑装置1500可以用于执行本申请实施例提供的方法中的相应步骤。该视频编辑装置1500可包括:
显示单元1501,用于显示视频编辑界面;
处理单元1502,用于在所述视频编辑界面中确定目标角色及输入的文本,其中,所述文本在所述视频编辑界面中以文本行的方式呈现,且所述视频编辑界面支持对所述文本行中的文本进行编辑;
处理单元1502,还用于生成动画视频,所述动画视频包含所述目标角色,且在所述动画视频中为所述目标角色设置所述文本对应的台词音频,其中,在动画视频的播放过程中,当播放至包含目标角色的画面时,同步播放文本对应的台词音频。
其中,处理单元1502所作的操作,可参照上述方法实施例所述,在此不再赘述。
请参见图16,图16是本申请实施例提供的一种计算机设备的结构示意图。该计算机设备1600用于执行前述方法实施例中终端设备或服务器所执行的步骤,该计算机设备1600包括:至少一个处理器1610;至少一个输入设备1620,至少一个输出设备1630和存储器1640。上述处理器1610、输入设备1620、输出设备1630和存储器1640通过总线1650连接。存储器1640用于存储计算机程序,所述计算机程序包括程序指令,处理器1610用于调用存储器1640存储的程序指令,执行上述实施例中所述的各种操作。
此外,这里需要指出的是:本申请实施例还提供了一种计算机存储介质,且计算机存储介质中存储有计算机程序,且该计算机程序包括程序指令,当处理器执行上述程序指令时,能够执行前文所对应实施例中的方法,因此,这里将不再进行赘述。对于本申请所涉及的计算机存储介质实施例中未披露的技术细节,请参照本申请方法实施例的描述。作为示例,程序指令可以被部署在一个计算机设备上,或者在位于一个地点的多个计算机设备上执行,又或者,在分布在多个地点且通过通信网络互连的多个计算机设备上执行。
根据本申请的一个方面,提供了一种计算机程序产品,该计算机程序产品包括计算机程序,计算机设备的处理器从计算机程序产品读取到该计算机程序后,该处理器可以执行该计算机程序,使得该计算机设备可以执行前文所对应实施例中的方法,因此,这里将不再进行赘述。
特别需要说明的是,在本申请的上述具体实施方式中,涉及到对象信息(例如对象的标识、昵称)等相关的数据,当本申请以上实施例运用到具体产品或技术中时,需要获得对象许可或者同意,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,上述程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,上述存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
以上所揭露的仅为本申请较佳实施例而已,当然不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。

Claims (23)

  1. 一种视频编辑方法,由计算机设备执行,包括:
    显示视频编辑界面;
    在所述视频编辑界面中确定目标角色及输入的文本,其中,所述文本在所述视频编辑界面中以文本行的方式呈现,且所述视频编辑界面支持对所述文本行中的文本进行编辑;及,
    生成动画视频,所述动画视频包含所述目标角色,且在所述动画视频中为所述目标角色设置所述文本对应的台词音频;
    其中,在所述动画视频的播放过程中,当播放至包含所述目标角色的画面时,同步播放所述文本对应的台词音频。
  2. 如权利要求1所述的方法,还包括:
    在所述台词音频的播放过程中,在所述包含所述目标角色的画面中突出显示所述文本;
    其中,所述突出显示包括以下任一种或多种:放大字体显示、改变字体颜色显示、按照预设字体显示。
  3. 如权利要求1所述的方法,还包括:
    在所述视频编辑界面中为所述目标角色设置姿态数据,所述姿态数据用于控制所述目标角色在所述动画视频中呈现的姿态,所述姿态包括以下一种或多种:面部姿态及肢体动作。
  4. 如权利要求3所述的方法,还包括:
    播放所述动画视频,并当播放至包含所述目标角色的画面时,播放所述台词音频,并控制所述目标角色呈现所述姿态。
  5. 如权利要求4所述的方法,其中,所述当播放至包含所述目标角色的画面时,播放所述台词音频,并控制所述目标角色呈现所述姿态,包括以下任一种:
    在显示包含所述目标角色的画面的过程中,先播放所述台词音频,并在播完所述台词音频后,控制所述目标角色呈现所述姿态;或者,
    在显示包含所述目标角色的画面的过程中,控制所述目标角色先呈现所述姿态,再播放所述台词音频;或者,
    在所述台词音频的播放过程中的任意时刻,控制所述目标角色呈现所述姿态。
  6. 如权利要求3所述的方法,其中,若所述文本的数量大于1,一个文本对应一个台词音频,则所述目标角色在所述动画视频中的台词音频数量大于1;所述姿态数据的数量大于1,则所述目标角色在所述动画视频中需要呈现的姿态数量大于1;所述方法还包括:
    设置各个文本之间的第一输出顺序,并在所述动画视频中按照所述第一输出顺序依次播放各文本对应的台词音频;
    设置各个姿态数据之间的第二输出顺序,并在所述动画视频中按照所述第二输出顺序控制所述目标角色依次呈现各姿态数据对应的姿态;
    设置任一文本与至少一个姿态数据之间的关联输出顺序,并在所述动画视频中控制播放所述任一文本对应的台词音频的过程中,按照所述关联输出顺序控制所述目标角色呈现至少一个相应姿态;
    其中,所述第一输出顺序、所述第二输出顺序和所述关联输出顺序均支持动态调整。
  7. 如权利要求1所述的方法,其中,所述动画视频包括第一分镜和第二分镜;所述方法还包括:
    显示分镜调序界面,所述分镜调序界面中包含所述第一分镜和所述第二分镜,且所述第一分镜和所述第二分镜,按照播放的先后顺序,排列显示于所述分镜调序界面中;
    在所述分镜调序界面中,变更所述第一分镜和/或所述第二分镜的排列位置;
    根据变更后的排列位置,调整所述第一分镜或所述第二分镜的播放顺序。
  8. 如权利要求1所述的方法,其中,所述动画视频包含至少一个分镜,任一分镜均支持被编辑,所述编辑包括复制、删除、动态修改中的任一种或多种;所述方法还包括:
    响应于对目标分镜的动态修改操作,显示所述目标分镜对应的时间轴编辑面板;
    在所述目标分镜对应的时间轴编辑面板上,对所述目标分镜涉及的画面内容进行修改,并基于所述 修改更新所述动画视频。
  9. 如权利要求8所述的方法,其中,所述目标分镜包含多个视频帧,所述时间轴编辑面板中显示有每个视频帧的画面内容、以及每个视频帧中各角色分别对应的脚本数据;所述脚本数据包括以下任一项或多项:各角色的台词音频对应的文本,以及各角色对应的姿态数据;
    所述修改包括以下任一项:对任一角色对应的脚本数据进行位置调整、对任一角色对应的脚本数据进行时间调整、对不同角色之间的脚本数据进行对齐处理。
  10. 如权利要求1所述的方法,还包括:
    接收角色切换操作,所述角色切换操作用于将所述文本对应的目标角色切换为参考角色;
    响应于所述角色切换操作,将所述文本对应的目标角色替换为参考角色;
    其中,所述参考角色是在触发所述目标角色的标识后所显示的角色选择面板中选择的;或者,所述参考角色是在快捷选择器中选择的,所述快捷选择器用于显示预设时间段内达到预设选择频次的多个角色标识。
  11. 如权利要求1所述的方法,其中,所述视频编辑界面中还设置有角色管理控件;所述方法还包括:
    当所述角色管理控件被选中时,输出角色管理界面,所述角色管理界面中显示有所述动画视频中包含的所有角色及针对每个角色的管理项;
    根据所述管理项对所述动画视频中的各个角色进行管理;
    其中,所述管理项包括角色替换项,所述管理包括角色替换;或者,所述管理项包括音色更换项,所述管理包括更改角色的台词音频的音色。
  12. 如权利要求1所述的方法,其中,所述在所述视频编辑界面中确定目标角色,包括:
    当检测到角色添加事件时,触发在所述视频编辑界面中添加目标角色;
    其中,所述角色添加事件是通过触发角色添加入口后生成的;
    或者,所述角色添加事件是在检测到角色添加手势后生成的,所述角色添加手势包括:单击手势、双击手势、悬浮手势、预设手势中的任一种。
  13. 如权利要求12所述的方法,其中,所述角色添加入口被设置于所述视频编辑界面中;所述当检测到角色添加事件时,触发在所述视频编辑界面中添加目标角色,包括:
    响应于所述角色添加入口被触发,输出角色选择面板,所述角色选择面板中显示有至少一个待选择的角色标识;
    响应于针对目标角色标识的选择操作,在所述视频编辑界面中显示所述目标角色标识对应的目标角色。
  14. 如权利要求1所述的方法,其中,所述视频编辑界面中显示有多个待编辑的历史视频,任一历史视频中包括至少一个角色;
    所述在所述视频编辑界面中确定目标角色,包括:
    从所述历史视频中选择任一角色确定为目标角色。
  15. 如权利要求13或14所述的方法,其中,所述视频编辑界面包括预览区域;所述方法还包括:
    在选择目标角色的过程中,每次被选择的角色均显示于所述预览区域中,且所述预览区域中显示的角色随着选择操作的切换而进行替换;
    当所述目标角色被选中时,在所述预览区域中呈现所述目标角色。
  16. 如权利要求15所述的方法,其中,所述目标角色还设置有姿态数据;所述视频编辑界面还包括脚本显示区域;所述脚本显示区域用于显示所述文本、以及为所述目标角色所设置的姿态数据。
  17. 如权利要求1所述的方法,其中,所述视频编辑界面显示于终端设备中;所述方法还包括:
    对所述动画视频进行视频导出操作,所述视频导出操作包括以下任一种或多种:保存至所述终端设备中、发布至所述动画视频的创作者主页中、分享至社交会话中。
  18. 一种视频编辑装置,包括:
    显示单元,用于显示视频编辑界面;
    处理单元,用于在所述视频编辑界面中确定目标角色及输入的文本,其中,所述文本在所述视频编辑界面中以文本行的方式呈现,且所述视频编辑界面支持对所述文本行中的文本进行编辑;
    所述处理单元,还用于生成动画视频,所述动画视频包含所述目标角色,且在所述动画视频中为所述目标角色设置所述文本对应的台词音频;
    其中,在所述动画视频的播放过程中,当播放至包含所述目标角色的画面时,同步播放所述文本对应的台词音频。
  19. 如权利要求1所述的装置,其中,所述处理单元还用于,在所述视频编辑界面中为所述目标角色设置姿态数据,所述姿态数据用于控制所述目标角色在所述动画视频中呈现的姿态,所述姿态包括以下一种或多种:面部姿态及肢体动作。
  20. 如权利要求1所述的装置,其中,所述动画视频包括第一分镜和第二分镜;所述处理单元还用于,显示分镜调序界面,所述分镜调序界面中包含所述第一分镜和所述第二分镜,且所述第一分镜和所述第二分镜,按照播放的先后顺序,排列显示于所述分镜调序界面中;在所述分镜调序界面中,变更所述第一分镜和/或所述第二分镜的排列位置;根据变更后的排列位置,调整所述第一分镜或所述第二分镜的播放顺序。
  21. 一种计算机设备,包括:存储装置和处理器;
    存储器,所述存储器中存储一条或多条计算机程序;
    处理器,用于加载所述一条或多条计算机程序实现如权利要求1-17任一项所述的视频编辑方法。
  22. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序适于被处理器加载并执行如权利要求1-17任一项所述的视频编辑方法。
  23. 一种计算机程序产品,包括计算机程序,所述计算机程序适于被处理器加载并执行如权利要求1-17任一项所述的视频编辑方法。
PCT/CN2023/086471 2022-05-30 2023-04-06 视频编辑方法、装置、计算机设备、存储介质及产品 WO2023231568A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210603765.9A CN115129212A (zh) 2022-05-30 2022-05-30 视频编辑方法、装置、计算机设备、存储介质及产品
CN202210603765.9 2022-05-30

Publications (1)

Publication Number Publication Date
WO2023231568A1 true WO2023231568A1 (zh) 2023-12-07

Family

ID=83378226

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/086471 WO2023231568A1 (zh) 2022-05-30 2023-04-06 视频编辑方法、装置、计算机设备、存储介质及产品

Country Status (2)

Country Link
CN (1) CN115129212A (zh)
WO (1) WO2023231568A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115129212A (zh) * 2022-05-30 2022-09-30 腾讯科技(深圳)有限公司 视频编辑方法、装置、计算机设备、存储介质及产品
CN116882370B (zh) * 2023-07-10 2024-04-26 广州开得联智能科技有限公司 一种内容处理方法、装置、电子设备及存储介质
CN117173293B (zh) * 2023-11-03 2024-01-26 武汉方拓数字科技有限公司 一种基于Unity的动画曲线序列化方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107770626A (zh) * 2017-11-06 2018-03-06 腾讯科技(深圳)有限公司 视频素材的处理方法、视频合成方法、装置及存储介质
CN110163939A (zh) * 2019-05-28 2019-08-23 上海米哈游网络科技股份有限公司 三维动画角色表情生成方法、装置、设备及存储介质
CN110781346A (zh) * 2019-09-06 2020-02-11 天脉聚源(杭州)传媒科技有限公司 基于虚拟形象的新闻生产方法、系统、装置和存储介质
CN114363691A (zh) * 2021-04-22 2022-04-15 南京亿铭科技有限公司 语音字幕合成方法、装置、计算机设备及存储介质
CN115129212A (zh) * 2022-05-30 2022-09-30 腾讯科技(深圳)有限公司 视频编辑方法、装置、计算机设备、存储介质及产品

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8831948B2 (en) * 2008-06-06 2014-09-09 At&T Intellectual Property I, L.P. System and method for synthetically generated speech describing media content
CN104703056B (zh) * 2013-12-04 2019-04-12 腾讯科技(北京)有限公司 一种视频播放方法、装置和系统
CN110880198A (zh) * 2018-09-06 2020-03-13 百度在线网络技术(北京)有限公司 动画生成方法和装置
CN111601174A (zh) * 2020-04-26 2020-08-28 维沃移动通信有限公司 一种字幕添加方法及装置
CN113192161B (zh) * 2021-04-22 2022-10-18 清华珠三角研究院 一种虚拟人形象视频生成方法、系统、装置及存储介质
CN113923515A (zh) * 2021-09-29 2022-01-11 马上消费金融股份有限公司 视频制作方法、装置、电子设备及存储介质
CN115115753A (zh) * 2022-05-30 2022-09-27 腾讯科技(深圳)有限公司 动画视频处理方法、装置、设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107770626A (zh) * 2017-11-06 2018-03-06 腾讯科技(深圳)有限公司 视频素材的处理方法、视频合成方法、装置及存储介质
CN110163939A (zh) * 2019-05-28 2019-08-23 上海米哈游网络科技股份有限公司 三维动画角色表情生成方法、装置、设备及存储介质
CN110781346A (zh) * 2019-09-06 2020-02-11 天脉聚源(杭州)传媒科技有限公司 基于虚拟形象的新闻生产方法、系统、装置和存储介质
CN114363691A (zh) * 2021-04-22 2022-04-15 南京亿铭科技有限公司 语音字幕合成方法、装置、计算机设备及存储介质
CN115129212A (zh) * 2022-05-30 2022-09-30 腾讯科技(深圳)有限公司 视频编辑方法、装置、计算机设备、存储介质及产品

Also Published As

Publication number Publication date
CN115129212A (zh) 2022-09-30

Similar Documents

Publication Publication Date Title
WO2023231568A1 (zh) 视频编辑方法、装置、计算机设备、存储介质及产品
TWI776066B (zh) 圖片生成方法、裝置、終端、伺服器及儲存媒體
CN111294663B (zh) 弹幕处理方法、装置、电子设备及计算机可读存储介质
US20240107127A1 (en) Video display method and apparatus, video processing method, apparatus, and system, device, and medium
WO2022042593A1 (zh) 字幕编辑方法、装置和电子设备
US11263397B1 (en) Management of presentation content including interjecting live feeds into presentation content
US20210392403A1 (en) Smart Television And Server
US20220198403A1 (en) Method and device for interacting meeting minute, apparatus and medium
CN111193960B (zh) 视频处理方法、装置、电子设备及计算机可读存储介质
WO2023240943A1 (zh) 生成数字人的方法、模型的训练方法、装置、设备和介质
US20180143741A1 (en) Intelligent graphical feature generation for user content
WO2020220773A1 (zh) 图片预览信息的显示方法、装置、电子设备及计算机可读存储介质
JP7240505B2 (ja) 音声パケット推薦方法、装置、電子機器およびプログラム
US20140282000A1 (en) Animated character conversation generator
WO2024037557A1 (zh) 特效道具处理方法、装置、电子设备及存储介质
WO2023134568A1 (zh) 显示方法、装置、电子设备及存储介质
WO2023284498A1 (zh) 视频播放方法、装置及存储介质
US11782984B2 (en) Styling a query response based on a subject identified in the query
KR20120000595A (ko) 멀티플랫폼에서 구동되는 온라인 멀티미디어 콘텐츠 제작툴 제공 방법 및 시스템
CN102419708B (zh) 一种j2me游戏运行方法和装置
CN112689177B (zh) 一种实现快速交互的方法及显示设备
CN113518160A (zh) 视频生成方法、装置、设备及存储介质
CN111625740A (zh) 图像显示方法、图像显示装置和电子设备
WO2022262560A1 (zh) 图像显示方法、装置、设备及存储介质
US20240121485A1 (en) Method, apparatus, device, medium and program product for obtaining text material

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23814764

Country of ref document: EP

Kind code of ref document: A1