CN117156199A

CN117156199A - Digital short-man video production platform and production method thereof

Info

Publication number: CN117156199A
Application number: CN202211311642.4A
Authority: CN
Inventors: 司马华鹏; 刘杰; 杜萍萍
Original assignee: Nanjing Silicon Intelligence Technology Co Ltd
Current assignee: Nanjing Silicon Intelligence Technology Co Ltd
Priority date: 2022-05-23
Filing date: 2022-10-25
Publication date: 2023-12-01
Also published as: CN117119207A; CN117119123A

Abstract

The embodiment of the application provides a digital short-man video production platform and a production method, which are provided with a model unit, a template unit, a sound unit, a table unit and a management unit, wherein the model unit is configured to: receiving a digital mannequin selection instruction sent by a user; responding to a digital mannequin selection instruction, and displaying a static photo and a dynamic video of a target digital mannequin; the template unit is configured to: receiving a digital person template selection editing instruction sent by a user; responding to the digital person template selection editing instruction to obtain the target template; the sound unit is configured to: receiving a sound selection instruction sent by a user; responding to the sound selection instruction, and playing the target sound selected by the user; the desk book unit is configured to: receiving a codebook selection instruction sent by a user; responding to the table selection instruction, and generating a target table; the management unit is configured to: receiving a management instruction sent by a user; a target digital personal video is produced and generated in response to the management instructions.

Description

Digital short-man video production platform and production method thereof

The present application claims priority from the chinese patent office, application number 202210571898.2, entitled "a digital personal video generation service method based on digital personal video platform," filed on day 23, 5, 2022, the entire contents of which are incorporated herein by reference.

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a digital short video production platform and a production method thereof.

Background

Along with the development of artificial intelligence technology and the popularization of metauniverse concepts, virtual images such as virtual anchor, virtual idol, virtual staff and the like become important components in internet activities. The virtual figure is composed of a digital person/virtual person (hereinafter, collectively referred to as a digital person), specifically, the digital person is composed of a preset figure, sound, and scene, and the digital person can be driven to display different actions or expressions according to the voice or content input by the user, so as to generate a section of digital person video.

In general, when a user has a service requirement for generating a digital personal video, the user needs to contact a company (hereinafter, collectively referred to as a customizing company) that can provide an avatar customizing service, inform the customizing company of the service requirement, such as a preference of a digital personal image and a voice, a specific application scenario, a content of the digital personal video, and the like, and the customizing company makes a corresponding digital person in the background according to the service requirement, and completes the generation of the corresponding digital personal video. The digital person making process comprises selecting corresponding digital person images, selecting corresponding sounds, training digital persons and the like; the process of generating the digital person video includes driving the digital person, editing the video, and the like.

In combination with the above example, in the digital personal video generation process in the related art, the cooperation manner between the user and the customizing company is similar to the conventional customizing service, that is, the requirement is put forward to the service party, and the service party provides the corresponding service or product according to the requirement. In the implementation of the method, on one hand, from the perspective of a user, the user needs to actively or passively contact with a customizing company, which definitely improves a threshold for generating the digital human video, and in the process of generating the digital human video, the user needs to repeatedly communicate with the customizing company according to the requirements, so that the cost of manpower and time is reduced, and the efficiency of generating the digital human video is lowered. On the other hand, from the perspective of a custom company, the digital human video generation service can be promoted only by adopting a traditional marketing mode, so that the marketing cost is too high, the promotion efficiency is low, and the comprehensive promotion of the digital human video generation service is difficult to realize for common consumers; in addition, in the digital human video generation process, the customizing company also needs to continuously adjust the mode or the content of the digital human video generation according to the user requirement, which further causes that the cost and the efficiency of the digital human video generation are not ideal.

Under the background of rapid popularization of metauniverse concepts, the demands of digital people in different application scenes are increasing day by day, and correspondingly, for example, a service mode is generated by adopting a digital person video under the traditional model, and the service mode cannot be matched with the service demands due to the problems.

Aiming at a series of problems that in the related technology, a user cannot quickly generate a digital human video according to own needs, a custom company cannot efficiently popularize own digital human video generation service, cost and efficiency of digital human video generation cannot be controlled and the like, no effective solution is provided in the related technology.

Disclosure of Invention

The application provides a digital short video production platform and a production method thereof, which at least solve the technical problems that users cannot quickly produce digital human videos according to own needs, custom companies cannot efficiently popularize own digital human video production services, and cost and efficiency of digital human video production cannot be controlled in the related technologies.

The application provides a digital short-man video production platform, which is provided with a model unit, a template unit, a sound unit, a table unit and a management unit, wherein the model unit is configured to: receiving a digital mannequin selection instruction sent by a user, wherein the digital mannequin selection instruction is used for generating a target digital mannequin corresponding to a target digital mannequin; responding to the digital mannequin selection instruction, and displaying a static photo and a dynamic video of the target digital mannequin; the template unit is configured to: receiving a digital person template selection editing instruction sent by a user, wherein the digital person template selection editing instruction is used for editing a preset template through the user so as to generate a target template; responding to the digital person template selection editing instruction to obtain the target template, wherein the target template is generated according to different application requirements under different preset scenes, and the preset scenes comprise characters, images and audio; the sound unit is configured to: receiving a sound selection instruction sent by a user, wherein the sound selection instruction is used for making the target digital mannequin pronounce according to target sound; playing the target sound selected by the user in response to the sound selection instruction; the table unit is configured to: receiving a table book selection instruction sent by a user, wherein the table book selection instruction is used for generating a target table book with corresponding text content according to the target theme; generating the target codebook in response to the codebook selection instruction, wherein the target codebook is a codebook with text content corresponding to the target theme; the management unit is configured to: receiving a management instruction sent by a user, wherein the management instruction is used for generating a target digital short-person video by editing the target digital mannequin, the target template, the target sound and the target table book; and responding to the management instruction to produce and generate the target digital short-man video.

In one implementation, the platform is further provided with a live unit configured to: receiving a live broadcast instruction sent by a user, wherein the live broadcast instruction is used for enabling the target digital mannequin to carry out video live broadcast under the corresponding target template, target sound and target table book; and responding to the live broadcast instruction, and realizing video live broadcast based on the target digital mannequin.

In one implementation, a model unit is used for acquiring a target digital mannequin, a template unit is used for acquiring a target template, a sound unit is used for acquiring a target sound, and a table unit is used for acquiring a target table; the target digital mannequin, the target template, the target sound and the target table book are uniformly managed by adopting a management unit; generating a target digital short video according to the target digital mannequin, the target sound and the target table book by adopting a management unit based on the target template; and storing the generated target digital short human video by adopting a management unit, and outputting the target digital short human video.

In one implementation, the target templates include full templates with specific formats and user-defined blank templates.

In one implementation, the generating, by using a management unit, a target digital short video according to the target digital mannequin, the target sound, and the target table based on the target template, further includes: receiving a material video input by a user by adopting the management unit, wherein the material video is a video which is self-shot or downloaded by the user; acquiring audio content or video content corresponding to the material video by adopting the management unit; when the material video corresponds to the audio content, driving the target digital mannequin to act according to the audio content by adopting the management unit so as to generate a target digital short video; and when the material video corresponds to video content, driving the target digital mannequin to act according to the video content by adopting the management unit so as to generate a target digital short video.

In one implementation, driving the target digital mannequin to act according to the video content to generate a target digital mannequin video includes: extracting video content corresponding to the material video; and migrating the character actions in the video content to the target digital mannequin in an action migration mode, so that the target digital mannequin performs corresponding actions according to the character actions in the video to generate the target digital short video corresponding to the material video.

In one implementation, the target digital human short video is generated based on a pre-trained neural network model, specifically comprising: acquiring the target sound by adopting the sound unit, wherein the target sound is material audio input by a user or audio correspondingly generated by the sound unit; and outputting corresponding actions by the neural network model according to the target sound, and driving the target digital mannequin to generate the target digital short-person video.

In one implementation, driving the target digital mannequin according to the target sound includes: acquiring the text content edited by the user or the target table content selected by the user based on the table unit; and generating audio corresponding to the edited text content or the target table book content according to the target sound so as to drive the target digital mannequin to realize corresponding actions.

In one implementation, generating the target digital human short video based on the pre-trained neural network model further comprises: extracting the target digital mannequin, the target template, the target sound and the target table book; training the pre-trained neural network model through a training sample; and driving the target digital mannequin by adopting the trained neural network model to generate the target digital short human video.

In one implementation, the preset scene in the target template includes a non-professional field and a professional field; when the preset scene in the target template is a non-professional field, training the pre-trained neural network model by the template unit by adopting a preset general training sample; and when the preset scene in the target template is in the professional field, training the pre-trained neural network model by adopting a professional training sample.

According to the technical scheme, the digital human short video production platform provided by the application has the following technical effects:

1. according to the digital human short video production platform provided by the application, a user can realize production and generation of digital human videos according to the needs by himself without docking with a server, so that the efficiency of digital human video production is remarkably improved from the perspective of the user, and personalized processing can be better realized;

2. the service side can not only efficiently popularize the digital human video generation service, but also remarkably reduce the labor and time cost in the digital human video generation service process through the digital human short video production platform provided by the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a schematic diagram of a page of a module unit in a digital video production platform provided by the application;

FIG. 2 is a schematic diagram of a module unit in a digital video production platform according to the present application;

FIG. 3 is a schematic diagram of a sound unit in the digital video production platform according to the present application;

FIG. 4 is a schematic diagram of a page of a table unit in the digital video production platform provided by the application;

fig. 5 is a flow chart of a method for making a digital short video.

Detailed Description

The application will be described in detail hereinafter with reference to the drawings in conjunction with embodiments. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.

As shown in fig. 1, the embodiment of the application provides a digital short video production platform, which is provided with a model unit 001, a template unit 002, a sound unit 003, a table unit 004 and a management unit 005, so as to solve a series of problems that in the related art, a user cannot quickly produce a digital human video according to own requirements, a custom company cannot efficiently popularize own digital human video production service, cost and efficiency of digital human video production cannot be controlled, and the like. The digital person short video production platform is similar to a mall taking digital person video as commodity, and users can freely select and purchase the needed digital person and realize the digital person video generation service by themselves.

The digital short video production platform related to the application can be presented in any form, and the specific presentation mode is not limited in the application. For example, web pages, PC-side or mobile-side applications, weChat applets, etc.; preferably, it can be presented through a Web page or application. For example, the digital short video production platform is presented through a Web page, and can input a login website in the page to enter the login page. And using the provided account number and password to log in, such as the provided account number, password, correct input of verification code and the like, so as to log in the digital person video platform, thereby making the digital person video.

Specifically, the digital short video production platform according to the present application is composed of five parts, namely, model unit 001, template unit 002, sound unit 003, table unit 004, and management unit 005. The method comprises the following steps:

the model unit 001 is configured to: receiving a digital mannequin selection instruction sent by a user, wherein the digital mannequin selection instruction is used for generating a target digital mannequin corresponding to a target digital mannequin; and responding to the digital mannequin selection instruction, and displaying the static photo and the dynamic video of the target digital mannequin.

The model unit 001 provided by the application comprises various digital human figures for display, and a user can freely select the corresponding digital human figure as a digital human model in the model unit 001, wherein the digital human model is the digital human figure in the digital human video. When a user selects a digital mannequin, a static photo and a dynamic video of the digital mannequin can be displayed. So that the user can intuitively see the specific features of the digital mannequin and select the digital mannequin. As shown in fig. 1, the user is free to select different digital mannequins.

In some embodiments, a model market may also be included in the model unit 001. Further, the model market displays all model information and profiles, and clicking on the corresponding scene can view the example video of the model under the scene. The user can select a proper model to purchase, namely the user can have the use right of the model, so that the model can be used for video production.

The template unit 002 is configured to: receiving a digital person template selection editing instruction sent by a user, wherein the digital person template selection editing instruction is used for editing a preset template through the user so as to generate a target template; responding to the digital person template selection editing instruction to obtain the target template, wherein the target template is generated according to different application requirements under different preset scenes, and the preset scenes comprise characters, images and audio.

In some embodiments, the target templates include full templates with specific formats and user-defined blank templates. Further, the full template has a specific format and comprises various types of templates, and the blank template is completely edited by the user in a self-defined way.

Specifically, the template refers to a scene preset according to different application requirements, and characters, images, audio and the like (similar to ppt templates) can be further edited in the scene, so that a corresponding background page, contents to be displayed and the like in the digital human video are formed, and the same module can be composed of a plurality of different scenes. The user can freely select different templates, and edit the templates to form scenes, characters, images, audio and the like in the process of playing the digital human video. FIG. 2 is a page schematic of the template unit, as shown in FIG. 2, with the user being free to select different templates.

In some embodiments, the template further includes a VIP template and a general template, specifically, the user may enter the purchase page by clicking a template purchase button set in the template unit, and after purchasing the template, the user may use the template to create a speaking operation, purchase the platform diamond, and if the balance is insufficient, click a "go to recharge" button, and go to the account for recharging.

For example, in a specific scenario, the procedure of creating a video through a template may be: by using the APP corresponding to the digital human short video production platform, the AI speaking template using button in the APP is clicked to enter the video editing page. Clicking on the "modify work name" button in the page may modify the work name. Adding a scene title and a subtitle, uploading a recording image, and deleting the scene, adding the scene or copying the current scene through a button corresponding to the upper part of the scene. The PPT/Word is selected to be imported, the PPT or Word content can be added to the explanation content by one key, and the text content can be input by editing in the middle video and the picture or video can be uploaded manually. After the text content setting is completed, the voice setting is performed next. The local audio can be selected for uploading by clicking the uploading record, and if no audio exists, the voice can be produced by clicking the 'produce audio' and going to the sound market. After the audio setting is completed, the video can be synthesized by clicking a synthesis button. If the user needs to go to other pages in the video production process, the user needs to click a save button on the right upper part of the page in time to save the current setting.

Furthermore, the user can enter the video editing page by clicking a universal template 'use' button or selecting a new blank template. Specifically, clicking a button for modifying the name of the work in the page can modify the name of the work; clicking a 'recharge' button, and selecting a recharge diamond to recharge the duration of video synthesis; clicking a 'switching' button can switch the horizontal screen/vertical screen display of the video; clicking on the template, and selecting to switch different video templates; clicking on the model, and selecting different models; after the model is selected, clicking the model layer, and performing operations such as size adjustment, position movement and the like; clicking the 'add' can add a blank scene newly, clicking the 'copy' button on the selected scene, copying the current scene, clicking the 'delete' button, deleting the current scene; the PPT or Word document can also be uploaded and used as a video scene or a picture-in-picture, the PPT/Word is converted into a video by one-key synthesis, and the method is fast and efficient; the PPT and Word scenes automatically identify the explanation content in the PPT and Word.

After the scene setting is completed, explanation is next made. The content to be explained can be input in the explanation content column, and the voice can be set. The AI synthesis/uploading recording mode can be selected, the default of tone, speech speed, volume and intonation can be determined, and manual adjustment can be performed as required; "apply to global" can apply the adjusted timbre to all scenes under the video. "My" may choose to use audio files made by the user in the sound market. After the voice is listened to, the mouse is placed in the text which needs to be inserted into the pause, the insertion pause time is clicked, and the pause can be inserted.

And setting the subtitles, wherein the subtitles can be manually added or not set, and the subtitles can be automatically split according to punctuation marks. In the page material library, "My materials" can be uploaded from the local to My materials, such as pictures, backgrounds, videos, music, etc., and inserted into the videos; the background, picture, music, video and other materials provided by the system can be used for replacing the video background by one key, and the picture, the music, the video and the like can be inserted; the input text in the text is inserted into the video; selecting a subtitle adding style and setting a subtitle interval; setting the scene stay time. The layer is selected in the video display area, and the operations of drawing layer position, drawing layer size, drawing layer position setting, viewing, deleting and the like can be flexibly dragged. After all the scene settings are completed, clicking a 'composition' button to perform video composition.

The sound unit is configured to: receiving a sound selection instruction sent by a user, wherein the sound selection instruction is used for making the target digital mannequin pronounce according to target sound; and responding to the sound selection instruction, and playing the target sound selected by the user.

The sound unit comprises various sounds, different sounds have different tone colors, tones and the like, for example, male sounds, female sounds and old people sounds, a user can freely select different sounds, when the user selects a certain sound, the user can listen to the sound in a trial mode, and after the user selects a certain sound, subsequent digital persons can sound by adopting the tone colors, the tones and the like corresponding to the sound. Fig. 3 is a schematic page diagram of the sound unit 003, and as shown in fig. 3, a user can freely select different sounds.

In some embodiments, the sound unit 003 further comprises a sound market having sound models of different colors and styles, and clicking on the head portrait of the sound model can listen to the color. The sound market is classified into childhood sound, female sound and male sound according to sound types. According to the tone color of sound, the Chinese-English white-collar sound is classified into sweet, chinese-English white-collar sound and general-purpose sound. And each different type of sound will have a suitable nickname and a simple introduction to the sound model, such as su, gentle and affine, fresh and lively, representing the tutorial and prevention announcements of the work. Clicking the model, entering a detail page, checking the detailed description of the sound model, listening to the representative works of the tone color, selecting a proper package for purchase, and making the audio by using the model.

The table unit 004 is configured to: receiving a table book selection instruction sent by a user, wherein the table book selection instruction is used for generating a target table book with corresponding text content according to the target theme; and responding to the table book selection instruction, generating the target table book, wherein the target table book is a table book with text content corresponding to the target theme.

The above-mentioned table book unit 004 includes the demonstration of various table books, the text content of a certain theme is pointed to by the table book, the user can select different table books freely. The user can take the table book as the basis of video production and take the table book as the main content of video explanation in the subsequent video production process. Fig. 4 is a schematic diagram of a page of the station unit 004, and as shown, a user can freely select different stations.

In some embodiments, the table unit 004 further includes a table market, including various types of tables such as album, recommendation, epidemic prevention and control, and mind care. Including multiple books under each type would simply reveal the name and author of the book. When a table is desired to be searched, the name, type, etc. of the table can be input through the search field of the page for inquiry. The table book market provides rich table book data and albums, and the detailed contents of the table books can be checked after the interested table books are clicked to enter the detail pages and purchased and used as video explanation contents for video production and output.

The management unit 005 is configured to: receiving a management instruction sent by a user, wherein the management instruction is used for generating a target digital short-person video by editing the target digital mannequin, the target template, the target sound and the target table book; and responding to the management instruction to produce and generate the target digital short-man video.

In the management unit 005, after the user selects and purchases the relevant elements in the model unit 001, the sound unit 003, and the table unit 004, the user can manage the relevant elements in the management unit 005 in a unified manner, and can create a video and generate a digital personal video by himself.

In some embodiments, a workbench is further provided, the workbench comprises a My model page, and all models under the user account are displayed in a 'My model', and the models are of two types of 'general' and 'custom'. Click details, the model profile and scene example video detailed description can be viewed. After the model and the application scene are selected, the model can be used for video production by clicking the using button. Specifically, after a model and an application scene are selected, clicking a 'use' button to enter a video editing page; the video editing operation is detailed by referring to the universal template or the blank template, and the video editing operation is edited; clicking a using button after selecting a model and an application scene, and entering a video editing page; clicking the page 'modify work name' button to modify the work name; clicking the 'recharge' button to recharge the duration of video synthesis; clicking the save button can save the current setting content; clicking the basic information to check the basic information of the current model; clicking the model and clicking the application, and switching other models and application scenes under the user account; clicking the scene, and switching other application scenes under the current model; clicking to upload audio, or directly dragging an audio file to a content area to synthesize video, such as no audio file, or clicking to make audio, and making audio to the sound market; "voiced codebook" refers to the audio in the purchased voiced codebook; "My" may select audio that has been made in the sound market for uploading. After the audio is successfully uploaded, the audio can be played by clicking a play button, and after the audio is confirmed to be correct, the video can be synthesized by clicking a confirm synthesis button.

Further, the workbench further comprises a 'My template' page, wherein 'My template' displays templates purchased by users, and the templates can be directly used for video production. If no template exists, the "go to purchase" button can be clicked to go to the template market to purchase the video template. The creation of a video tutorial using templates refers specifically to the process of creating a video using templates.

Further, the workbench further comprises a 'My Sound model' page in which all sound models purchased by the user are displayed. Clicking a reference button after selecting the sound model, and entering an audio production page; clicking on "model" may switch other acoustic models; clicking the "book" can refer to the contents of the book for audio production, and clicking the "purchase" can go to the market of the book for purchasing the book if no book exists; clicking the text can select text uploading, including uploading PDF, word, TXT and other format documents, uploading records and other modes; text content can also be directly input in the text box; after the text is set, clicking a page 'modify work name' button to modify the work name; clicking the "upload cover" can upload the work cover, also can not upload the cover and use the default cover; clicking "listen to" can listen to the audio on test; after the mouse is placed in the text content to be stopped, clicking the 'insert stop', and inserting the stop time, wherein the stop time is selected from default 0.5S, 1S and 2S, and the self-defined stop time can be input; for words with incorrect pronunciation such as multi-tone words, a 'pronunciation check' button can be clicked, the modified words can be selected manually, and right clicking of a mouse selects correct pronunciation or self-defines pronunciation; the voice setting column can manually adjust the speech speed, intonation and sound size, or can select default settings to be not adjusted. After the setting is completed, the synthesis button is clicked, and then the audio can be synthesized.

The workbench also comprises a 'My book' page, wherein the 'My book' displays books purchased by users, such as no books can be clicked to go to the book market and go to the book market for purchase; clicking the newly built book can go to the book editing page to make the book.

Based on the application scenario, the embodiment of the application also provides a method for manufacturing the digital short video, as shown in fig. 5, which comprises the following steps:

s1, acquiring a target digital mannequin by using a model unit 001, acquiring a target template by using a template unit 002, acquiring target sound by using a sound unit 003 and acquiring a target table by using a table unit 004;

s2, uniformly managing the target digital mannequin, the target template, the target sound and the target table book by adopting a management unit 005;

s3, generating a target digital short video by using a management unit 005 based on the target template according to the target digital mannequin, the target sound and the target table book;

and S4, the management unit 005 is adopted to store the generated target digital short human video and output the target digital short human video.

In a specific implementation scene, a digital short video production platform receives a selection operation input by a user, and corresponding selection information is obtained according to the selection operation; any one of model unit 001, template unit 002, sound unit 003, table unit 004 and other material units can be selected by a user on the digital short video production platform, corresponding digital human models, templates, sounds or tables are correspondingly selected and purchased in the selected units, and the selected models, templates, sounds and tables are displayed in a unified manner in the management unit. Receiving editing operation input by a user, and editing the selected information according to the editing operation to obtain target information; the editing operation is used for performing corresponding editing and management on the digital mannequin, the template, the sound or the table book. For example, text, images, etc. are edited in templates. The target information is model information, template information, sound information or table information corresponding to the digital mannequin, the template, the sound or the table after editing. The user can manage and edit the digital mannequin, the template, the sound, the table book and the like in the management unit 005 so as to make digital human videos. Inputting the target information into a trained neural network model, and outputting a digital human video containing the target information through the neural network model; the edited digital human model, template, sound and table book are input into a trained neural network model, and digital human video is output through the neural network model. And the user completes the production of the digital human video in the management unit, correspondingly stores and outputs the digital human video, and then completes the generation of the digital human video.

Illustratively, the user correspondingly selects and purchases the corresponding digital mannequins, templates, voices and books in the model unit 001, the template unit 002, the voice unit 003 and the book unit 004, and the selected models, templates, voices and books are displayed uniformly in the management unit 005; the user can manage and edit the digital mannequin, the template, the sound, the table book and the like in the management unit 005 so as to manufacture the digital personal video, and after the user completes the manufacture of the digital personal video in the management unit, the user correspondingly stores and outputs the digital personal video, namely the generation of the digital personal video is completed. The above units are that the user generates the digital human video directly in the digital human short video making platform through the PC end or the mobile end. However, in the actual use process, some users still want to simplify the digital human video generation operation as much as possible, and for this purpose, the digital human short video generation platform in the present application proposes an embodiment capable of generating digital human video based on the video sent by the user based on the above S1-S4, specifically as follows:

in some embodiments, the generating, by using the management unit 005, a target digital short video according to the target digital mannequin, the target sound, and the target table based on the target template, further includes: receiving a material video input by a user by adopting the management unit 005, wherein the material video is a video which is self-shot or downloaded by the user; acquiring audio content or video content corresponding to the material video by adopting the management unit 005; when the material video corresponds to the audio content, driving the target digital mannequin to act by adopting the management unit 005 according to the audio content so as to generate a target digital short video; when the material video corresponds to video content, the management unit 005 is adopted to drive the target digital mannequin to act according to the video content so as to generate a target digital short video.

In some embodiments, driving the target digital mannequin to act according to the video content to generate a target digital mannequin video includes: extracting video content corresponding to the material video; and migrating the character actions in the video content to the target digital mannequin in an action migration mode, so that the target digital mannequin performs corresponding actions according to the character actions in the video to generate the target digital short video corresponding to the material video.

In the above embodiment, the user sends the video to the digital short-people video production platform, where the video may be a video that the user performs self-timer through the APP corresponding to the digital short-people video production platform, or may be a video that the user performs self-timer or downloads through the APP of a third party; the digital human short video production platform is used for generating the digital human video based on the content of the video, specifically, if a user only needs to generate the corresponding digital human video based on the audio content corresponding to the video, the digital human video platform can directly drive the digital human model based on the audio part of the video to realize corresponding actions, and further generate the digital human video. If the user demand is based on the video content corresponding to the video, such as a dance, the digital human video platform transfers the character actions in the original video to the digital human model in an action transfer mode according to a pre-trained model, so that the digital human model realizes corresponding actions according to the actions in the original video, and a digital human video conforming to the original video is generated.

In some embodiments, the target digital human short video is generated based on a pre-trained neural network model, specifically comprising: acquiring the target sound by adopting the sound unit 003, wherein the target sound is material audio input by a user or audio correspondingly generated by the sound unit; and outputting corresponding actions by the neural network model according to the target sound, and driving the target digital mannequin to generate the target digital short-person video.

In some embodiments, generating the target digital human short video based on the pre-trained neural network model further comprises: extracting the target digital mannequin, the target template, the target sound and the target table book; training the pre-trained neural network model through a training sample; and driving the target digital mannequin by adopting the trained neural network model to generate the target digital short human video.

In some embodiments, driving the target digital mannequin according to the target sound includes: acquiring the text content edited by the user or the target table content selected by the user based on the table unit 004; and generating audio corresponding to the edited text content or the target table book content according to the target sound so as to drive the target digital mannequin to realize corresponding actions.

It should be noted that, in the application, the generation service of the digital human video realized by the digital human short video production platform is realized based on a pre-trained neural network model, and the model is used for realizing audio determination of digital human actions; specifically, after the user inputs corresponding audio, the digital mannequin can form various actions according to the output of the neural network model, and further generation of digital human video is achieved.

In the above process, the audio input by the user may be from the audio provided by the user, or the audio unit 003 may generate the corresponding audio, specifically, in the process of making the digital personal video, the audio unit may generate the corresponding audio for the text edited by the user or the corresponding text in the table selected by the user according to the voice selected by the user, so as to enable the digital person to be driven to realize the corresponding action.

In some embodiments, the preset scenes within the target template include non-professional fields and professional fields; when the preset scene in the target template is a non-professional field, the template unit 002 adopts a preset general training sample to train the pre-trained neural network model; and when the preset scene in the target template is in the professional field, training the pre-trained neural network model by adopting a professional training sample.

The neural network model can be trained by the service side, the service side can train the neural network model through training samples in advance, and a user does not need to independently perform model training operation in the process of making the digital human video, so that the digital human model, the template, the sound, the table book and the like can be directly utilized to finish the generation of the digital human video. However, for some specific fields, due to the specificity of application scenes or the specificity of user requirements, the model trained by using the universal training samples is not enough to realize the ideal driving of digital human actions in the field. For example, a user desires to generate astronomical related science popularization videos, but audio or text content related in the field contains more rare terms, and a general training sample cannot fully cover the content in the field; as another example, users desire that the emotional expressions of digital mannequins in digital human video be more rich, and that the models trained by generic training samples are insufficient to support the above-mentioned needs. In this regard, the digital human video platform of the present application also provides a user self-training function, and the user can wait for selecting the digital human model, the template, the sound and the table book, input the special training sample in the specific field, and perform the personalized training treatment on the neural network model of the background, so that the actions of the digital human model can be more fit to the characteristics or the requirements of the application scene in the process of generating the digital human video by the user.

The process of digital human video production and generation described above is described below by way of a number of examples:

in one example, a user needs to generate a digital personal video with a weather forecast as a theme, and the following description will explain the process of making the video:

the user can select a weather forecast template from a plurality of templates, and the templates comprise a plurality of scenes, and the scenes respectively correspond to different video pages. Specifically, under the weather forecast template, the user can further select a corresponding digital mannequin.

After determining the digital mannequin, the user can edit the corresponding weather, constellation, blessing, ending language and the like through the preset contents under the weather forecast template. After finishing the editing of the relevant texts such as the weather, constellation, blessing, ending language and the like, the sound unit 003 can convert the text content into audio according to the sound selected by the user, and the audio further drives the digital mannequin to realize corresponding actions through the trained neural network model, so that a section of digital human video for broadcasting weather forecast by the digital mannequin can be manufactured.

According to the above embodiment, the digital person short video production platform provided by the application not only can realize the generation service of digital person video, but also can realize the live broadcast service based on digital person, namely, on the basis of the composition of the digital person video platform, a direct broadcast unit 006 is additionally arranged to realize the live broadcast service based on digital person.

In some embodiments, the digital short video production platform provided by the application not only can realize the generation service of the digital human video, but also can realize the live broadcast service based on the digital human, and the digital short video production platform also comprises a live broadcast unit 006 so as to realize the live broadcast service based on the digital human.

In some embodiments, the platform is further provided with a live unit 006, the live unit 006 being configured to: receiving a live broadcast instruction sent by a user, wherein the live broadcast instruction is used for enabling the target digital mannequin to carry out video live broadcast under the corresponding target template, target sound and target table book; and responding to the live broadcast instruction, and realizing video live broadcast based on the target digital mannequin.

The live broadcast unit 006 can realize two different live broadcast modes of digital person live broadcast or face-changing live broadcast, wherein digital person live broadcast refers to direct live broadcast by a digital person anchor selected by a user; the live face changing means that the face of the live person is replaced by the face of the corresponding digital mannequin to conduct live broadcasting, wherein the live face changing method is not limited to face changing, and any part of the body can be changed. The implementation process of the two live broadcast modes is respectively described below.

In some embodiments, the live broadcast mode includes digital human live broadcast: the user correspondingly selects and purchases the corresponding digital mannequin, template, sound and table in the model unit 001, the template unit 002, the sound unit 003 and the table unit 004. And sends the selected model, template, sound and table book to the live broadcast unit 006, and the selected model, template, sound and table book are displayed uniformly in the live broadcast unit 006. The digital mannequin provided by the model unit 001 is a digital mannequin, the template provided by the template unit 002 comprises a live scene and a live material possibly related in live broadcast, and the book provided by the book unit 004 is related content of a product introduced in the live broadcast process, wherein the content can be provided or edited by a user. The user manages and edits the digital mannequin, the template, the sound, the table book, and the like in the live broadcast unit 006, so as to perform the production process of the digital mannequin live broadcast. In this embodiment, the user may select a corresponding template, and further edit pictures, videos, and the like as materials in the live broadcast process on the basis of the template. On the basis of completing the template and the materials, the corresponding introduction can be edited in the template for the products introduced in the live broadcast process. After the digital live broadcast is finished, a user can conduct digital live broadcast through the digital human video platform.

Furthermore, the live broadcasting of the digital person comprises that the user still needs to perform corresponding live broadcasting work in the background, such as product introduction, display, interaction with consumers and the like, and the digital person in the foreground is driven to realize the actions in a mode of action migration and audio driving so as to realize live broadcasting of the digital person in the foreground; or the user does not conduct live broadcast processing in the background, only the digital person introduces the product according to the preset content, and interacts with the consumer according to the preset speaking setting.

In a digital live process, users typically engage in interactions when they mention content outside of a preset conversation range. Furthermore, the application provides text-based user interaction operation, namely, when a user confirms that the user needs to interact with a consumer in the background, the text for interaction is input through the digital human video short video production platform, a sound unit in the digital human short video production platform is converted into audio, and a digital human anchor is driven to interact with the consumer according to the content and simultaneously realize corresponding actions. Thus, the user operation can be further reduced.

In some embodiments, the live broadcast mode further includes a face-changing live broadcast: the live broadcasting process of the face changing can refer to the live broadcasting process of the digital person, and the difference is that in the live broadcasting process, a live host is used for transplanting the face of the live host to the face of the digital person model so as to conduct live broadcasting processing with the face of the live host under the corresponding template. The present application is not limited to exchanging faces, and any part of the body may be exchanged.

The application provides a digital short video production platform and a production method thereof, and a user can realize corresponding digital human video generation and digital human live broadcast service based on the platform. On the basis, the application further provides a scheme for improving user interaction experience based on action migration or text interaction and the like through a digital short-people video production platform so as to further improve convenience of users in the interaction process.

It should be noted that the present application may be applied to any service providing method involving an information carrier, including a digital short video production platform and service products provided by a related service provider.

Reference throughout this specification to "an embodiment," "some embodiments," "one embodiment," or "an embodiment," etc., means that a particular feature, component, or characteristic described in connection with the embodiment is included in at least one embodiment, and thus the phrases "in embodiments," "in some embodiments," "in at least another embodiment," or "in embodiments," etc., appearing throughout the specification do not necessarily all refer to the same embodiment. Furthermore, the particular features, components, or characteristics may be combined in any suitable manner in one or more embodiments. Thus, a particular feature, component, or characteristic shown or described in connection with one embodiment may be combined, in whole or in part, with features, components, or characteristics of one or more other embodiments, without limitation. Such modifications and variations are intended to be included within the scope of the present application.

The above-provided detailed description is merely a few examples under the general inventive concept and does not limit the scope of the present application. Any other embodiments which are extended according to the solution of the application without inventive effort fall within the scope of protection of the application for a person skilled in the art.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims

1. The utility model provides a digital people short video production platform which characterized in that is provided with model unit, template unit, sound unit, table unit and management unit, wherein:

the model unit is configured to: receiving a digital mannequin selection instruction sent by a user, wherein the digital mannequin selection instruction is used for generating a target digital mannequin corresponding to a target digital mannequin;

responding to the digital mannequin selection instruction, and displaying a static photo and a dynamic video of the target digital mannequin;

The template unit is configured to: receiving a digital person template selection editing instruction sent by a user, wherein the digital person template selection editing instruction is used for editing a preset template through the user so as to generate a target template;

responding to the digital person template selection editing instruction to obtain the target template, wherein the target template is generated according to different application requirements under different preset scenes, and the preset scenes comprise characters, images and audio;

the sound unit is configured to: receiving a sound selection instruction sent by a user, wherein the sound selection instruction is used for making the target digital mannequin pronounce according to target sound;

playing the target sound selected by the user in response to the sound selection instruction;

the table unit is configured to: receiving a table book selection instruction sent by a user, wherein the table book selection instruction is used for generating a target table book with corresponding text content according to the target theme;

generating the target codebook in response to the codebook selection instruction, wherein the target codebook is a codebook with text content corresponding to the target theme;

the management unit is configured to: receiving a management instruction sent by a user, wherein the management instruction is used for generating a target digital short-person video by editing the target digital mannequin, the target template, the target sound and the target table book;

And responding to the management instruction to produce and generate the target digital short-man video.

2. The platform of claim 1, further provided with a live unit configured to: receiving a live broadcast instruction sent by a user, wherein the live broadcast instruction is used for enabling the target digital mannequin to carry out video live broadcast under the corresponding target template, target sound and target table book;

and responding to the live broadcast instruction, and realizing video live broadcast based on the target digital mannequin.

3. A method for producing a digital human short video, comprising:

obtaining a target digital mannequin by adopting a model unit, obtaining a target template by adopting a template unit, obtaining target sound by adopting a sound unit and obtaining a target table by adopting a table unit;

the target digital mannequin, the target template, the target sound and the target table book are uniformly managed by adopting a management unit;

generating a target digital short video according to the target digital mannequin, the target sound and the target table book by adopting a management unit based on the target template;

and storing the generated target digital short human video by adopting a management unit, and outputting the target digital short human video.

4. The method of claim 3, wherein the target template comprises a full template with a specific format and a user-defined blank template.

5. The method of claim 3, wherein generating, with a management unit, a target digital short video from the target digital mannequin, the target sound, and the target table based on the target template, further comprises:

receiving a material video input by a user by adopting the management unit, wherein the material video is a video which is self-shot or downloaded by the user;

acquiring audio content or video content corresponding to the material video by adopting the management unit;

when the material video corresponds to the audio content, driving the target digital mannequin to act according to the audio content by adopting the management unit so as to generate a target digital short video;

and when the material video corresponds to video content, driving the target digital mannequin to act according to the video content by adopting the management unit so as to generate a target digital short video.

6. The method of claim 5, wherein driving the target digital mannequin to act according to the video content to generate a target digital mannequin video comprises:

Extracting video content corresponding to the material video;

and migrating the character actions in the video content to the target digital mannequin in an action migration mode, so that the target digital mannequin performs corresponding actions according to the character actions in the video to generate the target digital short video corresponding to the material video.

7. A method according to claim 3, wherein the target digital human short video is generated based on a pre-trained neural network model, comprising in particular:

acquiring the target sound by adopting the sound unit, wherein the target sound is material audio input by a user or audio correspondingly generated by the sound unit;

and outputting corresponding actions by the neural network model according to the target sound, and driving the target digital mannequin to generate the target digital short-person video.

8. The method of claim 7, wherein driving the target digital mannequin according to the target sound includes:

acquiring the text content edited by the user or the target table content selected by the user based on the table unit;

and generating audio corresponding to the edited text content or the target table book content according to the target sound so as to drive the target digital mannequin to realize corresponding actions.

9. The method of claim 7, wherein generating the target digital human-short video based on the pre-trained neural network model further comprises:

extracting the target digital mannequin, the target template, the target sound and the target table book;

training the pre-trained neural network model through a training sample;

and driving the target digital mannequin by adopting the trained neural network model to generate the target digital short human video.

10. The method of claim 9, wherein the preset scenes within the target template include non-professional fields and professional fields;

when the preset scene in the target template is a non-professional field, training the pre-trained neural network model by the template unit by adopting a preset general training sample;

and when the preset scene in the target template is in the professional field, training the pre-trained neural network model by adopting a professional training sample.