CN114339423A

CN114339423A - Short video generation method and device, computing equipment and computer readable storage medium

Info

Publication number: CN114339423A
Application number: CN202111597387.XA
Authority: CN
Inventors: 季焕文; 陶杰; 于芹
Original assignee: Migu Cultural Technology Co Ltd; China Mobile Communications Group Co Ltd
Current assignee: Migu Cultural Technology Co Ltd; China Mobile Communications Group Co Ltd
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2022-04-12
Anticipated expiration: 2041-12-24
Also published as: CN114339423B

Abstract

The embodiment of the invention relates to the technical field of video processing, and discloses a short video generation method, which comprises the following steps: determining a plurality of key video frames according to peak data of the heat of the target video; determining a plurality of initial video segments according to the character data and the audio data in each key video frame; generating role script videos of all roles from the initial short video; and generating a target short video based on the character script video. Through the mode, the embodiment of the invention enhances the user experience in the short video synthesis process.

Description

Short video generation method and device, computing equipment and computer readable storage medium

Technical Field

The embodiment of the invention relates to the technical field of video processing, in particular to a short video generation method, a short video generation device, a computing device and a computer readable storage medium.

Background

For the generation of the short video, the prior technical scheme generally adopts operation and production, and actors record the short video and upload the short video for user experience.

In the process of implementing the embodiment of the invention, the inventor finds that the conventional short video generation needs operation, actors and multi-aspect support, and has low synthesis efficiency, less playing methods and poor user experience.

Disclosure of Invention

In view of the foregoing problems, embodiments of the present invention provide a method, an apparatus, a computing device, and a computer-readable storage medium for generating a short video, which are used to solve the problem in the prior art that a user experience of short video generation is poor.

According to an aspect of an embodiment of the present invention, there is provided a short video generation method, including:

determining a plurality of key video frames according to peak data of the heat of the target video;

determining a plurality of initial video segments according to the character data and the audio data in each key video frame;

generating role script videos of all roles from the initial short video;

and generating a target short video based on the character script video.

In an alternative, the heat is a heat curve; the determining a plurality of key video frames according to the peak data of the heat of the target video and the peak data of the heat of the target video comprises the following steps: determining key video frames before and after each peak value in the heat curve; determining whether people exist in the key video frames before and after the peak value according to the people data; if no person exists, marking the key video frames before and after the peak value as an initial frame or the termination frame respectively; if the person exists, determining whether the voice exists in the audio data corresponding to the key video frame; if no voice exists, marking the key video frames before and after the peak value as an initial frame or the termination frame respectively; determining a plurality of initial video segments according to the starting frame and the ending frame.

In an alternative manner, the determining key video frames before and after each peak in the heat curve includes: and if the key video frames before and after the peak value have characters and voices, continuously searching the video frames near the key video frames forwards or backwards to serve as the key video frames.

In an optional manner, the generating of the character scenario videos of the respective characters from the initial short video includes: screening at least one initial video segment from the plurality of initial video segments to form an initial short video according to historical film watching information of a user; and identifying the voiceprints of all roles in the initial short video, and generating the role script videos of all the roles from the initial short video according to the voiceprints.

In an alternative mode, the screening out at least one initial video segment from the plurality of initial video segments as an initial short video according to the historical viewing information of the user includes: determining the emotion degree of a user corresponding to each initial video segment according to the historical film watching information; and screening out at least one initial video segment from the plurality of initial video segments according to the emotion degree to serve as an initial short video.

In an optional manner, the identifying a voiceprint of each character in the initial short video and generating a character scenario video of each character from the initial short video according to the voiceprint includes: decoding the initial short video to obtain a decoded initial short video; performing voiceprint recognition on the decoded initial short video to obtain the audio track of each role in the initial short video; and generating the role script video corresponding to each role in the initial short video according to the audio track.

In an optional manner, the generating a target short video based on the character scenario video includes: receiving a processing request of a user for the character script video, wherein the processing request comprises a sound processing request and/or a picture character processing request input by the user; performing audio processing on the character script video according to the sound processing request; and/or performing picture processing on the character script video according to the picture character processing request.

According to still another aspect of an embodiment of the present invention, there is provided a short video generating apparatus, including:

the first determining module is used for determining a plurality of key video frames according to the peak data of the heat degree of the target video;

the second determining module is used for determining a plurality of initial video segments according to the character data and the audio data in each key video frame;

the generating module is used for generating role script videos of all roles from the initial short video;

and the synthesis module is used for generating a target short video based on the role script video.

According to another aspect of embodiments of the present invention, there is provided a computing device including:

the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction which causes the processor to execute the operation of the short video generation method.

According to yet another aspect of the embodiments of the present invention, there is provided a computer-readable storage medium having at least one executable instruction stored therein, which when executed on a computing device, causes the computing device to perform the operations of the short video generation method.

According to the embodiment of the invention, a plurality of key video frames are determined according to peak data of the heat degree of a target video; according to the character data and the audio data in each key video frame, a plurality of initial video segments are determined, the character script videos of each character are generated from the initial short videos, the target short videos are generated based on the character script videos, the short video generation efficiency can be effectively improved, and the user experience is effectively improved.

The foregoing description is only an overview of the technical solutions of the embodiments of the present invention, and the embodiments of the present invention can be implemented according to the content of the description in order to make the technical means of the embodiments of the present invention more clearly understood, and the detailed description of the present invention is provided below in order to make the foregoing and other objects, features, and advantages of the embodiments of the present invention more clearly understandable.

Drawings

The drawings are only for purposes of illustrating embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 is a flowchart illustrating a short video generation method according to an embodiment of the present invention;

fig. 2 is a schematic diagram illustrating a heat curve in a short video generation method according to an embodiment of the present invention;

fig. 3 is a schematic diagram illustrating the identification of each role in the short video generation method according to the embodiment of the present invention;

fig. 4 is a schematic diagram illustrating an application scenario of short video generation in the short video generation method according to the embodiment of the present invention;

fig. 5 is a schematic diagram illustrating picture processing performed on a corner play video in a short video generation method according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a short video generation apparatus provided in an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a computing device provided by an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein.

Fig. 1 illustrates a flow diagram of a short video generation method, performed by a computing device, provided by an embodiment of the invention. The computing device may be a computer device, a terminal device, an intelligent device, a video playing device, and the like, and the embodiment of the present invention is not particularly limited. As shown in fig. 1, the method comprises the steps of:

step 110: and determining a plurality of key video frames according to the peak data of the heat of the target video.

According to the method and the device for determining the initial video segments, the initial video segments are determined from the target video according to the heat degree, the character data and the audio data of the target video. In the embodiment of the present invention, before determining a plurality of initial video segments from a target video according to the popularity, character data, and audio data of the target video, the method includes: the method comprises the steps of segmenting a target video to obtain a plurality of video segments; and determining the corresponding heat of each video fragment. Wherein the heat is a heat curve. Specifically, video is sliced according to gop (Group of Pictures, which refers to the distance between two I frames, and Reference (Reference period) refers to the distance between two P frames) to obtain a plurality of video slices. Taking the video clips with the average audio greater than a preset decibel threshold value as midpoint video clips; wherein, in the process of watching by a user, collecting audio, after denoising the audio, if the average audio of the current video slice is larger than a preset decibel threshold, the I frame (I frames) of the current video slice adopts an intraframe coding mode, namely only utilizes the spatial correlation in a single frame image, but not utilizes the time correlation, the I frame uses intraframe compression and does not use motion compensation, the I frame is independent of other frames, so the I frame is a point of entry of random access and is a decoded reference frame, the I frame is mainly used for initialization of a receiver and acquisition of a channel, and switching and insertion of programs, the compression multiple of the I frame image is relatively low, the I frame image periodically appears in an image sequence, the appearance frequency can be selected by an encoder) is marked as a midpoint alpha, wherein, the preset decibel threshold can be correspondingly set according to a specific scene, in one embodiment of the present invention, the preset decibel threshold may be 60. Detecting the upper (lower) fragment from the forward (backward) of the fragment where the current alpha is located, if db is less than 40, marking the video fragment detected before as a pre-starting (ending) point, marking the number of the video fragments with db being more than or equal to 40 found currently as n, if n is less than a first number threshold value, the video fragments are not in accordance with the requirement, otherwise, adding the video fragments into a video recommendation alternative queue. And generating a heat curve according to the video recommendation alternative queue, wherein the heat in the heat curve shows the appreciation of the current segment by the user, and the colorful degree of the current segment is reflected from the side surface. Wherein the first number threshold may be 3. In the embodiment of the present invention, as shown in fig. 2, the heat curve may be a bullet screen curve graph. The method comprises the steps of counting the number of bullet screens in each video fragment in advance, and drawing a bullet screen curve graph by taking the time stamp of the video fragment as a horizontal axis and the number of the bullet screens as a vertical axis.

Step 120: and determining a plurality of initial video segments according to the character data and the audio data in each key video frame.

In the embodiment of the invention, after the heat curve is obtained, the video frames before and after each peak value in the heat curve are determined to determine the starting frame and the ending frame, and a plurality of initial video segments are determined according to the starting frame and the ending frame. Specifically, each peak value in the heat curve is determined, a video frame corresponding to the peak value is determined as a midpoint, a starting frame β is searched forward and an ending frame θ is searched backward according to the midpoint, a point (a trigger value can be set) where a tangent slope of the starting frame β and the ending frame θ suddenly drops on the heat curve is used as a frame signal, and after a tangent slope is determined, the video frame corresponding to the tangent slope is marked as a pre-starting point β and a pre-ending point θ. Determining key video frames before and after each peak value in the heat curve; determining whether people exist in the key video frames before and after the peak value; if the person exists, determining whether the voice exists in the audio data corresponding to the key video frame; and if the voice exists, marking the key video frames before and after the peak value as the initial frame or the termination frame respectively. Specifically, a key frame β 1 closest to the pre-start point β is taken forward (if the current β is a key frame, β 1 is referred to as β), and then the video frame and the audio frame are separated by decoding, and the key video frame is detected first, and if there is no person, the start frame is marked as the key video frame β 1. If the person exists, the audio frame corresponding to the key video frame is continuously detected, if the audio frame has no voice, the key video frame beta 1 is marked as the initial frame, otherwise, the starting point key video frame beta 2 is continuously searched forward until the initial frame beta n is found. In the same way, the end frame θ n is found out in a backward search, and the initial video segment is generated based on the start frame β n and the end frame θ n. And then, continuing to generate the short video by the second highest point according to the process (if the second highest point is covered in the generated short video, continuing to search for the next highest point) until a preset second number of initial video segments are generated. The embodiment of the present invention does not specifically limit the specific value of the preset second number, and in an embodiment of the present invention, the preset second number may be 5. Thus, a plurality of initial video segments are obtained according to the starting frame and the ending frame before and after the peak value.

Step 130: and generating character script videos of all the characters from the initial short video.

In the embodiment of the invention, after a plurality of initial video segments are obtained, at least one initial video segment is screened out from the plurality of initial video segments as an initial short video according to historical film watching information of a user.

The historical viewing information of the user can be historical behaviors s of the user, a viewing record t, a favorite preference p, a celebrity placard o and facial expression changes f in the viewing process (when the user shows emotional fluctuation at a certain moment, such as laughing or crying, the expression is completely substituted into a drama).

In the embodiment of the invention, according to the historical film watching information, the emotion degree of the user corresponding to each initial video segment is determined; and screening out at least one initial video segment from the plurality of initial video segments according to the emotion degree to serve as an initial short video. Specifically, the historical behavior s, the watching record t, the favorite preference p, the celebrity brand o and the facial expression change f in the watching process of the user are collected for weight matching.

The emotion weight refers to analyzing the user emotion of the current video segment, performing expression prediction through image classification, deducing the state g of the user, such as happiness, calmness or depression, analyzing the current user audio, obtaining the corresponding decibel db through sampling p audio frames to reflect the intensity of the current emotion, obtaining the emotion state k through db1 db2 db 3./(p 10), reaching the maximum weight ratio when k is close to 6, deducing the formula y-5/24 k ^2+25/12k, and obtaining the weight ratio y actually corresponding to the emotion state k.

And comparing the emotion degrees corresponding to a plurality of initial video segments of each clip, extracting key frames for each initial video segment by using the comparison method, extracting the number N, calculating a weighted value of (s 0.1+ t 0.1+ p 0.2+ o 0.2+ f 0.4 x y)/N, determining the initial video segments with N bits before the weighting according to the weighted value, and using the initial video segments as initial short videos, thereby feeding the initial short videos back to users for playing and making. Wherein N may be 2.

And then, identifying the voiceprints of all roles in the initial short video, and generating role script videos of all roles from the initial short video according to the voiceprints.

The method and the device for generating the character script video comprise the steps of decoding the initial short video to obtain the decoded initial short video, performing voiceprint recognition on the decoded initial short video to obtain audio tracks of all characters in the initial short video, and generating the character script video corresponding to all the characters in the initial short video according to the audio tracks. Specifically, the edited initial short video is decoded, roles are divided according to voiceprint recognition, audio tracks corresponding to the roles are sequentially removed, a plurality of role script videos are generated, the manufactured role script videos are uploaded to a c-terminal management platform and are issued to a test environment in a default mode, an operator can issue the videos to an online environment according to effects, a user can dub the videos according to subtitles, and the recorded videos are synthesized into the video audio tracks, so that interesting script videos are generated. As shown in fig. 3, three roles a b c appear in turn from left to right, but the current highlight voiceprint analysis is mainly a dialogue of a and c, or b has low specific gravity (the ratio of voiceprints in the current dialogue is not more than 20%), so that the short play generated has a c two plays, namely the play with angles a and c removed respectively.

Step 140: and generating a target short video based on the character script video.

In the embodiment of the invention, a processing request of a user for the role script video is received, and the role script video is processed according to the processing request to obtain the target short video.

In the embodiment of the present invention, as shown in fig. 4, in the c-end management platform, a user may select one of the character script videos, and enter short video processing, where recording/pausing, deletion of a previous segment, face acquisition, and the like may be clicked.

Wherein the processing request comprises a sound processing request and/or an image character processing request input by a user. Therefore, processing the character scenario video according to the processing request specifically includes: performing audio processing on the character script video according to the sound processing request; and/or performing picture processing on the character script video according to the picture character processing request.

For the audio processing request, the voice of the user can be recorded and synthesized into the video, and the unsatisfactory voice can be deleted in a segmented manner and re-recorded. For picture character processing, if face acquisition is started, pictures acquired by one camera are displayed in picture, and user expression acquisition or whole face acquisition can be selected for face acquisition. Expression collection means that the facial expression shown by the current user is automatically synthesized to the character of the c-angle through ai, as shown in fig. 5, the face of the character in the original video is replaced after collection, and the face of the character in the original image 4 is replaced by the face of the user. And the whole face collection is to replace the face detected by the whole camera of the user, if the face is not detected midway, ai fusion is carried out on the face of the user collected before and the character expression in the video, and the aim of the ai fusion is completed. In the embodiment of the invention, a friend movie script can be provided, and a player plays with a match, so that the substitution sense is better. After shooting, the video can be synthesized and released. (also can provide the shooting dubbing ai score, according to the original video language paragraph contrast and so on score).

In an embodiment of the present invention, the sound processing request may include exchanging character sounds, making a dubbing fun, ghost videos, and the like. Specifically, the character voices are exchanged, for example, four characters a b c d correspond to four voiceprints a1 b1 c1 d1, after the four voiceprints are arranged and combined, the combination with a a1 or b b1 or c c1 or d d1 is removed, and the other combinations c41 × c31 are preferentially exchanged according to the character gender, that is, the characters are paired according to the principle of preferential exchange of the same gender. And (3) making a strange dubbing, searching the character according to the watching history, the favorite preference, the ranking condition and the like of the user, extracting the corresponding character voiceprint to replace the character dubbing (matching by adopting big data under the defect, randomly selecting the current popular characters such as the sound of the cartoon character, the sound of the planet to dub and the like), and generating a strange video. And (3) based on the clipped video and known mid-point alpha key frames, taking n frames before and after the key frames to carry out forward and backward interpolation, so that the most colorful part is subjected to ghost image, and the ghost image video is generated. In the live broadcast sports event, for example in the football match, when there is the goal, pop out and make the goal video fast, 30s back 10s after the goal before the intercepting goal, the user can upload a portrait photo, after digging out the portrait, replace to the shooting player's face, trade the face, the other personnel of video can be replaced for making up the head portrait automatically to can dispose one section and explain, the composite video carries out the taste and shares.

Fig. 6 is a schematic structural diagram illustrating a short video generation apparatus according to an embodiment of the present invention. As shown in fig. 3, the apparatus 300 includes:

the first determining module 310 is configured to determine, according to the peak data of the heat degree of the target video, a plurality of key video frames for determining, according to the peak data of the heat degree of the target video, the plurality of key video frames;

a second determining module 320, configured to determine a plurality of initial video segments according to the character data and the audio data in each key video frame;

a generating module 330, configured to generate a character scenario video of each character from the initial short video;

and the synthesizing module 340 is used for generating a target short video based on the character script video.

In an optional manner, before determining a plurality of key video frames according to peak data of heat of a target video, the method includes: the method comprises the steps of segmenting a target video to obtain a plurality of video segments; and determining heat data corresponding to each video fragment.

In an alternative, the heat is a heat curve; the determining a plurality of key video frames according to peak data of the heat of the target video comprises: according to the video frame start frame and the video frame end frame before and after each peak value in the heat curve; determining a plurality of initial video segments according to the starting frame and the ending frame.

In an alternative manner, determining a start frame and an end frame according to video frames before and after each peak in the heat curve includes: determining key video frames before and after each peak value in the heat curve; determining whether people exist in the key video frames before and after the peak value; if the person exists, determining whether the voice exists in the audio data corresponding to the key video frame; and if the voice exists, marking the key video frames before and after the peak value as the initial frame or the termination frame respectively.

In an alternative manner, the determining key video frames before and after each peak in the heat curve includes: and if the key video frames before and after the peak value have characters and voices, taking the key video frames as the initial frames or the termination frames.

In an alternative mode, the processing request comprises a sound processing request and/or a picture character processing request input by a user; the receiving a processing request of a user for the role script video, and processing the role script video according to the processing request to obtain a target short video, includes: performing audio processing on the character script video according to the sound processing request; and/or performing picture processing on the character script video according to the picture character processing request.

The specific working steps of the short video generating device of the embodiment of the present invention are substantially the same as those of the method embodiment described above, and are not described herein again.

Fig. 7 is a schematic structural diagram of a computing device according to an embodiment of the present invention, and a specific embodiment of the present invention does not limit a specific implementation of the computing device.

As shown in fig. 7, the computing device may include: a processor (processor)402, a Communications Interface 404, a memory 406, and a Communications bus 408.

Wherein: the processor 402, communication interface 404, and memory 406 communicate with each other via a communication bus 408. A communication interface 404 for communicating with network elements of other devices, such as clients or other servers. The processor 402 is configured to execute the program 410, and may specifically perform the relevant steps in the embodiment of the short video generation method described above.

In particular, program 410 may include program code comprising computer-executable instructions.

The processor 402 may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present invention. The computing device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

And a memory 406 for storing a program 410. Memory 406 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 410 may be specifically invoked by the processor 402 to cause the computing device to perform the following operations:

generating role script videos of all roles from the initial short video;

and generating a target short video based on the character script video.

In an optional manner, before the determining a plurality of initial video segments from the target video according to the popularity data, the character data, and the audio data of the target video, the method includes: the method comprises the steps of segmenting a target video to obtain a plurality of video segments; and determining heat data corresponding to each video fragment.

In an alternative, the heat is a heat curve; the determining a plurality of key video frames according to peak data of the heat of the target video comprises: determining key video frames before and after each peak value in the heat curve; determining a plurality of initial video segments according to the character data and/or the audio data in the key video frames, including: determining whether people exist in the key video frames before and after the peak value according to the people data; if no person exists, marking the key video frames before and after the peak value as an initial frame or the termination frame respectively; if the person exists, determining whether the voice exists in the audio data corresponding to the key video frame; if no voice exists, marking the key video frames before and after the peak value as an initial frame or the termination frame respectively; determining a plurality of initial video segments according to the starting frame and the ending frame.

The specific working steps of the computing device according to the embodiment of the present invention are substantially the same as those of the method embodiment described above, and are not described herein again.

An embodiment of the present invention provides a computer-readable storage medium, where the storage medium stores at least one executable instruction, and when the executable instruction is executed on a computing device, the computing device is enabled to execute a short video generation method in any method embodiment described above.

The executable instructions may be specifically configured to cause the computing device to:

generating role script videos of all roles from the initial short video;

and generating a target short video based on the character script video.

In an alternative, the heat is a heat curve; the determining a plurality of key video frames according to peak data of the heat of the target video comprises: determining key video frames before and after each peak value in the heat curve; determining whether people exist in the key video frames before and after the peak value according to the people data; if no person exists, marking the key video frames before and after the peak value as an initial frame or the termination frame respectively; if the person exists, determining whether the voice exists in the audio data corresponding to the key video frame; if no voice exists, marking the key video frames before and after the peak value as an initial frame or the termination frame respectively; determining a plurality of initial video segments according to the starting frame and the ending frame.

In an optional manner, the generating of the character scenario videos of the respective characters from the initial short video includes: screening at least one initial video segment from the plurality of initial video segments to form an initial short video according to historical film watching information of a user; and identifying the voiceprints of all roles in the initial short video, and generating the role script videos of all the roles from the initial short video according to the voiceprints. In an alternative mode, the screening out at least one initial video segment from the plurality of initial video segments as an initial short video according to the historical viewing information of the user includes: determining the emotion degree of a user corresponding to each initial video segment according to the historical film watching information; and screening out at least one initial video segment from the plurality of initial video segments according to the emotion degree to serve as an initial short video.

The embodiment of the invention provides a short video generation device, which is used for executing the short video generation method.

Embodiments of the present invention provide a computer program that can be invoked by a processor to cause a computing device to perform the short video generation method in any of the above method embodiments.

Embodiments of the present invention provide a computer program product comprising a computer program stored on a computer-readable storage medium, the computer program comprising program instructions that, when run on a computer, cause the computer to perform the short video generation method of any of the above-mentioned method embodiments.

The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specified otherwise.

Claims

1. A method of short video generation, the method comprising:

generating role script videos of all roles from the initial short video;

and generating a target short video based on the character script video.

2. The method of claim 1, wherein the heat is a heat profile; the determining a plurality of key video frames according to the peak data of the heat of the target video and the peak data of the heat of the target video comprises the following steps:

determining key video frames before and after each peak value in the heat curve;

determining whether people exist in the key video frames before and after the peak value according to the people data;

if no person exists, marking the key video frames before and after the peak value as an initial frame or the termination frame respectively;

if the person exists, determining whether the voice exists in the audio data corresponding to the key video frame;

if no voice exists, marking the key video frames before and after the peak value as an initial frame or the termination frame respectively;

determining a plurality of initial video segments according to the starting frame and the ending frame.

3. The method of claim 2, wherein determining key video frames before and after each peak in the heat curve comprises:

and if the key video frames before and after the peak value have characters and voices, continuously searching the video frames near the key video frames forwards or backwards to serve as the key video frames.

4. The method of claim 1, wherein the generating of the character transcript video of each character from the initial short video comprises:

screening at least one initial video segment from the plurality of initial video segments to form an initial short video according to historical film watching information of a user;

and identifying the voiceprints of all roles in the initial short video, and generating the role script videos of all the roles from the initial short video according to the voiceprints.

5. The method according to claim 4, wherein the selecting at least one initial video segment from the plurality of initial video segments as the initial short video according to the historical viewing information of the user comprises:

determining the emotion degree of a user corresponding to each initial video segment according to the historical film watching information;

and screening out at least one initial video segment from the plurality of initial video segments according to the emotion degree to serve as an initial short video.

6. The method of claim 4, wherein the identifying voiceprints of respective characters from the initial short video, and the generating a character scenario video of respective characters from the initial short video from the voiceprints comprises:

decoding the initial short video to obtain a decoded initial short video;

performing voiceprint recognition on the decoded initial short video to obtain the audio track of each role in the initial short video;

and generating the role script video corresponding to each role in the initial short video according to the audio track.

7. The method of claim 1, wherein generating the target short video based on the character transcript video comprises:

receiving a processing request of a user for the character script video, wherein the processing request comprises a sound processing request and/or a picture character processing request input by the user; performing audio processing on the character script video according to the sound processing request; and/or

And carrying out picture processing on the role script video according to the picture character processing request.

8. An apparatus for short video generation, the apparatus comprising:

9. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is configured to store at least one executable instruction that causes the processor to perform the operations of the short video generation method of any of claims 1-7.

10. A computer-readable storage medium having stored therein at least one executable instruction that, when executed on a computing device, causes the computing device to perform operations of the short video generation method of any of claims 1-7.