CN114339423A - Short video generation method and device, computing equipment and computer readable storage medium - Google Patents

Short video generation method and device, computing equipment and computer readable storage medium Download PDF

Info

Publication number
CN114339423A
CN114339423A CN202111597387.XA CN202111597387A CN114339423A CN 114339423 A CN114339423 A CN 114339423A CN 202111597387 A CN202111597387 A CN 202111597387A CN 114339423 A CN114339423 A CN 114339423A
Authority
CN
China
Prior art keywords
video
initial
character
key
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111597387.XA
Other languages
Chinese (zh)
Other versions
CN114339423B (en
Inventor
季焕文
陶杰
于芹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Migu Cultural Technology Co Ltd
China Mobile Communications Group Co Ltd
Original Assignee
Migu Cultural Technology Co Ltd
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Migu Cultural Technology Co Ltd, China Mobile Communications Group Co Ltd filed Critical Migu Cultural Technology Co Ltd
Priority to CN202111597387.XA priority Critical patent/CN114339423B/en
Publication of CN114339423A publication Critical patent/CN114339423A/en
Application granted granted Critical
Publication of CN114339423B publication Critical patent/CN114339423B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The embodiment of the invention relates to the technical field of video processing, and discloses a short video generation method, which comprises the following steps: determining a plurality of key video frames according to peak data of the heat of the target video; determining a plurality of initial video segments according to the character data and the audio data in each key video frame; generating role script videos of all roles from the initial short video; and generating a target short video based on the character script video. Through the mode, the embodiment of the invention enhances the user experience in the short video synthesis process.

Description

Short video generation method and device, computing equipment and computer readable storage medium
Technical Field
The embodiment of the invention relates to the technical field of video processing, in particular to a short video generation method, a short video generation device, a computing device and a computer readable storage medium.
Background
For the generation of the short video, the prior technical scheme generally adopts operation and production, and actors record the short video and upload the short video for user experience.
In the process of implementing the embodiment of the invention, the inventor finds that the conventional short video generation needs operation, actors and multi-aspect support, and has low synthesis efficiency, less playing methods and poor user experience.
Disclosure of Invention
In view of the foregoing problems, embodiments of the present invention provide a method, an apparatus, a computing device, and a computer-readable storage medium for generating a short video, which are used to solve the problem in the prior art that a user experience of short video generation is poor.
According to an aspect of an embodiment of the present invention, there is provided a short video generation method, including:
determining a plurality of key video frames according to peak data of the heat of the target video;
determining a plurality of initial video segments according to the character data and the audio data in each key video frame;
generating role script videos of all roles from the initial short video;
and generating a target short video based on the character script video.
In an alternative, the heat is a heat curve; the determining a plurality of key video frames according to the peak data of the heat of the target video and the peak data of the heat of the target video comprises the following steps: determining key video frames before and after each peak value in the heat curve; determining whether people exist in the key video frames before and after the peak value according to the people data; if no person exists, marking the key video frames before and after the peak value as an initial frame or the termination frame respectively; if the person exists, determining whether the voice exists in the audio data corresponding to the key video frame; if no voice exists, marking the key video frames before and after the peak value as an initial frame or the termination frame respectively; determining a plurality of initial video segments according to the starting frame and the ending frame.
In an alternative manner, the determining key video frames before and after each peak in the heat curve includes: and if the key video frames before and after the peak value have characters and voices, continuously searching the video frames near the key video frames forwards or backwards to serve as the key video frames.
In an optional manner, the generating of the character scenario videos of the respective characters from the initial short video includes: screening at least one initial video segment from the plurality of initial video segments to form an initial short video according to historical film watching information of a user; and identifying the voiceprints of all roles in the initial short video, and generating the role script videos of all the roles from the initial short video according to the voiceprints.
In an alternative mode, the screening out at least one initial video segment from the plurality of initial video segments as an initial short video according to the historical viewing information of the user includes: determining the emotion degree of a user corresponding to each initial video segment according to the historical film watching information; and screening out at least one initial video segment from the plurality of initial video segments according to the emotion degree to serve as an initial short video.
In an optional manner, the identifying a voiceprint of each character in the initial short video and generating a character scenario video of each character from the initial short video according to the voiceprint includes: decoding the initial short video to obtain a decoded initial short video; performing voiceprint recognition on the decoded initial short video to obtain the audio track of each role in the initial short video; and generating the role script video corresponding to each role in the initial short video according to the audio track.
In an optional manner, the generating a target short video based on the character scenario video includes: receiving a processing request of a user for the character script video, wherein the processing request comprises a sound processing request and/or a picture character processing request input by the user; performing audio processing on the character script video according to the sound processing request; and/or performing picture processing on the character script video according to the picture character processing request.
According to still another aspect of an embodiment of the present invention, there is provided a short video generating apparatus, including:
the first determining module is used for determining a plurality of key video frames according to the peak data of the heat degree of the target video;
the second determining module is used for determining a plurality of initial video segments according to the character data and the audio data in each key video frame;
the generating module is used for generating role script videos of all roles from the initial short video;
and the synthesis module is used for generating a target short video based on the role script video.
According to another aspect of embodiments of the present invention, there is provided a computing device including:
the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction which causes the processor to execute the operation of the short video generation method.
According to yet another aspect of the embodiments of the present invention, there is provided a computer-readable storage medium having at least one executable instruction stored therein, which when executed on a computing device, causes the computing device to perform the operations of the short video generation method.
According to the embodiment of the invention, a plurality of key video frames are determined according to peak data of the heat degree of a target video; according to the character data and the audio data in each key video frame, a plurality of initial video segments are determined, the character script videos of each character are generated from the initial short videos, the target short videos are generated based on the character script videos, the short video generation efficiency can be effectively improved, and the user experience is effectively improved.
The foregoing description is only an overview of the technical solutions of the embodiments of the present invention, and the embodiments of the present invention can be implemented according to the content of the description in order to make the technical means of the embodiments of the present invention more clearly understood, and the detailed description of the present invention is provided below in order to make the foregoing and other objects, features, and advantages of the embodiments of the present invention more clearly understandable.
Drawings
The drawings are only for purposes of illustrating embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a flowchart illustrating a short video generation method according to an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating a heat curve in a short video generation method according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating the identification of each role in the short video generation method according to the embodiment of the present invention;
fig. 4 is a schematic diagram illustrating an application scenario of short video generation in the short video generation method according to the embodiment of the present invention;
fig. 5 is a schematic diagram illustrating picture processing performed on a corner play video in a short video generation method according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a short video generation apparatus provided in an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a computing device provided by an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein.
Fig. 1 illustrates a flow diagram of a short video generation method, performed by a computing device, provided by an embodiment of the invention. The computing device may be a computer device, a terminal device, an intelligent device, a video playing device, and the like, and the embodiment of the present invention is not particularly limited. As shown in fig. 1, the method comprises the steps of:
step 110: and determining a plurality of key video frames according to the peak data of the heat of the target video.
According to the method and the device for determining the initial video segments, the initial video segments are determined from the target video according to the heat degree, the character data and the audio data of the target video. In the embodiment of the present invention, before determining a plurality of initial video segments from a target video according to the popularity, character data, and audio data of the target video, the method includes: the method comprises the steps of segmenting a target video to obtain a plurality of video segments; and determining the corresponding heat of each video fragment. Wherein the heat is a heat curve. Specifically, video is sliced according to gop (Group of Pictures, which refers to the distance between two I frames, and Reference (Reference period) refers to the distance between two P frames) to obtain a plurality of video slices. Taking the video clips with the average audio greater than a preset decibel threshold value as midpoint video clips; wherein, in the process of watching by a user, collecting audio, after denoising the audio, if the average audio of the current video slice is larger than a preset decibel threshold, the I frame (I frames) of the current video slice adopts an intraframe coding mode, namely only utilizes the spatial correlation in a single frame image, but not utilizes the time correlation, the I frame uses intraframe compression and does not use motion compensation, the I frame is independent of other frames, so the I frame is a point of entry of random access and is a decoded reference frame, the I frame is mainly used for initialization of a receiver and acquisition of a channel, and switching and insertion of programs, the compression multiple of the I frame image is relatively low, the I frame image periodically appears in an image sequence, the appearance frequency can be selected by an encoder) is marked as a midpoint alpha, wherein, the preset decibel threshold can be correspondingly set according to a specific scene, in one embodiment of the present invention, the preset decibel threshold may be 60. Detecting the upper (lower) fragment from the forward (backward) of the fragment where the current alpha is located, if db is less than 40, marking the video fragment detected before as a pre-starting (ending) point, marking the number of the video fragments with db being more than or equal to 40 found currently as n, if n is less than a first number threshold value, the video fragments are not in accordance with the requirement, otherwise, adding the video fragments into a video recommendation alternative queue. And generating a heat curve according to the video recommendation alternative queue, wherein the heat in the heat curve shows the appreciation of the current segment by the user, and the colorful degree of the current segment is reflected from the side surface. Wherein the first number threshold may be 3. In the embodiment of the present invention, as shown in fig. 2, the heat curve may be a bullet screen curve graph. The method comprises the steps of counting the number of bullet screens in each video fragment in advance, and drawing a bullet screen curve graph by taking the time stamp of the video fragment as a horizontal axis and the number of the bullet screens as a vertical axis.
Step 120: and determining a plurality of initial video segments according to the character data and the audio data in each key video frame.
In the embodiment of the invention, after the heat curve is obtained, the video frames before and after each peak value in the heat curve are determined to determine the starting frame and the ending frame, and a plurality of initial video segments are determined according to the starting frame and the ending frame. Specifically, each peak value in the heat curve is determined, a video frame corresponding to the peak value is determined as a midpoint, a starting frame β is searched forward and an ending frame θ is searched backward according to the midpoint, a point (a trigger value can be set) where a tangent slope of the starting frame β and the ending frame θ suddenly drops on the heat curve is used as a frame signal, and after a tangent slope is determined, the video frame corresponding to the tangent slope is marked as a pre-starting point β and a pre-ending point θ. Determining key video frames before and after each peak value in the heat curve; determining whether people exist in the key video frames before and after the peak value; if the person exists, determining whether the voice exists in the audio data corresponding to the key video frame; and if the voice exists, marking the key video frames before and after the peak value as the initial frame or the termination frame respectively. Specifically, a key frame β 1 closest to the pre-start point β is taken forward (if the current β is a key frame, β 1 is referred to as β), and then the video frame and the audio frame are separated by decoding, and the key video frame is detected first, and if there is no person, the start frame is marked as the key video frame β 1. If the person exists, the audio frame corresponding to the key video frame is continuously detected, if the audio frame has no voice, the key video frame beta 1 is marked as the initial frame, otherwise, the starting point key video frame beta 2 is continuously searched forward until the initial frame beta n is found. In the same way, the end frame θ n is found out in a backward search, and the initial video segment is generated based on the start frame β n and the end frame θ n. And then, continuing to generate the short video by the second highest point according to the process (if the second highest point is covered in the generated short video, continuing to search for the next highest point) until a preset second number of initial video segments are generated. The embodiment of the present invention does not specifically limit the specific value of the preset second number, and in an embodiment of the present invention, the preset second number may be 5. Thus, a plurality of initial video segments are obtained according to the starting frame and the ending frame before and after the peak value.
Step 130: and generating character script videos of all the characters from the initial short video.
In the embodiment of the invention, after a plurality of initial video segments are obtained, at least one initial video segment is screened out from the plurality of initial video segments as an initial short video according to historical film watching information of a user.
The historical viewing information of the user can be historical behaviors s of the user, a viewing record t, a favorite preference p, a celebrity placard o and facial expression changes f in the viewing process (when the user shows emotional fluctuation at a certain moment, such as laughing or crying, the expression is completely substituted into a drama).
In the embodiment of the invention, according to the historical film watching information, the emotion degree of the user corresponding to each initial video segment is determined; and screening out at least one initial video segment from the plurality of initial video segments according to the emotion degree to serve as an initial short video. Specifically, the historical behavior s, the watching record t, the favorite preference p, the celebrity brand o and the facial expression change f in the watching process of the user are collected for weight matching.
The emotion weight refers to analyzing the user emotion of the current video segment, performing expression prediction through image classification, deducing the state g of the user, such as happiness, calmness or depression, analyzing the current user audio, obtaining the corresponding decibel db through sampling p audio frames to reflect the intensity of the current emotion, obtaining the emotion state k through db1 db2 db 3./(p 10), reaching the maximum weight ratio when k is close to 6, deducing the formula y-5/24 k ^2+25/12k, and obtaining the weight ratio y actually corresponding to the emotion state k.
And comparing the emotion degrees corresponding to a plurality of initial video segments of each clip, extracting key frames for each initial video segment by using the comparison method, extracting the number N, calculating a weighted value of (s 0.1+ t 0.1+ p 0.2+ o 0.2+ f 0.4 x y)/N, determining the initial video segments with N bits before the weighting according to the weighted value, and using the initial video segments as initial short videos, thereby feeding the initial short videos back to users for playing and making. Wherein N may be 2.
And then, identifying the voiceprints of all roles in the initial short video, and generating role script videos of all roles from the initial short video according to the voiceprints.
The method and the device for generating the character script video comprise the steps of decoding the initial short video to obtain the decoded initial short video, performing voiceprint recognition on the decoded initial short video to obtain audio tracks of all characters in the initial short video, and generating the character script video corresponding to all the characters in the initial short video according to the audio tracks. Specifically, the edited initial short video is decoded, roles are divided according to voiceprint recognition, audio tracks corresponding to the roles are sequentially removed, a plurality of role script videos are generated, the manufactured role script videos are uploaded to a c-terminal management platform and are issued to a test environment in a default mode, an operator can issue the videos to an online environment according to effects, a user can dub the videos according to subtitles, and the recorded videos are synthesized into the video audio tracks, so that interesting script videos are generated. As shown in fig. 3, three roles a b c appear in turn from left to right, but the current highlight voiceprint analysis is mainly a dialogue of a and c, or b has low specific gravity (the ratio of voiceprints in the current dialogue is not more than 20%), so that the short play generated has a c two plays, namely the play with angles a and c removed respectively.
Step 140: and generating a target short video based on the character script video.
In the embodiment of the invention, a processing request of a user for the role script video is received, and the role script video is processed according to the processing request to obtain the target short video.
In the embodiment of the present invention, as shown in fig. 4, in the c-end management platform, a user may select one of the character script videos, and enter short video processing, where recording/pausing, deletion of a previous segment, face acquisition, and the like may be clicked.
Wherein the processing request comprises a sound processing request and/or an image character processing request input by a user. Therefore, processing the character scenario video according to the processing request specifically includes: performing audio processing on the character script video according to the sound processing request; and/or performing picture processing on the character script video according to the picture character processing request.
For the audio processing request, the voice of the user can be recorded and synthesized into the video, and the unsatisfactory voice can be deleted in a segmented manner and re-recorded. For picture character processing, if face acquisition is started, pictures acquired by one camera are displayed in picture, and user expression acquisition or whole face acquisition can be selected for face acquisition. Expression collection means that the facial expression shown by the current user is automatically synthesized to the character of the c-angle through ai, as shown in fig. 5, the face of the character in the original video is replaced after collection, and the face of the character in the original image 4 is replaced by the face of the user. And the whole face collection is to replace the face detected by the whole camera of the user, if the face is not detected midway, ai fusion is carried out on the face of the user collected before and the character expression in the video, and the aim of the ai fusion is completed. In the embodiment of the invention, a friend movie script can be provided, and a player plays with a match, so that the substitution sense is better. After shooting, the video can be synthesized and released. (also can provide the shooting dubbing ai score, according to the original video language paragraph contrast and so on score).
In an embodiment of the present invention, the sound processing request may include exchanging character sounds, making a dubbing fun, ghost videos, and the like. Specifically, the character voices are exchanged, for example, four characters a b c d correspond to four voiceprints a1 b1 c1 d1, after the four voiceprints are arranged and combined, the combination with a a1 or b b1 or c c1 or d d1 is removed, and the other combinations c41 × c31 are preferentially exchanged according to the character gender, that is, the characters are paired according to the principle of preferential exchange of the same gender. And (3) making a strange dubbing, searching the character according to the watching history, the favorite preference, the ranking condition and the like of the user, extracting the corresponding character voiceprint to replace the character dubbing (matching by adopting big data under the defect, randomly selecting the current popular characters such as the sound of the cartoon character, the sound of the planet to dub and the like), and generating a strange video. And (3) based on the clipped video and known mid-point alpha key frames, taking n frames before and after the key frames to carry out forward and backward interpolation, so that the most colorful part is subjected to ghost image, and the ghost image video is generated. In the live broadcast sports event, for example in the football match, when there is the goal, pop out and make the goal video fast, 30s back 10s after the goal before the intercepting goal, the user can upload a portrait photo, after digging out the portrait, replace to the shooting player's face, trade the face, the other personnel of video can be replaced for making up the head portrait automatically to can dispose one section and explain, the composite video carries out the taste and shares.
According to the embodiment of the invention, a plurality of key video frames are determined according to peak data of the heat degree of a target video; according to the character data and the audio data in each key video frame, a plurality of initial video segments are determined, the character script videos of each character are generated from the initial short videos, the target short videos are generated based on the character script videos, the short video generation efficiency can be effectively improved, and the user experience is effectively improved.
Fig. 6 is a schematic structural diagram illustrating a short video generation apparatus according to an embodiment of the present invention. As shown in fig. 3, the apparatus 300 includes:
the first determining module 310 is configured to determine, according to the peak data of the heat degree of the target video, a plurality of key video frames for determining, according to the peak data of the heat degree of the target video, the plurality of key video frames;
a second determining module 320, configured to determine a plurality of initial video segments according to the character data and the audio data in each key video frame;
a generating module 330, configured to generate a character scenario video of each character from the initial short video;
and the synthesizing module 340 is used for generating a target short video based on the character script video.
In an optional manner, before determining a plurality of key video frames according to peak data of heat of a target video, the method includes: the method comprises the steps of segmenting a target video to obtain a plurality of video segments; and determining heat data corresponding to each video fragment.
In an alternative, the heat is a heat curve; the determining a plurality of key video frames according to peak data of the heat of the target video comprises: according to the video frame start frame and the video frame end frame before and after each peak value in the heat curve; determining a plurality of initial video segments according to the starting frame and the ending frame.
In an alternative manner, determining a start frame and an end frame according to video frames before and after each peak in the heat curve includes: determining key video frames before and after each peak value in the heat curve; determining whether people exist in the key video frames before and after the peak value; if the person exists, determining whether the voice exists in the audio data corresponding to the key video frame; and if the voice exists, marking the key video frames before and after the peak value as the initial frame or the termination frame respectively.
In an alternative mode, the screening out at least one initial video segment from the plurality of initial video segments as an initial short video according to the historical viewing information of the user includes: determining the emotion degree of a user corresponding to each initial video segment according to the historical film watching information; and screening out at least one initial video segment from the plurality of initial video segments according to the emotion degree to serve as an initial short video.
In an alternative manner, the determining key video frames before and after each peak in the heat curve includes: and if the key video frames before and after the peak value have characters and voices, taking the key video frames as the initial frames or the termination frames.
In an optional manner, the identifying a voiceprint of each character in the initial short video and generating a character scenario video of each character from the initial short video according to the voiceprint includes: decoding the initial short video to obtain a decoded initial short video; performing voiceprint recognition on the decoded initial short video to obtain the audio track of each role in the initial short video; and generating the role script video corresponding to each role in the initial short video according to the audio track.
In an alternative mode, the processing request comprises a sound processing request and/or a picture character processing request input by a user; the receiving a processing request of a user for the role script video, and processing the role script video according to the processing request to obtain a target short video, includes: performing audio processing on the character script video according to the sound processing request; and/or performing picture processing on the character script video according to the picture character processing request.
The specific working steps of the short video generating device of the embodiment of the present invention are substantially the same as those of the method embodiment described above, and are not described herein again.
According to the embodiment of the invention, a plurality of key video frames are determined according to peak data of the heat degree of a target video; according to the character data and the audio data in each key video frame, a plurality of initial video segments are determined, the character script videos of each character are generated from the initial short videos, the target short videos are generated based on the character script videos, the short video generation efficiency can be effectively improved, and the user experience is effectively improved.
Fig. 7 is a schematic structural diagram of a computing device according to an embodiment of the present invention, and a specific embodiment of the present invention does not limit a specific implementation of the computing device.
As shown in fig. 7, the computing device may include: a processor (processor)402, a Communications Interface 404, a memory 406, and a Communications bus 408.
Wherein: the processor 402, communication interface 404, and memory 406 communicate with each other via a communication bus 408. A communication interface 404 for communicating with network elements of other devices, such as clients or other servers. The processor 402 is configured to execute the program 410, and may specifically perform the relevant steps in the embodiment of the short video generation method described above.
In particular, program 410 may include program code comprising computer-executable instructions.
The processor 402 may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present invention. The computing device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
And a memory 406 for storing a program 410. Memory 406 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The program 410 may be specifically invoked by the processor 402 to cause the computing device to perform the following operations:
determining a plurality of key video frames according to peak data of the heat of the target video;
determining a plurality of initial video segments according to the character data and the audio data in each key video frame;
generating role script videos of all roles from the initial short video;
and generating a target short video based on the character script video.
In an optional manner, before the determining a plurality of initial video segments from the target video according to the popularity data, the character data, and the audio data of the target video, the method includes: the method comprises the steps of segmenting a target video to obtain a plurality of video segments; and determining heat data corresponding to each video fragment.
In an alternative, the heat is a heat curve; the determining a plurality of key video frames according to peak data of the heat of the target video comprises: determining key video frames before and after each peak value in the heat curve; determining a plurality of initial video segments according to the character data and/or the audio data in the key video frames, including: determining whether people exist in the key video frames before and after the peak value according to the people data; if no person exists, marking the key video frames before and after the peak value as an initial frame or the termination frame respectively; if the person exists, determining whether the voice exists in the audio data corresponding to the key video frame; if no voice exists, marking the key video frames before and after the peak value as an initial frame or the termination frame respectively; determining a plurality of initial video segments according to the starting frame and the ending frame.
In an alternative manner, the determining key video frames before and after each peak in the heat curve includes: and if the key video frames before and after the peak value have characters and voices, continuously searching the video frames near the key video frames forwards or backwards to serve as the key video frames.
In an optional manner, the generating of the character scenario videos of the respective characters from the initial short video includes: screening at least one initial video segment from the plurality of initial video segments to form an initial short video according to historical film watching information of a user; and identifying the voiceprints of all roles in the initial short video, and generating the role script videos of all the roles from the initial short video according to the voiceprints.
In an alternative mode, the screening out at least one initial video segment from the plurality of initial video segments as an initial short video according to the historical viewing information of the user includes: determining the emotion degree of a user corresponding to each initial video segment according to the historical film watching information; and screening out at least one initial video segment from the plurality of initial video segments according to the emotion degree to serve as an initial short video.
In an optional manner, the identifying a voiceprint of each character in the initial short video and generating a character scenario video of each character from the initial short video according to the voiceprint includes: decoding the initial short video to obtain a decoded initial short video; performing voiceprint recognition on the decoded initial short video to obtain the audio track of each role in the initial short video; and generating the role script video corresponding to each role in the initial short video according to the audio track.
In an alternative mode, the processing request comprises a sound processing request and/or a picture character processing request input by a user; the receiving a processing request of a user for the role script video, and processing the role script video according to the processing request to obtain a target short video, includes: performing audio processing on the character script video according to the sound processing request; and/or performing picture processing on the character script video according to the picture character processing request.
The specific working steps of the computing device according to the embodiment of the present invention are substantially the same as those of the method embodiment described above, and are not described herein again.
According to the embodiment of the invention, a plurality of key video frames are determined according to peak data of the heat degree of a target video; according to the character data and the audio data in each key video frame, a plurality of initial video segments are determined, the character script videos of each character are generated from the initial short videos, the target short videos are generated based on the character script videos, the short video generation efficiency can be effectively improved, and the user experience is effectively improved.
An embodiment of the present invention provides a computer-readable storage medium, where the storage medium stores at least one executable instruction, and when the executable instruction is executed on a computing device, the computing device is enabled to execute a short video generation method in any method embodiment described above.
The executable instructions may be specifically configured to cause the computing device to:
determining a plurality of key video frames according to peak data of the heat of the target video;
determining a plurality of initial video segments according to the character data and the audio data in each key video frame;
generating role script videos of all roles from the initial short video;
and generating a target short video based on the character script video.
In an optional manner, before determining a plurality of key video frames according to peak data of heat of a target video, the method includes: the method comprises the steps of segmenting a target video to obtain a plurality of video segments; and determining heat data corresponding to each video fragment.
In an alternative, the heat is a heat curve; the determining a plurality of key video frames according to peak data of the heat of the target video comprises: determining key video frames before and after each peak value in the heat curve; determining whether people exist in the key video frames before and after the peak value according to the people data; if no person exists, marking the key video frames before and after the peak value as an initial frame or the termination frame respectively; if the person exists, determining whether the voice exists in the audio data corresponding to the key video frame; if no voice exists, marking the key video frames before and after the peak value as an initial frame or the termination frame respectively; determining a plurality of initial video segments according to the starting frame and the ending frame.
In an alternative manner, the determining key video frames before and after each peak in the heat curve includes: and if the key video frames before and after the peak value have characters and voices, continuously searching the video frames near the key video frames forwards or backwards to serve as the key video frames.
In an optional manner, the generating of the character scenario videos of the respective characters from the initial short video includes: screening at least one initial video segment from the plurality of initial video segments to form an initial short video according to historical film watching information of a user; and identifying the voiceprints of all roles in the initial short video, and generating the role script videos of all the roles from the initial short video according to the voiceprints. In an alternative mode, the screening out at least one initial video segment from the plurality of initial video segments as an initial short video according to the historical viewing information of the user includes: determining the emotion degree of a user corresponding to each initial video segment according to the historical film watching information; and screening out at least one initial video segment from the plurality of initial video segments according to the emotion degree to serve as an initial short video.
In an optional manner, the identifying a voiceprint of each character in the initial short video and generating a character scenario video of each character from the initial short video according to the voiceprint includes: decoding the initial short video to obtain a decoded initial short video; performing voiceprint recognition on the decoded initial short video to obtain the audio track of each role in the initial short video; and generating the role script video corresponding to each role in the initial short video according to the audio track.
In an alternative mode, the processing request comprises a sound processing request and/or a picture character processing request input by a user; the receiving a processing request of a user for the role script video, and processing the role script video according to the processing request to obtain a target short video, includes: performing audio processing on the character script video according to the sound processing request; and/or performing picture processing on the character script video according to the picture character processing request.
The specific working steps of the computing device according to the embodiment of the present invention are substantially the same as those of the method embodiment described above, and are not described herein again.
According to the embodiment of the invention, a plurality of key video frames are determined according to peak data of the heat degree of a target video; according to the character data and the audio data in each key video frame, a plurality of initial video segments are determined, the character script videos of each character are generated from the initial short videos, the target short videos are generated based on the character script videos, the short video generation efficiency can be effectively improved, and the user experience is effectively improved.
The embodiment of the invention provides a short video generation device, which is used for executing the short video generation method.
Embodiments of the present invention provide a computer program that can be invoked by a processor to cause a computing device to perform the short video generation method in any of the above method embodiments.
Embodiments of the present invention provide a computer program product comprising a computer program stored on a computer-readable storage medium, the computer program comprising program instructions that, when run on a computer, cause the computer to perform the short video generation method of any of the above-mentioned method embodiments.
The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specified otherwise.

Claims (10)

1. A method of short video generation, the method comprising:
determining a plurality of key video frames according to peak data of the heat of the target video;
determining a plurality of initial video segments according to the character data and the audio data in each key video frame;
generating role script videos of all roles from the initial short video;
and generating a target short video based on the character script video.
2. The method of claim 1, wherein the heat is a heat profile; the determining a plurality of key video frames according to the peak data of the heat of the target video and the peak data of the heat of the target video comprises the following steps:
determining key video frames before and after each peak value in the heat curve;
determining whether people exist in the key video frames before and after the peak value according to the people data;
if no person exists, marking the key video frames before and after the peak value as an initial frame or the termination frame respectively;
if the person exists, determining whether the voice exists in the audio data corresponding to the key video frame;
if no voice exists, marking the key video frames before and after the peak value as an initial frame or the termination frame respectively;
determining a plurality of initial video segments according to the starting frame and the ending frame.
3. The method of claim 2, wherein determining key video frames before and after each peak in the heat curve comprises:
and if the key video frames before and after the peak value have characters and voices, continuously searching the video frames near the key video frames forwards or backwards to serve as the key video frames.
4. The method of claim 1, wherein the generating of the character transcript video of each character from the initial short video comprises:
screening at least one initial video segment from the plurality of initial video segments to form an initial short video according to historical film watching information of a user;
and identifying the voiceprints of all roles in the initial short video, and generating the role script videos of all the roles from the initial short video according to the voiceprints.
5. The method according to claim 4, wherein the selecting at least one initial video segment from the plurality of initial video segments as the initial short video according to the historical viewing information of the user comprises:
determining the emotion degree of a user corresponding to each initial video segment according to the historical film watching information;
and screening out at least one initial video segment from the plurality of initial video segments according to the emotion degree to serve as an initial short video.
6. The method of claim 4, wherein the identifying voiceprints of respective characters from the initial short video, and the generating a character scenario video of respective characters from the initial short video from the voiceprints comprises:
decoding the initial short video to obtain a decoded initial short video;
performing voiceprint recognition on the decoded initial short video to obtain the audio track of each role in the initial short video;
and generating the role script video corresponding to each role in the initial short video according to the audio track.
7. The method of claim 1, wherein generating the target short video based on the character transcript video comprises:
receiving a processing request of a user for the character script video, wherein the processing request comprises a sound processing request and/or a picture character processing request input by the user; performing audio processing on the character script video according to the sound processing request; and/or
And carrying out picture processing on the role script video according to the picture character processing request.
8. An apparatus for short video generation, the apparatus comprising:
the first determining module is used for determining a plurality of key video frames according to the peak data of the heat degree of the target video;
the second determining module is used for determining a plurality of initial video segments according to the character data and the audio data in each key video frame;
the generating module is used for generating role script videos of all roles from the initial short video;
and the synthesis module is used for generating a target short video based on the role script video.
9. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is configured to store at least one executable instruction that causes the processor to perform the operations of the short video generation method of any of claims 1-7.
10. A computer-readable storage medium having stored therein at least one executable instruction that, when executed on a computing device, causes the computing device to perform operations of the short video generation method of any of claims 1-7.
CN202111597387.XA 2021-12-24 2021-12-24 Short video generation method, device, computing equipment and computer readable storage medium Active CN114339423B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111597387.XA CN114339423B (en) 2021-12-24 2021-12-24 Short video generation method, device, computing equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111597387.XA CN114339423B (en) 2021-12-24 2021-12-24 Short video generation method, device, computing equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN114339423A true CN114339423A (en) 2022-04-12
CN114339423B CN114339423B (en) 2024-08-27

Family

ID=81012380

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111597387.XA Active CN114339423B (en) 2021-12-24 2021-12-24 Short video generation method, device, computing equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN114339423B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115348459A (en) * 2022-08-16 2022-11-15 支付宝(杭州)信息技术有限公司 Short video processing method and device
CN116744063A (en) * 2023-08-15 2023-09-12 四川中电启明星信息技术有限公司 Short video push system integrating social attribute information

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107067450A (en) * 2017-04-21 2017-08-18 福建中金在线信息科技有限公司 The preparation method and device of a kind of video
CN108337532A (en) * 2018-02-13 2018-07-27 腾讯科技(深圳)有限公司 Perform mask method, video broadcasting method, the apparatus and system of segment
CN108597521A (en) * 2018-05-04 2018-09-28 徐涌 Audio role divides interactive system, method, terminal and the medium with identification word
CN110753263A (en) * 2019-10-29 2020-02-04 腾讯科技(深圳)有限公司 Video dubbing method, device, terminal and storage medium
CN111339865A (en) * 2020-02-17 2020-06-26 杭州慧川智能科技有限公司 Method for synthesizing video MV (music video) by music based on self-supervision learning
CN111935503A (en) * 2020-06-28 2020-11-13 百度在线网络技术(北京)有限公司 Short video generation method and device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107067450A (en) * 2017-04-21 2017-08-18 福建中金在线信息科技有限公司 The preparation method and device of a kind of video
CN108337532A (en) * 2018-02-13 2018-07-27 腾讯科技(深圳)有限公司 Perform mask method, video broadcasting method, the apparatus and system of segment
CN108597521A (en) * 2018-05-04 2018-09-28 徐涌 Audio role divides interactive system, method, terminal and the medium with identification word
CN110753263A (en) * 2019-10-29 2020-02-04 腾讯科技(深圳)有限公司 Video dubbing method, device, terminal and storage medium
CN111339865A (en) * 2020-02-17 2020-06-26 杭州慧川智能科技有限公司 Method for synthesizing video MV (music video) by music based on self-supervision learning
CN111935503A (en) * 2020-06-28 2020-11-13 百度在线网络技术(北京)有限公司 Short video generation method and device, electronic equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115348459A (en) * 2022-08-16 2022-11-15 支付宝(杭州)信息技术有限公司 Short video processing method and device
CN116744063A (en) * 2023-08-15 2023-09-12 四川中电启明星信息技术有限公司 Short video push system integrating social attribute information
CN116744063B (en) * 2023-08-15 2023-11-03 四川中电启明星信息技术有限公司 Short video push system integrating social attribute information

Also Published As

Publication number Publication date
CN114339423B (en) 2024-08-27

Similar Documents

Publication Publication Date Title
CN107707931B (en) Method and device for generating interpretation data according to video data, method and device for synthesizing data and electronic equipment
CN109922373B (en) Video processing method, device and storage medium
CN109145784B (en) Method and apparatus for processing video
WO2019157977A1 (en) Method for labeling performance segment, video playing method and device, and terminal
CN107615766B (en) System and method for creating and distributing multimedia content
Truong et al. Video abstraction: A systematic review and classification
JP5038607B2 (en) Smart media content thumbnail extraction system and method
CN106060578B (en) Generate the method and system of video data
US11138462B2 (en) Scene and shot detection and characterization
CN109862388A (en) Generation method, device, server and the storage medium of the live video collection of choice specimens
CN111683209A (en) Mixed-cut video generation method and device, electronic equipment and computer-readable storage medium
EP1081960A1 (en) Signal processing method and video/voice processing device
US20020051081A1 (en) Special reproduction control information describing method, special reproduction control information creating apparatus and method therefor, and video reproduction apparatus and method therefor
US11853357B2 (en) Method and system for dynamically analyzing, modifying, and distributing digital images and video
CN105872717A (en) Video processing method and system, video player and cloud server
CN114339423A (en) Short video generation method and device, computing equipment and computer readable storage medium
KR20160122253A (en) Browsing videos via a segment list
CN106658030A (en) Method and device for playing composite video comprising single-path audio and multipath videos
CN112287771A (en) Method, apparatus, server and medium for detecting video event
CN114339451B (en) Video editing method, device, computing equipment and storage medium
EP3876543A1 (en) Video playback method and apparatus
US10924637B2 (en) Playback method, playback device and computer-readable storage medium
CN114339399A (en) Multimedia file editing method and device and computing equipment
US20210144358A1 (en) Information-processing apparatus, method of processing information, and program
CN117319765A (en) Video processing method, device, computing equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant