CN114449308A

CN114449308A - Automatic video clipping method and device and cloud clipping server

Info

Publication number: CN114449308A
Application number: CN202111618817.1A
Authority: CN
Inventors: 吴启琦
Original assignee: Suzhou Zhongtuo Internet Information Technology Co ltd
Current assignee: Suzhou Zhongtuo Internet Information Technology Co ltd
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2022-05-06

Abstract

The embodiment of the disclosure provides a video automatic clipping method and device and a cloud clipping server. The method comprises the following steps: a video automatic clipping device retrieves the first source video data and the second source video data; obtaining a first group of video frames and a second group of video frames, wherein at least one frame of the first group of video frames in the first source video data is the same as an audio frame of the second group of video frames in the second source video data; the video automatic clipping equipment matches the audio frames corresponding to the first group of video frames with the audio frames corresponding to the second group of video frames and outputs matched audio range timestamps; obtaining quality parameters of the first set of video frames and the second set of video frames based on the audio range timestamp; and splicing the first group of video frames and the second group of video frames based on the quality parameters to obtain a final video. By the mode, the editing work of the multi-view video can be automatically completed, the editing workload is reduced, and the user experience is improved.

Description

Automatic video clipping method and device and cloud clipping server

Technical Field

The disclosure relates to the technical field of video editing, in particular to a video automatic editing method and device and a cloud editing server.

Background

In the field of video clips, sometimes, in order to dynamically present a shot object from multiple angles, multiple shots are usually set to shoot a target at the same time, and at this time, although the shot images of source videos recorded by the respective shots are different, audio data should be synchronized theoretically. In a typical editing method, a plurality of video segments recorded simultaneously by shots are generally superimposed on a video track, and the video segments are divided into several segments based on human preference and then manually spliced together, thereby obtaining a piece. However, in the above manner, the video segments need to be manually screened, which increases the working difficulty of video editing, and meanwhile, manual overlapping placement may cause the sound and picture to be asynchronous, thereby further reducing the quality of the film.

Disclosure of Invention

In order to overcome at least the above disadvantages in the prior art, an object of the present disclosure is to provide a video automatic clipping method, apparatus and cloud clipping server.

In a first aspect, the present disclosure provides a video automatic clipping method for a video automatic clipping device including first source video data and second source video data, comprising:

a video automatic clipping device retrieves the first source video data and the second source video data;

the video automatic clipping device obtains a first group of video frames through the first source video data and obtains a second group of video frames through the second source video data, wherein at least one frame of the first group of video frames in the first source video data is the same as an audio frame of the second group of video frames in the second source video data;

the video automatic clipping equipment matches the audio frames corresponding to the first group of video frames with the audio frames corresponding to the second group of video frames and outputs matched audio range timestamps;

obtaining quality parameters of the first set of video frames and the second set of video frames based on the audio range timestamp;

and splicing the first group of video frames and the second group of video frames based on the quality parameters to obtain a final video.

In a second aspect, the present disclosure provides a video automatic clipping device, comprising:

the acquisition unit is used for calling the first source video data and the second source video data by the automatic video clipping equipment;

an extracting unit, configured to obtain a first group of video frames from the first source video data and obtain a second group of video frames from the second source video data by the video automatic clipping device, where at least one frame of the first group of video frames in the first source video data is the same as an audio frame of the second group of video frames in the second source video data;

the matching unit is used for matching the audio frames corresponding to the first group of video frames with the audio frames corresponding to the second group of video frames by the automatic video clipping equipment and outputting matched audio range timestamps;

an analysis unit for obtaining quality parameters of the first and second sets of video frames based on the audio range time stamps;

and the combination unit is used for splicing the first group of video frames and the second group of video frames based on the quality parameters to obtain a final video.

In a third aspect, an embodiment of the present disclosure provides a computer-readable storage medium, where instructions are stored, and when executed, cause a computer to perform the video automatic clipping method in the first aspect or any one of the possible designs of the first aspect.

In a fourth aspect, the disclosed embodiments further provide a cloud clip server, where the cloud clip server includes a processor, a machine-readable storage medium, and a network interface, where the machine-readable storage medium, the network interface, and the processor are connected through a bus system, the network interface is configured to be communicatively connected to at least one clip terminal, the machine-readable storage medium is configured to store a program, an instruction, or a code, and the processor is configured to execute the program, the instruction, or the code in the machine-readable storage medium to perform the video automatic clipping method in the first aspect or any one of the possible designs in the first aspect.

Based on any one of the above aspects, the present disclosure provides a video automatic clipping method, a video automatic clipping device, and a cloud clipping server. The method comprises the following steps: a video automatic clipping device retrieves the first source video data and the second source video data; obtaining a first group of video frames and a second group of video frames, wherein at least one frame of the first group of video frames in the first source video data is the same as an audio frame of the second group of video frames in the second source video data; the video automatic clipping equipment matches the audio frames corresponding to the first group of video frames with the audio frames corresponding to the second group of video frames and outputs matched audio range timestamps; obtaining quality parameters of the first set of video frames and the second set of video frames based on the audio range timestamp; and splicing the first group of video frames and the second group of video frames based on the quality parameters to obtain a final video. Through the mode, the multi-view source video can be synchronized based on the audio features, and the source video is not required to be manually screened by a user, so that the editing work of the multi-view source video can be automatically completed, the editing workload is reduced, and the user experience is improved.

Drawings

To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings needed in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present disclosure and therefore should not be considered as limiting the scope, and those skilled in the art can also derive other related drawings based on these drawings without inventive efforts.

Fig. 1 is a schematic view of an application scenario of an automatic video editing system provided in an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of an automatic video editing method provided by an embodiment of the present disclosure;

FIG. 3 is a functional block diagram of an automatic video editing apparatus according to an embodiment of the disclosure;

fig. 4 is a block diagram schematically illustrating a structure of a cloud clipping server for implementing the above-mentioned video automatic clipping method according to an embodiment of the present disclosure.

Detailed Description

The present disclosure is described in detail below with reference to the drawings, and the specific operation methods in the method embodiments can also be applied to the device embodiments or the system embodiments.

Fig. 1 is an interaction diagram of a video automatic clipping system 10 provided by an embodiment of the present disclosure. The video automatic clipping system 10 may include a cloud clip server 100 and a clip terminal 200 communicatively connected to the cloud clip server 100. The video automatic clipping system 10 shown in fig. 1 is only one possible example, and in other possible embodiments, the video automatic clipping system 10 may include only a portion of the components shown in fig. 1 or may also include other components.

In this embodiment, the clipboard terminal 200 may comprise a mobile device, a tablet computer, a laptop computer, etc., or any combination thereof. In some embodiments, the mobile device may include a smart home device, a wearable device, a smart mobile device, a virtual reality device, an augmented reality device, or the like, or any combination thereof. In some embodiments, the smart home devices may include control devices of smart electrical devices, smart monitoring devices, smart televisions, smart cameras, and the like, or any combination thereof. In some embodiments, the wearable device may include a smart bracelet, a smart lace, smart glass, a smart helmet, a smart watch, a smart garment, a smart backpack, a smart accessory, or the like, or any combination thereof. In some embodiments, the smart mobile device may include a smartphone, a personal digital assistant, a gaming device, and the like, or any combination thereof. In some embodiments, the virtual reality device and/or the augmented reality device may include a virtual reality helmet, virtual reality glass, a virtual reality patch, an augmented reality helmet, augmented reality glass, an augmented reality patch, or the like, or any combination thereof. For example, the virtual reality device and/or augmented reality device may include various virtual reality products and the like.

In this embodiment, the cloud clip server 100 and the clip terminal 200 in the video automatic clipping system 10 may cooperatively perform the video automatic clipping method described in the following method embodiments, and the detailed description of the following method embodiments may be referred to for the specific steps performed by the cloud clip server 100 and the clip terminal 200.

In order to solve the technical problem in the foregoing background art, fig. 2 is a flowchart illustrating a video automatic clipping method provided in an embodiment of the present disclosure, where the video automatic clipping method provided in this embodiment may be executed by the cloud clipping server 100 shown in fig. 1, and the video automatic clipping method is described in detail below.

Step S110, video automatic clipping equipment calls the first source video data and the second source video data;

step S120, the video automatic clipping device obtains a first group of video frames through the first source video data and obtains a second group of video frames through the second source video data, wherein at least one frame of the first group of video frames in the first source video data is the same as an audio frame of the second group of video frames in the second source video data;

step S130, the video automatic clipping device matches the audio frames corresponding to the first group of video frames with the audio frames corresponding to the second group of video frames, and outputs matched audio range timestamps;

step S140, acquiring quality parameters of the first group of video frames and the second group of video frames based on the audio range time stamp;

and S150, splicing the first group of video frames and the second group of video frames based on the quality parameters to obtain a final video.

In one possible embodiment, step S110 further includes:

step S111, the video automatic clipping device acquires environmental parameter information;

step S112, the video automatic clipping device determines the type of a source video according to the acquired environment parameter information;

step S113, the video automatic clipping device calls the first source video data and the second source video data according to the determined source video type, wherein the source video type of the first source video data is different from that of the second source video data.

In one possible embodiment, step S112 further includes:

step S1121, analyzing the source video shooting parameters based on the environment parameter information, wherein the source video shooting parameters are focal length and/or exposure;

step S1122, confirming the source video types of the first source video data and the second source video data based on the source video shooting parameters.

In one possible embodiment, step S130 further includes:

step S131, acquiring the audio frames and the time stamp information thereof corresponding to the first group of video frames and the audio frames and the time stamp information thereof corresponding to the second group of video frames;

step S132, matching the audio frames corresponding to the first group of video frames and the audio frames corresponding to the second group of video frames, and obtaining the audio range timestamp based on the timestamp information, where the audio range timestamp is a timestamp range in which the audio characteristics of the audio frames corresponding to the first group of video frames and the audio frames corresponding to the second group of video frames are the same.

In one possible embodiment, step S132 further includes:

step S1321, analyzing the waveform of the audio frame corresponding to the first group of video frames and the waveform of the audio frame corresponding to the second group of video frames;

and step S1322, when the waveforms of the two overlap, acquiring and recording the timestamp information.

In one possible embodiment, step S140 further includes:

step S141, if the first group of video frames and the second group of video frames are both within the audio range timestamp, determining quality parameters of the first group of video frames and the second group of video frames, wherein the quality parameters comprise video resolution and/or frame rate;

in step S142, the best source video data is selected for filling the audio range time stamp.

In one possible embodiment, step S142 further includes:

step S1421, if the quality parameter of the first group of video frames is greater than or equal to the quality parameter of the second group of video frames, using the first group of video frames to fill the audio range timestamp;

step S1422, if the quality parameter of the first set of video frames is less than the quality parameter of the second set of video frames, using the second set of video frames to fill the audio range timestamp.

Fig. 3 is a schematic diagram of functional modules of an automatic video editing device 300 according to an embodiment of the present disclosure, and in this embodiment, the automatic video editing device 300 may be divided into the functional modules according to the method embodiment executed by the cloud editing server 100, that is, the following functional modules corresponding to the automatic video editing device 300 may be used to execute the method embodiments executed by the cloud editing server 100. The video automatic clipping device 300 may include an acquisition unit 310, an extraction unit 320, a matching unit 330, an analysis unit 340, and a combination unit 350, and the functions of the functional modules of the video automatic clipping device 300 are described in detail below.

The capturing unit 310 may be configured to perform the step S110, namely, for the video automatic clipping device to retrieve the first source video data and the second source video data.

The extracting unit 320 may be configured to perform the step S120, namely, the video automatic clipping device obtains a first group of video frames from the first source video data and obtains a second group of video frames from the second source video data, wherein at least one frame of the first group of video frames in the first source video data is the same as an audio frame of the second group of video frames in the second source video data.

The matching unit 330 may be configured to perform the step S130, namely, for the video automatic clipping device to match the audio frame corresponding to the first group of video frames with the audio frame corresponding to the second group of video frames, and output a matched audio range timestamp.

The analyzing unit 340 may be configured to perform the above step S140, namely, to obtain the quality parameters of the first group of video frames and the second group of video frames based on the audio range time stamps.

The combining unit 350 may be configured to perform the step S150 described above, that is, to splice the first group of video frames and the second group of video frames based on the quality parameter to obtain a final video.

It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or can be implemented in the form of hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the collecting unit 310 may be a processing element separately set up, or may be integrated into a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and a processing element of the apparatus calls and executes the functions of the collecting unit 310. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when some of the above modules are implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor that can call program code. As another example, these modules may be integrated together, implemented in the form of a system-on-a-chip (SOC).

Fig. 4 illustrates a hardware structure diagram of the cloud clip server 100 for implementing the control device provided in the embodiment of the present disclosure, and as shown in fig. 4, the cloud clip server 100 may include a processor 110, a machine-readable storage medium 120, a bus 130, and a transceiver 140.

In a specific implementation process, at least one processor 110 executes computer-executable instructions stored in the machine-readable storage medium 120 (for example, included in the video automatic clipping apparatus 300 shown in fig. 3), so that the processor 110 may execute the video automatic clipping method according to the above method embodiment, where the processor 110, the machine-readable storage medium 120, and the transceiver 140 are connected through the bus 130, and the processor 110 may be configured to control transceiving actions of the transceiver 140, so as to perform data transceiving with the clipping terminal 200.

For a specific implementation process of the processor 110, reference may be made to the above-mentioned various method embodiments executed by the cloud clip server 100, which implement principles and technical effects are similar, and details of this embodiment are not described herein again.

In the embodiment shown in fig. 4, it should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

The machine-readable storage medium 120 may comprise high-speed RAM memory, and may also include non-volatile storage NVM, such as at least one disk memory.

The bus 130 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus 130 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.

In addition, the embodiment of the disclosure also provides a readable storage medium, and the readable storage medium stores computer-executable instructions, and when a processor executes the computer-executable instructions, the method for automatically clipping the video is realized.

The readable storage medium described above may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. Readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present disclosure, and not for limiting the same; while the present disclosure has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present disclosure.

Claims

1. A video automatic clipping method for a video automatic clipping device including first source video data and second source video data, comprising:

2. The method of claim 1, wherein the video automatic clipping device retrieves the first source video data and the second source video data, comprising:

the video automatic clipping equipment acquires environmental parameter information;

the video automatic clipping equipment determines the type of a source video according to the acquired environment parameter information;

and the video automatic clipping device calls the first source video data and the second source video data according to the determined source video type, wherein the source video type of the first source video data is different from that of the second source video data.

3. The method of claim 2, wherein the video automatic clipping device determines a source video type according to the acquired environment parameter information, comprising:

analyzing the source video shooting parameters based on the environment parameter information, wherein the source video shooting parameters are focal length and/or exposure;

confirming the source video type of the first source video data and the second source video data based on the source video shooting parameters.

4. The method of claim 1, wherein the video auto-clip device matches the audio frames corresponding to the first set of video frames with the audio frames corresponding to the second set of video frames and outputs matched audio range timestamps, comprising:

acquiring the audio frames and the time stamp information thereof corresponding to the first group of video frames and the audio frames and the time stamp information thereof corresponding to the second group of video frames;

and matching the audio frames corresponding to the first group of video frames with the audio frames corresponding to the second group of video frames, and obtaining the audio range timestamp based on the timestamp information, wherein the audio range timestamp is a timestamp range with the same audio characteristics of the audio frames corresponding to the first group of video frames and the audio frames corresponding to the second group of video frames.

5. The method of claim 4, wherein the audio feature is a waveform map of the audio frames, and wherein the matching the audio frames corresponding to the first set of video frames and the audio frames corresponding to the second set of video frames comprises:

analyzing the waveform diagram of the audio frame corresponding to the first group of video frames and the waveform diagram of the audio frame corresponding to the second group of video frames;

and when the waveforms of the two coincide, acquiring and recording the time stamp information.

6. The method of claim 1, wherein obtaining the quality parameters of the first set of video frames and the second set of video frames based on the audio range timestamps comprises:

if the first group of video frames and the second group of video frames are both within the audio range timestamp, judging quality parameters of the first group of video frames and the second group of video frames, wherein the quality parameters comprise video resolution and/or frame rate;

selecting the best source video data for populating the audio range timestamps.

7. The method of claim 6, wherein selecting the best source video data for populating the audio range timestamps comprises:

if the quality parameter of the first set of video frames is greater than or equal to the quality parameter of the second set of video frames, using the first set of video frames to fill the audio range timestamp;

if the quality parameter of the first set of video frames is less than the quality parameter of the second set of video frames, using the second set of video frames to fill the audio range timestamp.

8. An automatic video editing apparatus, comprising:

an analysis unit for obtaining quality parameters of the first and second sets of video frames based on the audio range timestamps;

and the combining unit is used for splicing the first group of video frames and the second group of video frames based on the quality parameters to obtain a final video.

9. A computer readable storage medium storing instructions/executable code which, when executed by a processor of a video automatic clipping device, causes the video automatic clipping device to implement the method of any of claims 1-7.

10. A cloud clipping server, characterized in that the cloud clipping server comprises a processor, a machine-readable storage medium, and a network interface, the machine-readable storage medium, the network interface and the processor are connected through a bus system, the network interface is used for being connected with at least one clipping terminal in a communication manner, the machine-readable storage medium is used for storing programs, instructions or codes, and the processor is used for executing the programs, instructions or codes in the machine-readable storage medium to execute the video automatic clipping method according to any one of claims 1 to 7.