CN114222156A

CN114222156A - Video editing method, video editing device, computer equipment and storage medium

Info

Publication number: CN114222156A
Application number: CN202111525505.6A
Authority: CN
Inventors: 杨颖凡; 邹伟力
Original assignee: China Construction Bank Corp
Current assignee: China Construction Bank Corp
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2022-03-22

Abstract

The present application relates to a video clipping method, apparatus, computer device, storage medium and computer program product. The method comprises the following steps: acquiring a real-time video stream; transmitting the real-time video stream to a cloud server; receiving audio and video data, wherein the audio and video data are obtained by encoding a real-time video stream by a cloud server; responding to the clipping operation, acquiring clipping materials and audio and video segments corresponding to the clipping operation, and sending the clipping materials and the audio and video segments to the cloud server; and receiving a target clipping video, generating a synthesis queue by the cloud server according to the clipping material and the audio and video segments, clipping and synthesizing the audio and video data according to the synthesis queue, and obtaining and pushing the target clipping video to the terminal. The method directly obtains the real-time video stream shot by the user, transmits the real-time video stream to the cloud server, and codes and clips the real-time video stream through the cloud server, so that the real-time performance of video transmission and clipping is improved, and the video clipping efficiency is improved.

Description

Video editing method, video editing device, computer equipment and storage medium

Technical Field

The present application relates to the technical field of cloud computing resource scheduling, and in particular, to a video editing method, apparatus, computer device, storage medium, and computer program product.

Background

With the coming of the 5G information age, technological innovation is brought about by information transmission and interaction experience. Short video refers to scene-type video short films with the fire and heat in recent years from tens of seconds to minutes; the short video clip is to enrich short video presentation modes by compressing the shot short video in length, splicing segments, adding stickers, adding characters, adding background music, adding filters, adding transition animations and the like through a clipping tool platform and the like.

The traditional video processing software has various functions, high requirements on equipment performance, complex editing process and complex interaction, real-time interaction cannot be realized in the editing process, and the video editing efficiency is low.

Disclosure of Invention

In view of the above, it is necessary to provide a video clipping method, apparatus, computer device, computer readable storage medium and computer program product capable of improving video clipping efficiency.

In a first aspect, the present application provides a video clipping method. The method comprises the following steps:

acquiring a real-time video stream;

transmitting the real-time video stream to a cloud server;

receiving audio and video data, wherein the audio and video data are obtained by encoding a real-time video stream by a cloud server;

responding to the clipping operation, acquiring clipping materials and audio and video segments corresponding to the clipping operation, and sending the clipping materials and the audio and video segments to the cloud server;

and receiving a target clipping video, generating a synthesis queue by the cloud server according to the clipping material and the audio and video segments, and clipping and synthesizing the audio and video data according to the synthesis queue to obtain the target clipping video.

In one embodiment, transmitting the real-time video stream to the cloud server comprises: and transmitting the real-time video stream to the cloud server through the real-time transmission service thread.

In one embodiment, before transmitting the real-time video stream to the cloud server, the method further includes: acquiring the video code rate of the real-time video stream; and when the video code rate exceeds the preset transmission code rate, actively dropping frames of the real-time video stream.

In a second aspect, the present application also provides a video clipping method. The method comprises the following steps:

receiving a real-time video stream transmitted by a terminal;

coding the real-time video stream to obtain and push audio and video data;

receiving a clipping material and an audio and video segment, wherein the clipping operation and the audio and video segment are obtained by responding to the clipping operation by a terminal and acquiring the clipping material and the audio and video segment corresponding to the clipping operation;

generating a synthesis queue according to the clipping material and the audio and video clips;

and editing and synthesizing the audio and video data according to the synthesis queue to obtain and push the target edited video to the terminal.

In one embodiment, the editing and synthesizing of the audio and video data according to the synthesis queue to obtain and push the target clip video to the terminal includes: reading the synthesis queues in sequence according to the time chain; editing and synthesizing the audio and video segments corresponding to the synthesis queue according to the synthesis queue to obtain edited segments; and assembling the clip segments to obtain and push the target clip video to the terminal.

In one embodiment, performing clipping synthesis on the audio/video segment corresponding to the composition queue according to the composition queue, and obtaining a clipped segment includes: and taking the first key frame in the audio and video segment as a starting point, and editing and synthesizing the audio and video segment according to the editing materials of the synthesis queue through an editing and synthesizing service thread to obtain an editing segment.

In a third aspect, the present application also provides a video clipping device. The device includes:

the video stream acquisition module is used for acquiring a real-time video stream;

the terminal transmission module is used for transmitting the real-time video stream to the cloud server;

the terminal receiving module is used for receiving audio and video data, and the audio and video data are obtained by encoding a real-time video stream by the cloud server;

the terminal response module is used for responding to the clipping operation, acquiring clipping materials and audio and video segments corresponding to the clipping operation, and sending the clipping materials and the audio and video segments to the cloud server;

and the target clip video receiving module is used for receiving a target clip video, the target clip video is generated into a synthesis queue by the cloud server according to the clip material and the audio and video segments, and the audio and video data are clipped and synthesized according to the synthesis queue to obtain the target clip video.

In one embodiment, the terminal transmission module is further configured to transmit the real-time video stream to the cloud server through the real-time transmission service thread.

In one embodiment, the video editing device further comprises an active frame dropping module, configured to obtain a video bitrate of the real-time video stream; and when the video code rate exceeds the preset transmission code rate, actively dropping frames of the real-time video stream.

In a fourth aspect, the present application further provides a video editing apparatus. The device includes:

the cloud receiving module is used for receiving the real-time video stream transmitted by the terminal;

the encoding module is used for encoding the real-time video stream to obtain and push audio and video data;

the terminal comprises a clipping receiving module, a processing module and a processing module, wherein the clipping receiving module is used for receiving clipping materials and audio and video fragments, and the clipping operations and the audio and video fragments are obtained by responding to the clipping operations and acquiring the clipping materials and the audio and video fragments corresponding to the clipping operations by the terminal;

the synthesis queue generating module is used for generating a synthesis queue according to the clipping material and the audio and video clips;

and the editing and synthesizing module is used for editing and synthesizing the audio and video data according to the synthesis queue to obtain and push the target editing video to the terminal.

In one embodiment, the clipping and combining module is further configured to read the combining queues in sequence according to the time chain; editing and synthesizing the audio and video segments corresponding to the synthesis queue according to the synthesis queue to obtain edited segments; and assembling the clip segments to obtain and push the target clip video to the terminal.

In one embodiment, the clipping and synthesizing module is further configured to clip and synthesize the audio and video segment according to the clipping material of the synthesis queue through the clipping and synthesizing service thread with a first key frame in the audio and video segment as a starting point, so as to obtain the clipped segment.

In a fifth aspect, the present application further provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the following steps when executing the computer program:

acquiring a real-time video stream;

transmitting the real-time video stream to a cloud server;

In a sixth aspect, the present application further provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the following steps when executing the computer program:

receiving a real-time video stream transmitted by a terminal;

coding the real-time video stream to obtain and push audio and video data;

In a seventh aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:

acquiring a real-time video stream;

transmitting the real-time video stream to a cloud server;

In an eighth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:

receiving a real-time video stream transmitted by a terminal;

coding the real-time video stream to obtain and push audio and video data;

In a ninth aspect, the present application further provides a computer program product. The computer program product comprising a computer program which when executed by a processor performs the steps of:

acquiring a real-time video stream;

transmitting the real-time video stream to a cloud server;

In a tenth aspect, the present application further provides a computer program product. The computer program product comprising a computer program which when executed by a processor performs the steps of:

receiving a real-time video stream transmitted by a terminal;

coding the real-time video stream to obtain and push audio and video data;

The video editing method, the video editing device, the computer equipment, the storage medium and the computer program product are used for acquiring a real-time video stream; transmitting the real-time video stream to a cloud server; receiving audio and video data, wherein the audio and video data are obtained by encoding a real-time video stream by a cloud server; responding to the clipping operation, acquiring clipping materials and audio and video segments corresponding to the clipping operation, and sending the clipping materials and the audio and video segments to the cloud server; and receiving a target clipping video, generating a synthesis queue by the cloud server according to the clipping material and the audio and video segments, and clipping and synthesizing the audio and video data according to the synthesis queue to obtain the target clipping video. According to the method, the real-time video stream shot by the user is directly obtained through the terminal, the real-time video stream is transmitted to the cloud server, and the cloud server is used for coding and editing the real-time video stream, so that the dependence on the performance of equipment is reduced, the instantaneity of video transmission and editing is improved, and the video editing efficiency is further improved.

Drawings

FIG. 1 is a diagram of an application environment of a video clipping method in one embodiment;

FIG. 2 is a flow diagram that illustrates a method for video editing in one embodiment;

FIG. 3 is a flowchart illustrating a video clipping method according to another embodiment;

FIG. 4 is a schematic flow chart of real-time video stream acquisition and encoding according to one embodiment;

FIG. 5 is a schematic diagram of the composition of a video frame in one embodiment;

FIG. 6 is a schematic diagram of a clip composition process in one embodiment;

FIG. 7 is a schematic diagram of a core frame cell in one embodiment;

FIG. 8 is a block diagram showing the structure of a video clip apparatus according to one embodiment;

FIG. 9 is a block diagram showing the construction of a video clipping device according to another embodiment;

FIG. 10 is a diagram showing an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Nowadays, the image-text transmission era gradually loses the transmission advantages, the short video transmission can capture eyeballs of users better, and the requirements of short video high-precision clipping, sticker adding, background music adding, filter adding, transition animation and the like can be met without integrating a terminal short video SDK aiming at small scenes such as short video marketing, user survival promotion and the like.

The traditional video editing method has the problems of high requirements on equipment performance, difficulty in handling large-scale software or incapability of meeting the requirements on video processing precision at a frame level and the like; meanwhile, with the development of fire and heat of short video applications in the mobile age in recent years, short video APPs such as "tremble", "fast-handed", and the like have become emerging mobile applications of billions of users. However, third-party short video clip SDKs need to be integrated for developing such applications, which is costly, expensive, and requires multi-terminal integration.

The cross-platform editing method is light, flexible, cross-platform, high-precision and multifunctional cloud-side.

Similarly, in the traditional cloud editing mode, the shot video needs to be uploaded to a cloud platform, and then an editing instruction is sent to the platform, so that the editing method is not real-time enough, and the interaction is complex. And time delay details of uploading, interactive synthesis, downloading and the like. According to the method and the device, a WebRTC communication technology is utilized, an uploading link is omitted, and the problems of complexity and efficiency of traditional cloud editing can be solved.

The cloud editing means that a video editing platform tool is placed on a cloud server, and the video is subjected to editing operations such as compressing length, splicing segments, adding stickers, adding characters, adding background music, adding filters, adding transition animations and the like through the server.

WebRTC, the name of which is abbreviated from Web Real-Time Communication (English), is an API that supports Web browsers for Real-Time voice conversations or video conversations. It was sourced at 1 st 6/2011 and was incorporated into the W3C recommendation of the world wide web consortium under the support of Google, Mozilla, Opera. WebRTC allows web applications or sites to establish point-to-point (Peer-to-Peer) connections between browsers to enable the transmission of video and/or audio streams or any other data without the aid of intermediaries. These standards encompassed by WebRTC make it possible for users to create Peer-to-Peer (Peer-to-Peer) data sharing and teleconferencing without the need to install any plug-ins or third party software.

The video clipping method provided by the embodiment of the application can be applied to the application environment shown in fig. 1. The terminal 102 communicates with the cloud server 104 through a network. The data storage system may store data that cloud server 104 needs to process. The data storage system may be integrated on the cloud server 104, or may be located on the cloud or other network server. The method comprises the steps that a terminal obtains a real-time video stream, the real-time video stream is transmitted to a cloud server 104, the cloud server 104 encodes the real-time video stream to obtain audio and video data, the terminal responds to a clipping operation of a user to obtain clipping materials and audio and video segments corresponding to the clipping operation, the clipping materials and the audio and video segments are sent to the cloud server 104, the cloud server generates a synthesis queue according to the clipping materials and the audio and video segments, the audio and video data are clipped and synthesized according to the synthesis queue and are sent to a terminal 102. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart car-mounted devices, and the like. The portable wearable device can be a smart watch, a smart bracelet, a head-mounted device, and the like. The server 104 may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.

In one embodiment, as shown in fig. 2, a video clipping method is provided, which is exemplified by the method applied to the terminal in fig. 1, and includes the following steps:

step 202, a real-time video stream is obtained.

Specifically, the terminal acquires a real-time video stream acquired by a terminal camera through a video stream acquisition interface. Further, the terminal acquires the real-time video stream acquired by the terminal camera through a getUsermedia API (application program interface) method of the WebRTC.

And step 204, transmitting the real-time video stream to a cloud server.

Specifically, the terminal transmits the real-time video stream collected by the camera to the cloud server through the bidirectional data transmission channel. The bidirectional data transmission channel is a transmission channel established by the terminal and the cloud server through a UDP protocol.

And step 206, receiving audio and video data.

The audio and video data are obtained by encoding the real-time video stream by the cloud server.

Specifically, the terminal receives audio and video data transmitted by the cloud server through the bidirectional data transmission channel, and pushes key frames in the audio and video data to a display interface for displaying, so that a user can preview the audio and video in real time.

And step 208, responding to the clipping operation, acquiring the clipping material and the audio and video segments corresponding to the clipping operation, and sending the clipping material and the audio and video segments to the cloud server.

The clip material library provides a plurality of clipped materials, and is a rich clip content function of the cloud clip. Filter function, rotation, etc. as provided by FFMpeg; or gif motion pictures, light effect animation, background music, etc.; each material comprises a material id, a material type, a composite operation action instruction and the like. The subsequent synthesis queues are analyzed one by one, and the corresponding synthesis method is called.

Specifically, the method comprises the steps of responding to the clipping operation of a user on audio and video data on a terminal display interface, obtaining clipping materials, synthetic operation action instructions and corresponding audio and video segments corresponding to the clipping operation, and transmitting the clipping materials, the synthetic operation action instructions and the audio and video segments to a cloud server through a bidirectional data transmission channel.

Step 210, receive the target clip video.

The target clipping video is obtained by the cloud server generating a synthesis queue according to the clipping material and the audio and video segments and clipping and synthesizing the audio and video data according to the synthesis queue.

Specifically, after the audio and video clip is completed by the cloud server, the target clip video is transmitted to the terminal through the bidirectional data transmission channel, and the terminal pushes the target clip video to the display interface after receiving the target clip video, so that a user can browse the target clip video.

In the video clipping method, the video clipping apparatus, the computer device, the storage medium, and the computer program product, a real-time video stream is acquired; transmitting the real-time video stream to a cloud server; receiving audio and video data, wherein the audio and video data are obtained by encoding a real-time video stream by a cloud server; responding to the clipping operation, acquiring clipping materials and audio and video segments corresponding to the clipping operation, and sending the clipping materials and the audio and video segments to the cloud server; and receiving a target clipping video, generating a synthesis queue by the cloud server according to the clipping material and the audio and video segments, and clipping and synthesizing the audio and video data according to the synthesis queue to obtain the target clipping video. According to the method, the real-time video stream shot by the user is directly obtained through the terminal, the real-time video stream is transmitted to the cloud server, and the cloud server is used for coding and editing the real-time video stream, so that the dependence on the performance of equipment is reduced, the instantaneity of video transmission and editing is improved, and the video editing efficiency is further improved.

In an alternative embodiment, transmitting the real-time video stream to the cloud server comprises: and transmitting the real-time video stream to the cloud server through the real-time transmission service thread.

The real-time transmission service thread carries out transmission service based on a WebRTC communication service protocol.

Specifically, after acquiring the implementation video stream, the terminal establishes connection with the cloud server through an RTCPeerConnection method to obtain a bidirectional data transmission channel RTCDataChannel and downlink information, where the downlink information is handshake information when the terminal establishes connection with the cloud server, and transmits the real-time video stream to the cloud server based on RTP and RTCP protocols according to the bidirectional data transmission channel and the downlink information.

The RTP and RTCP protocols are data packet formats transmitted by WebRTC on the internet, are defined by WebRTC specifications, and include message formats, transmission rules, and the like. The RTP protocol is a basic protocol for streaming media transmission on the Internet, and specifies a standard data packet format for transmitting audio and video on the Internet. The RTP protocol only guarantees the transmission of real-time data, and the RTCP protocol is responsible for the transmission quality guarantee of streaming media, and provides services such as flow control and congestion control. During the RTP session, the terminal and the cloud server periodically send RTCP messages to each other. The message contains statistical information such as data sending and receiving of the terminal and the cloud server, and the terminal and the cloud server can dynamically control the streaming media transmission quality.

In an optional embodiment, before transmitting the real-time video stream to the cloud server and transmitting the real-time video stream to the cloud server, the method further includes: acquiring the video code rate of the real-time video stream; and when the video code rate exceeds the preset transmission code rate, actively dropping frames of the real-time video stream.

The video code rate is the number of data bits transmitted per unit time during data transmission, and is generally kbps, i.e., kilobits per second. The popular understanding is that the sampling rate is higher, the higher the sampling rate in unit time is, the higher the precision is, and the closer the processed file is to the original file.

Specifically, frame dropping logic is designed in a WebRTC communication service protocol to track the video code rate of the current real-time video stream, and when the current video code rate exceeds the preset transmission code rate of an encoder, the terminal actively drops frames of the real-time video stream.

And when the accumulated code rate of the real-time video stream transmission exceeds the preset accumulated transmission code rate of the encoder, the terminal actively loses frames of the real-time video stream.

BufferFullnessSkip＝Σ(CodedBitsPerFrame–TargetBitsPerFrame)

The BufferFullnessskip frame dropping logic checks each frame of code, the TargetBlasterFrame represents the preset code rate of each frame of code, the CodedBererFrame represents the current transmission code rate of each frame, and when the cumulative code rate of real-time video stream transmission exceeds the preset cumulative transmission code rate (namely the buffer is full), the terminal actively drops frames of the real-time video stream.

And when the maximum code rate of the real-time video stream transmission exceeds the preset maximum transmission code rate of the encoder, the terminal actively loses frames of the real-time video stream.

BufferMaxBRFullness＝Σ(CodedBitsPerFrame–MaxBitsPerFrame)

And the buffer MaxBRFullness frame loss logic checks each frame of video, and MaxBitsPerFrame represents a preset maximum transmission code rate.

The coded audio and video data comprises three frame types of I frames, P frames and B frames, active frame loss aims at the P frames in the real-time video stream, and the I frames cannot be lost. I frames are key frames, which can be understood as individual images, P frames are full pictures that need to be formed by referring to the previous I frame or B frame, and similarly, B frames need to be formed by referring to the previous I frame or P frame or the next P frame.

The active frame dropping process can be performed not only in the process of implementing video stream transmission, but also in the process of coding, frame dropping in the process of processing real-time video stream by the cloud server, frame dropping by the decoder, and the like. The active frame dropping can be sent during video stream acquisition and transmission, and can also occur in the processes of sending back after the composition of the clips and the like.

In this embodiment, the active P frame loss can remove the non-important frame without affecting the quality, and has low error rate and less packet loss, thereby improving the transmission efficiency and the transmission success rate, and also improving the synthesis efficiency and the high-precision editing effect.

In one embodiment, as shown in fig. 3, a video clipping method is provided, which is exemplified by the method applied to the cloud server in fig. 1, and includes the following steps:

step 302, receiving a real-time video stream transmitted by a terminal.

Specifically, the cloud server receives a real-time video stream transmitted by the terminal through a bidirectional data transmission channel established with the terminal.

And 304, coding the real-time video stream to obtain and push audio and video data.

Specifically, as shown in fig. 4, the cloud server performs VP8 encoding on a video stream in the real-time video stream, and performs Opus encoding on an audio in the real-time video stream, so as to obtain encoded audio and video data. The coded audio and video data comprises three frame types of I frame, P frame and B frame. As shown in fig. 5, a GOP (Group of Pictures) is a Group of close Pictures consisting of an I-frame and a plurality of P-frames or B-frames. Video frame GOP strategies affect the coding quality, a GOP is a group of consecutive pictures. One group is a segment of the image encoded data stream, starting with an I-frame and ending with the next I-frame.

I frame: an intra-coded frame is an entire frame. P frame: a forward predictive coded frame, which is an incomplete frame, is generated by reference to previous I or P frames. B frame: the bidirectional predictive interpolation coding frame is generated by coding with reference to the previous and next image frames, and the B frame depends on the I frame or the P frame which is the nearest before the B frame or the P frame which is the nearest after the B frame. The audio-video decoder can decode the I frame directly, but the P frame and the B frame must rely on the I frame, or the P or B before and after the I frame.

Step 306, receiving the clip material and the audio-video clip.

The terminal responds to the clipping operation and obtains clipping materials and audio and video segments corresponding to the clipping operation.

Specifically, the cloud server receives the editing material, the synthesis operation action instruction and the audio and video clip sent by the terminal through the bidirectional data transmission channel.

And 308, generating a synthesis queue according to the clip material and the audio and video clips.

The composition processing queue is a series of instruction set methods issued to the clip composition service thread.

Specifically, each instruction takes a key I frame as a starting point, and the processing time length is n I frame intervals, that is, the length of an audio/video clip. Each command head is used for processing the type of the clip material, and the command load is a processing object, such as a video frame, an audio track, a transition filter, a sticker animation and the like. And the cloud server generates a synthesis queue for each audio and video segment according to the clip material of each audio and video segment and the synthesis operation action instruction through a clip synthesis service thread.

And 310, editing and synthesizing the audio and video data according to the synthesis queue to obtain and push the target editing video to the terminal.

Specifically, the cloud server clips each composition queue through a clipping composition service thread, synthesizes the clipped composition queues, and obtains and pushes a target clip video to the terminal.

In an optional embodiment, as shown in fig. 6, the editing and synthesizing the audio and video data according to the synthesis queue to obtain and push the target clip video to the terminal includes: reading the synthesis queues in sequence according to the time chain; editing and synthesizing the audio and video segments corresponding to the synthesis queue according to the synthesis queue to obtain edited segments; and assembling the clip segments to obtain and push the target clip video to the terminal.

Specifically, a time chain of the clip is obtained according to a time stamp (pts (presentation timestamp) of each audio/video clip and a time stamp for rendering), and each synthesis queue is sequentially read according to the time chain; editing and synthesizing the audio and video segments corresponding to the synthesis queue through an editing and synthesizing service thread to obtain edited segments; and assembling the clip segments according to the time chain to obtain and push the target clip video to the terminal.

Clip segments (i.e., video frames) are presented in accordance with the time stamps of the PTS. The dts (decoding timestamp) decoding timestamp is used for video decoding.

The terminal can also continuously respond to the clipping operation of the user, a buffer queue is generated according to the new clipping material and the audio and video segments, the audio and video data are clipped and synthesized according to the buffer queue, and the target clipping video is obtained and pushed to the terminal.

In the embodiment, the dynamic adjustment of the user on the audio and video is stored in the buffer intermediate state, the target clip video can be directly clipped and generated according to the buffer queue, and the cloud clipping efficiency is improved.

In an optional embodiment, performing clipping synthesis on the audio/video segment corresponding to the composition queue according to the composition queue, and obtaining a clipped segment includes: and taking the first key frame in the audio and video segment as a starting point, and editing and synthesizing the audio and video segment according to the editing materials of the synthesis queue through an editing and synthesizing service thread to obtain an editing segment.

Specifically, with the first key frame in the audio/video segment as the starting point, by using a clipping composition service thread (i.e., composition service in fig. 6), clipping composition is performed on each frame in the audio/video segment according to the clipping material in the composition queue, so that each frame queue segment is encoded into a clipping segment by FFMpeg.

Furthermore, FFMpeg is a core technology for processing video clip composition, a clip composition service thread reads a plurality of queues (i.e., composition queues) of video content, audio content after audio mixing, and selected clip material content which need to be combined after processing according to a time chain, then starts to combine the video content, the audio content, the material content, and the like, respectively, and the combining is implemented by encoding the video content and the audio content based on FFMpeg, combining the encoded video and audio (clip segments) into an integral media file (i.e., a target clip video), and finally externally outputting the combined target clip video, which can be output in an mp4 format or other formats, which is not limited herein.

In the embodiment, video editing and synthesizing are realized at the cloud server side, and the performance pressure of the editing operation on the equipment is effectively transferred to the cloud side; efficient communication services also reduce system processing delays. The method comprises the steps of carrying out multi-thread transmission on real-time video streams, clip materials, audio and video clips and target clip videos through a WebRTC transmission protocol, and particularly can be realized through MessageQueue of the WebRTC.

For example, to facilitate understanding of the video clipping method provided by the present application, the video clipping method provided by the embodiment of the present application is described in detail in a video clipping process in which a terminal interacts with two ends of a cloud server, and specifically includes:

(1) and the terminal acquires the real-time video stream and transmits the real-time video stream to the cloud server.

(2) The cloud server receives the real-time video stream transmitted by the terminal, encodes the real-time video stream, and obtains and pushes audio and video data.

(3) And the terminal receives audio and video data.

(4) And the terminal responds to the clipping operation, acquires the clipping material and the audio and video segments corresponding to the clipping operation, and sends the clipping material and the audio and video segments to the cloud server.

(5) The cloud server receives the clipping material and the audio and video clips; generating a synthesis queue according to the clipping material and the audio and video clips; reading the synthesis queues in sequence according to the time chain; taking a first key frame in the audio and video segments as a starting point, and editing and synthesizing the audio and video segments according to editing materials of a synthesis queue through an editing and synthesizing service thread to obtain editing segments; and assembling the clip segments to obtain and push the target clip video to the terminal.

(6) The terminal receives the target clip video.

As shown in fig. 7, the present application is divided into three core processing device units:

(1) front-end interaction layer (implemented by terminal): WebRTC audio-video collection, video frame preview (key frame preview of audio-video data), obtaining clipping operation (clipping interaction instruction) and the like;

(2) cloud clipping logical layer (implemented by cloud server): editing material management, timeline management (time stamp of synthesis queue), buffer intermediate state (buffer queue storage), synthesis queue and the like;

(3) cloud clip processing layer (implemented by cloud server): WebRTC communication service (real-time transport service thread), composition of clips (composition of clips service thread), clip output, etc.

It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.

Based on the same inventive concept, the embodiment of the present application further provides a video clipping device for implementing the video clipping method mentioned above. The implementation scheme for solving the problem provided by the apparatus is similar to the implementation scheme described in the above method, so specific limitations in one or more embodiments of the video editing apparatus provided below can be referred to the limitations of the video editing method in the foregoing, and are not described herein again.

In one embodiment, as shown in fig. 8, there is provided a video clipping device applied to a terminal, including: a video stream acquisition module 802, a terminal transmission module 804, a terminal reception module 806, a terminal response module 808, and a target clip video reception module 810, wherein:

a video stream acquiring module 802, configured to acquire a real-time video stream;

and a terminal transmission module 804, configured to transmit the real-time video stream to a cloud server.

And the terminal receiving module 806 is configured to receive audio and video data, where the audio and video data is obtained by encoding a real-time video stream by a cloud server.

And the terminal response module 808 is configured to respond to the clipping operation, acquire the clipping material and the audio/video segments corresponding to the clipping operation, and send the clipping material and the audio/video segments to the cloud server.

And the target clip video receiving module 810 is configured to receive a target clip video, generate a synthesis queue for the target clip video according to the clip material and the audio/video segments by the cloud server, and perform clipping and synthesis on the audio/video data according to the synthesis queue to obtain the target clip video.

In an optional embodiment, the terminal transmission module 804 is further configured to transmit the real-time video stream to the cloud server through a real-time transmission service thread.

In an optional embodiment, the video editing apparatus further includes an active frame dropping module, configured to obtain a video bitrate of the real-time video stream; and when the video code rate exceeds the preset transmission code rate, actively dropping frames of the real-time video stream.

In one embodiment, as shown in fig. 9, there is provided a video clip apparatus applied to a cloud server, including: a cloud receiving module 902, an encoding module 904, a clip receiving module 906, a composition queue generating module 908, and a clip composition module 910, wherein:

a cloud receiving module 902, configured to receive a real-time video stream transmitted by a terminal;

the encoding module 904 is configured to encode the real-time video stream to obtain and push audio and video data;

the editing receiving module 906 is configured to receive an editing material and an audio/video segment, where the editing operation and the audio/video segment are obtained by a terminal responding to the editing operation and acquiring the editing material and the audio/video segment corresponding to the editing operation;

a composition queue generating module 908, configured to generate a composition queue according to the clip material and the audio/video segments;

and the clipping and synthesizing module 910 is configured to clip and synthesize the audio and video data according to the synthesis queue, so as to obtain and push the target clipped video to the terminal.

In one embodiment, the clip composition module 910 is further configured to read the composition queues in sequence according to a time chain; editing and synthesizing the audio and video segments corresponding to the synthesis queue according to the synthesis queue to obtain edited segments; and assembling the clip segments to obtain and push the target clip video to the terminal.

In one embodiment, the clipping and synthesizing module 910 is further configured to clip and synthesize the audio and video segment according to the clipping material of the synthesis queue by using a first key frame in the audio and video segment as a starting point through a clipping and synthesizing service thread, so as to obtain a clipped segment.

The various modules in the video clipping device described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a video clipping method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

acquiring a real-time video stream;

transmitting the real-time video stream to a cloud server;

In one embodiment, the processor, when executing the computer program, further performs the steps of: transmitting the real-time video stream to the cloud server includes: and transmitting the real-time video stream to the cloud server through the real-time transmission service thread.

In one embodiment, the processor, when executing the computer program, further performs the steps of: before transmitting the real-time video stream to the cloud server, the method further includes: acquiring the video code rate of the real-time video stream; and when the video code rate exceeds the preset transmission code rate, actively dropping frames of the real-time video stream.

receiving a real-time video stream transmitted by a terminal;

coding the real-time video stream to obtain and push audio and video data;

In one embodiment, the processor, when executing the computer program, further performs the steps of: the method for editing and synthesizing the audio and video data according to the synthesis queue to obtain and push the target clip video to the terminal comprises the following steps: reading the synthesis queues in sequence according to the time chain; editing and synthesizing the audio and video segments corresponding to the synthesis queue according to the synthesis queue to obtain edited segments; and assembling the clip segments to obtain and push the target clip video to the terminal.

In one embodiment, the processor, when executing the computer program, further performs the steps of: according to the synthesis queue, the audio and video clips corresponding to the synthesis queue are clipped and synthesized, and the clipping clip obtaining comprises the following steps: and taking the first key frame in the audio and video segment as a starting point, and editing and synthesizing the audio and video segment according to the editing materials of the synthesis queue through an editing and synthesizing service thread to obtain an editing segment.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

acquiring a real-time video stream;

transmitting the real-time video stream to a cloud server;

In one embodiment, the computer program when executed by the processor further performs the steps of: transmitting the real-time video stream to the cloud server includes: and transmitting the real-time video stream to the cloud server through the real-time transmission service thread.

In one embodiment, the computer program when executed by the processor further performs the steps of: before transmitting the real-time video stream to the cloud server, the method further includes: acquiring the video code rate of the real-time video stream; and when the video code rate exceeds the preset transmission code rate, actively dropping frames of the real-time video stream.

receiving a real-time video stream transmitted by a terminal;

coding the real-time video stream to obtain and push audio and video data;

In one embodiment, the computer program when executed by the processor further performs the steps of: the method for editing and synthesizing the audio and video data according to the synthesis queue to obtain and push the target clip video to the terminal comprises the following steps: reading the synthesis queues in sequence according to the time chain; editing and synthesizing the audio and video segments corresponding to the synthesis queue according to the synthesis queue to obtain edited segments; and assembling the clip segments to obtain and push the target clip video to the terminal.

In one embodiment, the computer program when executed by the processor further performs the steps of: according to the synthesis queue, the audio and video clips corresponding to the synthesis queue are clipped and synthesized, and the clipping clip obtaining comprises the following steps: and taking the first key frame in the audio and video segment as a starting point, and editing and synthesizing the audio and video segment according to the editing materials of the synthesis queue through an editing and synthesizing service thread to obtain an editing segment.

In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, performs the steps of:

acquiring a real-time video stream;

transmitting the real-time video stream to a cloud server;

receiving a real-time video stream transmitted by a terminal;

coding the real-time video stream to obtain and push audio and video data;

It should be noted that video data (including but not limited to data for editing, stored data, presented data, and the like) referred to in the present application is data authorized by a user or sufficiently authorized by each party.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims

1. A method of video clipping, the method comprising:

acquiring a real-time video stream;

transmitting the real-time video stream to a cloud server;

receiving audio and video data, wherein the audio and video data are obtained by encoding the real-time video stream by the cloud server;

responding to a clipping operation, acquiring clipping materials and audio and video segments corresponding to the clipping operation, and sending the clipping materials and the audio and video segments to the cloud server;

and receiving a target clipping video, wherein the target clipping video is obtained by generating a synthesis queue by the cloud server according to the clipping material and the audio and video segments, and clipping and synthesizing the audio and video data according to the synthesis queue.

2. The method of claim 1, wherein the transmitting the real-time video stream to a cloud server comprises:

and transmitting the real-time video stream to a cloud server through a real-time transmission service thread.

3. The method of claim 1, wherein prior to transmitting the real-time video stream to the cloud server, further comprising:

acquiring the video code rate of the real-time video stream;

and when the video code rate exceeds a preset transmission code rate, actively losing frames of the real-time video stream.

4. A method of video clipping, the method comprising:

receiving a real-time video stream transmitted by a terminal;

coding the real-time video stream to obtain and push audio and video data;

and editing and synthesizing the audio and video data according to the synthesis queue to obtain and push a target editing video to a terminal.

5. The method according to claim 4, wherein the editing and synthesizing the audio and video data according to the synthesis queue to obtain and push a target clip video to a terminal comprises:

reading the synthesis queues in sequence according to the time chain;

editing and synthesizing the audio and video clips corresponding to the synthesis queue according to the synthesis queue to obtain edited clips;

and assembling the clip segments to obtain and push a target clip video.

6. The method according to claim 5, wherein the performing clipping synthesis on the audio/video segments corresponding to the synthesis queue according to the synthesis queue to obtain clipped segments comprises:

and taking the first key frame in the audio and video segment as a starting point, and editing and synthesizing the audio and video segment according to the editing materials of the synthesis queue through an editing and synthesizing service thread to obtain an editing segment.

7. A video clipping apparatus, characterized in that the apparatus comprises:

the terminal transmission module is used for transmitting the real-time video stream to a cloud server;

the terminal receiving module is used for receiving audio and video data, and the audio and video data are obtained by encoding the real-time video stream by the cloud server;

the terminal response module is used for responding to the clipping operation, acquiring clipping materials and audio and video segments corresponding to the clipping operation, and sending the clipping materials and the audio and video segments to the cloud server; and the target clip video receiving module is used for receiving a target clip video, the target clip video is generated into a synthesis queue by the cloud server according to the clip material and the audio and video segments, and the audio and video data is clipped and synthesized according to the synthesis queue to obtain the target clip video.

8. The apparatus of claim 7, wherein the terminal transmission module is further configured to transmit the real-time video stream to a cloud server through a real-time transmission service thread.

9. The apparatus of claim 7, wherein the video clipping apparatus further comprises a frame dropping module configured to:

acquiring the video code rate of the real-time video stream;

10. A video clipping apparatus, characterized in that the apparatus comprises:

the coding module is used for coding the real-time video stream to obtain and push audio and video data;

the terminal response module is used for responding to the clipping operation and acquiring the clipping material and the audio and video segments corresponding to the clipping operation;

11. The apparatus of claim 10, wherein the clip composition module is further configured to read composition queues in sequence according to a time chain;

and assembling the clip segments to obtain and push the target clip video to the terminal.

12. The apparatus of claim 11, wherein the clipping and synthesizing module is further configured to clip and synthesize the audio/video segment according to the clipping material of the synthesis queue by using a clipping and synthesizing service thread with a first key frame in the audio/video segment as a starting point, so as to obtain a clipped segment.

13. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 6.

14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.

15. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 6 when executed by a processor.