CN115225941A

CN115225941A - Video processing method, device, equipment and storage medium

Info

Publication number: CN115225941A
Application number: CN202110413773.2A
Authority: CN
Inventors: 周雄峰
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-04-16
Filing date: 2021-04-16
Publication date: 2022-10-21

Abstract

The embodiment of the application discloses a video processing method, a video processing device, video processing equipment and a storage medium, and is suitable for the fields of computers, cloud computing and the like. The method comprises the following steps: determining a clipping time interval of a video to be processed; determining a target image frame sequence according to the cutting time interval and the time stamp of each image frame in the image frame sequence of the video to be processed, and determining a target audio frame sequence according to the cutting time interval and the time stamp of each audio frame in the audio frame sequence of the video to be processed; determining the playing positions of each image frame in the target image frame sequence and each audio frame in the target audio sequence in the cut video corresponding to the video to be processed according to the time stamp of each image frame in the target image frame sequence and the time stamp of each audio frame in the target audio sequence; and obtaining the cutting video according to the target audio frame sequence, the target image frame sequence and the playing position. By adopting the method and the device, the video to be processed can be cut to obtain the cut video with synchronous sound and picture, and the applicability is high.

Description

Video processing method, device, equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a video processing method, apparatus, device, and storage medium.

Background

With the continuous development of computer technology and multimedia technology, people also tend to watch videos and further clip the videos on terminal devices such as computers, mobile phones or tablet computers. The video clipping processing comprises clipping the video, and clipping the video with longer time into a short video with shorter time so as to achieve the purposes of short video sharing, video highlight interception and the like.

In an existing video clipping scheme, a start time point and an end time point for clipping a video image are often determined first, and then a video is directly clipped based on the start time point and the end time point. Based on the existing video cutting scheme, because the original video may have the phenomenon of sound-picture asynchronism, the obtained cut video still has the problem of sound-picture asynchronism, and the applicability is poor.

Disclosure of Invention

The embodiment of the application provides a video processing method, a video processing device and a storage medium, which can cut a video to obtain a cut video with audio and image matched, can cut a video to be processed to obtain a cut video with synchronous audio and video, and have high applicability.

In one aspect, an embodiment of the present application provides a video processing method, where the method includes:

determining a clipping time interval of a video to be processed;

determining a target image frame sequence according to the cutting time interval and the time stamp of each image frame in the image frame sequence of the video to be processed, and determining a target audio frame sequence according to the cutting time interval and the time stamp of each audio frame in the audio frame sequence of the video to be processed;

determining the playing positions of the image frames in the target image frame sequence and the audio frames in the target audio sequence in the cut video corresponding to the video to be processed according to the time stamps of the image frames in the target image frame sequence and the time stamps of the audio frames in the target audio sequence;

and obtaining the cutting video according to the target audio frame sequence, the target image frame sequence and the playing positions corresponding to the image frames in the target image frame sequence and the audio frames in the target audio sequence.

In another aspect, an embodiment of the present application provides a video processing apparatus, including:

the cutting time interval determining module is used for determining the cutting time interval of the video to be processed;

a frame sequence determining module, configured to determine a target image frame sequence according to the clipping time interval and the timestamp of each image frame in the image frame sequence of the video to be processed, and determine a target audio frame sequence according to the clipping time interval and the timestamp of each audio frame in the audio frame sequence of the video to be processed;

a playing position determining module, configured to determine, according to the time stamp of each image frame in the target image frame sequence and the time stamp of each audio frame in the target audio sequence, a playing position of each image frame in the target image frame sequence and a playing position of each audio frame in the target audio sequence in a clipped video corresponding to the video to be processed;

and the cutting video determining module is used for obtaining the cutting video according to the target audio frame sequence, the target image frame sequence and the playing positions corresponding to each image frame in the target image frame sequence and each audio frame in the target audio sequence.

In another aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the processor and the memory are connected to each other;

the memory is used for storing computer programs;

the processor is configured to execute the video processing method provided by the embodiment of the application when the computer program is called.

In another aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored, and the computer program is executed by a processor to implement a video processing method provided by an embodiment of the present application.

In another aspect, the present application provides a computer program product or a computer program, which includes computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the video processing method provided by the embodiment of the application.

In the embodiment of the application, the target image frame sequence is determined by cutting the time interval and the time stamp of each image frame in the image frame sequence of the video to be processed, and the target audio frame sequence is determined by cutting the time interval and the time stamp of each audio frame in the audio frame sequence of the video to be processed, so that the target image frame sequence and the target audio frame sequence simultaneously correspond to the cutting time interval, and the situation that a sound segment corresponding to the target audio frame sequence does not correspond to a video picture corresponding to the target image frame sequence is avoided. Furthermore, after the playing positions of the image frames of the target image frame sequence and the audio frames of the target audio sequence in the cut video corresponding to the video to be processed are determined, the image frames of the target image frame sequence and the audio frames of the target audio sequence can be located at the corresponding positions in the cut video, the condition that sound and pictures of the cut video are not synchronous is further avoided, the video quality of the cut video is improved, and the applicability is high.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic view of a scene of a video processing method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a video processing method according to an embodiment of the present application;

FIG. 3a is a schematic diagram of a scene for determining a sequence of target image frames according to an embodiment of the present application;

FIG. 3b is a schematic diagram of another scenario for determining a sequence of target image frames according to an embodiment of the present application;

FIG. 4a is a schematic diagram of a scene of a playing sequence of image frames and audio frames provided by an embodiment of the present application;

FIG. 4b is a schematic diagram of another scene of a playing sequence of image frames and audio frames provided by an embodiment of the present application;

FIG. 4c is a schematic diagram of another scene of a playing sequence of image frames and audio frames according to an embodiment of the present application;

FIG. 5a is a schematic diagram of a scenario of time stamp synchronization provided by an embodiment of the present application;

fig. 5b is a schematic diagram of another scenario of time stamp synchronization provided in an embodiment of the present application;

FIG. 6a is a schematic diagram of a scene of processing an audio frame according to an embodiment of the present application;

FIG. 6b is a schematic diagram of another scenario of processing an audio frame according to an embodiment of the present application;

fig. 7 is a schematic view of a scene for determining a playing position according to an embodiment of the present application;

fig. 8a is a schematic view of an application scenario of a video processing method according to an embodiment of the present application;

fig. 8b is a schematic view of another application scenario of the video processing method provided in the embodiment of the present application;

fig. 9 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

Referring to fig. 1, fig. 1 is a schematic view of a scene of a video processing method according to an embodiment of the present disclosure. In fig. 1, a to-be-processed video 100 with a duration of 3 minutes is shown, and a cropping time interval 200 of the to-be-processed video 100 is 1 min 20 s to 2 min 20 s. For the to-be-processed video 100, the video content thereof is mainly composed of images and audio, e.g., the to-be-processed video 100 is composed of video pictures and video sound content.

The duration of the cutting time interval 200 is the playing duration of the cut video 500 obtained after the video 100 to be processed is processed.

The video frame of the video 100 to be processed is formed by continuously playing each image frame in the image frame sequence 301 of the video 100 to be processed, each image frame is arranged in the image frame sequence 301 according to the sequence of the timestamps, and the time interval between every two adjacent image frames in the image frame sequence 301 is the playing interval of the image content corresponding to the two image frames.

The sound of the video 100 to be processed is formed by continuously playing each audio frame in the audio frame sequence 302 of the video 100 to be processed, each audio frame is arranged in the audio frame sequence 401 according to the sequence of the timestamps, and the time interval between every two adjacent audio frames in the audio frame sequence 302 is the playing interval of the sound content corresponding to the two audio frames.

Further, the target image frame sequence 401 is determined according to the clipping time interval 200 of the video 100 to be processed and the time stamp of each image frame in the image frame sequence 301 of the video 100 to be processed. The video frame of the cropped video 500 corresponding to the video 100 to be processed is formed by continuously playing each image frame in the target image frame sequence 401. The image frames in the target image frame sequence 401 are partial image frames in the image frame sequence 301 of the video to be processed.

Further, the target audio frame sequence 402 is determined according to the clipping time interval 200 of the video 100 to be processed and the time stamp of each audio frame in the audio frame sequence 302 of the video 100 to be processed. Wherein, the sound of the cropped video 500 corresponding to the video 100 to be processed is formed by continuously playing each audio frame in the target audio frame sequence 402. The audio frames in the target audio frame sequence 402 are part of the audio frames in the audio frame sequence 302 of the video to be processed.

Based on the time stamp of each image frame in the target image frame sequence 401 and the time stamp of each audio frame in the target audio frame sequence 402, the playing position of each image frame in the target image frame sequence 401 and each audio frame in the target audio frame sequence 402 in the cropped video 500 is determined, and further based on the playing position of each audio frame and each audio frame, the target image frame sequence 401 and the target audio frame sequence 402, the cropped video 500 corresponding to the video 100 to be processed can be obtained. The cropped video 500 obtained based on the target image frame sequence 401 and the target audio frame sequence 402 is a partial segment of the video 100 to be processed.

Referring to fig. 2, fig. 2 is a schematic flow chart of a video processing method according to an embodiment of the present application. As shown in fig. 2, a video processing method provided in an embodiment of the present application may include the following steps:

and S21, determining a cutting time interval of the video to be processed.

In some feasible embodiments, the video to be processed may be a video recorded in real time, such as a game picture recorded by screen recording software and the like, a video recorded by a camera, and the like, may also be a video downloaded through a network, and may also be a video imported from a fixed storage space, and the like, and may specifically be determined based on requirements of an actual application scene, which is not limited herein.

The fixed storage space may be a hard disk, an equipment memory, a cloud storage space, a database, a block chain, or the like, and may be determined based on the actual application scene requirements, which is not limited herein.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A blockchain is essentially a decentralized database, a series of blocks that are generated by correlation using cryptography, each block being used to store video in various formats in various domains.

In some possible embodiments, the cropping time interval of the video to be processed may be determined by a cropping operation of the user.

Specifically, in response to a clipping operation triggered by a user and specific to a video to be processed, a starting time point and an ending time point corresponding to the clipping operation are determined, and a time interval corresponding to the starting time point and the ending time point corresponding to the clipping operation is determined as the clipping time interval of the video to be processed.

The clipping operation may be a clipping operation triggered after a user inputs a start time point and an end time point, a selection operation triggered by the user for selecting the start time point and the end time point for a video to be clipped, or a clipping operation triggered based on a voice instruction and the like input by the user and including the start time point and the end time point, which may be specifically determined based on actual application scene requirements, and is not limited herein.

For example, if a cropping operation of the user for the video to be processed is detected, the start time point may be determined to be 1 min 01 sec and the end time point may be determined to be 1 min 50 sec based on the cropping operation of the user, and then the cropping time interval of the video to be processed may be determined to be a time interval from 1 min 01 sec to 1 min 50 sec.

In some possible embodiments, the cropping time interval of the video to be processed may be determined based on the video pictures.

Specifically, if the video to be processed needs to be cut to obtain the specific video segment in the video to be processed, the start time point and the end time point corresponding to the specific video segment corresponding to the video to be processed can be determined, and then the time interval corresponding to the start time point and the end time point corresponding to the specific video segment can be determined as the cutting time interval of the video to be processed.

The starting time point of the specific video segment may be a time point at which a video picture with the largest occurrence frequency in the video to be processed is played for the first time, and the ending time point of the specific video segment may be a time point at which a video picture with the largest occurrence frequency in the video to be processed is played for the last time.

The starting time point of the specific video segment may be a time point corresponding to a video picture in a subtitle played for the first time and including a specific keyword, and the ending time point of the specific video segment may be a time point corresponding to a video picture in a subtitle played for the last time and including a specific keyword.

The starting time point of the specific video segment may be a time point corresponding to a video picture which is played for the first time and has a similarity higher than a similarity threshold with the specific image, and the ending time point of the specific video segment may be a time point corresponding to a video picture which is played for the last time and has a similarity higher than a similarity threshold with the specific image.

The occurrence frequency of the video pictures in the video to be processed, the subtitles included in the video pictures, and the similarity between the video pictures and the specific image can be determined based on cloud computing and artificial intelligence machine learning modes, specifically can be determined based on actual application scene requirements, and is not limited herein. The cloud Computing is a product of development and fusion of traditional computers and Network Technologies, such as Grid Computing (Grid Computing), distributed Computing (Distributed Computing), parallel Computing (Parallel Computing), utility Computing (Utility Computing), network Storage (Network Storage Technologies), virtualization (Virtualization), load balancing (Load Balance), and the like, and the Computing efficiency for determining the number of occurrences of a video picture, subtitles included in the video picture, and the similarity between the video picture and a specific image in the embodiment of the present application can be improved based on the cloud Computing.

And S22, determining a target image frame sequence according to the cutting time interval and the time stamp of each image frame in the image frame sequence of the video to be processed, and determining a target audio frame sequence according to the cutting time interval and the time stamp of each audio frame in the audio frame sequence of the video to be processed.

In some possible embodiments, the sequence of image frames of the video to be processed is a sequence of frames formed by the image frames constituting the video pictures of the video to be processed. The image frames in the image frame sequence of the video to be processed are arranged according to the sequence of the time stamps, and the time stamps of the image frames are corresponding playing time of the image frames when the image frames are written into the video file. If a certain image frame needs to be played in the 3 rd second when the image frame is written into a video file, the timestamp of the image frame is 3 seconds.

In some possible embodiments, the target image frame sequence corresponding to the video to be processed is a frame sequence formed by image frames corresponding to a cropping video of the video to be processed, and each image frame in the target image frame sequence is a part of the image frame sequence corresponding to the video to be processed. When determining the target image frame sequence, the start time point and the end time point of the target image frame may be determined based on the clipping time interval of the video to be processed and the time stamp of each image frame in the image frame sequence, and the target image frame sequence is determined from the image frame sequence corresponding to the video to be processed according to the start time point and the end time point of the target image frame sequence, that is, the frame sequence of the time interval corresponding to the start time point and the end time point of the target image frame sequence in the image frame sequence of the video to be processed is determined as the target image frame sequence.

Specifically, a timestamp matched with the start time point of the cutting time interval in timestamps of each image frame in the image frame sequence of the video to be processed is determined as the start time point of the target image frame sequence. And determining a time stamp matched with the end time point of the cutting time interval in the time stamps of all the image frames in the image frame sequence of the video to be processed as the end time point of the target image frame sequence.

As an example, if the cropping time interval of the video to be processed is 10 seconds to 20 seconds, that is, a video frame of the video to be processed needs to be cropped for 10 seconds to 20 seconds, a timestamp matching the 10 th second from timestamps of each image frame in the image frame sequence of the video to be processed may be determined and determined as the starting time point of the target image frame sequence, and a timestamp matching the 20 th second from timestamps of each image frame in the image frame sequence of the video to be processed may be determined and determined as the ending time point of the target image frame sequence.

The timestamp matched with the start time point of the cutting time interval may be a timestamp of the image frame corresponding to the start time point of the cutting time interval, and the timestamp matched with the end time point of the cutting time interval may be a timestamp of the image frame corresponding to the end time point of the cutting time interval. If the time stamp of the image frame corresponding to the 10 th second in the image frame sequence of the video to be processed is determined as the starting time point of the target image frame sequence (i.e. the 10 th second), the time stamp of the image frame corresponding to the 20 th second in the image frame sequence of the video to be processed is determined as the ending time point of the target image frame sequence (i.e. the 20 th second).

If the starting time point of the cutting time interval is located between the time stamps of two image frames in the image frame sequence of the image to be processed, the time stamp matched with the starting time point of the cutting time interval can be the time closest to the starting time point of the cutting time interval; if the ending time point of the clipping time interval is located between the time stamps of two image frames of the image frame sequence of the image to be processed, the time stamp matched with the ending time point of the clipping time interval may be the time stamp closest to the ending time point of the clipping time interval.

Referring to fig. 3a, fig. 3a is a schematic view of a scene for determining a target image frame sequence according to an embodiment of the present application. Fig. 3a shows a part of image frames in an image frame sequence of an image to be processed, such as image frames with time stamps of 6.0365s, 7.092042s, 7.134708s, 11.937376s and 12.234322s, respectively. In the case where the start time point of the clipping time interval of the video to be processed is 6.5s, since the start time point of the clipping time interval is located between the time stamp 6.0365s and the time stamp 7.092042s, and the time distance between the time stamp 6.0365s and the start time point 6.5s of the clipping time interval is smaller than the time distance between the time stamp 7.092042s and the start time point 6.5s of the clipping time interval, the time stamp 6.0365s is determined as the start time point of the target image frame sequence.

Similarly, in the case where the end time point of the cropping time interval of the video to be processed is 12s, since the end time point of the cropping time interval is located between the time stamp 11.937376s and the time stamp 12.234322s, and the time distance between the time stamp 11.937376s and the end time point 12s of the cropping time interval is smaller than the time distance between the time stamp 12.234322s and the end time point 12s of the cropping time interval, the time stamp 11.937376s is determined as the end time point of the target image frame sequence.

Based on the start time point and the end time point of the target image frame sequence, an image frame sequence corresponding to a time interval of 6.0365s to 11.937376s in the image frame sequence can be determined, and the image frame sequence of the time interval can be determined as the target image frame sequence.

Alternatively, for most videos, image key frames and image non-key frames often exist in each image frame in the image frame sequence of the video. The image key frame comprises a complete image, and the image key frame can be directly played when needing to be played. The non-key frame of the image does not contain a complete image, and a complete image needs to be restored through an algorithm by means of the front and back adjacent image frames, so that the image can be played. Based on this, in order to make the beginning and the end of the finally obtained clipping video have complete video content, the image key frame in each image frame in the image frame sequence of the video to be processed can be determined, and further the target image frame sequence can be determined based on the clipping time interval and the time stamp of each image key frame in the video to be processed.

The method for determining the time stamp matched with the starting time point of the cutting time interval in the time stamps of all the image frames in the image frame sequence of the video to be processed as the starting time point of the target image frame sequence comprises the following steps:

and determining the timestamp matched with the starting time point of the cutting time interval in the timestamps of all image key frames in the video to be processed as the starting time point of the target image frame sequence.

That is, the timestamp matching the start time point of the cropping time interval, may be a timestamp matching the start time point of the cropping time interval among the timestamps of the respective image key frames in the image frame sequence. For example, a timestamp corresponding to the starting time point of the cropping time interval in the timestamps of the respective image key frames in the image frame sequence, or a timestamp closest to the starting time point of the cropping time interval in the timestamps of the respective image key frames.

Determining a timestamp matched with the end time point of the cutting time interval in the timestamps of the image frames as the end time point of the target image frame sequence, wherein the method comprises the following steps of:

and determining the time stamp matched with the ending time point of the cutting time interval in the time stamps of the image key frames as the ending time point of the target image frame sequence.

That is, the timestamp matching the end time point of the cropping time interval may be a timestamp matching the end time point of the cropping time interval among the timestamps of the respective image key frames in the image frame sequence. Such as a timestamp corresponding to the ending time point of the cropping time interval among the timestamps of the respective image key frames in the image frame sequence, or a timestamp closest to the ending time point of the cropping time interval among the timestamps of the respective image key frames.

Referring to fig. 3b, fig. 3b is a schematic view of another scenario for determining a target image frame sequence according to an embodiment of the present application. Fig. 3b shows part of the image frames in the image frame sequence of the image to be processed, such as the non-key image frames with time stamps of 6.0365s, 7.134708s and 11.937376s, respectively, and the key image frames with time stamps of 7.092042s and 12.234322 s. In the case that the start time point of the cropping time interval of the video to be processed is 6.5s, the start time point of the cropping time interval is located between the timestamp 6.0365s and the timestamp 7.092042s, and if the time distance between the timestamp 7.092042s and the start time point 6.5s of the cropping time interval is the closest, the timestamp 7.092042s is determined as the start time point of the sequence of target image frames.

Similarly, when the ending time point of the cropping time interval of the video to be processed is 12s, the ending time point of the cropping time interval is located between the timestamp 11.937376s and the timestamp 12.234322s, and if the time distance between the timestamp 12.234322s and the ending time point 12s of the cropping time interval is the closest, the timestamp 12.234322s is determined as the ending time point of the target image frame sequence.

Based on the start time point and the end time point of the target image frame sequence, an image frame sequence corresponding to a time interval of 7.092042s to 12.234322s in the image frame sequence may be determined, and the image frame sequence of the time interval may be determined as the target image frame sequence.

In some possible embodiments, the sequence of audio frames of the video to be processed is a sequence of frames formed by audio frames constituting the sound of the video to be processed. The target audio frame sequence corresponding to the video to be processed is a frame sequence formed by audio frames corresponding to the cut video of the video to be processed, and each audio frame in the target audio frame sequence is a part of the audio frame sequence corresponding to the video to be processed. The audio frames in the audio frame sequence of the video to be processed are arranged according to the sequence of the time stamps, and the time stamps of the audio frames are corresponding playing time of the audio frames when the audio frames are written into the video file. If an audio frame needs to be played in the 10 th second when the audio frame is written into a video file, the timestamp of the audio frame is 10 seconds.

When the video to be processed is generated, all image frames of the video to be processed are written into a video file according to a playing sequence to obtain an image frame sequence, all audio frames of the video to be processed are written into the video file according to the playing sequence to obtain an audio frame sequence, and time stamps of all the image frames and all the audio frames correspond to the same time axis. Under the condition that all the image frames and all the audio frames are normally written into the video file, all the image frames in the image frame sequence and all the audio frames in the audio frame sequence of the video to be processed are played according to the sequence of the timestamps, for example, all the image frames in the image frame sequence are played through the video track, and all the audio frames in the audio frame sequence are played through the audio track synchronously, so that the effect of playing sound and picture synchronously is achieved.

Because the acuity of human eyes and human ears is not high, the picture seen by human and the sound heard by human will not disappear in a period of time, therefore even if a certain audio frame and the corresponding image frame are not played in the time stamp sequence strictly, and a certain time difference (for example, less than 0.1 s) exists between the time stamp of the audio frame and the image frame, the playing effect of sound-picture synchronization can also be realized. Referring to fig. 4a, fig. 4a is a schematic view of a scene of a playing sequence of image frames and audio frames according to an embodiment of the present application. Fig. 4a shows a playing sequence of each image frame and each audio frame corresponding to a video relative to the same time axis when the video is played, each image frame corresponding to the video is played relative to the time axis according to a sequence of timestamps, each audio frame corresponding to the video is played relative to the time axis according to a sequence of timestamps, although an audio frame with a timestamp of 0.075s is played after an image frame with a timestamp of 0.1s, a time difference between the two is very small, so that a playing effect of sound-picture synchronization is not affected during actual playing.

In the process of actually writing each image frame and each audio frame of the video to be processed into the video file, a situation that the audio frame is written into the video file before the image frame may occur, so that a situation that sound is played before a video picture occurs when the video to be processed is played. If the audio of the video to be processed is played normally, the image frame which should be played in the 0 th second is played in the 3 rd second due to the writing error. Referring to fig. 4b, fig. 4b is a schematic view of another scene of the playing sequence of the image frames and the audio frames provided in the embodiment of the present application. Under the condition that the audio frames of the video to be processed are written into the video file before the image frames, although the audio frames of the video to be processed are arranged according to the sequence of the timestamps to obtain an audio frame sequence, and the image frames are arranged according to the sequence of the timestamps to obtain an image frame sequence, the actual playing time of each image frame in the image frame sequence is later than the playing time corresponding to the timestamp, so that the audio-video asynchronous playing effect is caused when the video to be processed is played.

In the process of actually writing each image frame and each audio frame of the video to be processed into the video file, a situation that the image frame is written into the video file before the audio frame may occur, so that a situation that the video frame is played before the sound is played occurs when the video to be processed is played. If the video frame of the video to be processed is played normally, the audio frame that should be played in the 0 th second is played in the 3 rd second due to the writing error. Referring to fig. 4c, fig. 4c is a schematic diagram of another scene of a playing sequence of image frames and audio frames provided by the embodiment of the present application. Under the condition that the image frames of the video to be processed are written into the video file before the audio frames, although the audio frame sequences are obtained by arranging the audio frames of the video to be processed according to the sequence of the timestamps and the image frame sequences are obtained by arranging the image frames according to the sequence of the timestamps, the actual playing time of each audio frame in the image frame sequences is later than the playing time corresponding to the timestamp, so that the audio-video asynchronous playing effect is caused when the video to be processed is played.

Based on the above problem, when determining the target audio frame sequence, if the start time point and the end time point corresponding to the target image frame sequence corresponding to the video to be processed are respectively used as the start time point and the end time point corresponding to the target audio frame sequence corresponding to the video to be processed, the sound clip corresponding to the obtained target audio frame sequence and the video picture corresponding to the obtained target image frame sequence may have a non-correspondence condition. Therefore, when determining the target audio frame sequence, the start time point and the end time point of the target audio frame sequence may be determined based on the clipping time interval of the video to be processed and the time stamp of each audio frame in the audio frame sequence, and the target audio frame sequence may be determined from the audio frame sequence corresponding to the video to be processed according to the start time point and the end time point of the target audio frame sequence, that is, the frame sequence of the time interval corresponding to the start time point and the end time point of the target audio frame sequence in the audio frame sequence of the video to be processed is determined as the target audio frame sequence.

Specifically, the timestamp matched with the start time point of the cutting time interval in the timestamps of each audio frame in the audio frame sequence of the video to be processed is determined as the start time point of the target audio frame sequence. And determining the time stamp matched with the end time point of the cutting time interval in the time stamps of all the audio frames in the audio frame sequence of the video to be processed as the end time point of the target audio frame sequence.

As an example, if the clipping time interval of the video to be processed is 0 seconds to 20 seconds, that is, a sound segment of the video to be processed is required to be clipped in 0 seconds to 20 seconds, a timestamp matching with 0 second may be determined from timestamps of each audio frame in the sequence of audio frames of the video to be processed, and the timestamp may be determined as a starting time point of the target sequence of audio frames, and a timestamp matching with 20 second may be determined from timestamps of each audio frame in the sequence of audio frames of the video to be processed, and the timestamp may be determined as an ending time point of the target sequence of audio frames.

The timestamp matched with the start time point of the cutting time interval may be a timestamp of an audio frame corresponding to the start time point of the cutting time interval, and the timestamp matched with the end time point of the cutting time interval may be a timestamp of an audio frame corresponding to the end time point of the cutting time interval. If the time stamp of the audio frame corresponding to the 0 th second in the sequence of audio frames of the video to be processed is determined as the starting time point of the target sequence of audio frames (i.e. the 0 th second), the time stamp of the audio frame corresponding to the 20 th second in the sequence of audio frames of the video to be processed is determined as the ending time point of the target sequence of audio frames (i.e. the 20 th second).

If the starting time point of the clipping time interval is located between the time stamps of two audio frames in the audio frame sequence of the audio to be processed, the time stamp matched with the starting time point of the clipping time interval can be the time closest to the starting time point of the clipping time interval; if the ending time point of the clipping time interval is located between the time stamps of two audio frames of the sequence of audio frames of the audio to be processed, the time stamp matching the ending time point of the clipping time interval may be the time stamp closest to the ending time point of the clipping time interval.

Based on this, since the time stamp of the audio frame of the video to be processed can indicate that the corresponding audio frame actually corresponds to the playing time of the video to be processed, even if the actual playing time of the audio frame of the video to be processed is earlier than the playing time corresponding to the time stamp thereof, or the actual playing time thereof is later than the playing time corresponding to the time stamp thereof, the sound clip corresponding to the target audio frame sequence determined based on the time stamp of each audio frame is the sound clip actually corresponding to the cropping time interval of the video to be processed.

Alternatively, for most videos, as with the video frames of the video, audio key frames and audio non-key frames often exist in each audio frame of the audio frame sequence of the video. The audio key frame comprises complete audio, and the audio key frame can be directly played when the audio key frame needs to be played. The audio non-key frames do not contain complete audio, and the complete audio needs to be restored through an algorithm by means of the audio frames which are adjacent to the audio frames before and after the audio non-key frames, so that the audio is played. Based on the above, in order to enable the beginning and the end of the finally obtained clipped video to have complete audio segments, the audio key frames in each audio frame in the audio frame sequence of the video to be processed can be determined, and then the target audio frame sequence can be determined based on the clipping time interval and the time stamps of each audio key frame in the video to be processed.

Determining a timestamp matched with the starting time point of the cutting time interval in the timestamps of all audio frames in the audio frame sequence of the video to be processed as the starting time point of the target audio frame sequence, wherein the method comprises the following steps of:

and determining the timestamp matched with the starting time point of the cutting time interval in the timestamps of all the audio key frames in the video to be processed as the starting time point of the target audio frame sequence.

That is, the timestamp matching the start time point of the clipping time interval, may be a timestamp matching the start time point of the clipping time interval among the timestamps of the audio key frames in the sequence of audio frames. Such as a timestamp of the timestamps of the audio key frames in the sequence of audio frames corresponding to the starting time point of the clipping time interval or a timestamp of the timestamps of the audio key frames closest to the starting time point of the clipping time interval.

Determining a timestamp matched with the end time point of the cutting time interval in the timestamps of the audio frames as the end time point of the target audio frame sequence, wherein the method comprises the following steps of:

and determining the time stamp matched with the ending time point of the cutting time interval in the time stamps of the audio key frames as the ending time point of the target audio frame sequence.

That is, the timestamp matching the end time point of the clipping time interval may be a timestamp matching the end time point of the clipping time interval among the timestamps of the respective audio key frames in the sequence of audio frames. Such as a timestamp of the timestamps of the audio key frames in the sequence of audio frames corresponding to the ending point in time of the clipping time interval or a timestamp of the timestamps of the audio key frames closest to the ending point in time of the clipping time interval.

And S23, determining the playing positions of the image frames in the target image frame sequence and the audio frames in the target audio sequence in the cut video corresponding to the video to be processed according to the time stamps of the image frames in the target image frame sequence and the time stamps of the audio frames in the target audio sequence.

In practical application, after a video to be processed is cut to obtain a target image frame sequence of a cut video, time stamp synchronization processing is performed on audio frames corresponding to each image frame in the target image frame sequence, so that sound and video pictures in the finally obtained cut video are consistent as much as possible.

Referring to fig. 5a, fig. 5a is a schematic view of a scenario of time stamp synchronization provided in the embodiment of the present application. Fig. 5a shows the playing sequence of each image frame and the corresponding audio frame in the determined target image frame sequence corresponding to the video to be processed, that is, when the video to be processed is played, the corresponding audio frame or image frame is played at the time indicated by the timestamp corresponding to each frame. If the time stamps of the audio frames corresponding to the target image frame sequence are synchronized after the target image frame sequence is determined, for example, the time stamp of the audio frame with the time stamp of 7.092042s is synchronized with the time stamp of 6.160000s of the image frame immediately before the target image frame sequence, and the time stamp of the audio frame with the time stamp of 7.134708s is synchronized with the time stamp of 6.126667s of the image frame immediately before the target image frame sequence, when the image frames in the target image frame sequence are played based on the video track and the corresponding audio frames are played based on the audio track synchronously, the audio frame with the time stamp of 7.092042s and the audio frame with the time stamp of 7.134708s are played in advance, so that the playing time of the sound segment corresponding to the target image frame sequence is advanced.

Referring to fig. 5b, fig. 5b is a schematic diagram of another scenario of time stamp synchronization provided in the embodiment of the present application. Fig. 5b shows the playing sequence of each image frame and each corresponding audio frame in the determined target image frame sequence corresponding to the video to be processed, that is, when the video to be processed is played, the corresponding audio frame or image frame is played at the time indicated by the timestamp corresponding to each frame. If the time stamps of the corresponding audio frames are synchronized after the target image frame sequence is determined, for example, the time stamp of the audio frame with the time stamp of 8.662333s is synchronized with the time stamp of 12.076667s of the image frame immediately before the target image frame sequence, the audio frame with the earlier playing time corresponding to the time stamp is delayed to be played.

In practical applications, if the time stamp synchronization processing is not performed on the audio frames corresponding to the image frames in the target image frame sequence after the video to be processed is cut to obtain the target image frame sequence of the cut video, when the audio and video of the video to be processed are not synchronized, the situation that no sound is played at one end of the cut video after the cut video starts playing or the sound is lost at the latter half of the cut video may be caused. Namely, the obtained cut video still has the condition of sound-picture asynchronism when being played.

Referring to fig. 6a, fig. 6a is a schematic view of a scene of processing an audio frame according to an embodiment of the present application. If the target image frame sequence obtained in fig. 6a is image frames corresponding to 6s to 12s of the video to be processed, if the time stamp synchronization is not performed on the corresponding audio frames after the target image frame sequence is obtained, there are no other audio frames before the audio frame with the time stamp of 7.092042s, which results in a situation where the sound is lost when the final cropped video starts to be played.

Referring to fig. 6b, fig. 6b is a schematic view of another scenario of processing an audio frame according to an embodiment of the present application. If the target image frame sequence obtained in fig. 6b is image frames corresponding to 12s to 18s of the video to be processed, and if the time stamp synchronization is not performed on the corresponding audio frames after the target image frame sequence is obtained, the audio frames with the time stamps smaller than 12s in the audio frames corresponding to the target image frame sequence are discarded when the clip video is obtained, but other audio frames after the audio frame with the time stamp of 14.678333s and before 18s cannot be determined, so that the final clip video has no sound to be played after the audio frame with the time stamp of 14.678333s is played for a while.

Based on this, even if the target audio frame sequence obtained based on step S23 is determined based on the time stamp and the cropping time interval of each audio frame in the video to be processed, when there is picture-in-picture asynchronization in the video to be processed, the finally obtained cropped video may still have a case where there is sound loss or picture-in-picture asynchronization regardless of whether the time stamp synchronization is performed on the audio frames in the target audio frame sequence. Therefore, after the obtained target audio frame sequence and the target image frame sequence are obtained, it is necessary to re-determine the playing positions of the audio frames in the target audio frame sequence and the image frames in the target image frame sequence corresponding to the cropped video, so that the final cropped video can play the sound segments corresponding to the target audio frame sequence, and the sound of the cropped video and the video pictures are played synchronously.

In some possible embodiments, the playing position of each image frame and each audio frame corresponding to the cropped video corresponding to the video to be processed may be determined according to the time stamp of each image frame in the target image frame sequence and the time stamp of each audio frame in the target audio frame sequence. And the playing positions of the image frames and the audio frames corresponding to the cut video are the playing time corresponding to the image frames and the audio frames when the cut video is played.

Since the time stamp of each image frame in the target image frame sequence may represent the playing time of the corresponding image frame in the video to be processed, and the time stamp of each audio frame in the target audio frame sequence may represent the playing time of the corresponding audio frame with respect to the video to be processed (e.g., a certain image frame should be played at the 10 th second of the video to be processed), the playing position of each image frame and each audio frame corresponding to the cropped video may be determined based on the time stamp of each image frame in the target image frame sequence and the time stamp of each audio frame in the target audio frame sequence.

Specifically, the order of the timestamps of each image frame of the target image frame sequence and each audio frame of the target audio frame sequence may be determined as the playing position of the clipped video corresponding to the video to be processed corresponding to each image frame and each audio frame. The time sequence of the time stamps of each image frame and each audio frame is the playing position of each audio frame and each image frame corresponding to the cut video.

In some possible embodiments, when determining the playing position of each image frame in the target image frame sequence and each audio frame in the target audio frame sequence corresponding to the cropped video corresponding to the video to be processed, one image frame may be read from the target image frame sequence and one audio frame may be read from the target audio frame sequence, and the timestamps of the image frame and the audio frame are compared to arrange the frames with smaller timestamps in the preceding frame sequence to obtain one frame sequence. Further, an image frame or an audio frame is read from the target image frame sequence, the image frame or the audio frame is compared with the time stamp of each frame in the previously obtained frame sequence, the image frame or the audio frame is placed in the frame sequence according to the sequence of the time stamps, the steps are repeated until the image frames in the target image frame sequence and the audio frames in the target audio frame sequence are read and sequenced, and the playing position of the clipped video corresponding to the video to be processed in each image frame in the target image frame sequence and each audio frame in the target audio frame sequence can be determined based on the final frame sequence.

And S24, obtaining a cutting video according to the target audio frame sequence, the target image frame sequence and the playing positions corresponding to the image frames in the target image frame sequence and the audio frames in the target audio sequence.

In some possible embodiments, each image frame and each audio frame are synchronously written into the video file to obtain the cropped video according to the target audio frame sequence, the target image frame sequence and the playing positions corresponding to each image frame and each audio frame. That is, each audio frame in the target audio frame sequence and each image frame in the target image frame sequence are synchronously written into the video file according to the playing position corresponding to the cutting video respectively to obtain the cutting video. Therefore, when the cut video is played, the playing positions corresponding to the image frames and the audio frames can be determined based on the video file of the cut video, and the audio frames and the image frames are synchronously played according to the audio frames and the playing positions corresponding to the image frames.

Referring to fig. 7, fig. 7 is a schematic view of a scene for determining a playing position according to an embodiment of the present application. As shown in fig. 7, if the clipping time interval is 6s to 12s, the determined target image frame sequence is as shown in fig. 7 based on the clipping time interval and the time stamp of each image frame in the image frame sequence of the video to be processed, and the determined target image frame sequence is as shown in fig. 7 based on the clipping time interval and the time stamp of each audio frame in the audio frame sequence of the video to be processed. Based on the sequence of the timestamps of the image frames and the audio frames in fig. 7, the image frames and the audio frames may be sorted to obtain the playing positions of the clipped video corresponding to the image frames and the audio frames in fig. 7. Namely, when playing the video to be cut, the image frame with the time stamp of 6.0s, the image frame with the time stamp of 6.0365s, the image frame with the time stamp of 6.093333s, the audio frame with the time stamp of 7.092042s and the like are played in sequence, and when the image frame with the time stamp of 12.030000s is played, the cutting of the video is finished.

In some possible embodiments, after determining the playing position of each image frame in the target image frame sequence and each audio frame in the target audio frame sequence corresponding to the cropped video corresponding to the video to be processed, the time difference between the time stamps of each audio frame and each image frame may be increased on the basis of keeping the relative sizes and the arrangement order of the time stamps of each image frame and each audio frame unchanged, so that the resulting cropped video has the effect of slow (e.g., 0.5 times) playing. Alternatively, the time difference between the time stamps of each audio frame and each image frame can be reduced, so that the resulting cropped video has the effect of fast (e.g., 2 times) playback.

In some possible embodiments, after the cropped video is derived from the sequence of target audio frames and the sequence of target image frames, the cropped video may be played based on the playing order of the image frames and the audio frames corresponding to the cropped video in response to the playing operation for the cropped video.

Based on the video processing method provided by the embodiment of the application, the video recorded in real time (such as a game picture) can be cut to obtain a cut video, the video obtained based on multiple obtaining modes can be intercepted (such as a movie, a TV play and the like), and the user experience is further improved. As shown in fig. 8a, fig. 8a is a schematic view of an application scenario of the video processing method according to the embodiment of the present application. Fig. 8a shows an interface diagram after game match-up is completed, where a game background records a game picture of a user in real time when the user performs game match-up, and after the user completes match-up, the recorded game picture may be cut based on the video processing method provided in the embodiment of the present application, so as to obtain a wonderful moment of the user in game match-up. And the interface after the game-playing technology comprises a playing interface of the cut video at the wonderful moment of the user and plays the cut video for the user. As shown in fig. 8b, fig. 8b is a schematic view of another application scenario of the video processing method according to the embodiment of the present application. In fig. 8b, the video processing method provided based on the embodiment of the present application can clip the video of the game match, so that the player can more specifically perform tactical duplication of the game match based on the clipped video.

In the embodiment of the application, the target image frame sequence is determined by cutting the time interval and the time stamp of each image frame in the image frame sequence of the video to be processed, and the target audio frame sequence is determined by cutting the time interval and the time stamp of each audio frame in the audio frame sequence of the video to be processed, so that the target image frame sequence and the target audio frame sequence simultaneously correspond to the cutting time interval, and the situation that a sound segment corresponding to the target audio frame sequence does not correspond to a video picture corresponding to the target image frame sequence is avoided. Furthermore, after the playing positions of the image frames of the target image frame sequence and the audio frames of the target audio sequence in the cut video corresponding to the video to be processed are determined, the image frames of the target image frame sequence and the audio frames of the target audio sequence can be located at the corresponding positions in the cut video, the condition that the sound and the picture of the cut video are not synchronous is further avoided, the video quality of the cut video is improved, and the applicability is high. Furthermore, based on the method provided by the embodiment of the application, the playing speed of the cut video can be changed while the video is cut, the user experience is improved, and the applicability is high.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application. The video processing apparatus 1 provided in the embodiment of the present application includes:

a clipping time interval determining module 11, configured to determine a clipping time interval of a video to be processed;

a frame sequence determining module 12, configured to determine a target image frame sequence according to the clipping time interval and the timestamp of each image frame in the image frame sequence of the video to be processed, and determine a target audio frame sequence according to the clipping time interval and the timestamp of each audio frame in the audio frame sequence of the video to be processed;

a playing position determining module 13, configured to determine, according to the timestamp of each image frame in the target image frame sequence and the timestamp of each audio frame in the target audio sequence, a playing position of each image frame in the target image frame sequence and a playing position of each audio frame in the target audio sequence in a clipped video corresponding to the video to be processed;

and a clipping video determining module 14, configured to obtain the clipping video according to the target audio frame sequence, the target image frame sequence, and playing positions corresponding to each image frame in the target image frame sequence and each audio frame in the target audio sequence.

In some possible embodiments, the frame sequence determining module 12 is configured to:

determining a timestamp matched with the starting time point of the cutting time interval in the timestamps of all the image frames in the image frame sequence of the video to be processed as the starting time point of the target image frame sequence;

determining a timestamp matched with the ending time point of the cutting time interval in the timestamps of the image frames as the ending time point of the target image frame sequence;

and determining the target image frame sequence from the image frame sequences according to the starting time point and the ending time point of the target image frame sequence.

determining an image key frame in each image frame in the image frame sequence;

determining a timestamp matched with the starting time point of the cutting time interval in the timestamps of the key frames of the images in the video to be processed as the starting time point of the target image frame sequence;

and determining a timestamp matched with the end time point of the cutting time interval in the timestamps of the image key frames as the end time point of the target image frame sequence.

In some possible embodiments, the frame sequence determining module 13 is configured to:

determining a timestamp matched with the starting time point of the cutting time interval in the timestamps of all audio frames in the audio frame sequence of the video to be processed as the starting time point of the target audio frame sequence;

determining a timestamp matched with the ending time point of the cutting time interval in the timestamps of the audio frames as the ending time point of the target audio frame sequence;

and determining the target audio frame sequence from the audio frame sequence according to the starting time point and the ending time point of the target audio frame sequence.

determining audio key frames in each audio frame in the audio frame sequence;

determining a timestamp matched with the starting time point of the cutting time interval in the timestamps of all the audio key frames in the video to be processed as the starting time point of a target audio frame sequence;

In some possible embodiments, the play position determining module 13 is configured to:

and determining the sequence of the time stamps of each image frame of the target image frame sequence and each audio frame of the target audio sequence as the playing position of each image frame in the target image frame sequence and each audio frame in the target audio sequence in the cut video corresponding to the video to be processed.

In some possible embodiments, the cropping video determining module 14 is further configured to:

and responding to the playing operation of the cut video, and playing the cut video according to the playing positions corresponding to each image frame and each audio frame in the cut video.

The video processing apparatus may be a computer program (including a program code) running in a computer device, for example, the video processing apparatus is an application software that can be used to execute the implementation manners provided in the steps in fig. 2, which may specifically refer to the implementation manners provided in the steps, and is not described herein again.

In some possible embodiments, the video processing apparatus provided in this embodiment may be implemented by a combination of hardware and software, and by way of example, the video processing apparatus provided in this embodiment may be a processor in the form of a hardware decoding processor, which is programmed to execute the video processing method provided in this embodiment, for example, the processor in the form of the hardware decoding processor may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, programmable Logic Devices (PLDs), complex Programmable Logic Devices (CPLDs), field Programmable Gate Arrays (FPGAs), or other electronic components.

In some possible embodiments, the video processing apparatus provided in this embodiment of the present application may be implemented in software, and the video processing apparatus shown in fig. 9 may be software in the form of programs and plug-ins, and includes a series of modules, including a cropping time interval determining module 11, a frame sequence determining module 12, and a playing position determining module 13, to crop the video determining module 14. The cropping time interval determining module 11, the frame sequence determining module 12, and the playing position determining module 13 are configured to crop the video determining module 14 to implement the video processing method provided in the embodiment of the present application.

In the embodiment of the application, the target image frame sequence is determined by cutting the time interval and the time stamp of each image frame in the image frame sequence of the video to be processed, and the target audio frame sequence is determined by cutting the time interval and the time stamp of each audio frame in the audio frame sequence of the video to be processed, so that the target image frame sequence and the target audio frame sequence simultaneously correspond to the cutting time interval, and the situation that a sound segment corresponding to the target audio frame sequence does not correspond to a video picture corresponding to the target image frame sequence is avoided. Furthermore, after the playing positions of the image frames of the target image frame sequence and the audio frames of the target audio sequence in the cut video corresponding to the video to be processed are determined, the image frames of the target image frame sequence and the audio frames of the target audio sequence can be located at the corresponding positions in the cut video, the condition that the sound and the picture of the cut video are not synchronous is further avoided, the video quality of the cut video is improved, and the applicability is high.

Referring to fig. 10, fig. 10 is a schematic structural diagram of an electronic device provided in an embodiment of the present application. As shown in fig. 10, the electronic device 1000 in the present embodiment may include: the processor 1001, the network interface 1004, and the memory 1005, and the electronic device 1000 may further include: a user interface 1003, and at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display) and a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a standard wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory, such as at least one disk memory. The memory 1005 may alternatively be at least one memory device located remotely from the processor 1001. As shown in fig. 10, a memory 1005, which is a kind of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.

In the electronic device 1000 shown in fig. 10, the network interface 1004 may provide a network communication function; the user interface 1003 is an interface for providing a user with input; and the processor 1001 may be configured to call a device control application stored in the memory 1005 to implement the video processing method provided by the embodiment of the present application.

It should be understood that in some possible embodiments, the processor 1001 may be a Central Processing Unit (CPU), and the processor may be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field-programmable gate arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The memory may include both read-only memory and random access memory, and provides instructions and data to the processor. The portion of memory may also include non-volatile random access memory. For example, the memory may also store device type information.

In a specific implementation, the electronic device 1000 may execute the implementation manners provided in the steps in fig. 2 through the built-in functional modules, which may specifically refer to the implementation manners provided in the steps, and are not described herein again.

An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and the computer program is executed by a processor to implement the method provided in each step in fig. 2, which may specifically refer to the implementation manner provided in each step, and is not described herein again.

The computer readable storage medium may be the video processing apparatus or an internal storage unit of the electronic device, such as a hard disk or a memory of the electronic device. The computer readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) card, a flash card (flash card), and the like, which are provided on the electronic device. The computer readable storage medium may further include a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), and the like. Further, the computer readable storage medium may also include both an internal storage unit and an external storage device of the electronic device. The computer readable storage medium is used for storing the computer program and other programs and data required by the electronic device. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided by the steps of fig. 2.

The terms "first", "second", and the like in the claims and in the description and drawings of the present application are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or electronic device that comprises a list of steps or elements is not limited to only those steps or elements recited, but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or electronic device. Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments. The term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Those of ordinary skill in the art will appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the components and steps of the various examples have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, which is therefore intended to be covered by the present application with all equivalent modifications made to the claims of the present application.

Claims

1. A method of video processing, the method comprising:

determining a clipping time interval of a video to be processed;

2. The method of claim 1, wherein determining a target image frame sequence based on the cropping time interval and a timestamp of each image frame in the image frame sequence of the video to be processed comprises:

determining a timestamp matched with the starting time point of the cutting time interval in the timestamps of all the image frames in the image frame sequence of the video to be processed as the starting time point of a target image frame sequence;

determining a timestamp matched with the end time point of the cutting time interval in the timestamps of the image frames as the end time point of the target image frame sequence;

and determining the target image frame sequence from the image frame sequence according to the starting time point and the ending time point of the target image frame sequence.

3. The method of claim 2, further comprising:

determining an image key frame in each image frame in the image frame sequence;

the determining, as the starting time point of the target image frame sequence, a timestamp matching the starting time point of the clipping time interval in timestamps of each image frame in the image frame sequence of the video to be processed includes:

determining, as the end time point of the target image frame sequence, a timestamp in the timestamps of the image frames that matches the end time point of the cropping time interval, including:

4. The method according to claim 1, wherein determining a target audio frame sequence according to the clipping time interval and a time stamp of each audio frame in the audio frame sequence of the video to be processed comprises:

determining a timestamp matched with the starting time point of the cutting time interval in the timestamps of all audio frames in the audio frame sequence of the video to be processed as the starting time point of a target audio frame sequence;

determining the target audio frame sequence from the audio frame sequence according to the starting time point and the ending time point of the target audio frame sequence.

5. The method of claim 4, further comprising:

determining an audio key frame in each audio frame in the audio frame sequence;

determining a timestamp matched with the starting time point of the cutting time interval in the timestamps of all audio frames in the audio frame sequence of the video to be processed as the starting time point of the target audio frame sequence, and including:

determining a timestamp matched with the starting time point of the cutting time interval in the timestamps of the audio key frames in the video to be processed as the starting time point of the target audio frame sequence;

determining, as the end time point of the target audio frame sequence, a timestamp in the timestamps of the audio frames that matches the end time point of the clipping time interval, including:

6. The method of claim 1, wherein determining the playing position of each image frame in the target image frame sequence and each audio frame in the target audio sequence in the clip video corresponding to the video to be processed according to the time stamp of each image frame in the target image frame sequence and the time stamp of each audio frame in the target audio sequence comprises:

7. The method of claim 1, further comprising:

8. A video processing apparatus, characterized in that the apparatus comprises:

a playing position determining module, configured to determine, according to the timestamp of each image frame in the target image frame sequence and the timestamp of each audio frame in the target audio sequence, a playing position of each image frame in the target image frame sequence and a playing position of each audio frame in the target audio sequence in a clipped video corresponding to the video to be processed;

and the cutting video determining module is used for obtaining the cutting video according to the target audio frame sequence, the target image frame sequence and the playing positions corresponding to the image frames in the target image frame sequence and the audio frames in the target audio sequence.

9. An electronic device comprising a processor and a memory, the processor and the memory being interconnected;

the memory is used for storing a computer program;

the processor is configured to perform the method of any of claims 1 to 7 when the computer program is invoked.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the method of any one of claims 1 to 7.