CN114630148A

CN114630148A - Video processing method and device

Info

Publication number: CN114630148A
Application number: CN202011442524.8A
Authority: CN
Inventors: 汤然; 王一; 郑龙; 何钧
Original assignee: Shanghai Bilibili Technology Co Ltd
Current assignee: Shanghai Bilibili Technology Co Ltd
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2022-06-14
Anticipated expiration: 2040-12-11
Also published as: CN114630148B

Abstract

The present specification provides a video processing method and apparatus, wherein the video processing method includes: determining the number of audio frames before each video frame of the initial video, and placing the number of audio frames before each video frame in the corresponding video frame to generate a video to be transcoded; sending the video to be transcoded to a preset video transcoding system for video transcoding, and receiving a transcoded video returned by the video transcoding system; and acquiring the number of audio frames placed in each video frame of the transcoded video, and determining the transcoding result of the video to be transcoded based on the number of audio frames placed in each video frame of the transcoded video.

Description

Video processing method and device

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a video processing method. The present specification also relates to a video processing apparatus, a computing device, and a computer-readable storage medium.

Background

At present, in the video industry, transcoding is generally performed on original video of a user, so that videos with different definitions are obtained through transcoding, convenience is brought to the user to selectively play and watch the original video according to definition requirements, but video transcoding can involve reprocessing of the original video of the user, various problems can occur due to compatibility problems of various packaging and encoding protocols in the transcoding process, for example, the problem that sound and pictures are not synchronous and the like can occur in the transcoded video, and the user's video watching experience is greatly influenced.

Disclosure of Invention

In view of this, the embodiments of the present disclosure provide a video processing method. The present specification also relates to a video processing apparatus, a computing device, and a computer-readable storage medium, so as to solve the technical defect in the prior art that audio and video asynchronization occurs in the transcoded video.

According to a first aspect of embodiments of the present specification, there is provided a video processing method including:

determining the number of audio frames before each video frame of the initial video, and placing the number of audio frames before each video frame in the corresponding video frame to generate a video to be transcoded;

sending the video to be transcoded to a preset video transcoding system for video transcoding, and receiving a transcoded video returned by the video transcoding system;

and acquiring the number of audio frames placed in each video frame of the transcoded video, and determining the transcoding result of the video to be transcoded based on the number of audio frames placed in each video frame of the transcoded video.

According to a second aspect of embodiments herein, there is provided a video processing apparatus comprising:

the video generation module is configured to determine the number of audio frames before each video frame of the initial video, place the number of audio frames before each video frame in the corresponding video frame, and generate a video to be transcoded;

the video transcoding module is configured to send the video to be transcoded to a preset video transcoding system for video transcoding and receive a transcoded video returned by the video transcoding system;

the transcoding result determining module is configured to obtain the number of audio frames placed in each video frame of the transcoded video, and determine the transcoding result of the video to be transcoded based on the number of audio frames placed in each video frame of the transcoded video.

According to a third aspect of embodiments herein, there is provided a computing device comprising a memory, a processor and computer instructions stored on the memory and executable on the processor, the processor implementing the steps of the video processing method when executing the instructions.

According to a fourth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of any of the video processing methods.

The video processing method provided by the specification comprises the steps of determining the number of audio frames before each video frame of an initial video, placing the number of audio frames before each video frame in the corresponding video frame, and generating a video to be transcoded; sending the video to be transcoded to a preset video transcoding system for video transcoding, and receiving a transcoded video returned by the video transcoding system; and acquiring the number of audio frames placed in each video frame of the transcoded video, and determining the transcoding result of the video to be transcoded based on the number of audio frames placed in each video frame of the transcoded video.

According to the embodiment of the description, the number of the audio frames before each video frame is transcoded is placed in the corresponding video frame, so that whether the actual number of the audio frames in each video frame in the transcoded video is consistent with the number of the audio frames placed on each video frame is determined, and whether the phenomenon of picture and sound asynchronization occurs after the video is transcoded can be rapidly and accurately determined, so that the phenomenon of picture and sound asynchronization can be further processed in the following process.

Drawings

Fig. 1 is a flowchart of a video processing method according to an embodiment of the present disclosure;

fig. 2 is a schematic view of a video frame of a video to be transcoded in a video processing method provided in an embodiment of the present specification;

fig. 3 is a schematic diagram of a video frame of a transcoded video in a video processing method provided by an embodiment of the present specification;

fig. 4 is a schematic diagram illustrating the number of audio frames per video frame of a transcoded video in a video processing method provided by an embodiment of the present specification;

fig. 5 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present disclosure;

fig. 6 is a block diagram of a computing device according to an embodiment of the present disclosure.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

First, the noun terms to which one or more embodiments of the present specification relate are explained.

Transcoding: and recoding the audio and video.

Frame loss: video transcoding results in the loss of video pictures at some frame level.

Frame dropping: video transcoding results in some frame-level video pictures being lost and supplemented by adjacent frame copies, but video pictures may appear unsmooth during this time period as if stuck.

Video watermarking: some texts or pictures are added on the video picture, such as station logo/logo on the video.

ffmpeg: is a set of open source computer programs which can be used to record, convert digital audio and video and convert them into streams.

AAC (English full name: Advanced Audio Coding, Chinese full name: Advanced Audio Coding): is an MPEG-2 based audio coding technique.

At present, the machine is difficult to actively identify the phenomenon that the audio and video after transcoding is asynchronous, and poor visual experience is provided for the presented video user, so that the phenomenon that the audio and video after transcoding is asynchronous can be quickly identified for the video after transcoding, and the machine can be automatically processed to provide better video experience for the user.

In this specification, a video processing method is provided, and this specification simultaneously relates to a video processing apparatus, a computing device, and a computer-readable storage medium, which are described in detail one by one in the following embodiments.

Referring to fig. 1, fig. 1 shows a flowchart of a video processing method according to an embodiment of the present specification, which specifically includes the following steps:

step 102: determining the number of audio frames before each video frame of the initial video, and placing the number of audio frames before each video frame in the corresponding video frame to generate the video to be transcoded.

The video processing method provided by the embodiment of the application is applied to a scene for detecting the sound and picture quality of a video transcoding system, firstly, a special initial video is generated based on user requirements, the number of audio frames before each video frame in the initial video is determined, and whether the transcoding quality problem exists in the video transcoding system is determined by comparing the number of the audio frames before each video frame in the initial video before transcoding and the number of the audio frames before each video frame in the transcoded video after transcoding; if the number of audio frames before a certain video frame exists in the initial video before transcoding and the transcoded video after transcoding, it is indicated that the video transcoding system has the transcoding quality problem, and the transcoding may have the problems of frame loss, frame dropping, and audio-video asynchronization, and the video transcoding system needs to be repaired.

An initial video may be understood to be any type of video of any length, such as a short video or a video of a television episode, a video of an entertainment program, etc.

In order to detect whether frame loss, frame dropping and synchronization of sound and picture exist in the video, when the video is transcoded for a user in a transcoding system, the initial video needs to be detected for the user, and the conditions of the type, duration and the like of the initial video are not limited at all.

Specifically, the determining the number of audio frames before each video frame of the initial video includes:

and acquiring the initial video, and determining the number of audio frames before each video frame of the initial video by a preset audio frame extraction tool.

The preset audio frame extraction tool may be understood as a computer program, such as an ffmpeg tool, that needs to extract an audio frame in a process of processing video transcoding, and the audio frame may be extracted to determine the number of the audio frames.

In specific implementation, the server acquires a video that needs to detect whether the sound and the picture are synchronous, as an initial video, and determines the number of audio frames before each video frame by using a preset audio frame extraction tool, for example, the number of audio frames before a first video frame in the initial video is 3, the number of audio frames before a second video frame is 9, and the number of audio frames before a third video frame is 13, it should be noted that the number of audio frames before each sequentially arranged video frame is an accumulated statistic, and the number of audio frames is incremental, if the number of audio frames in each video frame in the initial video is not incremental, for example, the number of audio frames before the first video frame is 3, the number of audio frames before the second video frame is 13, and the number of audio frames before the third video frame is 9, the preset audio frame extraction tool may be damaged, the reacquired preset audio frame extraction tool performs audio frame extraction on the initial video to determine the number of audio frames before each video frame.

In the embodiment of the application, the audio frame before each video frame in the initial video is extracted through the audio frame extraction tool, and the number of the audio frames is counted, so that whether the transcoding of the initial video is successful or not is determined according to the number of the audio frames subsequently, and the audio and video synchronization is realized.

In order to further compare the audio frame before each video frame in the initial video with the audio frame before each video frame in the transcoded video, judging whether the number of the audio frames before the corresponding video frames is the same or not, and recording the audio frame before each video frame; specifically, the step of placing the number of audio frames before each video frame in a corresponding video frame to generate a video to be transcoded includes:

and placing the number of the audio frames before each video frame in a preset position of the corresponding video frame according to a watermark mode to generate a video to be transcoded.

The preset position may be understood as a position where the number of audio frames is placed in the video frame in advance, for example, the number of audio frames is placed at any position of an upper left corner, an upper right corner, a middle position, a lower left corner, and the like of the whole video frame, which is not limited in this embodiment of the present application.

It should be noted that the manner of placing the number of audio frames at the preset position of the video frame may be in the form of a watermark, or may be in other recording manners, and is not limited herein.

In specific implementation, the number of audio frames before each video frame is counted in the initial video, the number of audio frames is placed at the preset position of the video frame corresponding to the number of audio frames in a watermark mode, the video with the number of audio frames with watermarks in each video frame is used as the video to be transcoded, and subsequent video transcoding is performed.

Referring to fig. 2, fig. 2 is a schematic view of a video frame of a video to be transcoded in a video processing method provided in an embodiment of the present application.

Fig. 2 is a schematic diagram of a video frame in a video to be transcoded, where an audio frame number value is located at an upper left corner position of the video frame, such as a value 3506 in the diagram, where the number of audio frames before the video frame is 3506, and the value representing the number of audio frames is placed at the upper left corner position of the video frame, it should be noted that each video frame in the video to be transcoded will have the audio frame number value before each video frame placed therein, so as to facilitate subsequent comparison with the number of audio frames before each video frame in the transcoded video.

In the embodiment of the application, the number of the audio frames before each video frame is placed in the video frame, and the formed video is used as the video to be transcoded, so that whether the sound and the picture of the video are synchronous or not can be quickly and accurately determined through the number of the audio frames placed on the video frame subsequently.

Step 104: and sending the video to be transcoded to a preset video transcoding system for video transcoding, and receiving the transcoded video returned by the video transcoding system.

The preset video transcoding system includes any video transcoding system to be subjected to transcoding quality testing, such as an AAC technology, and the like, which is not limited in this specification.

Specifically, after an initial video comprising a plurality of video frames is generated, the initial video is sent to a specific video transcoding system needing transcoding quality testing for transcoding, and then a transcoded video generated after the initial video is transcoded and returned by the video transcoding system is received.

During specific implementation, the number of audio frames before each video frame in the initial video is recorded, the number of the audio frames is placed at a preset position of each video frame, the formed video to be transcoded is sent to a preset video transcoding system to be transcoded for video transcoding, and the transcoded video returned by the video transcoding system is received.

Step 106: and acquiring the number of audio frames placed in each video frame of the transcoded video, and determining the transcoding result of the video to be transcoded based on the number of audio frames placed in each video frame of the transcoded video.

Specifically, after the transcoded video is obtained, the number of audio frames placed in each video frame of the transcoded video is obtained, and a transcoding result of the transcoded video is determined based on the number of audio frames placed in the transcoded video and the actual number of audio frames before each video frame in the transcoded video.

Since the audio and video are not synchronized in the process of initial video transcoding, the transcoded video frames and the audio frames may be disordered, and at the moment, the audio frames before the video frames are counted and compared with the number of the audio frames before each video frame before transcoding, so as to determine the transcoding result of the transcoded video.

In specific implementation, the actual number of audio frames before each video frame in the transcoded video can be recorded, and the actual number of audio frames is placed at a preset position of each video frame in the transcoded video and compared with the number of audio frames in the video to be transcoded.

Referring to fig. 3, fig. 3 is a schematic diagram illustrating a video frame of a transcoded video in a video processing method according to an embodiment of the present application.

Fig. 3 is a schematic diagram of a video frame in the transcoded video, where a current actual audio frame number value is located at an upper left corner position of the video frame, such as a value 3507 in the figure, where the current audio frame number before the video frame is 3507, and the value representing the audio frame number is placed at the upper left corner position of the video frame, it should be noted that the actual audio frame number before each video frame is counted in the transcoded video and placed at a preset position of each video frame in the transcoded video, so as to facilitate the subsequent comparison with the audio frame number in the video to be transcoded.

Comparing the number of each audio frame in the transcoded video with the number of audio frames of the video to be transcoded, wherein the number of each audio frame is placed at the upper left corner of the video to be transcoded before transcoding, determining whether the number of the audio frames before the video frames is different, and further determining whether the video has the phenomenon of sound-picture asynchronization.

In order to record the comparison between the audio frame before the video frame before transcoding and the audio frame before the video frame after transcoding, further determining the transcoding result of the video to be transcoded; specifically, the determining a transcoding result of the video to be transcoded based on the number of audio frames placed in each video frame of the transcoded video includes:

arranging the number of audio frames placed in each video frame of the video to be transcoded frame by frame to form a first number sequence, and arranging the number of audio frames placed in each video frame of the transcoded video frame by frame to form a second number sequence;

comparing the first number of sequences with the second number of sequences, and determining that the video to be transcoded is transcoded successfully under the condition that the first number of sequences and the second number of sequences are all matched.

The first number sequence may be understood as a number sequence formed by arranging the number of audio frames placed in each video frame of the video to be transcoded frame by frame, for example, the number of audio frames placed in each video frame of the video to be transcoded is arranged frame by frame according to the arrangement sequence of all video frames in the video to be transcoded to form a first number sequence [ 3506, 3507, 3510 ].

At the same time, the number of audio frames placed in each video frame of the transcoded video is arranged frame by frame to form a second number sequence, and following the above example, the transcoded video still comprises three video frames arranged in sequence: video frame 1, video frame 2, and video frame 3, then the actual audio frames before each video frame in the transcoded video are arranged frame by frame according to the arrangement order of the video frames, forming a second number sequence [ 3506, 3507, 3510 ].

Comparing the first number sequence with the second number sequence to determine that the first number sequence and the second number sequence are all matched, and at this time, determining that the initial video transcoding is successful, wherein the successful initial video transcoding can indicate that the transcoding quality of the current video transcoding system is good, and the video frames in the transcoded video are the same as those before transcoding, and the situations of frame loss, frame drop and the like do not occur, and the conditions of sound and picture synchronization are met.

In the embodiment of the application, the counted number of the audio frames before each video frame of the initial video is compared with the number of the audio frames before each video frame of the transcoded video, and whether the audio and video synchronization is performed on the transcoding result of the initial video can be rapidly and accurately judged according to the comparison result, so that the user experience is improved.

Specifically, referring to fig. 4, fig. 4 is a schematic diagram illustrating the number of audio frames of each video frame of the transcoded video in a video processing method according to an embodiment of the present application.

The transcoded video of fig. 4 is illustrated by taking three video frames as an example, and includes video frame 1, video frame 2 and video frame 3, where the number of audio frames placed in video frame 1 is 3506, the number of audio frames placed in video frame 2 is 3507, and the number of audio frames placed in video frame 3 is 3510.

The preset position of the audio frame number placed in the video frames is the upper left corner, and the audio frame number is arranged at the upper left corner of each video frame and used for displaying the audio frame number included in each video frame; that is, the upper left corner of video frame 1 is placed with the number of audio frames 3506 contained in video frame 1, the upper left corner of video frame 2 is placed with the number of audio frames 3507 contained in video frame 2, and the upper left corner of video frame 3 is placed with the number of audio frames 3510 contained in video frame 3.

Specifically, the determining that the transcoding of the video to be transcoded is successful includes:

acquiring the number of current audio frames before each video frame of the transcoded video;

and under the condition that the number of the current audio frames is the same as the number of the audio frames placed in each video frame of the transcoded video, determining that the video to be transcoded is transcoded successfully.

In specific implementation, after determining the number of audio frames placed in each video frame in the transcoded video, acquiring the current number of audio frames before each video frame in the transcoded video, comparing the acquired current number of audio frames with the number of audio frames placed at the upper left corner of the video frame, and if the current number of audio frames before video frame 1 is 3506, the current number of audio frames before video frame 2 is 3507, and the current number of audio frames before video frame 3 is 3510, determining that the phenomena of frame loss, frame drop and picture synchronization do not occur in video frame 1, video frame 2 and video frame 3.

In the embodiment of the application, in order to more conveniently determine the number of audio frames contained in each video frame of the transcoded video, the number of audio frames can be placed in each video frame of the transcoded video, and the number of current audio frames before each video frame of the transcoded video is obtained is rapidly and accurately compared with the number of audio frames placed in each video frame of the transcoded video.

In addition, after the aligning the first number of sequences to the second number of sequences, the method further comprises:

under the condition that the first number sequence does not match with the second number sequence, determining video frames corresponding to the number of unmatched first number sequence and the second number sequence, and taking the video frames corresponding to the number of unmatched first number sequence and the number of unmatched second number sequence as verification video frames;

and under the condition that the verification video frame meets the preset transcoding condition, determining that the transcoding of the video to be transcoded is successful.

The preset transcoding condition can be understood as a transcoding video condition which is met after the video to be transcoded is transcoded.

Following the above example, if the transcoded video still includes three video frames arranged in sequence: the method comprises the steps of video frames 1, 3 and 2, wherein the number of audio frames before each video frame in the transcoded video is arranged frame by frame according to the arrangement sequence of all video frames in the transcoded video, and the formed second number sequence is [ 3506, 3510 and 3507 ].

Comparing the first number sequence with the second number sequence to determine that the first number sequence is not matched with the second number sequence, and at this time, it can be determined that the initial video transcoding fails, that is, the sequence of the video frames 2 and 3 after transcoding is disordered, which can indicate that the transcoding quality of the current video transcoding system is not good, the phenomenon of picture synchronization in the transcoded video occurs, and the video transcoding system needs to be repaired subsequently, and by this way, the error problem video frame can be quickly located according to the unmatched number, when the video transcoding system is repaired, problem analysis can be performed based on the error problem video frame, in addition, the comparison can be started from the last video frame, if the current number of audio frames before the last video frame of the transcoded video is not matched with the number of audio frames placed in the video frame to be viewed, it is determined that a phenomenon of picture-in-sound asynchronism may occur and a video transcoding system needs to be repaired subsequently.

In specific implementation, the video frames corresponding to the unmatched number in the first number sequence and the second number sequence are used as verification video frames, whether the verification video frames meet the preset transcoding condition is judged, and the successful transcoding of the video to be transcoded is determined under the condition that the verification video frames meet the transcoding condition.

For example, a video frame to be transcoded includes three video frames arranged in sequence: the method comprises the following steps of 1, 2 and 3, wherein the transcoded video comprises two video frames which are sequentially arranged: video frame 1, video frame 3, then video frame 2 is the lost video frame, regards video frame 2 as the verification video frame, judges whether this video frame 2 satisfies the preset transcoding condition, it should be noted that, predetermine the transcoding condition can be after video 2 loses, can better embody the video effect, then video frame 2 is the video frame that can lose, and perhaps, the video frame after the transcoding includes four video frames of sequential arrangement: the video frame 1, the video frame 2 and the video frame 3 are transcoded repeatedly (i.e. dropped) in the transcoded video frame 2, so as to enhance the video frame effect, i.e. meet the preset transcoding condition.

In practical application, under the condition that the verification video frame meets the preset transcoding condition, even if the transcoded video loses frames, drops frames and the like, the video to be transcoded can be determined to be successfully transcoded, under the condition that the verification video frame does not meet the preset transcoding condition, namely, the transcoded video after being transcoded loses frames and drops frames, the phenomenon that sound and pictures are not synchronous can be determined to occur when the transcoded video is very large, and the number of audio frames before each video frame does not need to be recorded again for comparison.

In the embodiment of the application, the counted number of the audio frames before each video frame of the initial video frame is compared with the number of the audio frames before each video frame in the transcoded video, whether the transcoding result of the initial video is correct or not can be judged quickly and accurately according to the comparison result, whether the transcoded video meets the requirement of a user or not is determined by judging whether the unmatched video meets the preset transcoding condition or not under the condition that the transcoding result of the initial video is inaccurate, and then the transcoding system is repaired pertinently based on the determined condition.

Further, after the aligning the first number of sequences with the second number of sequences, the method further comprises:

under the condition that the last digit magnitude value of the first number sequence is not matched with the last digit magnitude value of the second number sequence, determining the video frames corresponding to the unmatched number in the first number sequence and the second number sequence according to a preset judgment mode;

taking the video frames corresponding to the number of mismatches between the first number sequence and the second number sequence as verification video frames;

The preset determination manner may be understood as a manner of determining video frames corresponding to the number of mismatches between the first number sequence and the second number sequence, for example, a manner of performing determination by using a bisection method, and the preset determination manner is not limited in this specification.

Specifically, the last digit magnitude value in the first number sequence formed by the server based on the video to be transcoded is compared with the last digit magnitude value in the second number sequence formed by the server based on the transcoded video, and under the condition that the last digit magnitude values of the first number sequence and the second number sequence are not matched, the fact that sound and pictures are not synchronous can be preliminarily judged to exist in the transcoded video. And then determining the video frames corresponding to the unmatched number in the first number sequence and the second number sequence according to a preset judgment mode, taking the video frames as verification video frames, and determining that the transcoding of the video to be transcoded is successful under the condition that the verification video frames meet preset transcoding conditions.

For example, if the video to be transcoded includes five video frames: video frames 1, 2, 3, 4 and 5, wherein a first number sequence is [ 1024, 2504, 3406, 3456 and 3510 ], a second number sequence is [ 1024, 2504, 3406, 3510 and 3456 ] is formed by the transcoded video, and the last bit number value [ 3510 ] of the first number sequence is compared with the last bit number value [ 3456 ] of the second number sequence, so that the mismatch of the last bit number values of the first number sequence and the second number sequence can be determined, and the phenomenon that the sound painting of the transcoded video is asynchronous can be judged. The server can adopt a dichotomy to select the number value [ 3406 ] of the video frames 3 in the first number sequence to be compared with the number value [ 3406 ] of the video frames 3 in the second number sequence, determine that the number of the audio frames before the video frames 3 is not unmatched, further compare the number value [ 3456 ] of the video frames 4 in the first number sequence with the number value [ 3510 ] of the video frames 4 in the second number sequence through the dichotomy, further determine that the video frames 4 are the video frames corresponding to the unmatched number of the audio frames, use the video frames 4 as verification video frames, and determine that transcoding of the video to be transcoded is successful if a preset transcoding condition is met.

It should be noted that, in the above example, if the last bit magnitudes of the two are matched, the transcoded video may be transcoded successfully, or may be transcoded unsuccessfully, and in the case of transcoding unsuccessfully, it is continuously determined, frame by frame, whether the number of audio frames before each video frame is matched, and in the case that the number of audio frames before all video frames in the transcoded video is consistent with the number of audio frames placed in each video frame to be transcoded, it is determined that transcoding of the transcoded video is successful.

In the embodiment of the description, whether the transcoded video is matched with the audio frame before the last frame of video frame in the video to be transcoded is judged, and then the video frames corresponding to the unmatched number are determined in a preset judging mode, so that whether the transcoded video is successful can be quickly judged, and the initial positions of the audio and video which are not synchronous can be accurately determined, so that the transcoded video can be conveniently processed subsequently.

Further determining that transcoding is successful and the number of audio frames is still used for determining, specifically, the determining that transcoding of the video to be transcoded is successful includes:

and determining that the transcoding of the video to be transcoded is successful under the condition that the number of the current audio frames is the same as the number of the audio frames placed in each video frame of the transcoded video.

In specific implementation, for a transcoded video, the current number of audio frames before each video frame of the transcoded video is obtained, it needs to be explained that in the process of transcoding by a transcoding system, audio frames may be lost or repeated, so that in the transcoded video after transcoding, under the condition that the number of audio frames before each video frame is inconsistent with the number of audio frames before the corresponding video frame before transcoding, the sound and picture of the transcoded video are not synchronous, and the experience of a user for watching the video is very poor; and under the condition that the number of the current audio frames is the same as the number of the audio frames placed in each video frame of the transcoded video, determining that the transcoding of the video to be transcoded is successful.

For example, after a video to be transcoded is transcoded by a video transcoding system, a transcoded video is obtained, the number of current audio frames before a video frame of the transcoded video is obtained by a preset audio frame extraction tool and is 3501, if the number of audio frames placed in the corresponding video frame in the transcoded video is 3506, it is determined that audio frames may be lost in the transcoded video, and it is determined that the transcoding of the video to be transcoded fails; if the number of the audio frames placed in the corresponding video frames in the transcoded video is 3501, the number of the audio frames placed in the corresponding video frames in the transcoded video is the same as the current number of the audio frames, and the video to be transcoded is determined to be successful.

In the embodiment of the application, the number of the current audio frames before each video frame of the transcoded video is obtained and compared with the number of the audio frames placed in each video frame of the transcoded video, whether the transcoding result of the video to be transcoded is correct or not can be judged quickly and accurately according to the comparison result, and user experience is improved.

In order to better repair a video transcoding system, the audio frame loss degree of a transcoded video can be determined according to the current audio frame before each video frame of the transcoded video and the number of audio frames placed in each video frame in the video to be transcoded; specifically, after obtaining the current number of audio frames before each video frame of the transcoded video, the method further includes:

under the condition that the number of the current audio frames is different from the number of audio frames placed in each video frame of the transcoded video, acquiring the actual number of audio frames of the video frames corresponding to the unmatched number in the first number sequence and the second number sequence and the number of audio frames placed in the video frames;

and determining the audio frame loss degree of the transcoded video based on the difference value between the actual number of the audio frames and the number of the audio frames placed in the video frames and the audio frame duration of the transcoded video.

The audio frame loss degree is a frame loss degree of the video transcoding system for judging that the sound and the picture of the transcoded video are not synchronous, for example, the frame loss rate or the frame dropping rate of audio frames in the transcoded video after transcoding is high, that is, the audio frame loss rate of the audio is high.

Specifically, under the condition that the current number of audio frames of each video frame of the transcoded video is different from the number of audio frames placed in each video frame of the video frame to be transcoded, the actual number of audio frames of the video frames corresponding to the unmatched number in the first number sequence and the second number sequence and the number of audio frames placed in the video frames are obtained, and the difference value of the actual number of audio frames and the number of audio frames placed in the video frames is multiplied by the audio frame time length of each frame of the transcoded video to determine the audio frame loss of the transcoded video.

It should be noted that, in the case that the audio sampling rate is not changed during the process of transcoding the video, the video transcoding system determines that the duration of each audio frame is fixed according to the audio protocol standard, where the audio protocol standard may be the protocol standard of AAC audio.

For example, if the audio frame before the third frame of video frame before transcoding is 10, the actual audio frame before the third frame of video frame after transcoding is 5, and the audio frame duration of each frame of the transcoded video is 2ms, it is determined that the audio frame loss of the transcoded video is (10-5) × 2ms, that is, the audio frame loss is 10.

In the embodiment of the application, the audio frame loss degree of the transcoded video is judged, so that the transcoded audio frame loss degree of the video transcoding system can be further rapidly determined, the video transcoding system can be conveniently repaired subsequently, and better video experience is improved for users.

In summary, the number of the audio frames before each video frame before transcoding is placed in the corresponding video frame, so as to determine whether the actual number of the audio frames in each video frame in the transcoded video is consistent with the number of the audio frames placed on each video frame, and further quickly and accurately determine whether the audio and video are not synchronized after the video is transcoded, and automatically determine whether the transcoding quality problem exists in the video transcoding system, so as to facilitate the subsequent further processing of the audio and video synchronization phenomenon.

Corresponding to the above method embodiment, this specification further provides an embodiment of a video processing apparatus, and fig. 5 shows a schematic structural diagram of a video processing apparatus provided in an embodiment of this specification. As shown in fig. 4, the apparatus includes:

the video generation module 502 is configured to determine the number of audio frames before each video frame of the initial video, place the number of audio frames before each video frame in the corresponding video frame, and generate a video to be transcoded;

the video transcoding module 504 is configured to send the video to be transcoded to a preset video transcoding system for video transcoding, and receive a transcoded video returned by the video transcoding system;

a transcoding result determining module 506, configured to obtain the number of audio frames placed in each video frame of the transcoded video, and determine a transcoding result of the video to be transcoded based on the number of audio frames placed in each video frame of the transcoded video.

Optionally, the transcoding result determining module 506 is further configured to:

Optionally, the apparatus further includes:

an obtaining module configured to obtain a current number of audio frames before each video frame of the transcoded video;

Optionally, the apparatus further includes:

the acquisition module is configured to acquire the initial video and determine the number of audio frames before each video frame of the initial video through a preset audio frame extraction tool.

Optionally, the video generating module 502 is further configured to:

The video processing device provided by the embodiment of the application places the number of the audio frames before each video frame before transcoding into the corresponding video frame to determine whether the number of the actual audio frames in each video frame in the transcoded video is consistent with the number of the audio frames placed on each video frame, so that whether the phenomenon of picture and sound asynchronization occurs after transcoding processing the video can be quickly and accurately determined, and further processing can be conveniently carried out on the subsequent phenomenon of picture and sound asynchronization.

The above is a schematic scheme of a video processing apparatus of the present embodiment. It should be noted that the technical solution of the video processing apparatus belongs to the same concept as the technical solution of the video processing method, and details that are not described in detail in the technical solution of the video processing apparatus can be referred to the description of the technical solution of the video processing method.

Fig. 6 illustrates a block diagram of a computing device 600 provided according to an embodiment of the present description. The components of the computing device 600 include, but are not limited to, a memory 610 and a processor 620. The processor 620 is coupled to the memory 610 via a bus 630 and a database 650 is used to store data.

Computing device 600 also includes access device 640, access device 640 enabling computing device 600 to communicate via one or more networks 660. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 640 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 600, as well as other components not shown in FIG. 6, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 6 is for purposes of example only and is not limiting as to the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 600 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 600 may also be a mobile or stationary server.

Wherein the processor 620 is configured to execute computer-executable instructions that when executed by the processor 620 implement the steps of the video processing method.

The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the video processing method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the video processing method.

An embodiment of the present specification further provides a computer readable storage medium storing computer instructions, which when executed by a processor implement the steps of the video processing method as described above.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the above-mentioned video processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the above-mentioned video processing method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present disclosure is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present disclosure. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for this description.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the specification and its practical application, to thereby enable others skilled in the art to best understand the specification and its practical application. The specification is limited only by the claims and their full scope and equivalents.

Claims

1. A video processing method, comprising:

2. The video processing method according to claim 1, wherein the determining the transcoding result of the video to be transcoded based on the number of audio frames placed in each video frame of the transcoded video comprises:

3. The method of claim 2, wherein the determining that the transcoding of the video to be transcoded is successful comprises:

4. The method of claim 2, wherein after the comparing the first number of sequences to the second number of sequences, further comprising:

and under the condition that the verification video frame meets the preset transcoding condition, determining that the video to be transcoded is transcoded successfully.

5. The method of claim 2, wherein after the comparing the first number of sequences to the second number of sequences, further comprising:

6. The video processing method of claim 4, wherein the determining that the transcoding of the video to be transcoded is successful comprises:

7. The video processing method according to claim 3 or 6, wherein after obtaining the current number of audio frames before each video frame of the transcoded video, the method further comprises:

under the condition that the number of the current audio frames is not the same as the number of the audio frames placed in each video frame of the transcoded video, acquiring the actual number of the audio frames of the video frames corresponding to the unmatched number in the first number sequence and the second number sequence and the number of the audio frames placed in the video frames;

8. The video processing method of claim 1, wherein said determining the number of audio frames before each video frame of the initial video comprises:

9. The method according to claim 1 or 8, wherein the step of placing the number of audio frames before each video frame in the corresponding video frame to generate the video to be transcoded comprises:

10. A video processing apparatus, comprising:

the video generation module is configured to determine the number of audio frames before each video frame of the initial video, place the number of audio frames before each video frame in the corresponding video frame and generate a video to be transcoded;

the transcoding result determining module is configured to obtain the number of the audio frames placed in each video frame of the transcoded video, and determine the transcoding result of the video to be transcoded based on the number of the audio frames placed in each video frame of the transcoded video.

11. A computing device comprising a memory, a processor and computer instructions stored on the memory and executable on the processor, wherein the processor when executing the instructions implements the steps of the video processing method of any of claims 1 to 9.

12. A computer-readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the video processing method of any one of claims 1 to 9.