CN114630148B

CN114630148B - Video processing method and device

Info

Publication number: CN114630148B
Application number: CN202011442524.8A
Authority: CN
Inventors: 汤然; 王一; 郑龙; 何钧
Original assignee: Shanghai Bilibili Technology Co Ltd
Current assignee: Shanghai Bilibili Technology Co Ltd
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2023-11-14
Anticipated expiration: 2040-12-11
Also published as: CN114630148A

Abstract

The specification provides a video processing method and device, wherein the video processing method comprises the following steps: determining the number of audio frames before each video frame of an initial video, and placing the number of audio frames before each video frame in a corresponding video frame to generate a video to be transcoded; transmitting the video to be transcoded to a preset video transcoding system for video transcoding, and receiving transcoded video returned by the video transcoding system; and acquiring the number of the audio frames placed in each video frame of the transcoded video, and determining the transcoding result of the video to be transcoded based on the number of the audio frames placed in each video frame of the transcoded video.

Description

Video processing method and device

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a video processing method. The present description is also directed to a video processing apparatus, a computing device, and a computer-readable storage medium.

Background

At present, in the video industry, the original video of a user is generally transcoded to obtain videos with different definitions through transcoding, so that the user can selectively play and watch according to the definition requirements, but the video transcoding can involve reprocessing the original video of the user, because of the compatibility problems of various packaging and coding protocols in the transcoding process, various problems can occur, such as the problems of asynchronous audio and video and the like of the transcoded video, and the video watching experience of the user is greatly influenced.

Disclosure of Invention

In view of this, the present embodiments provide a video processing method. The present disclosure relates to a video processing apparatus, a computing device, and a computer readable storage medium, for solving the technical defect that in the prior art, audio and video asynchronization occurs in transcoded video.

According to a first aspect of embodiments of the present specification, there is provided a video processing method, including:

determining the number of audio frames before each video frame of an initial video, and placing the number of audio frames before each video frame in a corresponding video frame to generate a video to be transcoded;

transmitting the video to be transcoded to a preset video transcoding system for video transcoding, and receiving transcoded video returned by the video transcoding system;

and acquiring the number of the audio frames placed in each video frame of the transcoded video, and determining the transcoding result of the video to be transcoded based on the number of the audio frames placed in each video frame of the transcoded video.

According to a second aspect of embodiments of the present specification, there is provided a video processing apparatus comprising:

the video generation module is configured to determine the number of audio frames before each video frame of the initial video, and place the number of audio frames before each video frame in the corresponding video frame to generate a video to be transcoded;

The video transcoding module is configured to send the video to be transcoded to a preset video transcoding system to perform video transcoding and receive transcoded video returned by the video transcoding system;

the transcoding result determining module is configured to acquire the number of audio frames placed in each video frame of the transcoded video, and determine the transcoding result of the video to be transcoded based on the number of audio frames placed in each video frame of the transcoded video.

According to a third aspect of embodiments of the present specification, there is provided a computing device comprising a memory, a processor and computer instructions stored on the memory and executable on the processor, the processor implementing the steps of the video processing method when executing the instructions.

According to a fourth aspect of embodiments of the present specification, there is provided a computer readable storage medium storing computer executable instructions which, when executed by a processor, implement the steps of any of the video processing methods.

The video processing method comprises the steps of determining the number of audio frames before each video frame of an initial video, and placing the number of audio frames before each video frame in a corresponding video frame to generate a video to be transcoded; transmitting the video to be transcoded to a preset video transcoding system for video transcoding, and receiving transcoded video returned by the video transcoding system; and acquiring the number of the audio frames placed in each video frame of the transcoded video, and determining the transcoding result of the video to be transcoded based on the number of the audio frames placed in each video frame of the transcoded video.

According to the embodiment of the specification, the number of the audio frames before each video frame is placed in the corresponding video frame before transcoding, so that whether the actual number of the audio frames in each video frame in the transcoded video is consistent with the number of the audio frames placed on each video frame or not is determined, and whether the phenomenon of asynchronous audio and video occurs or not after the video is transcoded can be determined rapidly and accurately, and further processing is conducted on the phenomenon of asynchronous audio and video conveniently.

Drawings

FIG. 1 is a flow chart of a video processing method according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a video frame of a video to be transcoded in a video processing method according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of a video frame of a transcoded video in a video processing method according to an embodiment of the present disclosure;

fig. 4 is a schematic diagram of an audio frame number of each video frame of a transcoded video in a video processing method according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present disclosure;

FIG. 6 is a block diagram of a computing device according to one embodiment of the present disclosure.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many other forms than described herein and similarly generalized by those skilled in the art to whom this disclosure pertains without departing from the spirit of the disclosure and, therefore, this disclosure is not limited by the specific implementations disclosed below.

The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of this specification to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

First, terms related to one or more embodiments of the present specification will be explained.

Transcoding: and recoding the audio and video.

And (3) frame loss: video transcoding results in the loss of video pictures at some frame level.

Dropping frames: video transcoding results in some frame-level video pictures being lost and complemented by adjacent frame copies, but the video pictures may appear to be unsmooth during this time period as if stuck.

Video watermarking: some characters or pictures are added on the video picture, such as station logos/logo and the like on the video.

ffmpeg: is an open source computer program that can be used to record, convert digital audio, video, and convert it into streams.

AAC (english full name: advanced Audio Coding, chinese full name: advanced audio coding): is an MPEG-2 based audio coding technique.

At present, the phenomenon of asynchronous audio and video after video transcoding is difficult to actively identify by a machine, and has poor visual experience for a presented video user, so that the phenomenon of asynchronous audio and video after transcoding is required to be rapidly identified for the transcoded video, and the machine automation processing can be realized to better video experience for the user.

In the present specification, a video processing method is provided, and the present specification relates to a video processing apparatus, a computing device, and a computer-readable storage medium, one by one, as described in detail in the following embodiments.

Referring to fig. 1, fig. 1 shows a flowchart of a video processing method according to an embodiment of the present disclosure, which specifically includes the following steps:

step 102: and determining the number of audio frames before each video frame of the initial video, and placing the number of audio frames before each video frame in the corresponding video frame to generate the video to be transcoded.

The video processing method provided by the embodiment of the application is applied to an audio-video quality detection scene of a video transcoding system, firstly, a special initial video is generated based on user requirements, the number of audio frames before each video frame in the initial video is determined, and whether the transcoding quality problem exists in the video transcoding system is determined by comparing whether the number of audio frames before each video frame in the initial video before transcoding and the number of audio frames before each video frame in the transcoded video are different or not, for example, if the number of audio frames before each video frame in the initial video before transcoding and the number of audio frames before each video frame in the transcoded video are the same, the transcoding is successful; if the number of audio frames before a certain video frame exists in the initial video before transcoding and the transcoded video after transcoding, the problem of transcoding quality exists in the video transcoding system is indicated, the problems of frame loss, frame dropping, asynchronous audio and video and the like possibly exist in transcoding, and then the video transcoding system needs to be repaired.

The initial video may be understood as any type of video of any length, such as a short video or a television episode video, an entertainment video, etc.

In order to detect whether frame loss, frame dropping and audio and video synchronization exist in the video, when a user transcodes the video in a transcoding system, the user needs to detect an initial video, and the application does not limit the conditions such as the type, the duration and the like of the initial video.

Specifically, the determining the number of audio frames before each video frame of the initial video includes:

and acquiring the initial video, and determining the number of audio frames before each video frame of the initial video through a preset audio frame extraction tool.

The preset audio frame extraction tool may be understood as a computer program, such as a ffmpeg tool, that is required to extract audio frames during the process of video transcoding, and may extract audio frames to determine the number of audio frames.

In implementation, the server obtains the video to be detected for synchronization of the audio and video, and determines, as an initial video, the number of audio frames before each video frame by a preset audio frame extraction tool, for example, the number of audio frames before the first video frame in the initial video is 3, the number of audio frames before the second video frame is 9, the number of audio frames before the third video frame is 13, it should be noted that the number of audio frames before each video frame arranged in sequence is an accumulation statistic, the number of audio frames is increased, and if the number of audio frames in each video frame appears in the initial video is not increased, for example, the number of audio frames before the first video frame is 3, the number of audio frames before the second video frame is 13, and the number of audio frames before the third video frame is 9, then the preset audio frame extraction tool may damage, and the preset audio frame extraction tool may be reacquired to perform audio frame extraction on the initial video to determine the number of audio frames before each video frame.

In the embodiment of the application, the audio frames before each video frame in the initial video are extracted through the audio frame extraction tool, and the number of the audio frames is counted, so that whether the initial video is successfully transcoded or not is conveniently determined according to the number of the audio frames, and the audio and video synchronization of the video is realized.

In order to further compare the audio frame before each video frame in the initial video with the audio frame before each video frame in the transcoded video, judging whether the number of the audio frames before the corresponding video frames is the same or not, and recording the audio frames before each video frame; specifically, the step of placing the number of audio frames before each video frame in the corresponding video frame to generate the video to be transcoded includes:

and placing the number of audio frames before each video frame in a preset position of the corresponding video frame in a watermark manner to generate a video to be transcoded.

The preset position may be understood as a position where the number of audio frames is placed in the video frame in advance, for example, the number of audio frames is placed in any position of the upper left corner, the upper right corner, the middle, the lower left corner, etc. of the whole video frame, which is not limited in any way in the embodiment of the present application.

It should be noted that the manner of placing the number of audio frames at the preset position of the video frame may be in the form of a watermark, or may be in the form of other recording manners, which is not limited in any way.

In the implementation, the number of audio frames before each video frame is counted in an initial video, the number of audio frames is placed at a preset position of a video frame corresponding to the number of audio frames in a watermark mode, and the video with the number of watermark audio frames in each video frame is used as a video to be transcoded, so that subsequent video transcoding is performed.

Referring to fig. 2, fig. 2 is a schematic diagram of a video frame of a video to be transcoded in a video processing method according to an embodiment of the present application.

Fig. 2 is a schematic diagram of a video frame in a video to be transcoded, where the video frame has an audio frame number value at the upper left corner, such as the value 3506 in the figure, indicating that the number of audio frames before the video frame is 3506, and the value indicating the number of audio frames is placed at the upper left corner of the video frame, where it is noted that each video frame in the video to be transcoded will have an audio frame number value before each video frame, so as to facilitate the subsequent comparison with the number of audio frames before each video frame in the transcoded video.

In the embodiment of the application, the number of the audio frames before each video frame is placed in the video frame, so that the formed video is used as the video to be transcoded, and whether the audios and the videos of the video are synchronous or not can be rapidly and accurately determined through the number of the audio frames placed on the video frames.

Step 104: and sending the video to be transcoded to a preset video transcoding system for video transcoding, and receiving transcoded video returned by the video transcoding system.

The preset video transcoding system includes any video transcoding system to be subjected to a transcoding quality test, for example, AAC technology, and the present specification does not limit the foregoing.

Specifically, after an initial video including a plurality of video frames is generated, the initial video is sent to a specific video transcoding system needing to be subjected to transcoding quality test to be transcoded, and then transcoded video generated after the initial video is transcoded and returned by the video transcoding system is received.

In implementation, the number of audio frames before each video frame in the initial video is recorded, the number of audio frames is placed at a preset position of each video frame, the formed video to be transcoded is sent to a preset video transcoding system to perform video transcoding, and transcoded video returned by the video transcoding system is received.

Step 106: and acquiring the number of the audio frames placed in each video frame of the transcoded video, and determining the transcoding result of the video to be transcoded based on the number of the audio frames placed in each video frame of the transcoded video.

Specifically, after the transcoded video is obtained, the number of audio frames placed in each video frame of the transcoded video is obtained, and a transcoding result of the transcoded video is determined based on the number of audio frames placed in the transcoded video and the actual number of audio frames before each video frame in the transcoded video.

Because the phenomenon of asynchronous audio and video can occur in the process of transcoding the initial video, the transcoded video frames and the audio frames can be disordered after transcoding, and statistics is carried out on the audio frames before the video frames at the moment, and the number of the audio frames before each video frame before transcoding is compared to determine the transcoding result of the transcoded video.

In implementation, the number of actual audio frames before each video frame in the transcoded video can be recorded and placed at a preset position of each video frame in the transcoded video, and compared with the number of audio frames in the video to be transcoded.

Referring to fig. 3, fig. 3 is a schematic diagram illustrating a video frame of a transcoded video in a video processing method according to an embodiment of the present application.

Fig. 3 is a schematic diagram of a video frame in a transcoded video, where the top left corner of the video frame has a current actual audio frame number, such as a value 3507 in the figure, indicating that the number of current audio frames before the video frame is 3507, and the value indicating the number of audio frames is placed at the top left corner of the video frame, where it is noted that the actual number of audio frames before each video frame is counted in the transcoded video and placed at a preset position of each video frame in the transcoded video, so as to facilitate the subsequent comparison with the number of audio frames in the video to be transcoded.

Comparing the number of each audio frame in the transcoded video with the number of audio frames of the video to be transcoded before transcoding, which is placed at the upper left corner, and determining whether the number of the audio frames before the video frames is different, so as to determine whether the video is asynchronous in sound and picture.

In order to record the comparison of the audio frame before the video frame before transcoding and the audio frame before the video frame after transcoding, the transcoding result of the video to be transcoded is further determined; specifically, the determining the transcoding result of the video to be transcoded based on the number of audio frames placed in each video frame of the transcoded video includes:

Arranging the number of audio frames placed in each video frame of the video to be transcoded frame by frame to form a first number sequence, and arranging the number of audio frames placed in each video frame of the transcoded video frame by frame to form a second number sequence;

comparing the first number of sequences with the second number of sequences, and determining that the video to be transcoded is transcoded successfully under the condition that the first number of sequences is completely matched with the second number of sequences.

The first number sequence may be understood as a number sequence formed by arranging the number of audio frames placed in each video frame of the video to be transcoded frame by frame, for example, the number of audio frames placed in each video frame of the video to be transcoded is arranged frame by frame according to the arrangement order of all video frames in the video to be transcoded, so as to form a first number sequence [ 3506, 3507, 3510 ].

Meanwhile, the number of audio frames placed in each video frame of the transcoded video is arranged frame by frame to form a second number of sequences, and by the above example, the transcoded video still includes three video frames arranged in sequence: video frame 1, video frame 2 and video frame 3, then the actual audio frames preceding each video frame in the transcoded video are arranged frame by frame in the order of arrangement of the video frames, forming a second number of sequences [ 3506, 3507, 3510 ].

Comparing the first number sequence with the second number sequence, it can be determined that the first number sequence and the second number sequence are all matched, at this time, it can be determined that the initial video transcoding is successful, and the success of the initial video transcoding can indicate that the transcoding quality of the current video transcoding system is good, the video frames in the transcoded video are the same as those before transcoding, and the conditions of frame loss, frame dropping and the like do not occur, and the condition of audio-video synchronization is met.

In the embodiment of the application, the number of the audio frames included before each video frame of the counted initial video is compared with the number of the audio frames before each video frame of the transcoded video, and whether the transcoding result of the initial video is synchronous with the audio and the video can be rapidly and accurately judged according to the comparison result so as to improve the user experience.

Specifically, referring to fig. 4, fig. 4 is a schematic diagram showing the number of audio frames of each video frame of a transcoded video in a video processing method according to an embodiment of the present application.

The transcoded video of fig. 4 is illustrated by taking three video frames as an example, including a video frame 1, a video frame 2, and a video frame 3, wherein the number of audio frames placed in the video frame 1 is 3506, the number of audio frames placed in the video frame 2 is 3507, and the number of audio frames placed in the video frame 3 is 3510.

The audio frame number is arranged at the left upper corner of each video frame and used for displaying the audio frame number included in each video frame; that is, the upper left corner of video frame 1 is placed by the number of audio frames 3506 contained in video frame 1, the upper left corner of video frame 2 is placed by the number of audio frames 3507 contained in video frame 2, and the upper left corner of video frame 3 is placed by the number of audio frames 3510 contained in video frame 3.

Specifically, the determining that the video to be transcoded is transcoded successfully includes:

acquiring the current audio frame number before each video frame of the transcoded video;

and under the condition that the number of the current audio frames is the same as the number of the audio frames placed in each video frame of the transcoded video, determining that the video to be transcoded is successful.

In specific implementation, after determining the number of audio frames placed in each video frame in the transcoded video, acquiring the number of current audio frames before each video frame in the transcoded video, comparing the acquired number of current audio frames with the number of audio frames placed in the upper left corner of the video frame, if the number of current audio frames before the video frame 1 is 3506, the number of current audio frames before the video frame 2 is 3507, and the number of current audio frames before the video frame 3 is 3510, determining that the phenomena of frame loss, frame dropping and asynchronous audio and video do not occur in the video frame 1, the video frame 2 and the video frame 3.

In the embodiment of the application, in order to more conveniently determine the number of audio frames contained in each video frame of the transcoded video, the number of audio frames can be placed in each video frame of the transcoded video, and the current number of audio frames before each video frame of the transcoded video is acquired is rapidly and accurately compared with the number of audio frames placed in each video frame of the transcoded video.

Furthermore, after the aligning the first number of sequences with the second number of sequences, the method further includes:

under the condition that the first number of sequences is not matched with the second number of sequences, determining video frames corresponding to the unmatched numbers in the first number of sequences and the second number of sequences, and taking the video frames corresponding to the unmatched numbers in the first number of sequences and the second number of sequences as verification video frames;

and under the condition that the verification video frame meets the preset transcoding condition, determining that the video to be transcoded is transcoded successfully.

The preset transcoding conditions may be understood as transcoding video conditions satisfied after the preset video to be transcoded is transcoded.

Along the above example, if the transcoded video still includes three video frames in sequence: the number of audio frames before each video frame in the transcoded video is arranged frame by frame according to the arrangement sequence of all video frames in the transcoded video, and the formed second number sequence is [ 3506, 3510, 3507 ].

The first number sequence is compared with the second number sequence, the first number sequence and the second number sequence are not matched, at the moment, the failure of initial video transcoding can be determined, namely, the sequence of a video frame 2 and a video frame 3 after transcoding is disordered, the phenomenon that the audio and the video are not synchronous in the transcoded video can be indicated to be bad in transcoding quality of the current video transcoding system, the video transcoding system is required to be repaired later, the wrong problem video frame can be rapidly positioned according to the unmatched numbers in the mode, when the video transcoding system is repaired, problem analysis can be performed based on the wrong problem video frame, in addition, the comparison can be started from the last video frame, if the number of the current audio frame before the first video frame of the transcoded video is not matched with the number of the audio frame placed in the video frame to be transcoded, the phenomenon that the audio and the video are not synchronous can be possibly generated is determined, and the video transcoding system is required to be repaired later.

In the implementation, the video frames corresponding to the unmatched numbers in the first number of sequences and the second number of sequences are used as verification video frames, whether the verification video frames meet preset transcoding conditions is judged, and under the condition that the verification video frames meet the transcoding conditions, the video to be transcoded is determined to be transcoded successfully.

For example, a video frame to be transcoded includes three video frames in a sequential order: video frame 1, video frame 2, and video frame 3, the transcoded video comprising two video frames arranged in sequence: if the video frame 1 and the video frame 3 are lost video frames, the video frame 2 is taken as a verification video frame, and whether the video frame 2 meets the preset transcoding condition is judged, wherein the preset transcoding condition can be that the video effect can be better reflected after the video frame 2 is lost, the video frame 2 is a video frame which can be lost, or the transcoded video frame comprises four video frames which are sequentially arranged: video frame 1, video frame 2 and video frame 3, then the transcoded video frame after the transcoding of video frame 2 is repeated (i.e. frame dropping) in order to enhance the effect of the video frame, i.e. to meet the preset transcoding conditions.

In practical application, under the condition that the verification video frame meets the preset transcoding condition, even if the transcoded video has the conditions of frame loss, frame dropping and the like, the success of transcoding the video to be transcoded can be determined, under the condition that the verification video frame does not meet the preset transcoding condition, namely, the transcoded video after transcoding has the conditions of frame loss and frame dropping, the phenomenon that the transcoded video is possibly asynchronous in audio and video can be determined, and the number of audio frames before each video frame is recorded again is not required to be compared.

In the embodiment of the application, the number of the counted audio frames before each video frame of the initial video frame is compared with the number of the counted audio frames before each video frame in the transcoded video, whether the transcoding result of the initial video is correct can be rapidly and accurately judged according to the comparison result, and under the condition that the transcoding result of the initial video is inaccurate, whether the transcoded video meets the requirement of a user is determined by judging whether the unmatched video meets the preset transcoding condition or not, and then the transcoding system is restored in a targeted manner based on the determined condition.

Further, after the aligning the first number of sequences with the second number of sequences, the method further includes:

under the condition that the last digit magnitude of the first number sequence is not matched with the last digit magnitude of the second number sequence, determining video frames corresponding to the unmatched numbers in the first number sequence and the second number sequence according to a preset judging mode;

taking the video frames corresponding to the unmatched numbers of the first number of sequences and the second number of sequences as verification video frames;

The preset determining manner may be understood as a manner of determining the video frames corresponding to the number of mismatches in the first number of sequences and the second number of sequences, for example, determining by a dichotomy or the like, which is not limited in this specification.

Specifically, comparing the last digit value in the first digit sequence formed by the server based on the video to be transcoded with the last digit value in the second digit sequence formed based on the transcoded video, and under the condition that the last digit values of the first digit sequence and the last digit value are not matched, primarily judging that the transcoded video possibly has the phenomenon of sound and picture asynchronism. And then, video frames corresponding to the unmatched numbers in the first number of sequences and the second number of sequences can be determined according to a preset judging mode, the video frames are used as verification video frames, and the video to be transcoded is determined to be transcoded successfully under the condition that the verification video frames meet preset transcoding conditions.

For example, if the video to be transcoded includes five video frames: the first number of sequences formed by the video frames 1, 2, 3, 4 and 5 are [ 1024, 2504, 3406, 3456 and 3510 ], the second number of sequences formed by the transcoded video is [ 1024, 2504, 3406, 3510 and 3456 ], and the mismatching of the last number of the first number of sequences and the last number of the second number of sequences can be determined by comparing the last number of the first number of sequences [ 3510 ] with the last number of the second number of sequences [ 3456 ], so that the possible phenomenon of asynchronous sound and picture of the transcoded video can be judged. The server may adopt a dichotomy to select the number value [ 3406 ] of the video frame 3 in the first number sequence to compare with the number value [ 3406 ] of the video frame 3 in the second number sequence, determine that the number of audio frames is not mismatched before the video frame 3, further compare the number value [ 3456 ] of the video frame 4 in the first number sequence with the number value [ 3510 ] of the video frame 4 in the second number sequence by the dichotomy, further determine that the video frame 4 is a video frame corresponding to the number of mismatched audio frames, and use the video frame 4 as a verification video frame, and determine that the transcoding of the video to be transcoded is successful if the preset transcoding condition is satisfied.

It should be noted that if the last bit values of the two in the above examples match, the transcoded video may be transcoded successfully, or may be transcoded failed, and if the transcoded video fails, it is continuously determined whether the number of audio frames before each video frame matches or not frame by frame, and if the number of audio frames before all video frames in the transcoded video is consistent with the number of audio frames placed in each video frame to be transcoded, then it may be determined that the transcoded video is transcoded successfully.

In the embodiment of the specification, whether the transcoded video is successful or not can be rapidly determined, and the initial position of asynchronous audios and videos can be accurately determined by judging whether the transcoded video is matched with the audio frame before the last frame of video frames in the video to be transcoded or not and further determining the video frames corresponding to the unmatched quantity in a preset judging mode, so that the subsequent processing of the transcoded video is facilitated.

Further determining that transcoding is successful still using the audio frame number, specifically, determining that transcoding of the video to be transcoded is successful includes:

In the implementation, for the transcoded video, the current audio frame number before each video frame of the transcoded video is obtained, and it is to be noted that in the transcoding process of the transcoded system, the audio frame may be lost or repeated, so in the transcoded video after transcoding, if the audio frame number before each video frame is inconsistent with the audio frame number before the corresponding video frame before transcoding, the audio and video of the transcoded video are not synchronous, and the experience of the user for watching the video is poor; and under the condition that the number of the current audio frames is the same as the number of the audio frames placed by each video frame of the transcoded video, determining that the video to be transcoded is transcoded successfully.

For example, after a video transcoding system transcodes a video to be transcoded, obtaining transcoded video, acquiring the current number of audio frames before one video frame of the transcoded video through a preset audio frame extraction tool as 3501, if the number of audio frames placed in the corresponding video frame in the transcoded video is 3506, determining that audio frames may be lost in the transcoded video, and determining that the transcoding of the video to be transcoded fails; if the number of the audio frames placed in the corresponding video frames in the transcoded video is 3501, determining that the video to be transcoded is successful if the number of the audio frames placed in the corresponding video frames in the transcoded video is the same as the current number of the audio frames.

In the embodiment of the application, by acquiring the current audio frame number before each video frame of the transcoded video, the number of the current audio frame number is compared with the number of the audio frames placed in each video frame of the transcoded video, whether the transcoded result of the video to be transcoded is correct can be rapidly and accurately judged according to the comparison result, and the user experience is improved.

In order to better repair the video transcoding system, the audio frame loss degree of the transcoded video can be determined according to the current audio frame before each video frame of the transcoded video and the number of audio frames placed in each video frame in the video to be transcoded; specifically, after the current number of audio frames before each video frame of the transcoded video is obtained, the method further includes:

acquiring the actual audio frame number of the video frames corresponding to the unmatched number in the first number sequence and the second number sequence and the audio frame number placed in the video frames under the condition that the current audio frame number is different from the audio frame number placed in each video frame of the transcoded video;

and determining the audio frame loss degree of the transcoded video based on the difference value between the actual audio frame number and the audio frame number placed in the video frame and the audio frame duration of the transcoded video.

The audio frame loss degree is a frame loss degree for judging that the video transcoding system is asynchronous to the audio and the video of the transcoded video, for example, the frame loss or the frame loss rate of the audio frames in the transcoded video after transcoding is high, and the audio frame loss rate of the audio can be determined to be high.

Specifically, under the condition that the number of current audio frames of each video frame of the transcoded video is different from the number of audio frames placed in each video frame of the video frame to be transcoded, acquiring the actual number of audio frames of the video frames corresponding to the unmatched number in the first number sequence and the second number sequence and the number of audio frames placed in the video frames, multiplying the difference value of the actual number of audio frames and the number of audio frames placed in the video frames by the duration of the audio frames of each frame of the transcoded video, and determining the audio frame loss degree of the transcoded video.

It should be noted that, in the case that the video transcoding system does not change the audio sampling rate during the video transcoding process, the duration of each audio frame is determined to be fixed according to the audio protocol standard, where the audio protocol standard may be the AAC audio protocol standard.

For example, if the audio frame before the third frame of video frame before transcoding is 10, the actual audio frame before the third frame of video frame after transcoding is 5, and the audio frame duration of each frame of the transcoded video is 2ms, then it is determined that the audio frame loss of the transcoded video is (10-5) ×2ms, that is, the audio frame loss is 10.

In the embodiment of the application, the audio frame loss degree of the transcoded video is judged, so that the transcoded audio frame loss degree of the video transcoding system can be further and rapidly determined, the video transcoding system can be repaired later, and better video experience is improved for users.

In summary, the number of audio frames before each video frame before transcoding is placed in the corresponding video frame to determine whether the number of actual audio frames in each video frame in the transcoded video is consistent with the number of audio frames placed on each video frame, so that whether the phenomenon of asynchronous audio and video occurs after the video is transcoded can be rapidly and accurately determined, whether the transcoding quality problem exists in the video transcoding system is automatically judged, and further processing is conveniently carried out on the phenomenon of asynchronous audio and video.

Corresponding to the above method embodiments, the present disclosure further provides an embodiment of a video processing apparatus, and fig. 5 shows a schematic structural diagram of a video processing apparatus according to an embodiment of the present disclosure. As shown in fig. 4, the apparatus includes:

a video generating module 502 configured to determine a number of audio frames before each video frame of the initial video, and place the number of audio frames before each video frame in a corresponding video frame, to generate a video to be transcoded;

The video transcoding module 504 is configured to send the video to be transcoded to a preset video transcoding system for video transcoding, and receive transcoded video returned by the video transcoding system;

the transcoding result determining module 506 is configured to obtain the number of audio frames placed in each video frame of the transcoded video, and determine the transcoding result of the video to be transcoded based on the number of audio frames placed in each video frame of the transcoded video.

Optionally, the transcoding result determining module 506 is further configured to:

Optionally, the apparatus further includes:

an acquisition module configured to acquire a current number of audio frames preceding each video frame of the transcoded video;

Optionally, the apparatus further includes:

the acquisition module is configured to acquire the initial video and determine the number of audio frames before each video frame of the initial video through a preset audio frame extraction tool.

Optionally, the video generating module 502 is further configured to:

According to the video processing device provided by the embodiment of the application, the number of the audio frames before each video frame is placed in the corresponding video frame before transcoding so as to determine whether the actual number of the audio frames in each video frame in the transcoded video is consistent with the number of the audio frames placed on each video frame, so that whether the phenomenon of asynchronous audio and video occurs after transcoding the video can be rapidly and accurately determined, and further processing is carried out on the phenomenon of asynchronous audio and video conveniently.

The above is a schematic solution of a video processing apparatus of the present embodiment. It should be noted that, the technical solution of the video processing apparatus and the technical solution of the video processing method belong to the same concept, and details of the technical solution of the video processing apparatus, which are not described in detail, can be referred to the description of the technical solution of the video processing method.

Fig. 6 illustrates a block diagram of a computing device 600 provided in accordance with an embodiment of the present specification. The components of computing device 600 include, but are not limited to, memory 610 and processor 620. The processor 620 is coupled to the memory 610 via a bus 630 and a database 650 is used to hold data.

Computing device 600 also includes access device 640, access device 640 enabling computing device 600 to communicate via one or more networks 660. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 640 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 600, as well as other components not shown in FIG. 6, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device shown in FIG. 6 is for exemplary purposes only and is not intended to limit the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 600 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 600 may also be a mobile or stationary server.

Wherein the processor 620 is configured to execute computer-executable instructions that when executed by the processor 620 implement the steps of the video processing method.

The foregoing is a schematic illustration of a computing device of this embodiment. It should be noted that, the technical solution of the computing device and the technical solution of the video processing method belong to the same concept, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the video processing method.

An embodiment of the present disclosure also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, implement the steps of a video processing method as described above.

The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the storage medium and the technical solution of the video processing method belong to the same concept, and details of the technical solution of the storage medium which are not described in detail can be referred to the description of the technical solution of the video processing method.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present description is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present description. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all necessary in the specification.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are merely used to help clarify the present specification. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the disclosure and the practical application, to thereby enable others skilled in the art to best understand and utilize the disclosure. This specification is to be limited only by the claims and the full scope and equivalents thereof.

Claims

1. A video processing method, comprising:

determining the number of audio frames before each video frame of an initial video, and placing the number of audio frames before each video frame in a preset position of a corresponding video frame in a watermark manner to generate a video to be transcoded;

Acquiring the number of audio frames placed in each video frame of the transcoded video, arranging the number of audio frames placed in each video frame of the video to be transcoded frame by frame to form a first number sequence, and arranging the number of audio frames placed in each video frame of the transcoded video frame by frame to form a second number sequence;

2. The method of video processing according to claim 1, wherein said determining that the video to be transcoded was transcoded successfully comprises:

3. The video processing method according to claim 1, wherein after the comparing the first number of sequences with the second number of sequences, further comprising:

4. The video processing method according to claim 1, wherein after the comparing the first number of sequences with the second number of sequences, further comprising:

5. A video processing method according to claim 3, wherein said determining that the video to be transcoded was transcoded successfully comprises:

6. The video processing method according to claim 2 or 5, characterized in that after the obtaining of the current number of audio frames before each video frame of the transcoded video, further comprising:

7. The method of video processing according to claim 1, wherein said determining the number of audio frames preceding each video frame of the initial video comprises:

8. A video processing apparatus, comprising:

the video generation module is configured to determine the number of audio frames before each video frame of the initial video, and place the number of audio frames before each video frame in a preset position of the corresponding video frame in a watermark manner to generate a video to be transcoded;

The video transcoding module is configured to send the video to be transcoded to a preset video transcoding system for video transcoding, and receive transcoded video returned by the video transcoding system;

a transcoding result determining module configured to obtain a number of audio frames placed in each video frame of the transcoded video, arrange the number of audio frames placed in each video frame of the video to be transcoded frame by frame to form a first number of sequences, and arrange the number of audio frames placed in each video frame of the transcoded video frame by frame to form a second number of sequences; comparing the first number of sequences with the second number of sequences, and determining that the video to be transcoded is transcoded successfully under the condition that the first number of sequences is completely matched with the second number of sequences.

9. A computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, wherein the processor, when executing the instructions, implements the steps of the video processing method of any one of claims 1 to 7.

10. A computer readable storage medium, characterized in that it stores computer instructions which, when executed by a processor, implement the steps of the video processing method of any of claims 1 to 7.