CN111684816A

CN111684816A - Video decoding method, video decoding device, storage medium, and electronic apparatus

Info

Publication number: CN111684816A
Application number: CN201980009474.3A
Authority: CN
Inventors: 陈欣
Original assignee: SZ DJI Technology Co Ltd
Current assignee: SZ DJI Technology Co Ltd
Priority date: 2019-05-15
Filing date: 2019-05-15
Publication date: 2020-09-18
Also published as: WO2020227994A1

Abstract

The disclosure provides a video decoding method, a video decoding device, a computer readable storage medium and an electronic device, and belongs to the technical field of computers. The method comprises the following steps: decoding the target video based on the first speed; acquiring the position of the next IDR frame after the current decoding frame; determining whether the next IDR frame is located before a current expected decoding position; wherein the current expected decoding position is: calculating a decoding position of the target video which is decoded to the current time and is expected to arrive according to a preset second speed; performing a lookup decode when the next IDR frame is located before the current expected decoding position. The method and the device can avoid repeated decoding or repeated decoding detection caused by searching and decoding in the video playing process, improve the efficiency and solve the problems of playing blockage and frame loss.

Description

Video decoding method, video decoding device, storage medium, and electronic apparatus

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a video decoding method, a video decoding apparatus, a computer-readable storage medium, and an electronic device.

Background

The video file is generally a digital file formed by compressing digital image information through a specific encoding method, and when playing the video file, the playing tool needs to decode the video file to restore the image information, so as to play the video file. Therefore, the speed of video playback is limited by the decoding speed.

The decoding speed is related to the processing capability of the video playing tool itself and the hardware level of the electronic device, and currently, in most electronic devices, the decoding speed provided by the commonly used video playing tool is greater than the normal playing speed (i.e. 1 × speed playing) of the general video. However, in the situations of video preview, video editing, and user's manual adjustment of fast forward, etc., it is often necessary to play the video at a high speed, for example, at 2 times, 4 times or even higher speed, and if the play speed exceeds the upper limit of the decoding speed, the phenomena of pause, frame loss, etc. may occur in the video play, which affects the viewing experience of the user.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The present disclosure provides a video decoding method, a video decoding apparatus, a computer-readable storage medium and an electronic device, so as to at least improve the problems of the prior art that the video is easy to be jammed and lost when played at a high speed.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to a first aspect of the present disclosure, there is provided a video decoding method comprising: decoding the target video based on the first speed; acquiring the position of the next IDR frame after the current decoding frame; determining whether the next IDR frame is located before a current expected decoding position; wherein the current expected decoding position is: calculating a decoding position of the target video which is decoded to the current time and is expected to arrive according to a preset second speed; performing a lookup decode when the next IDR frame is located before the current expected decoding position.

According to a second aspect of the present disclosure, there is provided a decoding apparatus for video playback, comprising: a decoding module for decoding a target video based on a first speed; an obtaining module, configured to obtain a position of a next IDR frame located after a current decoded frame; a judging module, configured to judge whether the next IDR frame is located before a current expected decoding position; wherein the current expected decoding position is: calculating a decoding position of the target video which is decoded to the current time and is expected to arrive according to a preset second speed; and the searching module is used for searching and decoding when the next IDR frame is positioned before the current expected decoding position.

According to a third aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any one of the above.

According to a fourth aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the method of any one of the above via execution of the executable instructions.

Exemplary embodiments of the present disclosure have the following advantageous effects:

under the condition that the target video is decoded at a decoding speed lower than the expected playing speed, the actual decoding progress follows up the expected playing progress in a decoding searching mode, and whether the searching condition is met or not is judged before searching and decoding, namely whether the next IDR frame after the current decoding frame is in front of the current expected decoding position or not is judged, so that the searching process is not back to the front of the current decoding frame, the process of repeated decoding or repeated decoding detection is avoided, the system overhead is reduced, and the processing efficiency is improved. Furthermore, the exemplary embodiment can ensure that the actual decoding progress effectively approaches to the expected playing progress during each search, so that the decoder can continuously decode a certain number of frames, the frequent frame loss in the related technology is reduced, the problems of video playing jam and frame loss are improved, and the watching fluency of the user is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

FIG. 1 shows a schematic diagram of video frame decoding;

FIG. 2 shows a schematic diagram of high-speed playback of video;

FIG. 3 is a diagram illustrating a lookup decoding in the related art;

fig. 4 shows a flowchart of a video decoding method in the present exemplary embodiment;

fig. 5 shows a flowchart of another video decoding method in the present exemplary embodiment;

fig. 6 is a block diagram showing the structure of a video decoding apparatus in the present exemplary embodiment;

FIG. 7 illustrates a computer-readable storage medium for implementing the above-described method in the present exemplary embodiment;

fig. 8 shows an electronic device for implementing the above method in the present exemplary embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

In video compression, there are mainly two techniques, intra-frame coding and inter-frame coding. Intra-frame coding means spatially compressing and independently coding information of a certain frame; inter-frame coding means that information of a certain frame is compressed in a time dimension, and a previous frame or a subsequent frame is referred to when encoding. For example, the two encoding techniques are adopted in the conventional H264 encoding, in the H264 encoding, video frames include three types of I frames, P frames and B frames:

i frame: the frame is intra-coded. The image can be generated after being compressed, and the image can be independently decoded without depending on other frames.

P frame: frames are forward predictive coded. The P frame is an interframe coding frame, can obtain a higher compression rate than the I frame by depending on the compression generation of the previous video frame in the time dimension, and also needs to depend on the previous video frame when decoding; for example, in the frame sequence shown in fig. 1, the number of forward reference frames per P frame is 1 and adjacent to it. It will be appreciated that the reference frame of a P frame may also be a multiframe or non-adjacent thereto.

B frame: bidirectionally predictive interpolative coded frames. The B frame is also an inter-frame coding frame, and unlike the P frame, it needs to rely on both the previous and the following frames for compression, and the compression rate is higher than that of the P frame, and it also needs to solve the previous and following frames before decoding.

Wherein the I frame further includes a special type, i.e., an IDR (Instantaneous Decoding Refresh) frame. Normal I-frames may need to decode the following P-frames and B-frames with other frames located before it; while an IDR frame can be decoded independently, all frames following it do not need to be decoded and cannot refer to frames preceding the IDR frame, i.e., the video can be re-decoded starting with the IDR frame without reference to frames preceding the IDR frame.

For example, playing a video at 4 times speed may be performed by taking 1 frame out every 4 frames for display, as shown in fig. 2, a segment of H264 video is previewed at 4 times speed, and only the 1 st, 5 th, and 9 th 9 … … th frames are displayed. However, when decoding, it is not possible to extract 1 frame every 4 frames for decoding, because decoding P frames, B frames, etc. requires reference to previous frames, and even if only the 5 th frame needs to be displayed after the 1 st frame, the 2 nd, 3 rd, and 4 th frames need to be decoded, except that they are not displayed. Therefore, during high-speed playing, the workload of the decoder is not reduced, for a video of 30 frames/second (i.e. 33ms per frame), it only needs to be satisfied that each frame is decoded for playing at 1-time speed, but it is necessary to ensure that each frame is decoded within 8.33ms for playing at 4-time speed, otherwise, the decoding cannot keep up with the playing speed.

In order to solve the above problems, the related art generally adopts the following 3 schemes:

1. and transcoding the video at a high speed to generate a new video. Some video editing tools adopt such a scheme, which has the advantages that the generated new video can be edited and played smoothly at 1-time speed, but the transcoding time is required to wait, the longer the video is, the longer the waiting time is, and if a user does not determine which multiple speeds are more suitable and wants to try multiple speeds first, multiple transcoding is required, so that the video editing tools are not suitable for most application scenarios.

2. When the playing speed exceeds the decoding speed, the decoding and playing are carried out in the order of the limit decoding speed regardless of the requirement of the playing speed. For example, if the user wants to play video at 4 times, but the decoding speed is 3 times faster, then the video is played at 3 times, resulting in that only the first 3/4 video is played finally, and for longer video, the end missing video content is more, and therefore the practicability is poor.

3. And adopting a searching and decoding mode. Firstly, decoding and playing the video at a decoding speed, wherein the decoding speed is slower than an expected playing speed, so that the actual playing progress of the video is behind the expected playing progress; when the video is delayed to a certain extent, the delayed part is skipped, the expected playing position is searched, the video is continuously decoded and played from the position, and the steps are sequentially circulated. When searching, the seekTo () method is called, when seek is expected to reach the position of a certain P frame, and the position is transmitted to the decoder, as mentioned above, the decoder can decode from the middle only when seek reaches an IDR frame and ensures that the following frame can be decoded, so that the position that the decoder can actually reach is the IDR frame closest to and before the target position. After seek to IDR frame, the decoder must decode sequentially up to the target position and play it out.

Of the 3 schemes described above, scheme 3 has high practicability, but it also has certain problems. As the lookup process shown in fig. 3, when the expected playing position Y is found from the current decoding position X, the position Y may be any type of frame, and if it is a P frame or a B frame (B frame is not shown), its decoding needs to refer to the previous frame, so the actually searchable position Z is the IDR frame located closest to Y before Y. In practical applications, Z may be located before X, so that during decoding and searching, the decoding actually jumps back to an earlier position, which causes repeated decoding, low efficiency, and video playing pause. The above problem is further specified by an example below:

the video frame rate to be played is 30 frames/second, there is one IDR frame every 4 seconds, i.e. the IDR frames are distributed at 1 st frame, 121 th frame, 241 th frame … … to be played at 120 frames/second, i.e. 4 times speed of normal playing speed, while the time required for the decoder to decode one frame is 10 milliseconds, the limit of which is 100 frames/second, i.e. 3.33 times speed of normal playing speed. If the seek is triggered when the current decoding progress time lags the expected playing progress by 300ms (i.e. 9 frames), the playing situation is as shown in table 1:

actual decoding progress (frame)	Anticipating the progress of the play (frame)
		1-45	1-54
54	55-70
		70	71-91
91	92-118
		118	119-153
153	154-162
		162	163-174
174	175-190
		190	191-211
211	212-238
		238	239-273

TABLE 1

The method can be smoothly played at the beginning, when the 45 th frame is decoded, the expected playing progress reaches the 54 th frame, and at the moment, the frame is 9 frames behind, the searching is triggered, and the nearest IDR frame before the 54 th frame, namely the 1 st frame, is actually searched; however, when the frames 1 to 45 have already been decoded, the decoder performs a process of repeated decoding detection even without repeated decoding, which results in a waste of calculation; when the 54 th frame is decoded, the expected playing progress reaches the 70 th frame, lags behind the 16 th frame, the search is triggered again, and as in the case of the first search, the 1 st frame is actually searched, and the processing flow of the first search is repeated; when the 70 th frame is decoded, the playing progress is much behind the expected playing progress, and the process is repeated, so that the phenomenon of actual playing is that the user keeps on pausing and dropping frames except for the initial segment of smooth playing, and the viewing experience of the user is poor.

In view of the above-mentioned various problems, an exemplary embodiment of the present disclosure provides a video decoding method, and fig. 4 shows a flow of the method, including steps S410 to S440:

step S410: the target video is decoded based on the first speed.

The target video is the video to be played, and the first speed is the actual decoding speed. In the exemplary embodiment, the expected playing speed of the target video is faster than the actual decoding speed, and after the video starts to be played, two time schedules can be maintained: actual decoding progress and expected playing progress. The actual playing of the target video is synchronized with the decoding, i.e. the actual playing progress is equal to the actual decoding progress. In an exemplary embodiment, the first speed is the fastest decoding speed that the system can achieve.

In step S420, the position of the next IDR frame after the current decoded frame is obtained.

The frame decoded at the current moment is the current decoded frame, and the next IDR frame after the current decoded frame is the IDR frame which is located after the current decoded frame and is closest to the current decoded frame. The current decoded frame and the next IDR frame may be represented by a timestamp or an ordinal number of the frame in the target video, for example: the current decoded frame and the next IDR frame are represented as specific time instants in the target video, or the current decoded frame and the next IDR frame are represented as the mth frame and the mth frame in the target video, which is not limited by the present disclosure.

In an exemplary embodiment, the position of the next IDR frame may be obtained by: and determining the position of the next IDR frame after the current decoding frame according to the position information of each IDR frame of the target video. The position information of each IDR frame of the target video may be obtained from an encoding information file carried by the target video, or may be obtained by parsing the target video, for example: when any frame of the target video is decoded, acquiring the type of the current frame; and if the current frame is the IDR frame, recording the time stamp of the current frame, and taking the time stamp of the current frame as the position information of the corresponding IDR frame to obtain the position information of each IDR frame of the target video. The above process of parsing and recording the frame type can be performed when the target video is played for the first time, and the decoding generally includes two processes of parsing and decoding, taking the H264 encoded video as an example: firstly, analyzing a video frame from a target video, converting the video frame into H264 data of a frame, and determining the frame type as an IDR frame or a P frame and the like through the H264 data; the H264 data is then passed to a decoder which decodes the displayable image data. In this way, the position information of the IDR frame can be extracted and stored in a buffer or a designated file for subsequent playing or decoding. After the position information of each IDR frame is obtained, each IDR frame and the current decoding frame can be marked on the time axis of the target video, and the position of the next IDR frame can be directly determined according to the position relationship between the IDR frame and the current decoding frame.

In an exemplary embodiment, the position of the next IDR frame may also be determined by: and acquiring the position of the next IDR frame after the current decoding frame according to the position of the last IDR frame and the interval of two IDR frames. Wherein, the last IDR frame refers to the IDR frame located before the current decoding frame and closest to the current decoding frame. In many video compression coding, the IDR frames are uniformly arranged, i.e., the interval between any two adjacent IDR frames is equal. Based on this, the time interval between any two adjacent IDR frames can be determined by the interval of two IDR frames, so that the position of the next IDR frame can be calculated by adding the time interval to the position of the previous IDR frame. The method is suitable for the situation of playing the video for the first time, and the position of the next IDR frame can be calculated when the IDR frame position information of the target video is not analyzed.

Further, the interval between the two IDR frames may be obtained by: the positions of at least two IDR frames before the current decoding frame are obtained, and the interval of the two IDR frames is determined according to the positions of the at least two IDR frames. I.e., the interval between IDR frames can be determined by the position of two or more IDR frames decoded before the current decoded frame. Wherein, the at least two IDR frames may include two adjacent IDR frames, and the time stamps thereof are subtracted to obtain the interval between the two IDR frames; or two non-adjacent IDR frames are decoded, but the interval of the two IDR frames can be calculated by knowing the number of the IDR frames in the middle interval; or a plurality of IDR frames are decoded with an interval of two IDR frames closest to each other as an interval of the two IDR frames, and so on.

It should be added that the above method of calculating the next IDR frame according to the inter time can be regarded as a prediction method. Even though the predicted position of the next IDR frame may not be accurate for the case that the IDR frame interval is not uniform, since the interval difference between different IDR frames in the video is not too large, the predicted deviation is not too large, and the present exemplary embodiment still has a certain applicability.

In step S430, it is determined whether the next IDR frame is located before the current expected decoding position.

Wherein the current expected decoding positions are: and calculating a decoding position of the target video, which is expected to arrive at the current time according to the preset second speed, wherein the position can also be represented in the form of a timestamp or a frame ordinal number. The second speed is a desired decoding or playing speed, and may be, for example, 2 times, 4 times, 8 times, or 16 times the normal playing speed of the target video. The first speed is less than the second speed, i.e. the actual decoding speed is less than the desired decoding speed, which cannot meet the requirements of the desired play-out speed. For example: it is expected that the target video is played at 4 times of the normal playing speed, and the decoding speed of the decoder can only reach 3 times of the speed at most, then the first speed is 3 times of the speed, the second speed is 4 times of the speed, and the actual decoding progress lags behind the expected decoding progress in decoding.

In the present exemplary embodiment, the context relationship between the next IDR frame and the current expected decoding position can be judged by the timestamp, for example: in step S420, a timestamp of the next IDR frame is obtained, a timestamp corresponding to the current expected decoding position is calculated, the two times are compared, and whether the time of the next IDR frame is earlier than the time of the expected decoding position is determined; it can also be judged by the number of frames, for example: in step S420, the o-th frame of the target video is obtained as the next IDR frame, the n-th frame corresponding to the current expected decoding position is calculated, and whether o is smaller than n is determined.

In step S440, when the next IDR frame is located before the current expected decoding position, the search decoding is performed.

Wherein, seeking decoding means jumping to seeking specified position, and determining a decoding start frame (for example, an IDR frame) in the vicinity of the position, and continuing decoding from the decoding start frame to the back. For example, the seek decoding may employ the seekTo () method of the decoder, the logic of which is: the decoder is caused to find a temporal position in (), for example, seekTo (2000), and then finds a 2000ms position of the target video, and if the frame corresponding to the position is not an IDR frame, the decoder finds the nearest IDR frame before the position, actually finds the IDR frame, and continues decoding from the IDR frame. Specifically, when searching for decoding, the next IDR frame may be searched for decoding, and the previous IDR frame at the current expected decoding position may also be searched for decoding; the last IDR frame of the current expected decoding position refers to an IDR frame located before the current expected decoding position and closest to the current expected decoding position.

In the present exemplary embodiment, the target video is composed of P frames, B frames, and I frames, which include normal I frames and IDR frames. Of course, I-frames are not necessary, and only one of P-frames and B-frames may be used. The IDR frame is usually set at intervals, and may be set at a certain time or frame interval, or may be set at the beginning of each segment according to the target video content, and this disclosure does not limit this. The frame type between two adjacent IDR frames includes at least one of a P frame, a B frame, and a normal I frame. When the next IDR frame or the previous IDR frame at the current expected decoding position is found, discarding the undecoded frame before the IDR frame, clearing the previous decoding reference information, starting to decode again from the IDR frame, decoding the subsequent P frame and B frame, if the I frame is encountered, decoding independently, thereby ensuring that the subsequent frames can be decoded normally.

In an exemplary embodiment, when the next IDR frame is located behind the current expected decoding position, the above-mentioned search condition is not satisfied, and there is no IDR frame between the current decoded frame and the current expected decoding position, if the search is performed, it is bound to return to the position before the current decoded frame, so the search may not be performed, and the decoding is continued from the current decoded frame.

It should be added that if the next IDR frame is located at the current expected decoding position, this situation can be regarded as a special case where the next IDR frame is located before the current expected decoding position, and step S440 is executed, or as a special case where the next IDR frame is located after the current expected decoding position, and decoding is continued without performing a search, which is not limited by the present disclosure.

It should be noted that, the present exemplary embodiment may decode the target video frame by frame and play the target video frame by frame, and play one frame by frame after decoding one frame. In addition, under the condition of high-speed playing of the target video, a frame extraction playing mode similar to that of fig. 2 may be adopted, for example, one frame is played every 4 decoded frames, and then the current decoded frame and the current frame to be displayed may be different, for example, in fig. 2, if the 3 rd frame is decoded currently, the current frame to be displayed is the 5 th frame, the current frame to be displayed is decoded currently to the 6 th frame, and the current frame to be displayed is the 9 th frame; based on this, in step S420, the position of the next IDR frame after the current frame to be displayed may also be obtained, and thus the determination in step S430 is performed to determine whether to perform the search decoding in step S440; if the condition of the step S430 is satisfied, decoding the current frame to be displayed from the current decoded frame, and searching and decoding the current frame to be displayed after the current frame to be displayed is displayed, so as to avoid searching without displaying the current frame to be displayed, and further improve the efficiency of video decoding and playing.

Based on the above description, in the exemplary embodiment, when the target video is decoded at a decoding speed lower than the expected playing speed, the actual decoding progress follows the expected playing progress in a searching and decoding manner, and it is determined before searching and decoding whether the searching condition is met, that is, whether the next IDR frame after the current decoding frame is before the current expected decoding position, so as to ensure that the frame does not return to the current decoding frame during searching, avoid performing a process of repeated decoding or repeated decoding detection, reduce system overhead, and improve processing efficiency. Furthermore, the exemplary embodiment can ensure that the actual decoding progress effectively approaches to the expected playing progress during each search, so that the decoder can continuously decode a certain number of frames, the frequent frame loss in the related technology is reduced, the problems of video playing jam and frame loss are improved, and the watching fluency of the user is improved.

To further improve the efficiency of catching up the expected progress at each search, the expected search position (i.e. the current expected decoding position) is made as small as possible different from the actual search position (the next IDR frame, or the previous IDR frame of the current expected decoding position). In an exemplary embodiment, after the target video starts playing, since the current decoded frame and the current expected decoding position move backward in real time, steps S420 and S430 may be performed in real time, that is, each time a frame is decoded, the next IDR frame position of the current decoded frame is monitored, the current expected decoding position is detected or calculated at the same time, and the position relationship between the next IDR frame and the current expected decoding position is determined, or the above processes may be periodically performed at short time intervals, so as to achieve a real-time effect. Accordingly, in step S440, the lookup decoding may be immediately performed in response to the current expected decoding position reaching or exceeding the position of the next IDR frame.

In other words, look-up decoding may occur upon detecting that a condition is met where the current expected decoding position reaches or exceeds the position of the next IDR frame, where the current expected decoding position equals the position of the next IDR frame, or just exceeds the position of the next IDR frame. Generally, the scheme for detecting and judging the search condition in real time can be very close to an expected search position during each search, and can continuously decode for a long time and then perform the next search, so that a user can watch a video continuous picture for a long time, and the problems of playing blockage and frame loss are further improved.

According to the above embodiment of obtaining the position of the next IDR frame, if the position of the next IDR frame is determined according to the position information of each IDR frame in the target video, the position of the next IDR frame is the real position; if the position of the next IDR frame is determined based on the position of the previous IDR frame and the interval of two IDR frames, the position of the next IDR frame is the predicted position.

As described above, in the case where steps S420 and S430 are performed in real time, the relationship between the next IDR frame and the previous IDR frame at the current expected decoding position and the search decoding process are as follows:

if the position of the obtained next IDR frame is an accurate position, the next IDR frame is a previous IDR frame of the current expected decoding position, and the next IDR frame can be directly searched for and decoded;

if the position of the obtained next IDR frame is a predicted position, and the predicted position is matched with the accurate position (for example, under the condition that the IDR frame of the target video can be determined to be uniformly set), the next IDR frame is the previous IDR frame of the current expected decoding position, and the next IDR frame can be directly searched for and decoded;

if the position of the obtained next IDR frame is a predicted position, the predicted position does not match with the accurate position or cannot be matched with the predicted position (for example, it is not determined whether the IDR frame of the target video is uniformly set), the next IDR frame may not be the previous IDR frame of the current expected decoding position, and the previous IDR frame of the current expected decoding position can be found for decoding.

It is understood that steps S420 and S430 may be performed in non-real time, and there may be more than one IDR frame between the current decoding frame and the current expected decoding position, so that during the search, the previous IDR frame of the current expected decoding position, which is closest to the current expected decoding position, may be actually found, which is beneficial to catch up with the expected playing progress.

In an exemplary embodiment, after step S440, the following steps may also be performed:

and when the current decoding frame reaches or exceeds the current expected decoding position, continuously acquiring the position of the next IDR frame after the current decoding frame.

In this step, there are the following points to be explained:

1. during each search, the current expected decoding position n is expected to be searched, the IDR frame position o before n is actually searched, decoding is carried out backwards from o in sequence, and in the process from o to n, any search process can not be executed; until the current decoding frame reaches or exceeds the position n, the above steps S420 to S440 are started again.

2. In the process of playing the target video, the searching steps S420 to S440 are sequentially executed in a loop, that is, after one search is completed, the current expected decoding position n is sequentially decoded from the actual searching position o, and in the decoding process thereafter, if the searching condition is satisfied, the searching decoding is performed again, so that the complete decoding and playing processing can be performed by the present exemplary embodiment regardless of the length of the target video.

3. The current expected decoding position n represents the expected playing progress, the position o is actually found, and the video frame from o to n can not be played, so that the expected playing progress can be quickly pursued after each search, and the expected playing logic can be met as much as possible; and the video can be played, so that the played video content is more complete.

Therefore, the execution of the steps can further increase the effectiveness of each search when the decoding progress lags behind the playing progress more, and the fluency of video playing is improved.

The purpose of the search is to make the actual decoding progress catch up with the expected playing progress, and the search is more obvious when the two progresses are different greatly. Based on this, fig. 5 shows another exemplary flow of a video decoding method, and on the basis of the steps of the method in fig. 4, when the target video starts to play, the target video is decoded and played frame by frame, and the method may further include the following steps:

step S411, monitoring the difference value of the current decoding frame lagging the current expected decoding position;

step S412, determining whether the difference is greater than a preset threshold;

in step S413, if the difference is greater than the preset threshold, step S420 and the following steps are performed.

Step S414, if the difference is smaller than the preset threshold, step S450 is executed, and the decoding is continued from the current decoding frame.

The difference may be calculated in the following exemplary manners, but the following manners should not limit the scope of the disclosure:

(1) the current decoding frame is the mth frame, the current expected decoding position is the nth frame, the frame rate of the target video is r, the unit of the frame rate is FPS (Frames Per Second), and the difference t of the current decoding frame lagging the current expected decoding position is (n-m)/r;

(2) directly acquiring timestamps of a current decoding frame and a current expected decoding position in a target video, and subtracting the timestamps to obtain a difference value;

(3) if the first speed is v1, the expected decoding speed is v2, the units are all FPSs, the frame rate of the target video is r, k seek times currently occur, the number of frames skipped by each seek time is h1, h2, … and hk, respectively, and the played time (from the beginning to the current time) is ts, then the difference t is [ (v2-v1) · (h1+ h2+ … + hk) ]/r.

The preset threshold is used as a standard for judging whether the difference between the decoding progress and the expected playing progress is large, and may be determined according to the length of the target video, the frame rate, the difference between the first speed and the second speed, the specific application requirement, and the like, which is not limited by the disclosure. In an exemplary embodiment, the preset threshold may be 300ms, empirically; in an exemplary embodiment, in consideration of optimizing the search process so that the picture jump before and after the search is not too large, the preset threshold may be smaller than the time difference between two adjacent IDR frames that are closest to each other in the target video, and each search is performed only with one IDR frame between the current decoded frame and the current expected decoding position, which does not occur with two or more IDR frames, so that the number of frames skipped during the search is small, and the IDR frame is not skipped, thereby further improving the decoding efficiency.

It should be understood that the above difference and the preset threshold can also be expressed as a frame number, that is, when the decoding progress falls behind the expected playing progress and reaches a certain frame number standard, the step S420 and the following steps are executed. The principle and the calculation method used when determining the difference in the form of the number of frames and the preset threshold are similar to those in the above exemplary manners (1) to (3), and thus are not described again.

In fig. 5, after steps S410 to S412 are executed, the following procedure is equivalent to performing double judgment on whether searching and decoding can be performed, and the judgment and execution results include the following cases:

1. if the conditions of steps S413 and S430 are satisfied, performing steps S413-S420-S430-S440, i.e., performing search decoding;

2. if the condition of step S413 is satisfied but the condition of step S430 is not satisfied, performing steps S413-S420-S430-S450 without performing search decoding;

3. if the condition of step S414 is satisfied (i.e., the condition of step S413 is not satisfied), steps S414 to S450 are performed without performing the search decoding.

Therefore, only when the double judgment conditions are met, the searching and decoding are carried out.

It should be added that, if the difference value is exactly equal to the preset threshold value, in the determination of steps S412 to S414, this situation may be regarded as a special situation satisfying the condition of step S413, and step S420 is executed, or step S450 is executed as a special situation satisfying the condition of step S414, which is not limited by the present disclosure.

In an exemplary embodiment, in step S420, the position of the next IDR frame is determined according to the position information of each IDR frame of the target video, and the preset threshold in step S412 is smaller than the interval between two adjacent IDR frames that are closest to each other in the target video. If the search condition is satisfied, only one IDR frame exists between the current decoded frame and the current expected decoding position, where the IDR frame is the next IDR frame determined in step S420, and the position determined in step S420 is accurate; based on this, when searching, the next IDR frame can be searched directly for decoding. Thereby further improving efficiency without having to look ahead for an IDR frame from the current expected decoding position.

The video in table 1 is played by using the method flow of fig. 5, and the playing condition can be as shown in table 2, the video is played from the 1 st IDR frame (i.e. the 1 st frame), and is continuously played until the expected playing progress reaches the next IDR frame, and when the next IDR frame (i.e. the 121 th frame) is reached, the decoding progress lags behind the expected playing progress by 20 frames, exceeds the threshold of 300ms, and the video is searched to the position of the 121 th frame to continue decoding. Therefore, the video between each section of IDR can be decoded and played continuously, and the fluency is greatly improved.

Actual decoding progress (frame)	Anticipating the progress of the play (frame)
		1-100	1-120
121-220	121-240

TABLE 2

Comparing the playing situations in table 1 and table 2, it can be seen that, with the related art scheme, only 55 frames are actually played in the first 8 seconds of the video, whereas 200 frames are played in the present exemplary embodiment, which is 3.6 times as efficient as the related art.

An exemplary embodiment of the present disclosure also provides a video decoding apparatus, as shown in fig. 6, the apparatus 600 may include: a decoding module 610 for decoding the target video based on the first speed; an obtaining module 620, configured to obtain a position of a next IDR frame after a current decoded frame; a determining module 630, configured to determine whether a next IDR frame is located before the current expected decoding position; wherein the current expected decoding positions are: calculating a decoding position of the target video which is decoded to the current time and is expected to arrive according to a preset second speed; and the searching module 640 is configured to perform searching decoding when the next IDR frame is located before the current expected decoding position.

In an exemplary embodiment, the lookup module 640 may find the next IDR frame to decode when performing lookup decoding.

In an exemplary embodiment, the lookup module 640 may find the previous IDR frame of the current expected decoding position for decoding when performing lookup decoding.

In an exemplary embodiment, the video decoding apparatus 600 may further include: a monitoring module 650 for monitoring a difference value of the current decoded frame lagging the current expected decoding position; the obtaining module 620 may be configured to obtain a position of a next IDR frame after the current decoded frame if the difference is greater than a preset threshold.

In an exemplary embodiment, the decoding module 610 may be further configured to continue decoding from the currently decoded frame if the difference is smaller than a preset threshold.

In an exemplary embodiment, if the position of the next IDR frame is determined according to the position information of each IDR frame, and the preset threshold is smaller than the interval between two adjacent IDR frames that are closest to each other in the target video, the search module 640 may search for the next IDR frame to decode when performing search decoding.

In an exemplary embodiment, the preset threshold may be 300 ms.

In an exemplary embodiment, if the position of the next IDR frame is determined according to the position information of each IDR frame in the target video, the lookup module 640 may be configured to find the next IDR frame for decoding in response to the current expected decoding position reaching or exceeding the position of the next IDR frame.

In an exemplary embodiment, if the position of the next IDR frame is determined according to the position of the last IDR frame and the interval between two IDR frames, the lookup module 640 may find the previous IDR frame at the current expected decoding position for decoding when performing lookup decoding.

In an exemplary embodiment, after the lookup module 640 performs the lookup decoding, the obtaining module 620 may be configured to continue to obtain the position of the next IDR frame after the current decoded frame when the current decoded frame is determined to reach or exceed the current expected decoding position.

In an exemplary embodiment, the decoding module 610 can also be used to continue decoding from the current decoded frame when the next IDR frame is located after the current expected decoding position.

In an exemplary embodiment, the video decoding apparatus 600 may further include: a display module 660, configured to display the current decoded frame after the decoding module 610 decodes the current decoded frame.

In an exemplary embodiment, the video decoding apparatus 600 may further include: the recording module 670 is configured to, when the decoding module 610 decodes to any frame of the target video, obtain a type of the current frame, and record a timestamp of the current frame if the current frame is an IDR frame, where the timestamp of the current frame is used as position information of the corresponding IDR frame.

In an exemplary embodiment, the obtaining module 620 may be configured to determine a position of a next IDR frame after a current decoded frame according to the position information of each IDR frame of the target video.

In an exemplary embodiment, the obtaining module 620 may be configured to obtain the position of the next IDR frame after the current decoded frame according to the position of the last IDR frame and the interval of two IDR frames.

In an exemplary embodiment, the obtaining module 620 may be further configured to obtain positions of at least two IDR frames located before the current decoded frame, and determine an interval between the two IDR frames according to the positions of the at least two IDR frames.

In an exemplary embodiment, the at least two IDR frames include at least two adjacent IDR frames.

In an exemplary embodiment, the frame type between two adjacent IDR frames in the target video includes at least one of a P frame, a B frame, and a normal I frame.

In an exemplary embodiment, the first speed is a decoding speed; the second speed includes: 2 times, 4 times, 8 times or 16 times of the normal playing speed of the target video, and the like.

Details of the solution not disclosed in the above apparatus can be found in the embodiments of the method section, and thus are not described again.

As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

Exemplary embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the disclosure described in the above-mentioned "exemplary methods" section of this specification, when the program product is run on the terminal device.

Referring to fig. 7, a program product 700 for implementing the above method according to an exemplary embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Exemplary embodiments of the present disclosure also provide an electronic device capable of implementing the above method. An electronic device 700 according to such an exemplary embodiment of the present disclosure is described below with reference to fig. 7. The electronic device 700 shown in fig. 7 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 8, electronic device 800 may take the form of a general purpose computing device. The components of the electronic device 800 may include, but are not limited to: the at least one processing unit 810, the at least one memory unit 820, a bus 830 connecting the various system components (including the memory unit 820 and the processing unit 810), and a display unit 840.

The storage unit 820 stores program code that may be executed by the processing unit 810 to cause the processing unit 810 to perform steps according to various exemplary embodiments of the present disclosure described in the "exemplary methods" section above in this specification. For example, processing unit 810 may perform the method steps shown in fig. 4 or fig. 5, and so on.

The storage unit 820 may include readable media in the form of volatile storage units, such as a random access storage unit (RAM)821 and/or a cache storage unit 822, and may further include a read only storage unit (ROM) 823.

Storage unit 820 may also include a program/utility 824 having a set (at least one) of program modules 825, such program modules 825 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 830 may be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 800 may also communicate with one or more external devices 900 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 800, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 800 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 850. Also, the electronic device 800 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 860. As shown, the network adapter 860 communicates with the other modules of the electronic device 800 via the bus 830. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 800, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the exemplary embodiments of the present disclosure.

Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit according to an exemplary embodiment of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.

Claims

1. A video decoding method, comprising:

decoding the target video based on the first speed;

acquiring the position of the next IDR frame after the current decoding frame;

determining whether the next IDR frame is located before a current expected decoding position; wherein the current expected decoding position is: calculating a decoding position of the target video which is decoded to the current time and is expected to arrive according to a preset second speed;

performing a lookup decode when the next IDR frame is located before the current expected decoding position.

2. The method of claim 1, further comprising:

and when searching and decoding, searching the next IDR frame for decoding.

3. The method of claim 1, further comprising:

when searching and decoding, searching the previous IDR frame of the current expected decoding position for decoding.

4. The method of claim 1, further comprising:

monitoring a difference value of the current decoded frame lagging the current expected decoding position;

and if the difference value is larger than a preset threshold value, executing the step of acquiring the position of the next IDR frame after the current decoding frame.

5. The method of claim 4, further comprising:

and if the difference value is smaller than the preset threshold value, continuing decoding by the current decoding frame.

6. The method as claimed in claim 4 or 5, wherein if the position of the next IDR frame is determined according to the position information of each IDR frame, and the preset threshold is smaller than the interval between two adjacent IDR frames closest to each other in the target video, the next IDR frame is found for decoding when performing the search decoding.

7. The method according to any of claims 4 to 6, wherein the preset threshold is 300 ms.

8. The method of claim 1, wherein if the position of the next IDR frame is determined according to the position information of each IDR frame in the target video, performing the lookup decoding when the next IDR frame is located before the current expected decoding position comprises:

finding the next IDR frame for decoding in response to the current expected decoding position reaching or exceeding the position of the next IDR frame.

9. The method of claim 1 wherein if the position of the next IDR frame is determined according to the position of the last IDR frame and the interval between two IDR frames, when performing lookup decoding, the previous IDR frame that found the current expected decoding position is decoded.

10. The method according to any of claims 1 to 9, wherein after finding the next IDR frame or the previous IDR frame of the current expected decoding position for decoding, the method further comprises:

and when the position of the current decoding frame reaches or exceeds the current expected decoding position, continuously acquiring the position of the next IDR frame after the current decoding frame.

11. The method according to any one of claims 1 to 10, further comprising:

when the next IDR frame is located after the current expected decoding position, decoding continues from the current decoded frame.

12. The method according to any one of claims 1 to 11, further comprising:

displaying the current decoded frame after decoding the current decoded frame.

13. The method according to any one of claims 1 to 12, further comprising:

when any frame of the target video is decoded, acquiring the type of the current frame;

and if the current frame is the IDR frame, recording the time stamp of the current frame, and taking the time stamp of the current frame as the position information of the corresponding IDR frame.

14. The method of claim 13, wherein obtaining the position of the next IDR frame after the current decoded frame comprises:

and determining the position of the next IDR frame after the current decoding frame according to the position information of each IDR frame of the target video.

15. The method of claim 1, wherein obtaining the position of the next IDR frame after the current decoded frame comprises:

and acquiring the position of the next IDR frame after the current decoding frame according to the position of the last IDR frame and the interval of two IDR frames.

16. The method of claim 15, wherein the interval of the two IDR frames is obtained by:

acquiring the positions of at least two IDR frames before the current decoding frame, and determining the interval of the two IDR frames according to the positions of the at least two IDR frames.

17. The method of claim 16, wherein the at least two IDR frames comprise at least two adjacent IDR frames.

18. The method according to any one of claims 1 to 17, wherein the frame type between two adjacent IDR frames in the target video comprises at least one of a P frame, a B frame, and a normal I frame.

19. A video decoding apparatus, comprising:

a decoding module for decoding a target video based on a first speed;

an obtaining module, configured to obtain a position of a next IDR frame located after a current decoded frame;

a judging module, configured to judge whether the next IDR frame is located before a current expected decoding position; wherein the current expected decoding position is: calculating a decoding position of the target video which is decoded to the current time and is expected to arrive according to a preset second speed;

and the searching module is used for searching and decoding when the next IDR frame is positioned before the current expected decoding position.

20. The apparatus of claim 19,

and when the searching module searches and decodes, searching the next IDR frame for decoding.

21. The apparatus of claim 19,

and when searching and decoding, the searching module searches and decodes the previous IDR frame at the current expected decoding position.

22. The apparatus of claim 19, further comprising:

a monitoring module for monitoring a difference value that the current decoded frame lags behind the current expected decoding position;

and the obtaining module is used for obtaining the position of the next IDR frame after the current decoding frame if the difference value is larger than a preset threshold value.

23. The apparatus of claim 22, wherein the decoding module is further configured to continue decoding from the currently decoded frame if the difference is smaller than the preset threshold.

24. The apparatus of claim 22 or 23, wherein if the position of the next IDR frame is determined according to the position information of each IDR frame, and the preset threshold is smaller than the interval between two adjacent IDR frames nearest to each other in the target video,

25. The apparatus according to any one of claims 22 to 24, wherein the preset threshold is 300 ms.

26. The apparatus of claim 19 wherein if the position of the next IDR frame is determined according to the position information of each IDR frame in the target video,

the searching module is used for responding to the current expected decoding position reaching or exceeding the position of the next IDR frame, and searching the next IDR frame for decoding.

27. The apparatus of claim 19 wherein if the position of the next IDR frame is determined based on the position of the previous IDR frame and the interval of two IDR frames,

28. The apparatus according to any of claims 19 to 27, wherein after the lookup module finds the next IDR frame or the previous IDR frame of the current expected decoding position for decoding,

the obtaining module is further configured to continue to obtain the position of the next IDR frame after the current decoded frame when it is determined that the position of the current decoded frame reaches or exceeds the current expected decoding position.

29. The apparatus of any of claims 19 to 28, wherein the decoding module is further configured to continue decoding from the current decoded frame when the next IDR frame is located after the current expected decoded frame.

30. The apparatus of any one of claims 19 to 29, further comprising:

and the display module is used for displaying the current decoding frame after the decoding module decodes the current decoding frame.

31. The apparatus of any one of claims 19 to 30, further comprising:

and the recording module is used for acquiring the type of the current frame when the decoding module decodes any frame of the target video, recording the time stamp of the current frame if the current frame is the IDR frame, and taking the time stamp of the current frame as the position information of the corresponding IDR frame.

32. The apparatus of claim 31, wherein the obtaining module is configured to determine a position of a next IDR frame after the current decoded frame according to the position information of each IDR frame of the target video.

33. The apparatus of claim 19, wherein the obtaining module is configured to obtain the position of a next IDR frame after the current decoded frame according to the position of a previous IDR frame and the interval between two IDR frames.

34. The apparatus of claim 33, wherein the obtaining module is further configured to obtain positions of at least two IDR frames that precede the current decoded frame, and wherein the interval between the two IDR frames is determined according to the positions of the at least two IDR frames.

35. The apparatus of claim 34, wherein the at least two IDR frames comprise at least two adjacent IDR frames.

36. The apparatus according to any of claims 19 to 35, wherein the frame type between two adjacent IDR frames in the target video comprises at least one of a P frame, a B frame and a normal I frame.

37. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1-18.

38. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of any of claims 1-18 via execution of the executable instructions.