CN114189711A

CN114189711A - Video processing method and device, electronic equipment and storage medium

Info

Publication number: CN114189711A
Application number: CN202111357644.2A
Authority: CN
Inventors: 李尾冬
Original assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Current assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date: 2021-11-16
Filing date: 2021-11-16
Publication date: 2022-03-15

Abstract

The application relates to a video processing method and device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a first GOP (group of pictures), wherein the first GOP is an original GOP of a to-be-played segment in a target video requested by a video terminal; discarding part of video frames in the first GOP to obtain a second GOP; adjusting the time stamp of the video frame in the second GOP according to the playing time length of the first GOP so that the adjusted playing time length of the second GOP is the same as the playing time length of the first GOP; the second GOP is sent to the video terminal. The method and the device solve the technical problem of frame dropping to cause picture jumping when the server side data are accumulated.

Description

Video processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of media information processing, and in particular, to a video processing method and apparatus, an electronic device, and a storage medium.

Background

With the popularization of the internet and the development of multimedia technology, the application of streaming transmission technology and streaming media in video on demand, distance education, video conference, network live broadcast, network news release, network advertisement and other aspects is wide at present. Streaming media refers to continuous time-based media in networks using streaming techniques, such as: the audio, video or multimedia file is not downloaded before playing, only the beginning part of the content is stored in the memory, and other data streams are transmitted at any time and played at any time. Compared with the traditional pure downloading, the streaming media has obvious advantages: (1) the waiting time can be greatly shortened because all data do not need to be downloaded; (2) the stream file is usually smaller than the data volume of the original file, and a user does not need to download all the stream files to a hard disk, so that a large amount of disk space is saved; (3) due to the adoption of real-time transmission protocols such as RSTP (rapid spanning Tree Protocol), the method is more suitable for real-time transmission of animation, video and audio on the network.

At present, in the playing process of streaming media, if network jitter causes data accumulation at a server, the server may lose data, and when a frame is lost at the server, an audio frame and a video frame are either simultaneously discarded or only the video frame is discarded, because the quality of data obtained at a user side is poor, a user plays a video picture in a jumping and discontinuous manner, sometimes only hears sound, but the picture is stuck, thereby affecting the playing quality and seriously affecting the user experience.

Aiming at the problem of frame loss to cause picture jump during server data accumulation in the related technology, no effective solution is provided at present.

Disclosure of Invention

The application provides a video processing method and device, electronic equipment and a storage medium, which are used for at least solving the technical problem of picture jump caused by frame loss when server data are accumulated in the related technology.

According to an aspect of an embodiment of the present application, there is provided a video processing method, including: acquiring a first GOP (group of pictures), wherein the first GOP is an original GOP of a to-be-played segment in a target video requested by a video terminal; discarding part of video frames in the first GOP to obtain a second GOP; adjusting the time stamp of the video frame in the second GOP according to the playing time length of the first GOP so that the adjusted playing time length of the second GOP is the same as the playing time length of the first GOP; the second GOP is sent to the video terminal.

According to another aspect of the embodiments of the present application, there is also provided a video processing apparatus, including: the device comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a first GOP (group of pictures), and the first GOP is an original GOP of a segment to be played in a target video requested by a video terminal; the discarding module is used for discarding part of video frames in the first GOP to obtain a second GOP; the adjusting module is used for adjusting the time stamp of the video frame in the second GOP according to the playing time length of the first GOP so that the adjusted playing time length of the second GOP is the same as the playing time length of the first GOP; and the sending module is used for sending the second GOP to the video terminal.

According to another aspect of the embodiments of the present application, there is also provided a storage medium including a stored program which, when executed, performs the above-described method.

According to another aspect of the embodiments of the present application, there is also provided an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the above method through the computer program.

According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the steps of any of the embodiments of the method described above.

In the embodiment of the application, a first GOP is obtained, wherein the first GOP is an original picture group GOP of a to-be-played segment in a target video requested by a video terminal; discarding part of video frames in the first GOP to obtain a second GOP; adjusting the time stamp of the video frame in the second GOP according to the playing time length of the first GOP so that the adjusted playing time length of the second GOP is the same as the playing time length of the first GOP; and the mode of sending the second GOP to the video terminal is that the GOP after discarding part of the video frames is processed, and the timestamps of the rest video frames in the GOP are distributed in the playing time before frame loss, so that the purpose of making the video frames after frame loss continuous is achieved, the technical problem of frame jump caused by frame loss when the data of the server end is accumulated in the related technology is further solved, and the technical effect of improving the user experience is realized.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a schematic diagram of a hardware environment for a video processing method according to an embodiment of the present application;

FIG. 2 is a flow diagram of an alternative video processing method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of video processing of a previous GOP according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a GOP after video processing according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an alternative video processing apparatus according to an embodiment of the present application; and the number of the first and second groups,

fig. 6 is a block diagram of a terminal according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, partial nouns or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:

GOP: GOP (group Of Picture) group Of pictures, in video coding, pictures are organized by taking a GOP group Of pictures as a unit, one GOP is a section Of data stream after picture coding, starts with a key frame (I frame) and ends with the next key frame, and audio frames and video frames are contained in the GOP, wherein the time stamps Of the audio and video frames are all monotonically increased.

I frame: i-frames (Intra coded frames), i.e. I-frame images are Intra coded, i.e. only spatial correlation within a single frame image is used, but not temporal correlation. I frames use intra-frame compression and no motion compensation, and are the entry point for random access and the reference frame for decoding because they do not depend on other frames. The I frame is mainly used for initialization of a receiver and acquisition of a channel, and switching and insertion of programs, and the compression multiple of the I frame image is relatively low. I-frame pictures occur periodically in a sequence of pictures, with the frequency of occurrence being selectable by the encoder.

P frame: p frames (Predicted frames), forward Predicted frames, and inter-frame coding, i.e., exploiting both spatial and temporal correlation. The P frame image only adopts forward time prediction, so that the compression efficiency and the image quality can be improved. The P frame image may contain intra-coded parts, i.e. each macroblock in the P frame may be either forward predicted or intra-coded.

B frame: b-frames (Bi-directionally predicted frames) are Bi-directionally predicted interpolated frames, compressed using the correlation in two temporal directions of the video sequence, and the B-frames use the preceding I or P frame and the following P frame as reference frames, and therefore do not serve as reference frames because the coding and decoding order of the B-frames disturbs the natural order of the video images.

Time stamping: and a DTS (decoding Time stamp), wherein the Time stamp is decoded, and the player determines the Time for decoding the data of each frame according to the DST of the frame.

According to an aspect of embodiments of the present application, there is provided an embodiment of a method of video processing.

Alternatively, in the present embodiment, the above-described video processing method may be applied to a hardware environment constituted by the video terminal 101 (hereinafter, simply referred to as a terminal) and the server 103 as shown in fig. 1. As shown in fig. 1, a server 103 is connected to a terminal 101 through a network, which may be used to provide video processing services for the terminal or a client installed on the terminal, and a database 105 may be provided on the server or separately from the server, and is used to provide data storage services for the server 103, and the network includes but is not limited to: the terminal 101 is not limited to a PC, a mobile phone, a tablet computer, and the like.

The video processing method according to the embodiment of the present application may be executed by the server 103, or may be executed by both the server 103 and the terminal 101. A video processing method according to an embodiment of the present application is performed on a server as an example.

Fig. 2 is a flow chart of an alternative video processing method according to an embodiment of the present application, which may include the following steps, as shown in fig. 2:

step S202, the server acquires a first GOP, wherein the first GOP is an original picture group GOP of a to-be-played segment in a target video requested by the video terminal.

The video terminal is provided with a client (or application) which provides services such as video on demand, video conference, network live broadcast and the like; the clips to be played in the target video are stacked data generated due to network problems, and comprise a plurality of GOPs, and the video processing method is used for performing similar processing on each GOP.

Step S204, the server discards part of the video frames in the first GOP to obtain a second GOP.

Step S206, the server adjusts the time stamp of the video frame in the second GOP according to the playing time length of the first GOP, so that the adjusted playing time length of the second GOP is the same as the playing time length of the first GOP.

The play time length refers to a time length between a time stamp of a first frame and a time stamp of a last frame in a GOP.

In step S208, the server transmits the second GOP to the video terminal.

For example, when a user watches a network live broadcast on a video terminal, data accumulation caused by network jitter occurs, the server processes each GOP in the accumulated data by adopting the video processing method, and sends the processed GOP to the video terminal, so that the pictures are continuous when the user watches the pictures, and the phenomena of blocking, jumping and black screen cannot occur.

Through the steps S202 to S208, the GOP with the discarded part of the video frames is processed, and the timestamps of the rest video frames in the GOP are distributed in the playing time before the frame loss, so that the purpose of making the video frames after the frame loss continuous is achieved, the technical problem of frame jump caused by frame loss when the server data is accumulated in the related technology is solved, and the technical effect of improving the user experience is realized.

In the technical solution provided in step S202, the server acquires a first GOP, which is an original group of pictures GOP of a to-be-played segment in a target video requested by the video terminal.

In the technical solution provided in step S204, the server discards a part of the video frames in the first GOP to obtain a second GOP.

Optionally, the server extracts a predicted frame from all video frames of the first GOP, wherein the predicted frame is a video frame generated with reference to other frames and containing only the difference part for encoding; the server discards part of the predicted frames in the first GOP to obtain a second GOP.

For example, the video frames in the GOP include I frames, B frames, and P frames, and since the I frames are key frames, which are entry points for random access, and are reference frames for decoding, instead of discarding the I frames, predicted frames are extracted from all the video frames of the GOP, and the predicted frames include B frames and P frames, and a part of the B frames or a part of the P frames in the GOP is discarded, thereby obtaining a second GOP. Ways to discard partially predicted frames include, but are not limited to, the following:

(1) discard only B frames: since B frames require decoding with reference to I frames or P frames, B frames cannot be used as reference frames, and when the dropped video frames are B frames, any one or more B frames can be selected to be dropped.

(2) Discarding successive P frames containing the last P frame: since the P frame is a forward predicted frame, which represents the difference from the previous reference frame, and needs to be referred to the previous frame during decoding, if the previous reference frame is discarded, the following P frame cannot be decoded and screen splash occurs, so that when the discarded part of the video frames is P frames, consecutive P frames including the last P frame can be discarded.

(3) Dropping frames uniformly according to some rule: for example, every two frames, one frame is discarded, in this way, video frames can be discarded uniformly, uniform change of pictures is ensured, and the change after frame dropping processing is difficult to be perceived by users when watching, but in this way, part of reference frames may be discarded, so that predicted frames needing to be decoded by referring to the discarded reference frames cannot be decoded, and therefore, additional processing needs to be performed on the discarded reference frames, and data in the discarded reference frames is recovered or superimposed into video frames adjacent to the discarded reference frames, for example, a P1 frame before a P2 frame is discarded, and a P1 frame is a reference frame of a P2 frame, and data in the P1 frame is recovered into a P2 frame, so as to ensure normal decoding of the P2 frame.

As a preferred embodiment, the server discards a plurality of consecutive P frames in the first GOP, resulting in the second GOP, wherein the plurality of consecutive P frames includes the last P frame in the first GOP. By adopting the mode of discarding the P frame, the data in the discarded P frame does not need to be recovered, and the workload of the server is not increased.

For example, the last 1/3P frames in a GOP may be discarded, or the last 10P frames in a GOP may be discarded, and the number of discards may be configured.

In the technical solution provided in step S206, the server adjusts the timestamp of the video frame in the second GOP according to the playing time length of the first GOP, so that the adjusted playing time length of the second GOP is the same as the playing time length of the first GOP.

Optionally, the server determines the playing duration of the first GOP by using the timestamp of the first frame video frame and the timestamp of the last frame video frame of the first GOP; the target time interval is determined according to the following formula: tm is L/(N-1), where Tm represents the target time interval, L represents the play-out time period of the first GOP, and N represents the number of video frames in the second GOP.

For example, if the playing time length of the first GOP is 600 ms, and the number of video frames in the second GOP is 21, the target time interval Tm is 600/(21-1) 30 ms.

Optionally, the server adjusts the timestamp of the ith video frame in the second GOP to Ti according to the following formula: ti-1 × Tm + t1, where t1 denotes a timestamp of the first frame video frame in the second GOP before adjustment, and Tm denotes a target time interval.

For example, if the timestamp of the first video frame in the second GOP before adjustment is T1 and the target time interval is Tm, the timestamp T2 of the 2 nd video frame is (2-1) × Tm + T1.

Optionally, after adjusting the time stamp of the video frame in the second GOP according to the playing duration of the first GOP, in the case that there is an audio frame in the first GOP, the server inserts each frame of audio frame in the first GOP into the second GOP according to the time stamp in the following manner: acquiring a target audio frame from a first GOP, wherein the target audio frame is an audio frame with the earliest time stamp or an audio frame with a time stamp after the audio frame acquired last time in the first GOP; searching a target video frame with the timestamp closest to that of the target audio frame in the second GOP; inserting the target audio frame in front of the target video frame if the timestamp of the target audio frame is earlier than the timestamp of the target video frame; in the event that the timestamp of the target audio frame is later than the timestamp of the target video frame, the target audio frame is inserted after the target video frame.

For example, the first audio frame is the first audio frame in the GOP, and when the timestamp of the first audio frame is t3, the timestamp of the first video frame is t5, and t3< t5, the server inserts the first video frame before the first video frame; the second audio frame is the second audio frame in the GOP, and when the time stamp of the second audio frame is t4, t4< t5, the server inserts the second audio frame before the first video frame and after the first audio frame.

In the technical solution provided in step S208, the server sends the second GOP to the video terminal.

The scheme aims to discard part of video frames when data accumulation occurs at the server side under the condition that a network is not good, and then uniformly distribute the rest video frames in the GOP, so that the aim of reducing the data accumulation is fulfilled, the user experience is not influenced basically, and the playing quality can be ensured. In the related technology, the video frame and the audio frame are directly discarded together to cause video jumping, so that the user experience is influenced, the phenomenon that sound has no picture is caused by directly discarding the video frame, the rest video frames are uniformly distributed in the original playing time by the scheme, the audio frame is reserved to avoid the video jumping, and the phenomenon that the video picture has the black screen when the audio in the GOP is played can be avoided.

As an alternative example, the following describes the technical solution of the present application in combination with the specific embodiments:

fig. 3 is a schematic diagram of a GOP immediately preceding video processing according to an embodiment of the present application, fig. 4 is a schematic diagram of a GOP immediately following video processing according to an embodiment of the present application, I denotes an I frame, P denotes a P frame, and a denotes an audio frame (audio).

Acquiring pile-up data, counting the number of video frames in a GOP, and the GOP time length, and discarding the last 1/3P frames (P frame) in a GOP as shown in FIG. 3₅，P₆And P₇) Then, the remaining video frames are modified by DTS time stamps and are uniformly distributed into the present gop according to the time stamp size, as shown in FIG. 4, and the remaining P frames (P) are distributed₁，P₂，P₃And P₄) Evenly distributed throughout the GOP, with each frame in the GOP ordered by timestamp. Modifying the time stamp of a video frame simply changes the relative positions of the audio and video frames in the GOP, adjusting P as shown in FIGS. 3 and 4₁After the time stamping of the frame, P₁The time interval between a frame and the first I frame increases, as does the number of a (audio frames) contained within the time interval.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.

According to another aspect of the embodiments of the present application, there is also provided a video processing apparatus for implementing the above-described video processing method. Fig. 5 is a schematic diagram of an alternative video processing apparatus according to an embodiment of the present application, and as shown in fig. 5, the apparatus may include:

the obtaining module 52 is configured to obtain a first GOP, where the first GOP is an original group of pictures GOP of a to-be-played segment in a target video requested by a video terminal;

a discarding module 54, configured to discard a portion of the video frames in the first GOP to obtain a second GOP;

the adjusting module 56 is configured to adjust the timestamp of the video frame in the second GOP according to the playing duration of the first GOP, so that the adjusted playing duration of the second GOP is the same as the playing duration of the first GOP;

and a sending module 58, configured to send the second GOP to the video terminal.

It should be noted that the obtaining module 52 in this embodiment may be configured to execute the step S202 in this embodiment, the discarding module 54 in this embodiment may be configured to execute the step S204 in this embodiment, the adjusting module 56 in this embodiment may be configured to execute the step S206 in this embodiment, and the sending module 58 in this embodiment may be configured to execute the step S208 in this embodiment.

It should be noted here that the modules described above are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above embodiments. It should be noted that the modules described above as a part of the apparatus may operate in a hardware environment as shown in fig. 1, and may be implemented by software or hardware.

Through the module, the technical problem of frame loss to cause picture jump during server data accumulation in the related technology can be solved, and the technical effect of improving user experience is further achieved.

Optionally, the discarding module 54 further includes: an extraction unit configured to extract a predicted frame from all video frames of the first GOP, wherein the predicted frame is a video frame that is generated with reference to other frames and contains only difference portion coding; and the discarding unit is used for discarding part of the predicted frames in the first GOP to obtain a second GOP.

As an alternative embodiment, the discarding unit is further configured to discard multiple consecutive P frames in the first GOP to obtain the second GOP, where the multiple consecutive P frames include the last P frame in the first GOP.

Optionally, the adjusting module 56 further includes: the determining unit is used for determining a target time interval according to the playing time length of the first GOP and the number of video frames in the second GOP; and the adjusting unit is used for adjusting the time stamps of the video frames in the second GOP according to the target time interval, wherein the time interval between adjacent video frames in the adjusted second GOP is the target time interval, and the arrangement sequence of all the video frames in the adjusted second GOP is the same as that before the adjustment.

As an alternative embodiment, the determining unit is further configured to: determining the playing time of the first GOP by utilizing the time stamp of the first frame video frame and the time stamp of the last frame video frame of the first GOP; the target time interval is determined according to the following formula: tm is L/(N-1), where Tm represents the target time interval, L represents the play-out time period of the first GOP, and N represents the number of video frames in the second GOP.

As an alternative embodiment, the adjusting unit is further configured to: the timestamp of the ith video frame in the second GOP is adjusted to Ti according to the following formula: ti-1 × Tm + t1, where t1 denotes a timestamp of the first frame video frame in the second GOP before adjustment, and Tm denotes a target time interval.

As an alternative embodiment, the adjusting unit is further configured to: after adjusting the time stamp of the video frame in the second GOP according to the playing time length of the first GOP, under the condition that the audio frame exists in the first GOP, inserting each frame of audio frame in the first GOP into the second GOP according to the time stamp in the following way: acquiring a target audio frame from a first GOP, wherein the target audio frame is an audio frame with the earliest time stamp or an audio frame with a time stamp after the audio frame acquired last time in the first GOP; searching a target video frame with the timestamp closest to that of the target audio frame in the second GOP; inserting the target audio frame in front of the target video frame if the timestamp of the target audio frame is earlier than the timestamp of the target video frame; in the event that the timestamp of the target audio frame is later than the timestamp of the target video frame, the target audio frame is inserted after the target video frame.

It should be noted here that the modules described above are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above embodiments. It should be noted that the modules described above as a part of the apparatus may be operated in a hardware environment as shown in fig. 1, and may be implemented by software, or may be implemented by hardware, where the hardware environment includes a network environment.

According to another aspect of the embodiments of the present application, there is also provided a server or a terminal for implementing the above video processing method.

Fig. 6 is a block diagram of a terminal according to an embodiment of the present application, and as shown in fig. 6, the terminal may include: one or more processors 601 (only one of which is shown in fig. 6), a memory 603, and a transmitting device 605, as shown in fig. 6, the terminal may further include an input-output device 607.

The memory 603 may be used to store software programs and modules, such as program instructions/modules corresponding to the video processing method and apparatus in the embodiment of the present application, and the processor 601 executes various functional applications and data processing by running the software programs and modules stored in the memory 603, that is, implements the video processing method described above. The memory 603 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 603 may further include memory located remotely from the processor 601, which may be connected to the terminal through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The above-mentioned transmission device 605 is used for receiving or sending data via a network, and may also be used for data transmission between a processor and a memory. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 605 includes a Network adapter (NIC) that can be connected to a router via a Network cable and other Network devices to communicate with the internet or a local area Network. In one example, the transmission device 605 is a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

Among them, the memory 603 is used to store an application program, in particular.

The processor 601 may call the application stored in the memory 603 through the transmission device 605 to perform the following steps: acquiring a first GOP (group of pictures), wherein the first GOP is an original GOP of a to-be-played segment in a target video requested by a video terminal; discarding part of video frames in the first GOP to obtain a second GOP; adjusting the time stamp of the video frame in the second GOP according to the playing time length of the first GOP so that the adjusted playing time length of the second GOP is the same as the playing time length of the first GOP; the second GOP is sent to the video terminal.

By adopting the embodiment of the application, a video processing scheme is provided. By processing the GOP after discarding part of the video frames, the timestamps of the rest video frames in the GOP are distributed in the playing time before frame loss, so that the purpose of making the video pictures after frame loss continuous is achieved, the technical problem of picture jump caused by frame loss when server data is accumulated in the related technology is solved, and the technical effect of improving user experience is realized.

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments, and this embodiment is not described herein again.

It can be understood by those skilled in the art that the structure shown in fig. 6 is only an illustration, and the terminal may be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a Mobile Internet Device (MID), a PAD, etc. Fig. 6 is a diagram illustrating a structure of the electronic device. For example, the terminal may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 6, or have a different configuration than shown in FIG. 6.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

Embodiments of the present application also provide a storage medium. Alternatively, in the present embodiment, the storage medium may be a program code for executing the video processing method.

Optionally, in this embodiment, the storage medium may be located on at least one of a plurality of network devices in a network shown in the above embodiment.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: s1, acquiring a first GOP, wherein the first GOP is an original picture group GOP of a to-be-played segment in a target video requested by a video terminal; s2, discarding part of video frames in the first GOP to obtain a second GOP; s3, adjusting the time stamp of the video frame in the second GOP according to the playing time length of the first GOP, so that the adjusted playing time length of the second GOP is the same as the playing time length of the first GOP; s4, the second GOP is sent to the video terminal.

Optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including instructions for causing one or more computer devices (which may be personal computers, servers, network devices, or the like) to execute all or part of the steps of the method described in the embodiments of the present application.

In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A video processing method, comprising:

acquiring a first GOP (group of pictures), wherein the first GOP is an original GOP of a to-be-played segment in a target video requested by a video terminal;

discarding part of video frames in the first GOP to obtain a second GOP;

adjusting the time stamp of the video frame in the second GOP according to the playing time length of the first GOP so that the adjusted playing time length of the second GOP is the same as the playing time length of the first GOP;

and sending the second GOP to the video terminal.

2. The method of claim 1, wherein adjusting the time stamp of the video frames in the second GOP according to the playing time duration of the first GOP so that the adjusted playing time duration of the second GOP is the same as the playing time duration of the first GOP comprises:

determining a target time interval according to the playing duration of the first GOP and the number of video frames in the second GOP;

and adjusting the time stamps of the video frames in the second GOP according to the target time interval, wherein the time interval between adjacent video frames in the adjusted second GOP is the target time interval, and the arrangement sequence of all the video frames in the adjusted second GOP is the same as that before the adjustment.

3. The method of claim 2, wherein determining the target time interval based on the play-out duration of the first GOP and the number of video frames in the second GOP comprises:

determining the playing time length of the first GOP by utilizing the time stamp of the first frame video frame and the time stamp of the last frame video frame of the first GOP;

determining the target time interval according to the following formula: tm is L/(N-1), where Tm represents a target time interval, L represents a play time period of the first GOP, and N represents the number of video frames in the second GOP.

4. The method of claim 2, wherein adjusting the timestamps of the video frames in the second GOP according to the target time interval comprises:

adjusting the timestamp of the ith video frame in the second GOP to Ti according to the following formula: ti ═ i-1 × Tm + t1, where t1 denotes the timestamp of the first frame video frame in the second GOP before adjustment, Tm denotes the target time interval, and i is a positive integer.

5. The method according to any of claims 1 to 4, wherein after adjusting the time stamp of the video frame in the second GOP according to the playing duration of the first GOP, in the case that there is an audio frame in the first GOP, the method further comprises inserting the audio frame of each frame in the first GOP into the second GOP according to the time stamp in the following way:

acquiring a target audio frame from the first GOP, wherein the target audio frame is an audio frame with an earliest time stamp or an audio frame with a time stamp after the audio frame acquired last time in the first GOP;

searching the target video frame with the timestamp closest to the timestamp of the target audio frame in the second GOP;

inserting the target audio frame before the target video frame if the timestamp of the target audio frame is earlier than the timestamp of the target video frame;

inserting the target audio frame after the target video frame if the timestamp of the target audio frame is later than the timestamp of the target video frame.

6. The method according to any of claims 1 to 4, wherein discarding a portion of the video frames in the first GOP to obtain a second GOP comprises:

extracting a predicted frame from all video frames of the first GOP, wherein the predicted frame is a video frame which is generated by referring to other frames and only contains difference part coding;

and discarding part of the predicted frames in the first GOP to obtain a second GOP.

7. The method of claim 6, wherein discarding the partially predicted frame in the first GOP to obtain a second GOP comprises:

and discarding a plurality of continuous P frames in the first GOP to obtain a second GOP, wherein the plurality of continuous P frames comprise the last P frame in the first GOP.

8. A video processing apparatus, comprising:

the device comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a first GOP (group of pictures), and the first GOP is an original GOP of a segment to be played in a target video requested by a video terminal;

the discarding module is used for discarding part of video frames in the first GOP to obtain a second GOP;

the adjusting module is used for adjusting the time stamp of the video frame in the second GOP according to the playing time length of the first GOP so that the adjusted playing time length of the second GOP is the same as the playing time length of the first GOP;

and the sending module is used for sending the second GOP to the video terminal.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the method of any of the preceding claims 1 to 7 by means of the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the video processing method according to any one of claims 1 to 7.