CN115623155A

CN115623155A - Video data processing method, video data processing apparatus, and storage medium

Info

Publication number: CN115623155A
Application number: CN202110786513.XA
Authority: CN
Inventors: 奚驰; 李斌; 罗程
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-07-12
Filing date: 2021-07-12
Publication date: 2023-01-17

Abstract

The embodiment of the application provides a video data processing method, a device and a storage medium, wherein the method comprises the following steps: acquiring video data to be transmitted and acquiring a target coding rate; coding the video data according to the target coding rate to obtain at least one picture group; acquiring the importance degree of each frame in the picture group; allocating redundancy rates to the frames in the picture group according to the target coding rate and the importance degree of each frame in the picture group, wherein the importance degree of each frame and the corresponding redundancy rate of each frame form a positive correlation; and carrying out forward error correction coding on each frame in the picture group according to the redundancy rate of each frame. According to the method and the device, the redundancy rate is distributed for each frame according to the importance degree of each frame in the picture group, the frames with different importance degrees are prevented from using the same redundancy rate, and therefore the Canton time length ratio and the high Canton interval ratio are reduced.

Description

Video data processing method, video data processing apparatus, and storage medium

Technical Field

The present application relates to video data processing technologies, and in particular, to a video data processing method, a video data processing apparatus, and a storage medium.

Background

In order to solve the problem that packet loss is likely to occur in a video conference under the condition of low network bandwidth, a related video conference system usually adopts a technical means of Forward Error Correction (FEC) redundancy coding and network retransmission, however, excessive FEC redundancy and retransmission further increase network load, so that the video conference under the condition of low network bandwidth is more jammed.

Disclosure of Invention

The embodiment of the application provides a video data processing method, a video data processing device and a storage medium, so that the problem of video blockage of a video conference in a weak network environment is solved at least to a certain extent, and the video watching fluency is improved.

Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.

According to an aspect of an embodiment of the present application, there is provided a video data processing method, including:

acquiring video data to be transmitted and acquiring a target coding rate;

coding the video data according to the target coding rate to obtain at least one picture group;

acquiring the importance degree of each frame in the picture group;

allocating redundancy rates to the frames in the picture group according to the target coding rate and the importance degree of each frame in the picture group, wherein the importance degree of each frame and the corresponding redundancy rate of each frame form a positive correlation;

and carrying out forward error correction coding on each frame in the picture group according to the redundancy rate of each frame.

According to an aspect of an embodiment of the present application, there is provided a video data processing apparatus, including:

the device comprises an acquisition unit, a coding unit and a decoding unit, wherein the acquisition unit is used for acquiring video data to be transmitted and acquiring a target coding rate;

a video encoding unit, configured to encode the video data to obtain at least one group of pictures;

a frame importance acquiring unit, configured to acquire importance degrees of frames in the group of pictures;

a redundancy rate allocation unit, configured to allocate a redundancy rate to each frame in the group of pictures according to the target coding rate and the importance degree of each frame in the group of pictures, where the importance degree of each frame has a positive correlation with the redundancy rate corresponding to each frame;

and the error correction coding unit is used for carrying out forward error correction coding on each frame in the picture group according to the redundancy rate of each frame.

In some embodiments of the present application, each group of pictures comprises an intra-coded picture frame I frame and at least one forward predictive coded picture frame P frame located after the I frame, based on the aforementioned scheme.

The frame importance acquiring unit is configured to:

a distance obtaining subunit, configured to obtain a distance from each P frame in the group of pictures to the I frame;

and the importance determining subunit is used for determining the importance degree of each frame in the picture group according to the distance from each P frame in the picture group to the I frame, wherein the importance degree of each P frame in the picture group is in a negative correlation relation with the distance from each P frame to the I frame.

In some embodiments of the present application, based on the foregoing scheme, the redundancy rate allocating unit is configured to: determining the tolerance of decreasing redundancy rate according to the target coding rate and the length of the picture group; and allocating redundancy rates to the frames in the picture group based on the tolerance, wherein the redundancy rates of the frames in the picture group are decreased according to the equal difference of the distances between the frames and the I frame.

In some embodiments of the present application, based on the foregoing scheme, the formula of the redundancy rate of the ith frame in the group of pictures is represented as:

wherein, p represents the average redundancy rate of the picture group, and the average redundancy rate is calculated according to the overall redundancy code rate of the picture group and the target coding code rate; l represents the number of frames in the group of pictures; i is a natural number, and i is more than or equal to 1 and less than or equal to L.

In some embodiments of the present application, based on the foregoing scheme, the video encoding unit is configured to: sequentially numbering the frames according to the sending sequence of the frames in the picture group; the first frame in each group of pictures is encoded as an I-frame and the frames other than the first frame are encoded as P-frames.

The distance acquisition subunit is configured to: and taking the difference of the numbers between each P frame and the I frame in the picture group as the distance of each P frame in the picture group from the I frame.

In some embodiments of the present application, based on the foregoing scheme, the obtaining unit is configured to: acquiring the bandwidth of a video data transmission link according to the packet loss rate of the video data transmission link and the transmission delay of the video data transmission link; and adjusting the coding rate according to the video data transmission link bandwidth to obtain a first target coding rate, wherein the video data transmission link bandwidth and the first target coding rate have positive correlation.

In some embodiments of the present application, based on the foregoing scheme, the obtaining unit is configured to: receiving a coding rate control instruction sent by a server, wherein the coding rate control instruction is generated by the server according to downlink bandwidths between the server and a plurality of video data receiving ends; and adjusting the coding rate according to the coding rate control instruction to obtain a second target coding rate.

In some embodiments of the present application, based on the foregoing scheme, when a ratio of the coding rate control instruction at the low bandwidth receiving end is greater than a preset threshold, the coding rate control instruction is used to indicate that the coding rate is reduced; the low bandwidth receiving end represents a video data receiving end of which the downlink bandwidth is lower than a preset bandwidth threshold; and when the ratio of the coding rate control instruction at the low-bandwidth receiving end is less than or equal to the preset threshold, the coding rate control instruction is used for indicating that the coding rate is kept or improved.

According to an aspect of embodiments herein, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the video data processing method described in the above embodiments.

According to an aspect of embodiments of the present application, there is provided a computer-readable medium on which a computer program is stored, which, when executed by a processor, implements a video data processing method as described in the above embodiments.

According to an aspect of an embodiment of the present application, there is provided an electronic device including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the video data processing method as in the above embodiments.

In the technical solutions provided in some embodiments of the present application, redundancy rates are allocated to frames according to importance levels of the frames in a picture group, a frame with a high importance level affects decoding of more subsequent frames, a frame with a high importance level affects decoding of the subsequent frames, a frame with a low importance level affects decoding of the subsequent frames more, and a lower redundancy rate is allocated to the subsequent frames, so that frames with different importance levels are prevented from using the same redundancy rate, network load is reduced, and a stuck-time duration ratio and a high stuck-interval duty ratio are reduced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present application can be applied.

Fig. 2 shows a schematic diagram of the placement of a video encoding device and a video decoding device in a streaming environment.

Fig. 3 shows a flow chart of a video data processing method according to an embodiment of the application.

Fig. 4 shows a diagram of a group of pictures frame structure according to an embodiment of the present application.

FIG. 5 shows a flowchart of a method of implementing step 330 according to one embodiment of the present application.

Fig. 6 shows a flowchart of a method of implementing step 340 according to one embodiment of the present application.

Fig. 7 shows a flow chart of a video data processing method according to an embodiment of the application.

FIG. 8 shows a flowchart of a method of implementing step 320, according to one embodiment of the present application.

FIG. 9 shows a flowchart of a method of implementing step 320, according to one embodiment of the present application.

Figure 10 illustrates a video conferencing system architecture diagram according to one embodiment of the present application.

Fig. 11 shows a block diagram of a video data processing apparatus according to an embodiment of the present application.

FIG. 12 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

It should be noted that: reference herein to "a plurality" means two or more. "and/or" describe the association relationship of the associated objects, meaning that there may be three relationships, e.g., A and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

It is noted that the terms first, second and the like in the description and claims of the present application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the objects so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in other sequences than those illustrated or described herein.

As shown in fig. 1, the system architecture 100 includes a plurality of end devices that may communicate with each other over, for example, a network 150. For example, the system architecture 100 may include a first end device 110 and a second end device 120 interconnected by a network 150. In the embodiment of fig. 1, the first terminal device 110 and the second terminal device 120 perform unidirectional data transmission.

For example, first terminal device 110 may encode video data (e.g., a stream of video pictures captured by terminal device 110) for transmission over network 150 to second terminal device 120, the encoded video data being transmitted as one or more encoded video streams, second terminal device 120 may receive the encoded video data from network 150, decode the encoded video data to recover the video data, and display the video pictures according to the recovered video data.

In one embodiment of the present application, the system architecture 100 may include a third end device 130 and a fourth end device 140 that perform bi-directional transmission of encoded video data, such as may occur during a video conference. For bi-directional data transmission, each of third end device 130 and fourth end device 140 may encode video data (e.g., a stream of video pictures captured by the end device) for transmission over network 150 to the other of third end device 130 and fourth end device 140. Each of the third terminal device 130 and the fourth terminal device 140 may also receive encoded video data transmitted by the other of the third terminal device 130 and the fourth terminal device 140, and may decode the encoded video data to recover the video data, and may display a video picture on an accessible display device according to the recovered video data.

In the embodiment of fig. 1, the first terminal device 110, the second terminal device 120, the third terminal device 130, and the fourth terminal device 140 may be a server, a personal computer, and a smart phone, but the principles disclosed herein may not be limited thereto. Embodiments disclosed herein are applicable to laptop computers, tablet computers, media players, and/or dedicated video conferencing equipment. Network 150 represents any number of networks that communicate encoded video data between first terminal device 110, second terminal device 120, third terminal device 130, and fourth terminal device 140, including, for example, wired and/or wireless communication networks. The communication network 150 may exchange data in circuit-switched and/or packet-switched channels. The network may include a telecommunications network, a local area network, a wide area network, and/or the internet. For purposes of this application, the architecture and topology of the network 150 may be immaterial to the operation of the present disclosure, unless explained below.

In one embodiment of the present application, fig. 2 illustrates the placement of video encoding devices and video decoding devices in a streaming environment. The subject matter disclosed herein is equally applicable to other video-enabled applications including, for example, video conferencing, digital TV (television), storing compressed video on digital media including CDs, DVDs, memory sticks, and the like.

The streaming system may include an acquisition subsystem 213, and the acquisition subsystem 213 may include a video source 201, such as a digital camera, that creates an uncompressed video picture stream 202. In an embodiment, the video picture stream 202 includes samples taken by a digital camera. The video picture stream 202 is depicted as a thick line to emphasize a high data amount video picture stream compared to the encoded video data 204 (or the encoded video codestream 204), the video picture stream 202 can be processed by an electronic device 220, the electronic device 220 comprising a video encoding device 203 coupled to a video source 201. The video encoding device 203 may comprise hardware, software, or a combination of hardware and software to implement or embody aspects of the disclosed subject matter as described in greater detail below. The encoded video data 204 (or encoded video codestream 204) is depicted as a thin line to emphasize the lower data amount of the encoded video data 204 (or encoded video codestream 204) as compared to the video picture stream 202, which may be stored on the streaming server 205 for future use. One or more streaming client subsystems, such as client subsystem 206 and client subsystem 208 in fig. 2, may access streaming server 205 to retrieve

copies

207 and 209 of encoded video data 204. Client subsystem 206 may include, for example, video decoding device 210 in electronic device 230. Video decoding device 210 decodes incoming copies 207 of the encoded video data and generates an output video picture stream 211 that may be presented on a display 212 (e.g., a display screen) or another presentation device. In some streaming systems, encoded video data 204, video data 207, and video data 209 (e.g., video streams) may be encoded according to certain video encoding/compression standards. Examples of such standards include ITU-T H.265. In an embodiment, the Video Coding standard under development is informally referred to as next generation Video Coding (VVC), and the present application may be used in the context of the VVC standard.

It should be noted that

electronic devices

220 and 230 may include other components not shown in the figures. For example, electronic device 220 may comprise a video decoding device, and electronic device 230 may also comprise a video encoding device.

In one embodiment of the present application, a video encoding apparatus compression-encodes an original video stream to generate a Group of Pictures GOP (Group of Pictures), which is a Group of continuous Pictures including one I-frame and several P-frames, and which is a basic unit accessed by a video image encoder and decoder. An I frame is an intra-coded frame (also called a key frame), is a complete picture, and can be independently decoded; p-frames are forward predicted frames (also referred to as forward reference frames) and are recorded as changes relative to an I-frame or previous frame, depending on the I-frame or forward frame for decoding.

It should be noted that the video encoding apparatus needs to consider the encoding rate, i.e. the bit rate or the sampling rate, in kbps during the compression encoding process. And the video with the coding rate exceeding 1024kbps is called ultra-clear video. The definition of the video is closely related to the coding rate, and the higher the coding rate is, which means that the higher the sampling rate in unit time is, the closer the file obtained by coding is to the original file is, and the higher the video picture quality is, however, the higher the coding rate is, the heavier the network load is, and the higher the possibility of network congestion is.

Further, in order to cope with packet loss, the video encoding apparatus performs redundancy encoding on each frame after compression encoding by forward error correction FEC and sequentially transmits each frame after redundancy encoding to the video receiving end.

By using FEC coding, n redundant packets are added to a frame in a code stream after compression coding, and then the frame can resist packet loss of n video packets, so the number of the added redundant packets determines the strength of the packet loss resistance, i.e., the error correction capability, and the larger the redundancy rate is, the stronger the error correction capability is, but the increase of the redundancy rate further increases the network load in a low-bandwidth network environment, resulting in video blocking.

The higher the decoding influence degree of the subsequent frames is, the higher the importance of the frames is, the higher the redundancy rate is allocated to the frames according to the importance of the frames, the higher the redundancy rate is allocated to the frames with the higher importance degree, the lower the redundancy rate is allocated to the frames with the lower importance degree, and the network burden is reduced while the error correction performance is improved.

In the following, details of implementation of the technical solution of the embodiment of the present application are explained in detail from the perspective of a video sending end.

It should be noted that the video data processing method provided in the embodiment of the present application may be executed by a terminal device, and accordingly, the video data processing apparatus is generally disposed in the terminal device. However, in other embodiments of the present application, the server may also have similar functions as the terminal device, so as to execute the video data processing scheme provided by the embodiments of the present application.

It should be further noted that, according to implementation needs, the server may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server that provides basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN (Content Delivery Network), and a big data and artificial intelligence platform. The terminal may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like, but is not limited thereto, and the application is not limited thereto.

It should be explained that cloud computing (cloud computing) as above is a computing model that distributes computing tasks over a resource pool of a large number of computers, enabling various application systems to obtain computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the cloud can be infinitely expanded to users, and can be acquired at any time, used as required and expanded at any time. The cloud computing resource pool mainly comprises computing equipment (which is a virtualization machine and comprises an operating system), storage equipment and network equipment.

Fig. 3 shows a flow chart of a video data processing method according to an embodiment of the present application, as shown in fig. 2, the method comprising the following steps.

Step 310: and acquiring video data to be transmitted and acquiring a target coding rate.

In the embodiment of the application, the camera device of the video sending end collects the original video stream to obtain the video data to be sent. In order to avoid network congestion and guarantee video picture quality, when video data is encoded, an encoding rate suitable for the current network environment, that is, a target encoding rate, needs to be used. The video sending end can obtain the target coding rate through a preset strategy, or obtain the target coding rate issued by a server of the video conference system.

Step 320: and coding the video data according to the target coding rate to obtain at least one picture group.

After the encoder of the video sending end acquires the target coding rate, the encoder combines the target coding rate to compress and encode the video data to be sent, and a plurality of picture groups are generated.

Step 330: the importance degree of each frame in the picture group is obtained.

The frames in the group of pictures are divided into I-frames and P-frames, the I-frames are key frames of the entire group of pictures, which affect the decoding of all subsequent P-frames, and the importance degree is highest. P-frames are forward predictive coded image frames that affect the decoding of frames following them, with the degree of importance depending on the extent to which they are affected.

Step 340: and allocating redundancy rates to the frames in the picture group according to the target coding rate and the importance degree of each frame in the picture group, wherein the importance degree of each frame and the corresponding redundancy rate of each frame form a positive correlation relationship.

The redundancy rate is the ratio of the redundancy code rate to the target coding code rate, and the larger the redundancy rate of the frame is, the more redundant packets of the frame are, and the stronger the error correction capability is.

The higher the importance of the frame is, the more frames are affected by the frame during decoding, so that the error correction capability of the frame needs to be improved, a higher redundancy rate is allocated to the frame, the risk of network packet loss is better resisted, and the error correction capability is improved.

Step 350: and carrying out forward error correction coding on each frame in the picture group according to the redundancy rate of each frame.

After the redundancy rate corresponding to each frame is obtained, the video sending end performs FEC coding on each frame.

According to the method and the device, redundancy rates are distributed to the frames according to the importance degrees of the frames, a higher redundancy rate is distributed to the frames with high importance degrees, a lower redundancy rate is distributed to the frames with low importance degrees, the error correction performance of the frames is improved, meanwhile, network load is reduced, and the video communication quality is improved.

Fig. 4 is a diagram illustrating a frame structure of groups of pictures according to an embodiment of the present application, as shown in fig. 4, each group of pictures includes an intra-coded picture frame I frame and at least one forward predictive coded picture frame P frame located after the I frame.

In a real-time video conference application scenario, in order to enable each frame to be decoded by only relying on its preamble frame, a group of pictures only includes I-frames and P-frames. When a video sending end sends each frame in a picture group, I frames are sent out firstly, and P frames are sent out sequentially. In addition, in other embodiments of the present application, at least one B frame may be included in a group of pictures.

Based on the group of pictures frame structure shown in fig. 4, fig. 5 shows a flow chart of a method for implementing step 330 according to an embodiment of the present application, the method comprising the following steps.

Step 510: the distance of each P frame in the group of pictures from the I frame is obtained.

Step 520: and determining the importance degree of each frame in the picture group according to the distance between each P frame in the picture group and the I frame, wherein the importance degree of each P frame in the picture group and the distance between each P frame and the I frame form a negative correlation relationship.

In the embodiment of the present application, the importance degree of the I frame is the highest, and the importance degree of the P frame may be determined according to the distance from the P frame to the I frame. The closer the P frame is to the I frame, the higher the importance degree is, and the higher the redundancy rate is; the lower the importance of the P-frame the farther from the I-frame, the lower the redundancy rate.

Fig. 6 shows a flowchart of a method of implementing step 340, including the following steps, according to one embodiment of the present application.

Step 610: and determining the tolerance of the decreasing redundancy rate according to the target coding rate and the length of the picture group.

The redundancy rate of each frame in the picture group is designed to be an arithmetic sequence, and the tolerance of the arithmetic sequence is determined according to the target coding rate, the length of the picture group, namely the number of frames in the picture group, and the average redundancy rate of the picture group.

Step 620: and allocating redundancy rates to the frames in the picture group based on the tolerance, wherein the redundancy rates of the frames in the picture group are decreased according to the equal difference of the distance between each frame and the I frame.

The redundancy rate of each frame is the ratio of the redundancy code rate of the frame to the target coding code rate, and in the embodiment of the application, the tolerance of the redundancy rate is

The formula of the redundancy rate of the ith frame in the group of pictures is expressed as:

wherein, p represents the average redundancy rate of the picture group, the average redundancy rate is calculated according to the overall redundancy code rate and the target coding code rate of the picture group, and is specifically the ratio of the redundancy code rate and the target coding code rate of the picture group; l represents the length of a frame in a group of pictures; i is a natural number, and i is more than or equal to 1 and less than or equal to L.

Fig. 7 shows a flow chart of a video data processing method according to an embodiment of the present application, as shown in fig. 7, comprising the following steps.

Steps 710-720 are the same as steps 310-320 and will not be described further.

Step 730: and numbering the frames according to the sending sequence of the frames in the picture group.

Step 740: the first frame in each group of pictures is encoded as an I-frame and the frames other than the first frame are encoded as P-frames.

In a specific application, a frame number field may be used in a protocol header of a group of pictures to indicate a frame number of each frame (which may be incremented from 1), and a frame type field may be used to indicate a type of each frame (I-frame or P-frame).

Therefore, the difference between the numbers of the P frames and the I frames in the group of pictures can be obtained through the frame number field and the frame type field, so as to obtain the distance between each P frame in the group of pictures and the I frame, thereby obtaining the importance degree of each frame.

Steps 750-770 are the same as steps 330-350 and will not be described further herein.

Fig. 8 shows a flowchart of a method of implementing step 320 according to an embodiment of the present application, which includes the following steps, as shown in fig. 8.

Step 810: and acquiring the bandwidth of the video data transmission link according to the packet loss rate of the video data transmission link and the transmission delay of the video data transmission link.

The packet loss rate and the transmission delay directly reflect the network congestion condition, so the video sending end can detect the bandwidth of a video data sending link by adopting a bandwidth estimation model based on the packet loss and the delay, namely the video sending end carries out uplink bandwidth estimation.

In a specific application, the bandwidth estimation module at the video transmitting end may use Google Congestion Control algorithm GCC (Google Congestion Control) on WebRTC or Trendline algorithm to detect the bandwidth of the video data transmitting link. The GCC or Trendline algorithm receives an RTCP (Real-time Control Protocol) RR (Reception Report) message sent by a server, and obtains an uplink network congestion state according to packet loss rate and transmission delay information carried in a Report Block in the RTCP RR, thereby estimating a video data transmission link bandwidth.

Step 820: and adjusting the coding rate according to the bandwidth of the video data transmission link to obtain a first target coding rate, wherein the bandwidth of the video data transmission link and the first target coding rate form a positive correlation relationship.

If the video sending end detects that the bandwidth of the video data sending link is limited, namely the bandwidth of the video data sending link is lower than the currently used coding rate of the video sending end, the coding rate is adjusted in a self-adaptive mode, and the first target coding rate is lower than the currently used coding rate.

If the video sending end detects that the bandwidth of the video data sending link is abundant, the currently used coding rate can be kept or improved, so that the first target coding rate is not lower than the currently used coding rate.

In the embodiment of the application, the video sending end adaptively adjusts the coding rate by detecting the bandwidth of the sending link, so that network congestion can be avoided and the network communication quality can be improved in a network environment with low network bandwidth.

Fig. 9 shows a flowchart of a method of implementing step 320 according to an embodiment of the present application, which includes the following steps, as shown in fig. 9.

Step 910: and receiving a coding rate control instruction sent by the server, wherein the coding rate control instruction is generated by the server according to the downlink bandwidth between the server and the video data receiving ends.

In the embodiment of the application, the server side of the video conference detects the downlink bandwidths of the multiple video receiving ends, and adjusts the coding rate of the video sending end according to the downlink bandwidths of the multiple video receiving ends.

Step 920: and adjusting the coding rate according to the coding rate control instruction to obtain a second target coding rate.

The video sending end adjusts the currently used coding rate according to the coding rate control instruction sent by the server end, and the second target coding rate is obtained through calculation according to the information in the control instruction.

It should be noted that the coding rate control command may also directly include the second target coding rate.

In specific implementation, the server side counts downlink bandwidths of a plurality of video receiving ends, and when the ratio of the low-bandwidth receiving ends is greater than a preset threshold, a coding rate control instruction is used for indicating the video sending end to reduce the coding rate; and when the ratio of the low-bandwidth receiving end is less than or equal to a preset threshold value, the coding rate control instruction is used for indicating the video transmitting end to keep or improve the coding rate. The low bandwidth receiving end represents a video data receiving end with the downlink bandwidth lower than a preset bandwidth threshold value, and belongs to a poor quality user.

According to the embodiment of the application, the server side detects the downlink bandwidth of the video receiving end and adjusts the coding rate of the video sending end according to the downlink bandwidth of the video receiving end, so that network congestion is avoided, and the video communication quality is improved.

Fig. 10 is a diagram illustrating an architecture of a video conference system according to an embodiment of the present application, and as shown in fig. 10, the video conference system includes a video transmitting end 1010, a service end 1030, and a plurality of video receiving ends 1020.

The video sender 1010 each includes a first streaming SDK1011 and a first media engine 1012.

As shown in the figure, the video transmitting end 1010 of the video transmitting end adaptively adjusts the coding rate as follows:

1. a flow control SDK (Software Development Kit) 1011 performs uplink bandwidth estimation according to the uplink packet loss rate and the transmission delay to adjust the coding rate currently used by the first media engine 1012, so as to obtain a first target coding rate.

2. The first media engine 1012 encodes the video data to be transmitted according to the first target encoding rate.

Each video sink 1020 includes a second streaming SDK1021 and a second media engine 1022.

The server side comprises: a sender server 1031, a receiver server 1032, a QOS (Quality of Service) server 1033, and an OSS (Operation Support System) server 1034.

As shown, the server 1030 adjusting the coding rate of the video receiver 1010 according to the downlink bandwidths of the video receivers 1020 includes the following steps.

3. The second flow control SDKs respectively send the packet loss rate and the transmission delay of the downlink of the corresponding video receiving end to the receiving end server 1032 through signaling.

4. The receiver server 1032 estimates a downlink bandwidth of each video receiver according to the packet loss rate and the transmission delay of the downlink of each video receiver 1020, and transmits the downlink bandwidths of the plurality of video receivers 1020 to the sender server 1031.

In a specific implementation, the sink server 1032 may estimate a downlink bandwidth of the video sink by using a GCC or Trendline algorithm. The GCC or Trendline algorithm estimates the network delay according to the packet loss rate and the transmission delay of the downlink, then determines the congestion status of the current network, further estimates the bandwidth of the downlink, and finally sends the downlink bandwidth, i.e., the Maximum bandwidth capability, of the multiple video receiving ends to the sending-end server 1031 through RTCP REMB (Receiver Estimated Maximum received code rate) messages.

5. The sender server 1031 sends the downlink bandwidths of the multiple video receivers 1020 to the QOS server 1033.

6. The QOS server 1033 sends an encoding rate control instruction to the sending end server 1031 according to the preset policy issued by the OSS server 1034 and the downlink bandwidths of the multiple video receiving ends, where the encoding rate control instruction includes the second target encoding rate.

In specific implementation, the preset policy issued by the preset policy may be to control the video sending end to reduce the coding rate when the occupancy of the low bandwidth receiving end is greater than a preset threshold; and when the ratio of the low-bandwidth receiving end is less than or equal to a preset threshold value, controlling the video transmitting end to keep or improve the coding code rate. The low bandwidth receiving end indicates a video data receiving end with the downlink bandwidth lower than the preset bandwidth threshold.

It should be noted that the preset policy obtained by the QOS server may also be obtained from a blockchain, and specifically may be obtained from an intelligent contract of the blockchain. For example, the OSS server may generate a preset policy, send the policy to the blockchain network, and write the policy into the blockchain after reaching consensus in the blockchain network.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a string of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, which is used for verifying the validity (anti-counterfeiting) of the information and generating a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.

The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises the steps of maintaining public and private key generation (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorized condition, supervising and auditing the transaction condition of some real identities, and providing rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the service request to storage after consensus on the valid request is completed, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the service information to a shared account (network communication) completely and consistently after encryption, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering and executing according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of upgrading and canceling the contracts; the operation monitoring module is mainly responsible for deployment, configuration modification, contract setting, cloud adaptation in the product release process and visual output of real-time states in product operation, such as: alarm, monitoring network conditions, monitoring node equipment health status, and the like.

The platform product service layer provides basic capability and an implementation framework of typical application, and developers can complete block chain implementation of business logic based on the basic capability and the characteristics of the superposed business. The application service layer provides the application service based on the block chain scheme for the business participants to use.

7. The sending-end server 1031 sends the coding rate control instruction to the first flow control SDK1011.

8. The first streaming SDK1011 controls the first media engine 1012 to adjust the coding rate.

In the technical solution of the embodiment of the present application, redundancy rates are allocated to frames according to importance levels of the frames in a group of pictures, so that frames with different importance levels are prevented from using the same redundancy rate, and video communication quality is improved, thereby reducing a stuck-time duration ratio and a high stuck-interval ratio.

It should be noted that, in the embodiment of the present application, each server may be flexibly deployed in the video conference system. For example, sender server 1031, receiver server 1032, QOS server 1033, and OSS server 1034 may be deployed in the same server, which integrates the functions of the four; or the sender server 1031 and the receiver server 1032 may be deployed in the video sender 1010 at the same time, and the video sender 1010 integrates the functions of the two.

Embodiments of the apparatus of the present application are described below, which may be used to perform the methods of video data processing in the above-described embodiments of the present application. For details that are not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the video data processing method described above in the present application.

Fig. 11 shows a block diagram of a video data processing apparatus according to an embodiment of the present application. As shown in fig. 11, the video data processing apparatus 1100 includes the following sections.

An obtaining unit 1110, configured to obtain video data to be sent and obtain a target coding rate;

a video encoding unit 1120, configured to encode video data to obtain at least one group of pictures;

a frame importance acquiring unit 1130 configured to acquire importance levels of frames in a group of pictures;

a redundancy rate allocating unit 1140, configured to allocate a redundancy rate to each frame in the group of pictures according to the target coding rate and the importance degree of each frame in the group of pictures, where the importance degree of each frame has a positive correlation with the redundancy rate corresponding to each frame;

an error correction coding unit 1150 is configured to perform forward error correction coding on each frame in the group of pictures according to the redundancy rate of each frame.

In some embodiments of the present application, each group of pictures comprises an intra-coded picture frame I frame and at least one forward predictive coded picture frame P frame following the I frame, based on the foregoing scheme.

The frame importance acquiring unit is configured to:

a distance acquisition subunit, configured to acquire a distance from each P frame in the group of pictures to the I frame;

and the importance determining subunit is used for determining the importance degree of each frame in the picture group according to the distance from each P frame in the picture group to the I frame, and the importance degree of each P frame in the picture group and the distance from each P frame to the I frame form a negative correlation relationship.

In some embodiments of the present application, based on the foregoing scheme, the redundancy rate allocation unit is configured to: determining the tolerance of decreasing redundancy rate according to the target coding rate and the length of the picture group; and allocating redundancy rates to the frames in the picture group based on the tolerance, wherein the redundancy rates of the frames in the picture group are decreased according to the equal difference of the distance between each frame and the I frame.

In some embodiments of the present application, based on the foregoing scheme, the formula of the redundancy rate of the ith frame in the group of pictures is expressed as:

wherein, p represents the average redundancy rate of the picture group, and the average redundancy rate is calculated according to the integral redundancy code rate and the target coding code rate of the picture group; l represents the number of frames in a group of pictures; i is a natural number, and i is more than or equal to 1 and less than or equal to L.

In some embodiments of the present application, based on the foregoing scheme, the video encoding unit is configured to: numbering the frames according to the sending sequence of the frames in the picture group; the first frame in each group of pictures is encoded as an I-frame and the frames other than the first frame are encoded as P-frames.

The distance acquisition subunit is configured to: the difference in the numbers between each P frame and the I frame in the group of pictures is taken as the distance of each P frame in the group of pictures from the I frame.

In some embodiments of the present application, based on the foregoing solution, the obtaining unit is configured to: acquiring the bandwidth of a video data transmission link according to the packet loss rate of the video data transmission link and the transmission delay of the video data transmission link; and adjusting the coding rate according to the bandwidth of the video data transmission link to obtain a first target coding rate, wherein the bandwidth of the video data transmission link and the first target coding rate form a positive correlation relationship.

In some embodiments of the present application, based on the foregoing solution, the obtaining unit is configured to: receiving a coding rate control instruction sent by a server, wherein the coding rate control instruction is generated by the server according to downlink bandwidths between the server and a plurality of video data receiving ends; and adjusting the coding rate according to the coding rate control instruction to obtain a second target coding rate.

In some embodiments of the present application, based on the foregoing scheme, when a ratio of the coding rate control instruction at the low bandwidth receiving end is greater than a preset threshold, the coding rate control instruction is used to indicate that the coding rate is reduced; the low bandwidth receiving end represents a video data receiving end of which the downlink bandwidth is lower than a preset bandwidth threshold; and when the ratio of the coding rate control instruction at the low-bandwidth receiving end is less than or equal to a preset threshold value, the coding rate control instruction is used for indicating that the coding rate is kept or improved.

FIG. 12 illustrates a schematic structural diagram of a computer system suitable for use to implement the electronic device of the embodiments of the subject application.

It should be noted that the computer system 1200 of the electronic device shown in fig. 12 is only an example, and should not bring any limitation to the functions and the application scope of the embodiments of the present application.

As shown in fig. 12, the computer system 1200 includes a Central Processing Unit (CPU) 1201, which can perform various appropriate actions and processes, such as performing the methods described in the above embodiments, according to a program stored in a Read-Only Memory (ROM) 1202 or a program loaded from a storage section 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data necessary for system operation are also stored. The CPU 1201, ROM 1202, and RAM 1203 are connected to each other by a bus 1204. An Input/Output (I/O) interface 1205 is also connected to bus 1204.

The following components are connected to the I/O interface 1205: an input section 1206 including a keyboard, a mouse, and the like; an output section 1207 including a Display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 1208 including a hard disk and the like; and a communication section 1209 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 1209 performs communication processing via a network such as the internet. A driver 1210 is also connected to the I/O interface 1205 as needed. A removable medium 1211, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 1210 as necessary, so that a computer program read out therefrom is mounted into the storage section 1208 as necessary.

In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication portion 1209 and/or installed from the removable medium 1211. The computer program executes various functions defined in the system of the present application when executed by a Central Processing Unit (CPU) 1201.

It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present application also provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the video data processing method described in the above embodiments.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer-readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement the video data processing method described in the above embodiments.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method of video data processing, the method comprising:

acquiring video data to be transmitted and acquiring a target coding rate;

acquiring the importance degree of each frame in the picture group;

2. The method of claim 1, wherein each group of pictures comprises an intra coded picture frame I frame and at least one forward predictive coded picture frame P frame following said I frame;

the obtaining the importance degree of each frame in the picture group comprises:

acquiring the distance between each P frame in the picture group and the I frame;

and determining the importance degree of each frame in the picture group according to the distance between each P frame in the picture group and the I frame, wherein the importance degree of each P frame in the picture group is in a negative correlation relation with the distance between each P frame and the I frame.

3. The method of claim 2, wherein the allocating redundancy rates for the frames in the group of pictures according to the target coding rate and the importance of the frames in the group of pictures comprises:

determining the tolerance of decreasing redundancy rate according to the target coding rate and the length of the picture group;

and allocating redundancy rates to the frames in the picture group based on the tolerance, wherein the redundancy rates of the frames in the picture group are decreased according to the equal difference of the distances between the frames and the I frame.

4. The method according to claim 3, wherein the formula of the redundancy rate of the ith frame in the group of pictures is represented as:

wherein, p represents the average redundancy rate of the picture group, and the average redundancy rate is calculated according to the overall redundancy code rate of the picture group and the target coding code rate; l represents the length of the group of pictures; i is a natural number, and i is more than or equal to 1 and less than or equal to L.

5. The method of claim 2, wherein after the video data is encoded according to the target coding rate to obtain at least one group of pictures, the method further comprises:

sequentially numbering the frames according to the sending sequence of the frames in the picture group;

coding the first frame in each picture group into an I frame, and coding other frames except the first frame into P frames;

the obtaining the distance from each P frame in the picture group to the I frame includes:

and taking the difference of the numbers between each P frame in the picture group and the I frame as the distance of each P frame in the picture group from the I frame.

6. The method of claim 1, wherein obtaining the target coding rate comprises:

acquiring the bandwidth of a video data transmission link according to the packet loss rate of the video data transmission link and the transmission delay of the video data transmission link;

and adjusting the coding rate according to the video data transmission link bandwidth to obtain a first target coding rate, wherein the video data transmission link bandwidth and the first target coding rate have positive correlation.

7. The method of claim 1, wherein obtaining the target coding rate comprises:

receiving a coding rate control instruction sent by a server, wherein the coding rate control instruction is generated by the server according to downlink bandwidths between the server and a plurality of video data receiving ends;

and adjusting the coding rate according to the coding rate control instruction to obtain a second target coding rate.

8. The method of claim 7,

when the ratio of the coding rate control instruction at the low-bandwidth receiving end is greater than a preset threshold, the coding rate control instruction is used for indicating that the coding rate is reduced; the low-bandwidth receiving end represents a video data receiving end with the downlink bandwidth lower than a preset bandwidth threshold;

and when the ratio of the coding rate control instruction at the low-bandwidth receiving end is less than or equal to the preset threshold, the coding rate control instruction is used for indicating that the coding rate is kept or improved.

9. A video data processing apparatus, characterized in that the apparatus comprises:

a frame importance acquiring unit, configured to acquire importance degrees of frames in the picture group;

10. A storage medium storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the method of any one of claims 1-8.