US10785279B2 - Video encoding using starve mode - Google Patents
Video encoding using starve mode Download PDFInfo
- Publication number
- US10785279B2 US10785279B2 US15/394,699 US201615394699A US10785279B2 US 10785279 B2 US10785279 B2 US 10785279B2 US 201615394699 A US201615394699 A US 201615394699A US 10785279 B2 US10785279 B2 US 10785279B2
- Authority
- US
- United States
- Prior art keywords
- video
- network
- encoding
- packet
- mode
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
- 238000000034 method Methods 0.000 claims description 41
- 238000004891 communication Methods 0.000 claims description 27
- 238000012544 monitoring process Methods 0.000 claims description 20
- 230000005540 biological transmission Effects 0.000 claims description 15
- 230000008859 change Effects 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 8
- 238000013139 quantization Methods 0.000 claims description 6
- 238000012937 correction Methods 0.000 claims description 3
- 239000000872 buffer Substances 0.000 description 11
- 230000006835 compression Effects 0.000 description 11
- 238000007906 compression Methods 0.000 description 11
- 230000009467 reduction Effects 0.000 description 11
- 230000000007 visual effect Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 206010012186 Delayed delivery Diseases 0.000 description 1
- 206010044565 Tremor Diseases 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000008713 feedback mechanism Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000035755 proliferation Effects 0.000 description 1
- 238000004549 pulsed laser deposition Methods 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
- 239000012536 storage buffer Substances 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/65—Network streaming protocols, e.g. real-time transport protocol [RTP] or real-time control protocol [RTCP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/80—Responding to QoS
-
- H04L65/608—
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0852—Delays
- H04L43/0864—Round trip delays
-
- H04L65/602—
-
- H04L65/607—
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/70—Media network packetisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/75—Media network packet handling
- H04L65/752—Media network packet handling adapting media to network capabilities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/75—Media network packet handling
- H04L65/762—Media network packet handling at the source
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/107—Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/132—Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/164—Feedback from the receiver or from the transmission channel
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
Definitions
- This patent document relates to video processing, and in particular, relates to video compression for interactive real-time applications.
- the present document describes techniques for operating a video encoder, including, for example, for low latency, real-time video communication under adverse network conditions.
- a disclosed method of operating a video encoder in a data communication network includes monitoring a network condition, wherein the monitoring includes tracking the status of at least some compressed video packets generated by the video encoder and transferred to the data communication network; deciding, based on the monitoring, to change the operation of the video encoder to a starve mode in which a sub-optimal mode of encoding is used for generating compressed video packets; operating, in the starve mode, the video encoder to produce intra-only compressed video frames at an output of the video encoder; selecting an encoding parameter for the intra-only compressed video frames such that each resulting intra-encoded video frame fits within a single packet of transmission at the application layer; and transferring, selectively based on the network condition, the compressed video frames to the data communication network.
- a computer program product comprising a computer-readable medium that stores processor-executable code.
- the code includes instructions for implementing a method of performing video encoding on a user device.
- the instructions include instructions for monitoring the condition of a network for a trigger point for switching a mode of video encoding operation to sustain an ongoing video communication due to changes in the condition; instructions for deciding, upon detecting that a trigger point has been reached and based on an identity of the trigger point, to operate a video encoder in a corresponding starve mode by modifying at least one parameter of video encoding; and instructions for transferring, selectively based on the condition of the network, compressed video frames to the network.
- the video encoder is further controlled to produce, for each compressed video frame, a number of bits that fit within exactly one network packet of a pre-determined size.
- an apparatus comprising a memory, a processor, and a network interface for performing real-time video communication.
- the apparatus includes an encoder module that produces one or more compressed video representations of a video frame, one or more buffers that store the one or more compressed video representations, and a packetizer module that checks sizes of the one or more compressed video representations, and provides feedback to the encoder module about altering a parameter for producing the one or more compressed video representations.
- the encoder module is operable in at least two modes of operation, including a normal mode in which the encoder module produces the one or more compressed video representations by refraining from altering the parameter based on the feedback, and a starve mode in which the encoder module produces the one or more compressed video representation by performing intra-only encoding of the video frame and further based on the feedback received from the packetizer module.
- FIG. 1 illustrates an example of a communication network for video uploading and sharing, consistent with various embodiments
- FIG. 2 illustrates a timeline of operation of a video encoder that receives network feedback
- FIG. 3 illustrates an example of a video encoding timeline using frame rate reduction
- FIG. 4 illustrates another example of a video encoding timeline using frame rate reduction in the compressed domain
- FIG. 5 illustrates another example of a video encoding timeline using resolution reduction
- FIG. 6 illustrates another example of a video encoding timeline using resolution and rate reduction
- FIG. 7 illustrates various examples of video encoding embodiments
- FIG. 8 is a flowchart illustrating an example method of video encoding
- FIG. 9 is a block diagram illustrating an example computing device, consistent with various embodiments.
- FIG. 10 is a flowchart illustrating an example method of controlling the operation of a video encoder.
- Typical video encoding schemes used for real-time communications/video chat (“RTC”) are not optimized for the low bandwidth that cell phones sometime experience.
- Video encoders throw away packets during RTC rather than re-send lost packets because delayed delivery of lost packets is generally not desirable.
- the disclosure is directed to a “starve mode” for encoding video when experiencing low bandwidth.
- the encoder receives information on the current bandwidth, looks up a table to identify an acceptable frame rate and/or resolution, and adjusts the quantization parameter (Qp) for the encoder upwards or downwards so that a Real Time Transport (RTP) packet is efficiently utilized (e.g., so that the entire 1.2 Kb is used but not exceeded).
- Qp quantization parameter
- Each subsequent RTP packet carries an I-frame, and no p-frames (e.g., “delta frames”) are sent.
- the encoder checks the Real-Time Control Protocol (RTCP) receiver report at a specified frequency and leaves or enters the starve mode as necessary (e.g., when low or normal bandwidth is experienced).
- RTCP Real-Time Control Protocol
- FIG. 1 illustrates an example of a communication network 100 in which a distributed network 102 is operating to facilitate the use of digital video among multiple users: user 104 , who may be the originator of a video using his user device 106 , and users 114 with their user devices 116 , who may download and view the video sequence that the user device 106 uploads to a server 110 in the distributed network 102 .
- the user devices 106 and 116 may communicate with the distributed network 102 via communication networks or channels 108 and 112 .
- the channels 108 and 112 may be the same or different networks (e.g., the Internet or World Wide Web) and may change from time to time based on location.
- user devices 106 and 116 may include multiple network modules (e.g., a Wi-Fi modem, a 3G modem, a 4G modem, a WiMAX modem, etc.) and may use one or more of the network connections for communication in the upstream direction (from the device to the network) or the downstream direction (from the network to the device).
- network modules e.g., a Wi-Fi modem, a 3G modem, a 4G modem, a WiMAX modem, etc.
- the user 104 may capture a video sequence using a camera-enabled smartphone (user device 106 ).
- the user 104 may then instruct the user device 106 to upload the video to the server 110 , which may be a server operated by a service provider (e.g., a social media website).
- the service provider may operate the distributed network 102 (e.g., a geographically distributed server farm) to propagate the availability of the user's video clip to other users with whom the user 104 wishes to share the video clip (e.g., user 114 ).
- User devices 106 and 116 may often include resources for capturing video, for example, using a built-in camera, encoding or compressing the video, and transferring the video to the network via one or more of the network modules.
- User devices 106 and 116 may perform video encoding using a combination of code running on a processor and hardware assistance for computationally intensive functions such as transform calculations.
- Video encoders are often designed to produce high visual quality video output when operating within a target output bitrate. For example, a video encoder may be designed to operate in a normal mode in 200 Kbps to 2 Mbps output video range, while another video encoder may be designed to operate in the 1 Mbps to 6 Mbps range, and so on.
- the video encoder is constrained to produce lower bitrate compressed video than the normal range of operation, then the visual quality of the resulting video may significantly deteriorate.
- a video encoder may not even be able to operate below a target output bitrate. For example, some video encoders may not be able to produce compressed video at all at bitrates below 50 Kbps.
- video encoders are often optimized to exploit redundancies in video, such as inter-frame dependencies, to improve compression efficiency of the encoding.
- Such optimization may impact the performance in a real-time video communication application, especially in a scenario in which available network bandwidth is unpredictable, and packets might be lost because motion-compensated compressed video typically requires more end-to-end latency for encoding and also may produce more objectionable visual artifacts when some packets are lost in the network.
- the visual artifacts may be worse in cases where some large video frames (e.g., intra-coded frames) may occupy several network packets and thus may increase the possibility that at least some information from the large video frame may be lost in case of packet drops.
- the well-known RTP transport protocol includes the RTCP that defines a mechanism for receiving devices to provide Quality of Service (QoS) information to transmitting devices.
- QoS Quality of Service
- RTCP information may be received at a video encoder on a periodic basis, e.g., every 2 seconds, and may provide the video encoder.
- FIG. 2 illustrates an example timeline 200 of the operation of a video encoder that receives QoS feedback information from the network.
- the horizontal axis 204 represents time in arbitrary units, for example, in seconds.
- the time instances 201 may mark start and end times of an encoding epoch.
- the video encoder may operate with a fixed set of encoding parameters such as target output bitrate. For example, in one encoding epoch, a video encoder may use a target output bitrate of 600 Kbps, while in the next encoding epoch, the target bitrate may be changed to 800 Kbps, or may be lowered to 500 Kbps, etc. It would be appreciated that while the target settings are maintained, e.g., for calculating bitrate and running internal bit allocation, actual instantaneous bitrate sometimes may vary based on the detail of video content. Such differences between the encoder's output bitrate and the actual rate of transmission on the network can be compensated by using a temporary storage buffer for storing compressed video packets.
- Time instances 203 represent times at which the video encoder receives network QoS information, e.g., from RTCP packets.
- the time instances 203 may be a relatively fixed time period apart (e.g., 2 seconds) while encoding epochs may also be of a relatively similar duration (e.g., 4 seconds each).
- encoder settings in a next epoch may be influenced by the most recently received QoS report packet that the video encoder had time to receive and process.
- a control packet was received at time instance 203 a , just prior to the onset of the video epoch that starts from time instance 201 , but because the video encoder did not have the opportunity to process this packet, the network information contained within this packet was not used to decide video encoding parameters until the next video epoch 204 , as indicated by dashed arrow 205 .
- video epochs and the network QoS packets may be attempted to be synchronized with each other.
- a video epoch may start at a fixed time after a video encoder receives network monitoring information.
- video epoch duration may be dependent on how often the network monitoring information is received, or the fastest and/or the slowest rate at which the QoS information is received.
- video epoch duration may be greater than the inter-network monitoring packet period, e.g., two to four network QoS packets may be received in each video epoch.
- a video encoder may make decisions about how to operate in each video epoch.
- video encoders may be programmed to detect extreme network conditions, or trigger points, that could result in significant changes in video quality unless some corrective action is taken. For example, if the lowest rate at which a video encoder can be operated in the normal mode is 200 Kbps and the available network bandwidth falls below 220 Kbps (or 200 Kbps), then the video encoder may have to modify encoding parameters and operate in a “starve” mode in order to maintain a reasonable visual quality of output compressed video. As described in this document, such starve mode decisions may be triggered by more than one condition such as network bandwidth, rate of packet loss, round-trip packet delay, local video buffer overflowing or underflowing, and other operational conditions of the network.
- FIG. 3 to FIG. 6 various ways of operating a video encoder in a starve mode are described.
- the operational scenarios described in FIG. 3 to FIG. 6 may occur in the above-described video encoding epochs. That is, in some embodiments, one starve mode may be used throughout a video epoch, followed by a decision process about which starve mode to use in the next encoding epoch, and switching correspondingly—either to another starve mode, or to a non-starve mode operation of the video encoder.
- FIG. 3 is a block diagram illustration of an operational scenario 300 of a video encoder.
- the operational scenario 300 may occur, e.g., during one video epoch.
- the vertical axis 301 may generally represent a sequence of video encoder operations during the epoch.
- Row 302 represents a number of video frames v 1 to v 5 being received or generated at a user device.
- the video frames may have a capture resolution, for example, 640 ⁇ 352. Because this resolution may be different from the camera resolution at which the images are captured and internal to the user device, downsampling to a resolution suitable for video chat may be performed.
- the camera may be capturing video in real-time at full resolution (1920 ⁇ 1080, 60 frames per seconds).
- This sequence may be downsampled to a video chat resolution of 640 ⁇ 352, 15 frames per seconds, as represented by row 302 .
- the video encoding operates by reducing the number of frames being encoded and transmitted according to the transmission bandwidth available across the network interface 310 .
- the frames to be sent (v 1 and v 3 from row 304 ) may be encoded into their encoded video representations ve 1 and ve 3 .
- the video frame v 2 ( 306 ), on the other hand, may be dropped.
- the encoded frames ve 1 and ve 3 are then sent over the network.
- the decision to drop frames may be made prior to performing video encoding. For example, in some embodiments, a look-up table may be maintained by which the video encoder can decide which available network bandwidth value to use for a particular frame rate.
- the decision about which frames to drop may be performed in the uncompressed (or lightly compressed, or compressed by the capture camera) domain of video based on a repetitive sequence (e.g., every third frame) and/or may be performed in real-time based on scene changes, shakiness stabilization of video frames, and so on. Therefore, in operational scenario 300 , the available user device resources for video encoding may be saved because only video frames that are actually to be transmitted may be compressed.
- FIG. 4 operational scenario 400 , no frame-dropping decision is made in the uncompressed video domain. Instead, all video frames are compressed and a compressed video stream is generated. This stream may be stored in a temporary buffer in which a decision is made in real-time about which compressed domain bits to send.
- One advantage of this scheme is that video compression may be performed without having to first decide whether to drop frames. Such encoding may benefit from a hardware accelerator, a co-processor, or camera-based encoding where a social media app may not be able to alter the flow of input video images in the compression pipeline.
- an image resolution reduction filter 502 may be applied to reduce the size of images (as depicted by video frames in row 504 having a smaller width than the corresponding source video images in row 302 ), followed by video compression (row 506 ) and selective transmission (row 512 ).
- FIG. 6 depicts an operational scenario 600 in which both frame rate reduction and image resolution reduction are used. Such may be the case, when, for example, available network bandwidth may be very low and/or packet drop rate is high (e.g., above a threshold).
- row 604 represents resolution-reduced images (ones with dashed border being dropped from subsequent compression) and row 606 represents the corresponding compressed-domain video frames.
- Row 612 depicts the encoded frames being transmitted over the network.
- FIG. 3 to FIG. 6 illustrate various scenarios of operation of video encoder to produce video having a quality commensurate with and most suitable for available network parameters such as bandwidth, round trip delay, packet drop, and so on.
- FIG. 7 depicts an example of packet syntax 700 of a network packet that may be used for transmitting compressed video.
- the network packet may have a strict upper limit on number of bytes (in the illustrated embodiment, the upper limit is 1200 bytes).
- the network packet may comprise a header field.
- RTP defines a minimum 12 byte header field 702 , a video data payload field 704 , and an optional error correction code (ECC) field 706 .
- ECC error correction code
- the ECC field may be, for example, 4 bytes long.
- the system 730 illustrates an example where an encoder module 732 produces compressed video bits for video frames and outputs to a buffer 734 .
- the packetizer module 736 analyzes the produced bits in the buffer 734 for size and checks whether the size is acceptable, e.g., below a strict upper limit currently in force on the network. Based on the analysis, the packetizer module 736 provides feedback 738 to the encoder module 732 .
- the feedback may indicate the size of produced bits and thus provide information to the encoder about whether re-encoding to produce more or fewer bits should be performed.
- the feedback may be in the form of a suggested combination of encoding parameters that the encoder should use instead of the one that was used.
- offline machine learning described in the present document, could be used to establish a relationship between target compressed video packet sizes and video encoding settings.
- the system 760 depicts an alternative embodiment in which the encoder module may simultaneously produce output compressed video data at multiple encoding settings and store them in a bank of buffers 740 .
- the packetizer module 736 may simply pick a right-sized packet from the buffer.
- the packetizer module 736 may provide feedback 738 to the encoder module 732 .
- the system 760 may use more computational power due to multiple simultaneous encode operations, but may produce target packets without having to go through a feedback loop that may take from the end-to-end time delay budget in real-time communication scenarios such as video chat.
- the video encoding process will monitor an ongoing video chat session and produce encoded video by operating the video encoder in “normal” mode.
- Most commercially available or proprietary video encoders can operate in normal mode and produce satisfactory video bitstreams for a wide range of output bitrate using identical, or substantially identical parameter settings.
- the resulting compressed video stream may use intra-encoded and predictively encoded (or bi-predictively encoded) frames.
- Some video encoders may not need any external control of parameters to produce video bitstreams as low as 50 Kbps to as high as 6 Mbps.
- the video encoder may use certain network conditions as triggers to recognize that the video encoder settings should be changed to meet the changes in network condition. For example, a change in the number of packets dropped (as reported by a receiver), a change in the round trip delay, a change in the network interface (e.g., entering from Wi-Fi to 4G network) etc. These triggers may correspond to a low mark for one or more parameter settings for the video encoding. For example, while a video encoder may satisfactorily operate within a 50 Kbps to 6 Mbps range, when forced to produce bitstream below 50 Kbps using the same video compression settings, video quality may dramatically deteriorate and the video encoder may thus require a significant, or externally controlled, alteration of parameters to produce satisfactory quality of encoded video.
- one of the trigger events may trigger switching of a video encoding operation between a normal mode and a starve mode.
- the starve mode may include various combinations of the above-described scenarios 300 , 400 , 500 and 600 .
- multiple trigger events may be used to select one of multiple available starve modes. For example, one starve mode may correspond to reduced frame rate operation, another starve mode may correspond to reduced image resolution operation, another starve mode may correspond to intra-frame encoding only operation, while another starve mode may include more than one of these options used in a pre-determined combination, e.g., as described in FIG. 3 to FIG. 6 .
- machine learning may be used to train a video encoder system to make better decisions about which starve mode to use and what encoding settings to use, based on the knowledge of network conditions.
- machine learning may be achieved using online techniques, e.g., continuous quality monitoring using automatic or human-feedback-based video quality measurements, or using offline techniques such as controlled experiments with test sequences.
- encoding parameters whose values can influence the number of compressed bits produced. Some of these parameters may apply to the entire frame, while others may apply to only portions of the frame (e.g., slice or macroblock basis). The availability of a large number of such parameters can be advantageously used in various embodiments of starve mode operation of a video encoder.
- some well-known encoding algorithms such as H.264, use a single parameter, called quantization parameter (Qp), to control the level of quantization performed while encoding a video frame.
- Qp quantization parameter
- H.264 Qp value is permitted to be between 0 to 51, and thus, using Qp, approximately 51 different sizes of resulting compressed video frame are possible. This granularity may be sufficient in many cases.
- the granularity may be increased by using finer control on bit allocation and may include changing parameters such as the rate control algorithm used, whether or not intra-motion vectors are used, whether or not certain encoding features such as arithmetic encoding are used, whether or not motion vector predictor (MVP) based encoding is used, and so on.
- the level of bit granularity used by the encoder may be one of the video encoding parameters selected on a per-video epoch basis based on the video encoder keeping track of how many iterations have to be performed in the feedback system depicted in FIG. 7 before the encoder generates a packet that has the exact target network packet size.
- FIG. 8 is a flowchart illustrating an example method 800 for operating a video encoder.
- the method 800 includes, at 802 , monitoring a network condition, wherein the monitoring includes tracking the status of at least some compressed video packets generated by the video encoder and transferred to the data communication network.
- the method 800 includes, at 804 , deciding, based on the monitoring, to change the operation of the video encoder to a starve mode in which a constrained mode of encoding is used for generating compressed video packets.
- the method 800 includes, at 806 , operating, in the starve mode, the video encoder to produce intra-only compressed video frames at an output of the video encoder.
- the duration for which the intra-only encoding is performed may be proportional to a network condition such as the rate at which available network bandwidth dropped. For example, in some cases, available network bandwidth may have reduced at a rapid rate that exceeds a threshold (such may be the case when a user device enters from Wi-Fi coverage to 4G or LTE coverage). In such cases, the video encoder may decide to operate in the starve mode a bit longer than in cases where the network bandwidth has reduced slowly.
- the method 800 includes, at 808 , selecting an encoding parameter for the intra-only compressed video frames such that each resulting intra-encoded video frame fits within a single packet of transmission at the application layer.
- An application layer packet may conform to a pre-defined syntax such as RTP or similar, and may represent a unit of data transfer.
- the user device may map each application layer packet to its own corresponding network layer packet (e.g., an internet protocol (IP) packet) to improve predictability and QoS of the video encoding and network transmission process. For example, such one-to-one mapping makes it easier to count the number of video packets being dropped or successfully delivered, simply by counting the corresponding IP packets.
- IP internet protocol
- implementations may be kept computationally simple by using a single parameter that can uniformly impact the entirety of a video frame.
- Qp for example, is used in the denominator to reduce coefficient values of each macroblock across an entire frame, and thus has a same, or uniform, effect on the entirety of frame. For example, increasing Qp value will result in fewer (or in some cases equal) bits being produced after quantization of each and every macroblock of a video frame.
- the method 800 may include simultaneously producing candidate video-encoded bits at least at two different encoding parameter settings to decide which ones of the at least two different encoding parameter settings results in packets that conform to the single application-layer packet restriction.
- FIG. 7 illustrates the use of multiple buffers to simultaneously produce encoded packets of different sizes.
- the method 800 includes, at 810 , selectively transferring, selectively based on the network condition, the compressed video frames to the data communication network.
- the compressed video frames may be stored in an output buffer of the network interface and may be selectively transferred to the network when network bandwidth becomes available based on when the user device gains transmission opportunities to the transmission medium.
- the operation of a video encoder may be constrained by one or more of: changing at least one of a frame resolution or frame rate of the video, constraining each encoded video frame to be intra-frame with packet size as close to, but no greater than, network packet size.
- a public domain implementation of the RTP protocol stack called GoRTP
- GoRTP constrains RTP packets to have a 1200 byte size when sending out over the network.
- Some embodiments that use GoRTP may thus constrain each output intra-encoded video frame to fit within the 1200-byte packet. It will be appreciated that due to the presence of headers and optional error check-sums, actual video data may occupy less than 1200 bytes.
- a pre-determined schedule e.g., a look-up table
- a pre-determined schedule may be used for deciding between a given trigger point and a corresponding resolution of frame rate to be used.
- a factor of two reduction may be applied both in horizontal and vertical dimensions due to simplicity of implementation.
- frame rate reduction may be achieved by dropping a pre-determined sequence of frames.
- FIG. 9 is a block diagram of a computer system as may be used to implement features of some of the embodiments, for example, master nodes or worker nodes, as described herein.
- the computing system 900 may include one or more central processing units (“processors”) 905 , memory 910 , input/output devices 925 (e.g., keyboard and pointing devices, and display devices), storage devices 920 (e.g., disk drives), and network adapters 930 (e.g., network interfaces) that are connected to an interconnect 915 .
- the interconnect 915 is illustrated as an abstraction that represents any one or more separate physical buses, point to point connections or both, connected by appropriate bridges, adapters, or controllers.
- the interconnect 915 may include, for example, a system bus, a Peripheral Component Interconnect (PCI) bus or PCI-Express bus, a HyperTransport or industry standard architecture (ISA) bus, a small computer system interface (SCSI) bus, a universal serial bus (USB), IIC (I2C) bus, or an Institute of Electrical and Electronics Engineers (IEEE) standard 1394 bus, also called Firewire.
- PCI Peripheral Component Interconnect
- ISA industry standard architecture
- SCSI small computer system interface
- USB universal serial bus
- I2C IIC
- IEEE Institute of Electrical and Electronics Engineers
- the memory 910 and storage devices 920 are computer-readable storage media that may store instructions that implement at least portions of the various embodiments.
- the data structures and message structures may be stored or transmitted via a data transmission medium, for example, a signal on a communications link.
- Various communications links may be used, including the Internet, a local area network, a wide area network, or a point-to-point dial-up connection.
- computer-readable media can include computer-readable storage media (e.g., “non-transitory” media) and computer-readable transmission media.
- the instructions stored in memory 910 can be implemented as software and/or firmware to program the processor(s) 905 to carry out actions described above.
- such software or firmware may be initially provided to the computing system 900 via download from a remote system to the computing system 900 (e.g., via network adapter 930 ).
- a user device may include at least a memory, a processor, and a network interface.
- the memory may store instructions that, when executed by the processor, cause the processor to transmit encoded video over the network interface.
- the instructions may include instructions for performing a video compression operation and producing encoded video frames, instructions for transferring the encoded video frames over the network interface at a present output frame rate, instructions for monitoring the present output frame rate for deviation from a target output frame rate, and instructions for selectively adjusting the quality of future encoded video frames when the present output frame rate deviates from the target output frame rate.
- programmable circuitry e.g., one or more microprocessors
- special-purpose hardwired circuitry may be in the form of, for example, one or more ASICs, PLDs, FPGAs, etc.
- FIG. 10 is a flowchart illustrating a method 1000 of controlling the operation of a video encoder.
- the method 1000 includes, at 1002 , monitoring the condition of a network for a trigger point for switching a mode of video encoding operation to sustain an ongoing video communication due to changes in the network condition.
- the method 1000 includes, at 1004 , deciding, upon detecting that a trigger point has been reached and based on an identity of the trigger point, to operate a video encoder in a corresponding starve mode by modifying at least one parameter of video encoding.
- the video encoder is controlled to produce, for each compressed video frame, a number of bits that fit within exactly one network packet of a pre-determined size.
- the video encoder is controlled to produce intra-only encoded video frames.
- the method 1000 includes, at 1006 , transferring, selectively based on the condition of the network, compressed video frames over the network. For example, in some operational scenarios, due to the delay in the generation of encoded video frames, by the time a packet is ready to be sent out on the network, a short-term unavailability of network bandwidth may require either delaying or entirely skipping transmission of a network packet. As previously described, each network packet may be produced to occupy a number of bits as close to, but less than, a target network packet size such that a single network packet, e.g., an IP packet, carries all information that a receiver needs to uncompress and display a single video frame.
- a target network packet size such that a single network packet, e.g., an IP packet, carries all information that a receiver needs to uncompress and display a single video frame.
- the encoder in starve mode may also perform image resolution reduction and/or frame rate reduction to provide high quality of user experience even when network bandwidth availability is reduced.
- FIG. 7 shows an example of the network packet, which may include a header field, a payload field carrying video data, and an error correction code field.
- the network conditions may be reported to the video encoder via feedback from the network, e.g., RTCP packets as specified in the RTP protocol, which is a well-known industry standard.
- an apparatus for performing real-time video communication includes an encoder module that produces one or more compressed video representations of a video frame, one or more buffers that store the one or more compressed video representations, and a packetizer module that checks the sizes of the one or more compressed video representations and provides feedback to the encoder module about altering a parameter for producing the one or more compressed video representations.
- the encoder module is operable in at least two modes of operation including a normal mode in which the encoder module produces the one or more compressed video representations by refraining from altering the parameter based on the feedback, and a starve mode in which the encoder module produces the one or more compressed video representations by performing intra-only encoding of the video frame and further based on the feedback received from the packetizer module.
- the encoder module includes an image resolution filter module that operates to downsample the video frame prior to compression, based on the feedback.
- the encoder module includes a look-up table that controls the frame rate used for encoding the output compressed video based on the feedback.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Environmental & Geological Engineering (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Description
Claims (17)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/394,699 US10785279B2 (en) | 2016-12-29 | 2016-12-29 | Video encoding using starve mode |
US16/998,654 US11190570B2 (en) | 2016-12-29 | 2020-08-20 | Video encoding using starve mode |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/394,699 US10785279B2 (en) | 2016-12-29 | 2016-12-29 | Video encoding using starve mode |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/998,654 Division US11190570B2 (en) | 2016-12-29 | 2020-08-20 | Video encoding using starve mode |
Publications (2)
Publication Number | Publication Date |
---|---|
US20180192061A1 US20180192061A1 (en) | 2018-07-05 |
US10785279B2 true US10785279B2 (en) | 2020-09-22 |
Family
ID=62712126
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/394,699 Active 2038-08-11 US10785279B2 (en) | 2016-12-29 | 2016-12-29 | Video encoding using starve mode |
US16/998,654 Active US11190570B2 (en) | 2016-12-29 | 2020-08-20 | Video encoding using starve mode |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/998,654 Active US11190570B2 (en) | 2016-12-29 | 2020-08-20 | Video encoding using starve mode |
Country Status (1)
Country | Link |
---|---|
US (2) | US10785279B2 (en) |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10861196B2 (en) | 2017-09-14 | 2020-12-08 | Apple Inc. | Point cloud compression |
US11818401B2 (en) | 2017-09-14 | 2023-11-14 | Apple Inc. | Point cloud geometry compression using octrees and binary arithmetic encoding with adaptive look-up tables |
US10909725B2 (en) | 2017-09-18 | 2021-02-02 | Apple Inc. | Point cloud compression |
US11113845B2 (en) | 2017-09-18 | 2021-09-07 | Apple Inc. | Point cloud compression using non-cubic projections and masks |
CA3079475A1 (en) * | 2017-10-19 | 2019-04-25 | Lazar Entertainment Inc. | Systems and methods for broadcasting live media streams |
US10867414B2 (en) | 2018-04-10 | 2020-12-15 | Apple Inc. | Point cloud attribute transfer algorithm |
US11010928B2 (en) | 2018-04-10 | 2021-05-18 | Apple Inc. | Adaptive distance based point cloud compression |
CN108848377B (en) * | 2018-06-20 | 2022-03-01 | 腾讯科技(深圳)有限公司 | Video encoding method, video decoding method, video encoding apparatus, video decoding apparatus, computer device, and storage medium |
US11017566B1 (en) | 2018-07-02 | 2021-05-25 | Apple Inc. | Point cloud compression with adaptive filtering |
US11202098B2 (en) | 2018-07-05 | 2021-12-14 | Apple Inc. | Point cloud compression with multi-resolution video encoding |
US11012713B2 (en) | 2018-07-12 | 2021-05-18 | Apple Inc. | Bit stream structure for compressed point cloud data |
US11367224B2 (en) | 2018-10-02 | 2022-06-21 | Apple Inc. | Occupancy map block-to-patch information compression |
EP3742739B1 (en) * | 2019-05-22 | 2021-04-14 | Axis AB | Method and devices for encoding and streaming a video sequence over a plurality of network connections |
CA3144466A1 (en) | 2019-07-23 | 2021-01-28 | Lazar Entertainment Inc. | Live media content delivery systems and methods |
US11627314B2 (en) | 2019-09-27 | 2023-04-11 | Apple Inc. | Video-based point cloud compression with non-normative smoothing |
US11538196B2 (en) | 2019-10-02 | 2022-12-27 | Apple Inc. | Predictive coding for point cloud compression |
US11895307B2 (en) | 2019-10-04 | 2024-02-06 | Apple Inc. | Block-based predictive coding for point cloud compression |
US11798196B2 (en) | 2020-01-08 | 2023-10-24 | Apple Inc. | Video-based point cloud compression with predicted patches |
US11475605B2 (en) | 2020-01-09 | 2022-10-18 | Apple Inc. | Geometry encoding of duplicate points |
US11615557B2 (en) * | 2020-06-24 | 2023-03-28 | Apple Inc. | Point cloud compression using octrees with slicing |
US11620768B2 (en) | 2020-06-24 | 2023-04-04 | Apple Inc. | Point cloud geometry compression using octrees with multiple scan orders |
US11948338B1 (en) | 2021-03-29 | 2024-04-02 | Apple Inc. | 3D volumetric content encoding using 2D videos and simplified 3D meshes |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030031128A1 (en) * | 2001-03-05 | 2003-02-13 | Jin-Gyeong Kim | Systems and methods for refreshing macroblocks |
US6680976B1 (en) * | 1997-07-28 | 2004-01-20 | The Board Of Trustees Of The University Of Illinois | Robust, reliable compression and packetization scheme for transmitting video |
US20040181611A1 (en) * | 2003-03-14 | 2004-09-16 | Viresh Ratnakar | Multimedia streaming system for wireless handheld devices |
US20060026294A1 (en) * | 2004-07-29 | 2006-02-02 | Microsoft Corporation | Media transrating over a bandwidth-limited network |
US20090180701A1 (en) * | 2008-01-10 | 2009-07-16 | Seungyeob Choi | Video Data Encoding System |
US20100104009A1 (en) * | 2008-10-28 | 2010-04-29 | Sony Corporation | Methods and systems for improving network response during channel change |
US20120230390A1 (en) * | 2011-03-08 | 2012-09-13 | Gun Akkor | Adaptive Control of Encoders for Continuous Data Streaming |
US8578040B2 (en) * | 2003-08-14 | 2013-11-05 | International Business Machines Corporation | Method, system and article for client application control of network transmission loss tolerance |
US20150229367A1 (en) * | 2014-02-11 | 2015-08-13 | Lg Electronics Inc. | Apparatus for transmitting broadcast signals, apparatus for receiving broadcast signals, method for transmitting broadcast signals and method for receiving broadcast signals |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050013249A1 (en) * | 2003-07-14 | 2005-01-20 | Hao-Song Kong | Redundant packets for streaming video protection |
-
2016
- 2016-12-29 US US15/394,699 patent/US10785279B2/en active Active
-
2020
- 2020-08-20 US US16/998,654 patent/US11190570B2/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6680976B1 (en) * | 1997-07-28 | 2004-01-20 | The Board Of Trustees Of The University Of Illinois | Robust, reliable compression and packetization scheme for transmitting video |
US20030031128A1 (en) * | 2001-03-05 | 2003-02-13 | Jin-Gyeong Kim | Systems and methods for refreshing macroblocks |
US20040181611A1 (en) * | 2003-03-14 | 2004-09-16 | Viresh Ratnakar | Multimedia streaming system for wireless handheld devices |
US8578040B2 (en) * | 2003-08-14 | 2013-11-05 | International Business Machines Corporation | Method, system and article for client application control of network transmission loss tolerance |
US20060026294A1 (en) * | 2004-07-29 | 2006-02-02 | Microsoft Corporation | Media transrating over a bandwidth-limited network |
US20090180701A1 (en) * | 2008-01-10 | 2009-07-16 | Seungyeob Choi | Video Data Encoding System |
US20100104009A1 (en) * | 2008-10-28 | 2010-04-29 | Sony Corporation | Methods and systems for improving network response during channel change |
US20120230390A1 (en) * | 2011-03-08 | 2012-09-13 | Gun Akkor | Adaptive Control of Encoders for Continuous Data Streaming |
US20150229367A1 (en) * | 2014-02-11 | 2015-08-13 | Lg Electronics Inc. | Apparatus for transmitting broadcast signals, apparatus for receiving broadcast signals, method for transmitting broadcast signals and method for receiving broadcast signals |
Also Published As
Publication number | Publication date |
---|---|
US20200382575A1 (en) | 2020-12-03 |
US11190570B2 (en) | 2021-11-30 |
US20180192061A1 (en) | 2018-07-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11190570B2 (en) | Video encoding using starve mode | |
US11503307B2 (en) | System and method for automatic encoder adjustment based on transport data | |
US8527649B2 (en) | Multi-stream bit rate adaptation | |
US9532062B2 (en) | Controlling player buffer and video encoder for adaptive video streaming | |
CN108965883B (en) | System and method for encoding video content using virtual intra frames | |
US9826260B2 (en) | Video encoding device and video encoding method | |
US10659514B2 (en) | System for video monitoring with adaptive bitrate to sustain image quality | |
US20140104493A1 (en) | Proactive video frame dropping for hardware and network variance | |
US10523939B2 (en) | Dynamic codec adaption | |
CN111988560B (en) | Method and apparatus for encoding and streaming video sequences over multiple network connections | |
AU2021200428B2 (en) | System and method for automatic encoder adjustment based on transport data | |
EP3138294A1 (en) | Content message for video conferencing | |
JP2016506206A (en) | Retransmission and frame synchronization for error control | |
US20130243079A1 (en) | Storage and processing savings when adapting video bit rate to link speed | |
JP5675164B2 (en) | Transmission device, transmission method, and program | |
TWI683572B (en) | Video bit rate transmission control method based on dynamic picture information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: FACEBOOK, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HE, YAMING;ZUO, ZHENGPING;SIGNING DATES FROM 20170208 TO 20170209;REEL/FRAME:041236/0039 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: META PLATFORMS, INC., CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:FACEBOOK, INC.;REEL/FRAME:058294/0215 Effective date: 20211028 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |