WO2021072694A1 - Adaptive resolution coding based on machine learning model - Google Patents

Adaptive resolution coding based on machine learning model Download PDF

Info

Publication number
WO2021072694A1
WO2021072694A1 PCT/CN2019/111598 CN2019111598W WO2021072694A1 WO 2021072694 A1 WO2021072694 A1 WO 2021072694A1 CN 2019111598 W CN2019111598 W CN 2019111598W WO 2021072694 A1 WO2021072694 A1 WO 2021072694A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
bit rate
frame
video frame
resizing
Prior art date
Application number
PCT/CN2019/111598
Other languages
French (fr)
Inventor
Ran Wang
Yuchen SUN
Tsuishan CHANG
Changguo CHEN
Jian Lou
Original Assignee
Alibaba Group Holding Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Limited filed Critical Alibaba Group Holding Limited
Priority to PCT/CN2019/111598 priority Critical patent/WO2021072694A1/en
Publication of WO2021072694A1 publication Critical patent/WO2021072694A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/15Data rate or code amount at the encoder output by monitoring actual compressed data size at the memory before deciding storage at the transmission buffer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/59Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution

Definitions

  • Video streaming and downloading/uploading are very common in people’s daily lives nowadays.
  • a user may send or upload a video file from one device to another device such as a server or a computing device of another user.
  • videos of a higher quality or resolution such as high definition videos, ultra-high definition videos, etc.
  • These high-quality or high-resolution videos usually have large file sizes, which may be of several hundred megabytes to several gigabytes, etc.
  • These high-quality or high-resolution videos not only require a long period of time for uploading and transmitting over a communication network, but also incur a huge amount of traffic on the network, thus having a high transmission cost in terms of time and network bandwidth.
  • At least one video frame and a corresponding bit rate of the at least one video frame of a video may be determined.
  • the video may be a streaming video or a subset (such as a segment) of a stored video.
  • the at least one video frame and the corresponding bit rate into may be inputted into a machine learning model to obtain a recommended resolution.
  • the at least one video frame and one or more other video frames associated with the at least one video frame may then be resized or resampled (e.g., downsampled) according to the recommended resolution.
  • the at least one video frame and the one or more other video frames may be encoded to obtain an encoded video according to a target bit rate.
  • FIG. 1 illustrates an example relationship between bit rates and qualities of an example video.
  • FIG. 2 illustrates an example environment in which an adaptive resolution coding system may be used.
  • FIG. 3 illustrates an example adaptive resolution coding system in more detail.
  • FIG. 4 illustrates an example method of adaptive resolution coding.
  • FIG. 5 illustrates another example method of adaptive resolution coding.
  • the adaptive resolution coding system may first downsample the given video by a certain sampling ratio, and then encode the downsampled video at the target bit rate.
  • the adaptive resolution coding system may then transmit the encoded video to another device over a communication network, so that the other device can restore the given video (e.g., restore an original resolution of the given video) by decoding and upsampling the encoded video.
  • the quality of the video that is restored at a receiving end depends on an amount of downsampling (or a downsampling ratio) that is performed at a sending end.
  • FIG. 1 shows an example relationship 100 between bit rates and qualities of an example video that is restored after successive operations of downsampling, encoding, decoding, and upsampling. As can be seen from FIG.
  • the adaptive resolution coding system may employ a machine learning model to determine an optimal resolution or downsampling ratio for resizing an input video of an input resolution before encoding and transmitting the video at or around a specific bit rate to another device over a communication network.
  • the machine learning model may be trained using a training sample set of different videos having a particular resolution or different resolutions and respective known values of optimal downsampling ratios that produce the best qualities for the different videos. After values of parameters (such as weights) of the machine learning model are determined, the adaptive resolution coding system may apply the machine learning model to determine a recommended downsampling ratio or resolution for an input video.
  • the machine learning model may include, but is not limited to, a neural network model such as a convolutional neural network (CNN) , a Bayesian network, a decision tree, etc.
  • the described adaptive resolution coding system may receive an input video having an input resolution and an instruction to transmit the input video at a certain bit rate (or a target bit rate) .
  • the adaptive resolution coding system may obtain one or more frames (such as intra frames) and respective one or more bit rates from the input video.
  • the adaptive resolution coding system may attempt to encode the input video at the target rate, and obtain one or more intra frames and respective one or more bit rates after encoding.
  • the adaptive resolution coding system may then input the one or more frames and the respective one or more bit rates into a trained machine learning model to obtain a recommended resolution or sampling ratio for resizing (e.g., downsampling) the input video.
  • the adaptive resolution coding system may resize the input video from the input resolution to the recommended resolution, and encode the resized input video according to a target bit rate for transmission over a communication network, thus reducing the transmission cost of the video while ensuring a high quality of the video after restoration (i.e., decoding and upsampling, for example) .
  • the input video may include, but is not limited to, some or all of a stored video, or some or all of a streaming video, etc.
  • functions described herein to be performed by the adaptive resolution coding system may be performed by multiple separate units or services.
  • a receiving service may receive an input video and an instruction including a target bit rate, while an acquisition service may obtain one or more frames and respective one or more bit rates from the input video.
  • a determination service may obtain a recommended resolution or sampling ratio for resizing (e.g., downsampling) the input video based on a machine learning model.
  • an encoding service may encode the resized input video according to a target bit rate, while a transmission service may transmit the encoded video to another device over a communication network.
  • the adaptive resolution coding system may be implemented as software and/or hardware installed in a single device, in other examples, the adaptive resolution coding system may be implemented and distributed in multiple devices or as services provided in one or more servers over a network and/or in a cloud computing architecture.
  • the application describes multiple and varied embodiments and implementations.
  • the following section describes an example framework that is suitable for practicing various implementations.
  • the application describes example systems, devices, and processes for implementing an adaptive resolution coding system.
  • FIG. 2 illustrates an example environment 200 usable to implement an adaptive resolution coding system.
  • the environment 200 may include an adaptive resolution coding system 202.
  • the adaptive resolution coding system 202 is described to be included in a client device 204.
  • the adaptive resolution coding system 202 may exist as an individual entity or device.
  • the environment 200 may further include another client device 206 and a server 208.
  • the adaptive resolution coding system 202 or the client device 204 may communicate data with the other client device 206 and the server 208 over a network 210.
  • the server 208 may be a server of a plurality of servers in a cloud or a data center.
  • functions of the adaptive resolution coding system 202 may be included in or provided by the client device 204. In implementations, some or all of the functions of the adaptive resolution coding system 202 may be included in a cloud computing system or architecture, and may be provided as services to the client device 204.
  • the client device 204 or the client device 206 may be implemented as any of a variety of computing devices including, but not limited to, a desktop computer, a notebook or portable computer, a handheld device, a netbook, an Internet appliance, a tablet or slate computer, a mobile device (e.g., a mobile phone, a personal digital assistant, a smart phone, etc. ) , a server computer, etc., or a combination thereof.
  • a desktop computer e.g., a notebook or portable computer, a handheld device, a netbook, an Internet appliance, a tablet or slate computer, a mobile device (e.g., a mobile phone, a personal digital assistant, a smart phone, etc. ) , a server computer, etc., or a combination thereof.
  • the network 210 may be a wireless or a wired network, or a combination thereof.
  • the network 210 may be a collection of individual networks interconnected with each other and functioning as a single large network (e.g., the Internet or an intranet) . Examples of such individual networks include, but are not limited to, telephone networks, cable networks, Local Area Networks (LANs) , Wide Area Networks (WANs) , and Metropolitan Area Networks (MANs) . Further, the individual networks may be wireless or wired networks, or a combination thereof.
  • Wired networks may include an electrical carrier connection (such a communication cable, etc. ) and/or an optical carrier or connection (such as an optical fiber connection, etc. ) .
  • Wireless networks may include, for example, a WiFi network, other radio frequency networks (e.g., Zigbee, etc. ) , etc.
  • the adaptive resolution coding system 202 may receive an instruction to transmit an input video at a target bit rate.
  • the adaptive resolution coding system 202 may determine a recommended resolution or downsampling ratio for the input video based on a machine learning model, downsample the input video according to the recommended resolution or downsampling ratio, and encode the input video to obtain an encoded video for storage or transmission by the client device 204 or the adaptive resolution coding system 202.
  • FIG. 3 illustrates the adaptive resolution coding system 202 in more detail.
  • the adaptive resolution coding system 202 may include, but is not limited to, one or more processors 302, memory 304, and program data 306.
  • the adaptive resolution coding system 202 may further include one or more encoders 308, an input/output (I/O) interface 310, and/or a network interface 312.
  • some or all of the functions of the adaptive resolution coding system 202 may be implemented using hardware, for example, an ASIC (i.e., Application-Specific Integrated Circuit) , a FPGA (i.e., Field-Programmable Gate Array) , and/or other hardware.
  • the one or more encoders 308 of the adaptive resolution coding system 202 may be implemented using an ASIC, a FPGA, and/or any other hardware.
  • the one or more processors 302 are configured to execute instructions that are stored in the memory 304, and/or received from the input/output interface 310, and/or the network interface 312.
  • the one or more processors 302 may be implemented as one or more hardware processors including, for example, a microprocessor, an application-specific instruction-set processor, a physics processing unit (PPU) , a central processing unit (CPU) , a graphics processing unit, a digital signal processor, etc. Additionally or alternatively, the functionality described herein can be performed, at least in part, by one or more hardware logic components.
  • FPGAs field-programmable gate arrays
  • ASICs application-specific integrated circuits
  • ASSPs application-specific standard products
  • SOCs system-on-a-chip systems
  • CPLDs complex programmable logic devices
  • the memory 304 may include processor-readable media in a form of volatile memory, such as Random Access Memory (RAM) and/or non-volatile memory, such as read only memory (ROM) or flash RAM.
  • RAM Random Access Memory
  • ROM read only memory
  • flash RAM flash random access memory
  • the processor-readable media may include a volatile or non-volatile type, a removable or non-removable media, which may achieve storage of information using any method or technology.
  • the information may include a processor-readable instruction, a data structure, a program module or other data.
  • processor-readable media examples include, but not limited to, phase-change memory (PRAM) , static random access memory (SRAM) , dynamic random access memory (DRAM) , other types of random-access memory (RAM) , read-only memory (ROM) , electronically erasable programmable read-only memory (EEPROM) , quick flash memory or other internal storage technology, compact disk read-only memory (CD-ROM) , digital versatile disc (DVD) or other optical storage, magnetic cassette tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission media, which may be used to store information that may be accessed by a computing device.
  • the processor-readable media does not include any transitory media, such as modulated data signals and carrier waves.
  • the adaptive resolution coding system 202 may further include other hardware components and/or other software components such as program units to execute instructions stored in the memory 304 for performing various operations such as processing, determination, allocation, storage, etc.
  • the adaptive resolution coding system 202 may further include a model database 314 that is configured to store information of one or more trained machine learning models used for determining recommended resolutions for videos of different input resolutions.
  • FIGS. 4 and 5 show a schematic diagram depicting an example method of adaptive resolution coding.
  • the methods of FIGS. 4 and 5 may, but need not, be implemented in the environment of FIG. 2 and using the system of FIG. 3.
  • methods 400 and 500 are described with reference to FIGS. 1-3. However, the methods 400 and 500 may alternatively be implemented in other environments and/or using other systems.
  • the methods 400 and 500 are described in the general context of computer-executable instructions.
  • computer-executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, and the like that perform particular functions or implement particular abstract data types.
  • each of the example methods are illustrated as a collection of blocks in a logical flow graph representing a sequence of operations that can be implemented in hardware, software, firmware, or a combination thereof.
  • the order in which the method is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method, or alternate methods. Additionally, individual blocks may be omitted from the method without departing from the spirit and scope of the subject matter described herein.
  • the blocks represent computer instructions that, when executed by one or more processors, perform the recited operations.
  • some or all of the blocks may represent application specific integrated circuits (ASICs) or other physical components that perform the recited operations.
  • ASICs application specific integrated circuits
  • the adaptive resolution coding system 202 may receive an input video and information of a target bit rate.
  • the adaptive resolution coding system 202 may receive an instruction to encode an input video of a certain resolution (which is called as an input resolution) at a target bit rate (or a target bandwidth) from a user of the client device 204.
  • the input video may include, but is not limited to, a stored video or a streaming video.
  • the input video may be a subset (e.g., a segment) of a stored video or streaming video.
  • the adaptive resolution coding system 202 may determine or obtain image information of at least one video frame and a bit rate of the at least one video frame from the input video.
  • the adaptive resolution coding system 202 may determine or obtain image information of at least one video frame and a bit rate of the at least one video frame from the input video, which can be used as an input to a trained machine learning model for determining or obtaining a recommended resolution (or a resampling ratio) .
  • the adaptive resolution coding system 202 may encode the input video of the input resolution at the target bit rate, and extract or obtain image information of at least one video frame and a bit rate of the at least one video frame from the encoded video.
  • encoding the input video of the input resolution at the target bit rate may include compressing a size of the input video, so that an average bit rate of transmitting the compressed video over a communication network (such as the network 210) is at or around the target bit rate.
  • the adaptive resolution coding system 202 may obtain at least one video frame and a bit rate thereof by calculating a prediction residual of the input video and estimating resulting bits for the at least one video frame (e.g., the first video frame of the input video) and the bit rate thereof based on the prediction residual. In this case, the adaptive resolution coding system 202 may not need to encode or compress the input video completely. In implementations, the adaptive resolution coding system 202 may randomly select a portion of the input video, encode or compress the selected portion of the input video, and extract or obtain image information of at least one video frame and a bit rate of the at least one video frame from the encoded or compressed portion of the input video.
  • the at least one video frame may include, but is not limited to, an intra frame that is representative of the input video from among all intra frames of the input video, or an intra frame that is randomly selected from among all the intra frames of the input video.
  • the image information of the at least one video frame may include, but is not limited to, image data of the at least one video frame (such as pixel values at each coordinate in the at least one video frame) . Additionally or alternatively, the image information of the at least one video frame may include feature data of the at least one video frame.
  • the adaptive resolution coding system 202 may perform feature extraction or detection on the at least one video frame after obtaining the at least one video frame from the input video, and additionally or alternatively use feature data that is obtained from the feature extraction or detection as the image information of the at least one video frame.
  • the feature extraction or detection may include, but are not limited to, edge detection, corner detection, blob detection, curvature detection, shape-based detection, Hough transform, etc.
  • one or more types of feature extraction may be performed on the at least one video frame.
  • the intra frame that is representative of the input video may include, but is not limited to, a first intra frame of the input video, an intra frame having a bit rate that is a median of bit rates associated with intra frames of the input video, an intra frame having a bit rate that is closest to an average of the bits rates associated with the intra frames of the input video, etc.
  • the at least one intra frame may include one or more intra frames that are representative of the input video, and/or one or more intra frames that are randomly selected from among the intra frames of the input video.
  • the adaptive resolution coding system 202 may divide the input video into a plurality of video segments. For example, the adaptive resolution coding system 202 may divide the input video into a plurality of video segments having the same time length such as one second, two seconds, etc. In implementations, the adaptive resolution coding system 202 may divide the input video (e.g., a stored video) into a predetermined number of video segments.
  • the input video e.g., a stored video
  • the adaptive resolution coding system 202 may divide the input video into a plurality of video segments based on any scene change detection (SCD) method and/or shot transition detection (STD) method.
  • SCD scene change detection
  • STD shot transition detection
  • the plurality of video segments that are obtained by the adaptive resolution coding system 202 may have different lengths.
  • an amount of change between two video frames may be used for detecting a presence of a scene change. For example, if scenes between two video frames are different, a residual obtained after performing motion compensation between these two video frames is usually large, or a difference between pixel values of these two video frames is usually large.
  • predetermined threshold (s) may be set up for a residual associated with motion compensation between two video frames and/or a difference between pixel values of two video frames.
  • the adaptive resolution coding system 202 may determine that a scene change occurs between the two video frames.
  • the adaptive resolution coding system 202 may divide the input video into a plurality of video segments, with boundaries of a video segment of the plurality of video segments corresponding to positions of respective scene changes that are detected.
  • the input video may be divided into a plurality of video segments, and the adaptive resolution coding system 202 may determine or obtain at least one video frame and a respective bit rate from each video segment of the plurality of video segments according to a similar approach as described above.
  • the adaptive resolution coding system 202 may input the image information of the at least one video frame and the bit rate into a trained machine learning model to determine or obtain a recommended resolution.
  • the adaptive resolution coding system 202 may be associated with one or more trained machine learning models that are configured to receive image information of one or more video frames and respective one or more bit rates as inputs, and produce a recommended resolution (or resampling ratio) as an output.
  • the one or more trained machine learning models may be able to process video frames of a particular resolution or different resolutions.
  • the adaptive resolution coding system 202 may have one or more trained machine learning models that are stored in the memory 304, e.g., stored in the model database 314. The adaptive resolution coding system 202 may select a trained machine learning model from the model database 314, and input the image information of the at least one video frame and the bit rate into the trained machine learning model to obtain a recommended resolution (or resampling ratio) .
  • one or more trained machine learning models may be stored in a remote device (for example, a server, a cloud, or a data center, etc. ) that is accessible to the adaptive resolution coding system 202 through a network, e.g., the network 210.
  • the adaptive resolution coding system 202 may send the image information of the at least one video frame and the bit rate to the remote device through the network 210 to request the remote device for determining a recommended resolution (or resampling ratio) , and receive the recommended resolution from the remote device after the remote device determines the recommended resolution (or resampling ratio) using a trained machine learning model therein.
  • the image information of the at least one video frame inputted into the machine learning model may vary.
  • the image information of the at least one video frame may include image data (e.g., pixel values) of the at least one video frame, because feature extraction or detection can be performed in the first few layers of the neural network model.
  • the image information of the at least one video frame may include feature data of the at least one video frame, such as information about a presence or absence of certain features (such as an edge, a corner, a shape, etc. ) at different positions or coordinates on the at least one video frame.
  • the adaptive resolution coding system 206 may perform such feature extraction or detection at block 404 as described above.
  • the input video may be divided into a plurality of video segments as described above.
  • the adaptive resolution coding system 202 may use at least one respective video frame and a respective bit rate of each video segment as an input to a trained machine learning model to obtain or determine a respective recommended resolution (or resampling ratio) .
  • the adaptive resolution coding system 202 may determine or calculate a resulting resolution (or resampling ratio) based on the respective recommended resolutions (or resampling ratios) of the plurality of video segments as a recommended resolution (or resampling ratio) for the input video.
  • the adaptive resolution coding system 202 may further determine a resolution (or resampling ratio) that is representative of the respective recommended resolutions (or resampling ratios) of the plurality of video segments as the recommended resolution (or resampling ratio) for the input video.
  • the resolution (or resampling ratio) that is representative of the respective recommended resolutions (or resampling ratios) of the plurality of video segments may include, but is not limited to, an average of the respective recommended resolutions (or resampling ratios) of the plurality of video segments, a median of the respective recommended resolutions (or resampling ratios) of the plurality of video segments, etc.
  • the adaptive resolution coding system 202 may randomly select one of the respective recommended resolutions (or resampling ratios) of the plurality of video segments as the recommended resolution (or resampling ratio) for the input video.
  • the adaptive resolution coding system 202 may resample or resize the at least one video frame and one or more other video frames associated with the at least one video frame based on the recommended resolution (or resampling ratio) .
  • the adaptive resolution coding system 202 may resize or resample (e.g., downsample) the at least one video frame and one or more other video frames associated with the at least one video frame from the input resolution to the recommended resolution (or by the recommended resampling ratio) .
  • the adaptive resolution coding system 202 may downsample the input video from the input resolution to the recommended resolution (or by the recommended resampling ratio) .
  • the at least one video frame may include an intra frame, and the one or more other video frames associated with the at least one video frame may include inter frames depending on the intra frame.
  • the input video may be divided into a plurality of video segments as described above, and the adaptive resolution coding system 202 may resize or resample (e.g., downsample) the input video including the plurality of video segments from the input resolution to the same recommended resolution (or by the same recommended resampling ratio) as described in the above situation when the input video are divided into the plurality of video segments.
  • the adaptive resolution coding system 202 may resize or resample (e.g., downsample) the input video including the plurality of video segments from the input resolution to the same recommended resolution (or by the same recommended resampling ratio) as described in the above situation when the input video are divided into the plurality of video segments.
  • the input video may be divided into a plurality of video segments, and the adaptive resolution coding system 202 may further divide the plurality of video segments into a plurality of video groups that may not overlap with each other.
  • Each video group may include one or more video segments.
  • the adaptive resolution coding system 202 may divide the plurality of video segments into a plurality of video groups based on a predetermined number of video segments and/or a predetermined time period.
  • the adaptive resolution coding system 202 may group video segments whose recommended resolutions (or resampling ratios) have been determined and which have not been resized or resampled as an individual video group.
  • the adaptive resolution coding system 202 may resize or resample (e.g., downsample) a video group to a recommended resolution (or by a recommended resampling ratio) associated with that video group, encode the resized video group, and send the encoded video group to another device over the network 210, without waiting for other subsequent video groups, thus further speeding up a process of transmitting the input video from one device to another device.
  • a recommended resolution or by a recommended resampling ratio
  • a recommended resolution (or a recommended resampling ratio) associated with a video group may be determined as described above by selecting a resolution (or a resampling ratio) that is representative of resolutions (or resampling ratios) of video frames included in the video group, or by randomly selecting a resolution (or a resampling ratio) from among the resolutions (or the resampling ratios) of the video frames included in the video group.
  • the adaptive resolution coding system 202 may resize or resample the input video (or video segment or video group) using a predetermined resizing filter or resampling filter such as a downsampling filter.
  • a predetermined resizing filter or resampling filter such as a downsampling filter.
  • the predetermined resizing filter or resampling filter may include, but are not limited to, a downsampling filter such as a bi-linear filter, an averaging filter, a Lanczos filter, a convolutional filter, etc.
  • the adaptive resolution coding system 202 may encode or compress the resized video according to the target bit rate.
  • the adaptive resolution coding system 202 may encode or compress the resized video using an encoder (e.g., one of the encoders 308) at the target bit rate.
  • an encoder e.g., one of the encoders 308
  • the adaptive resolution coding system 202 or the encoder 308 may encode the resized video into a MPEG-4 format, a H. 264 format, or any format that is supported by the encoder 308 and/or agreed upon between the adaptive resolution coding system 202 and the other device (i.e., the client device 206) .
  • the adaptive resolution coding system 202 or the encoder 308 may encode a video group according to the target bit rate to produce an encoded video group without waiting for other subsequent video groups.
  • the adaptive resolution coding system 202 may transmit the encoded video to another device over a network.
  • the adaptive resolution coding system 202 may transmit the encoded video to another device (such as the client device 206) over a network, e.g., the network 210.
  • the adaptive resolution coding system 202 may send an encoded video group to the other device over the network, without waiting for other subsequent encoded video groups. This further improves the speed of video transmission without a need of waiting for a completion of encoding the entire video, which could take tens of seconds, or minutes.
  • the adaptive resolution coding system 202 may further send information of the plurality of video groups to the other device, so that the other device can recover the input video from the plurality of video groups.
  • the adaptive resolution coding system 202 may include or insert respective sequence numbers of the plurality of video groups of the input video in corresponding data headers of data packets including the plurality of video groups, and a data header of a data packet including the last video group includes a special label indicating that the video group included in this data packet is the last video group of the input video. The other device can then recover the input video based on the sequence numbers included in the data headers of the data packets that are received.
  • an inclusion of a sequence number in a data header of a data packet including a video group as described above may or may not be used, depending on whether a strict in-order requirement (i.e., a requirement for a correct order of video groups to be displayed) is imposed at the other device.
  • the adaptive resolution coding system 202 may include or insert respective sequence numbers of the plurality of video groups of the input video in corresponding data headers of data packets including the plurality of video groups if the strict in-order requirement is imposed.
  • the adaptive resolution coding system 202 may further send additional information.
  • the additional information may include, but is not limited to, information of an original resolution of the input video (i.e., the input resolution) , information of the resampling filter (such as the downsampling filter) that is used for resizing or resampling to the other device, etc.
  • the other device may restore the video to the original resolution by decoding and resizing (e.g., upsampling) using an opposite or conjugate filter (such as a corresponding upsampling filter) .
  • the adaptive resolution coding system 202 may encode a certain video group of an input video at a target bit rate, while resizing or resampling one or more video segments that are located after the video group according to a recommended resolution (or resampling ratio) . Additionally or alternatively, the adaptive resolution coding system 202 may determine a recommended resolution (or resampling ratio) for at least one video frame of a video segment, while preliminarily encoding video segments that are located after the video segment to determine respective one or more video frames and bit rates as inputs to a trained machine learning model.
  • the adaptive resolution coding system 202 is described to obtain the bit rate of the at least one video frame, and use the bit rate as one of the inputs to the machine learning model in the above blocks, in other instances, the video resolution coding system 202 may obtain a size (e.g., an amount of bits) of the at least one video frame, and use the size of the at least one video frame as one of the inputs to the machine learning model instead.
  • the target bit rate is used as one of the inputs to the machine learning model, instead of the bit rate or the size of the at least one video frame.
  • the client device 206 may receive an encoded or compressed video.
  • the client device 206 may receive an encoded video from the video resolution coding system 202 or another device such as the client device 204 the network 210. In implementations, the client device 206 may further receive additional information, which may include, but is not limited to, information of an original or intended resolution to which the encoded video is resized or restored, information of a resizing or resampling filter that has been used in the encoded video, etc. In implementations, if the encoded video is a video group of an input video that is sent from the video resolution coding system 202 or the client device 204, the additional information may further include a sequence number associated with the video group.
  • the client device 206 may decode the encoded or compressed video to obtain a decoded or decompressed video.
  • the client device 206 may decode the encoded video into a video format that is supported by the client device 206.
  • the encoded video may be a compressed video, and decoding the encoded video may include decompressing the compressed video.
  • Examples of the video format include, but are not limited to, a H. 264 format, a MPEG-4 format, an AVI format, etc.
  • the client device 206 may resize the decoded video.
  • the client device 206 may resize the decoded video to the original resolution using an upsampling filter that is opposite or conjugate to the downsampling filter used in the encoded video.
  • the client device 206 may play or present the resized video to a user of the client device 206, and/or store the resized video in a memory of the client device 206.
  • the client device 206 may display or store the video group according to a correct order of the plurality of video groups. For example, the client device 206 may place the video group in a buffer of the client device 206, and arrange the video group in a right position among one or more video groups (of the plurality of video groups) that have been received by the client device 206 according to a sequence number associated with the video group. The client device 206 may display the video group after video group (s) located prior thereto is/are displayed.
  • the video received by the client device 206 at block 502 is a video group of a plurality of video groups of a video intended to be stored in the client device 206
  • the client device 206 may place and arrange the video group in the buffer of the client device 206, and wait until all the video groups of the video are received to combine and store the video groups as a single video in the memory of the client device 206.
  • any of the acts of any of the methods described herein may be implemented at least partially by a processor or other electronic device based on instructions stored on one or more computer-readable media.
  • any of the acts of any of the methods described herein may be implemented under control of one or more processors configured with executable instructions that may be stored on one or more computer-readable media.
  • a neural network model is used herein as an example of the trained machine learning model described above. It should be noted that the present disclosure is not limited to this example neural network model, and other types of machine learning models can also be used and applicable to the present disclosure.
  • a neural network model such as a convolutional neural network (CNN) model
  • CNN convolutional neural network
  • a convolutional neural network (CNN) model such as Mobilenet v2
  • image information of at least one video frame as described above may be pixel values of the at least one video frame.
  • training samples with image information of respective video frames of a plurality of videos and corresponding bit rates of the respective video frames of the plurality of videos (or respective target bit rates) as inputs, and respective known optimal resolutions (or resampling ratios) as outputs may be used for training the neural network model.
  • These training samples may be obtained by a brute force approach or from a third-party database.
  • parameters (such as connection weights between nodes of same and different layers, biases, etc. ) of the neural network model are learned and determined using a subset of the training samples based on a particular optimization or training algorithm, such as a gradient descent method, a conjugate gradient method, a Quasi-Newton method, etc.
  • the neural network model may be tested and validated using another subset of the training samples. If an accuracy of recognition is less than a predetermined threshold, the neural network model may be retrained until the accuracy of recognition is greater than or equal to the predetermined threshold.
  • the neural network model may have a different number of convolutional layers, and/or a different number of feature maps in each layer, depending on the desired complexity of the neural network model, the desired accuracy of recognition, and/or the desired speed of computation for determining a recommended resolution (or resampling ratio) , etc. For example, if a recommended resolution or resampling ratio is needed for videos of a higher resolution, a higher number of features may exist in video frames of a video of the higher resolution, and so a higher number of feature maps in each layer may be desirable.
  • Clause 1 A method implemented by a computing device, the method comprising: determining at least one video frame and a corresponding bit rate of the at least one video frame; inputting the at least one video frame and the corresponding bit rate into a machine learning model to obtain a recommended resolution; resizing the at least one video frame and one or more other video frames associated with the at least one video frame according to the recommended resolution; and encoding the at least one video frame and the one or more other video frames to obtain an encoded video according to a target bit rate after the resizing.
  • Clause 2 The method of Clause 1, further comprising: sending the encoded video to a receiving computing device via a network.
  • Clause 3 The method of Clause 2, further comprising: sending information of a resizing filter for resizing the at least one video frame and the one or more other video frames to the receiving computing device via the network, the information of the resizing filter enabling the receiving computing device to undo the resizing using a corresponding reversing filter.
  • Clause 4 The method of Clause 1, wherein determining the at least one video frame and the bit rate of the at least one video frame comprises: encoding an input video comprising the at least one video frame according to the target bit rate; and extracting information of the at least one video frame and the corresponding bit rate from the encoded input video.
  • Clause 5 The method of Clause 4, wherein the input video comprises a subset of a streaming video or a stored video.
  • Clause 6 The method of Clause 1, wherein the at least one frame comprises an intra frame, and the one or more other video frames comprise one or more inter frames that are encoded based on the at least one frame.
  • Clause 7 The method of Clause 1, wherein the machine learning model is configured to receive video frames of a particular resolution and determine a corresponding resolution for resizing a video comprising the video frames of the particular resolution for transmission at a designated bit rate.
  • One or more processor-readable media storing executable instructions that, when executed by one or more processors, cause the one or more processors comprising: determining at least one video frame and a corresponding bit rate of the at least one video frame; inputting the at least one video frame and the corresponding bit rate into a machine learning model to obtain a recommended resolution; resizing the at least one video frame and one or more other video frames associated with the at least one video frame according to the recommended resolution; and encoding the at least one video frame and the one or more other video frames to obtain an encoded video according to a target bit rate after the resizing.
  • Clause 9 The one or more processor-readable media of Clause 8, the acts further comprising: sending the encoded video to a receiving computing device via a network.
  • Clause 10 The one or more processor-readable media of Clause 9, the acts further comprising: sending information of a resizing filter for resizing the at least one video frame and the one or more other video frames to the receiving computing device via the network, the information of the resizing filter enabling the receiving computing device to undo the resizing using a corresponding reversing filter.
  • Clause 11 The one or more processor-readable media of Clause 8, wherein determining the at least one video frame and the bit rate of the at least one video frame comprises: encoding an input video comprising the at least one video frame according to the target bit rate; and extracting information of the at least one video frame and the corresponding bit rate from the encoded input video.
  • Clause 12 The one or more processor-readable media of Clause 11, wherein the input video comprises a subset of a streaming video or a stored video.
  • Clause 13 The one or more processor-readable media of Clause 8, wherein the at least one frame comprises an intra frame, and the one or more other video frames comprise one or more inter frames that are encoded based on the at least one frame.
  • Clause 14 The one or more processor-readable media of Clause 8, wherein the machine learning model is configured to receive video frames of a particular resolution and determine a corresponding resolution for resizing a video comprising the video frames of the particular resolution for transmission at a designated bit rate.
  • a system comprising: one or more processors; and memory storing executable instructions that, when executed by the one or more processors, cause the one or more processors comprising: determining at least one video frame and a corresponding bit rate of the at least one video frame; inputting the at least one video frame and the corresponding bit rate into a machine learning model to obtain a recommended resolution; resizing the at least one video frame and one or more other video frames associated with the at least one video frame according to the recommended resolution; and encoding the at least one video frame and the one or more other video frames to obtain an encoded video according to a target bit rate after the resizing.
  • Clause 16 The system of Clause 15, the acts further comprising: sending the encoded video to a receiving computing device via a network.
  • Clause 17 The system of Clause 16, the acts further comprising: sending information of a resizing filter for resizing the at least one video frame and the one or more other video frames to the receiving computing device via the network, the information of the resizing filter enabling the receiving computing device to undo the resizing using a corresponding reversing filter.
  • Clause 18 The system of Clause 15, wherein determining the at least one video frame and the bit rate of the at least one video frame comprises: encoding an input video comprising the at least one video frame according to the target bit rate; and extracting information of the at least one video frame and the corresponding bit rate from the encoded input video.
  • Clause 19 The system of Clause 15, wherein the at least one frame comprises an intra frame, and the one or more other video frames comprise one or more inter frames that are encoded based on the at least one frame.
  • Clause 20 The system of Clause 15, wherein the machine learning model is configured to receive video frames of a particular resolution and determine a corresponding resolution for resizing a video comprising the video frames of the particular resolution for transmission at a designated bit rate.

Abstract

At least one video frame and a corresponding bit rate of the at least one video frame of a video may be determined. The video may be a streaming video or a subset of a stored video. The at least one video frame and the corresponding bit rate into may be inputted into a machine learning model to obtain a recommended resolution. The at least one video frame and one or more other video frames associated with the at least one video frame may then be resized or resampled (e.g., downsampled) according to the recommended resolution. After resizing, the at least one video frame and the one or more other video frames may be encoded to obtain an encoded video according to a target bit rate.

Description

Adaptive resolution coding Based on Machine Learning Model BACKGROUND
Video streaming and downloading/uploading are very common in people’s daily lives nowadays. A user may send or upload a video file from one device to another device such as a server or a computing device of another user. Along with the development of video technologies, people have an increasing demand for videos of a higher quality or resolution, such as high definition videos, ultra-high definition videos, etc. These high-quality or high-resolution videos usually have large file sizes, which may be of several hundred megabytes to several gigabytes, etc. These high-quality or high-resolution videos not only require a long period of time for uploading and transmitting over a communication network, but also incur a huge amount of traffic on the network, thus having a high transmission cost in terms of time and network bandwidth.
SUMMARY
This summary introduces simplified concepts of adaptive resolution coding, which will be further described below in the Detailed Description. This summary is not intended to identify essential features of the claimed subject matter, nor is it intended for use in limiting the scope of the claimed subject matter.
This application describes example implementations of adaptive resolution coding. In implementations, at least one video frame and a corresponding bit rate of the at least one video frame of a video may be determined. The video may  be a streaming video or a subset (such as a segment) of a stored video. In implementations, the at least one video frame and the corresponding bit rate into may be inputted into a machine learning model to obtain a recommended resolution. The at least one video frame and one or more other video frames associated with the at least one video frame may then be resized or resampled (e.g., downsampled) according to the recommended resolution. After resizing, the at least one video frame and the one or more other video frames may be encoded to obtain an encoded video according to a target bit rate.
BRIEF DESCRIPTION OF THE DRAWINGS
The detailed description is set forth with reference to the accompanying figures. In the figures, the left-most digit (s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items.
FIG. 1 illustrates an example relationship between bit rates and qualities of an example video.
FIG. 2 illustrates an example environment in which an adaptive resolution coding system may be used.
FIG. 3 illustrates an example adaptive resolution coding system in more detail.
FIG. 4 illustrates an example method of adaptive resolution coding.
FIG. 5 illustrates another example method of adaptive resolution coding.
DETAILED DESCRIPTION
Overview
As noted above, when videos of high-quality (such as videos of resolution of 4K, 8K, etc. ) are transmitted over a network, existing technologies suffer a high cost of transmission in term of both time and network bandwidth. As the qualities of videos that need to be transmitted increase, the cost of transmission of the videos increases sharply. This leads to a reduction in the network bandwidth that is available for other services in a network, and hence affecting the network performance for the other services.
This disclosure describes an example adaptive resolution coding system. In order to reduce the transmission cost for transmitting a given video at a certain bit rate (or called a target bit rate) as described above, the adaptive resolution coding system may first downsample the given video by a certain sampling ratio, and then encode the downsampled video at the target bit rate. The adaptive resolution coding system may then transmit the encoded video to another device over a communication network, so that the other device can restore the given video (e.g., restore an original resolution of the given video) by decoding and upsampling the encoded video.
In implementations, given a certain bit rate (or bandwidth) for transmitting a video, the quality of the video that is restored at a receiving end depends on an amount of downsampling (or a downsampling ratio) that is performed at a sending end. For example, FIG. 1 shows an example relationship 100 between bit rates and qualities of an example video that is restored after successive  operations of downsampling, encoding, decoding, and upsampling. As can be seen from FIG. 1, for a given bit rate, there exists a certain resolution (or a certain downsampling ratio) of a video from among different resolutions that the video can be downsampled to such resolution (or downsampling ratio) and then encoded at the given bit rate at a sending end, so that the video after restoration (i.e., decoding and upsampling) at a receiving end can attain the best quality among the different resolutions.
In implementations, the adaptive resolution coding system may employ a machine learning model to determine an optimal resolution or downsampling ratio for resizing an input video of an input resolution before encoding and transmitting the video at or around a specific bit rate to another device over a communication network. In implementations, the machine learning model may be trained using a training sample set of different videos having a particular resolution or different resolutions and respective known values of optimal downsampling ratios that produce the best qualities for the different videos. After values of parameters (such as weights) of the machine learning model are determined, the adaptive resolution coding system may apply the machine learning model to determine a recommended downsampling ratio or resolution for an input video. In implementations, the machine learning model may include, but is not limited to, a neural network model such as a convolutional neural network (CNN) , a Bayesian network, a decision tree, etc.
By way of example and not limitation, the described adaptive resolution coding system may receive an input video having an input resolution and  an instruction to transmit the input video at a certain bit rate (or a target bit rate) . The adaptive resolution coding system may obtain one or more frames (such as intra frames) and respective one or more bit rates from the input video. For example, the adaptive resolution coding system may attempt to encode the input video at the target rate, and obtain one or more intra frames and respective one or more bit rates after encoding. The adaptive resolution coding system may then input the one or more frames and the respective one or more bit rates into a trained machine learning model to obtain a recommended resolution or sampling ratio for resizing (e.g., downsampling) the input video. After obtaining the recommended resolution or sampling ratio from the trained machine learning model, the adaptive resolution coding system may resize the input video from the input resolution to the recommended resolution, and encode the resized input video according to a target bit rate for transmission over a communication network, thus reducing the transmission cost of the video while ensuring a high quality of the video after restoration (i.e., decoding and upsampling, for example) . In implementations, the input video may include, but is not limited to, some or all of a stored video, or some or all of a streaming video, etc.
In implementations, functions described herein to be performed by the adaptive resolution coding system may be performed by multiple separate units or services. For example, a receiving service may receive an input video and an instruction including a target bit rate, while an acquisition service may obtain one or more frames and respective one or more bit rates from the input video. A determination service may obtain a recommended resolution or sampling ratio for  resizing (e.g., downsampling) the input video based on a machine learning model. In implementations, an encoding service may encode the resized input video according to a target bit rate, while a transmission service may transmit the encoded video to another device over a communication network.
Moreover, although in the examples described herein, the adaptive resolution coding system may be implemented as software and/or hardware installed in a single device, in other examples, the adaptive resolution coding system may be implemented and distributed in multiple devices or as services provided in one or more servers over a network and/or in a cloud computing architecture.
The application describes multiple and varied embodiments and implementations. The following section describes an example framework that is suitable for practicing various implementations. Next, the application describes example systems, devices, and processes for implementing an adaptive resolution coding system.
Example Environment
FIG. 2 illustrates an example environment 200 usable to implement an adaptive resolution coding system. The environment 200 may include an adaptive resolution coding system 202. In this example, the adaptive resolution coding system 202 is described to be included in a client device 204. In some instances, the adaptive resolution coding system 202 may exist as an individual entity or device. In implementations, the environment 200 may further include another client device 206 and a server 208. The adaptive resolution coding system 202 or the client  device 204 may communicate data with the other client device 206 and the server 208 over a network 210. In implementations, the server 208 may be a server of a plurality of servers in a cloud or a data center.
In implementations, functions of the adaptive resolution coding system 202 may be included in or provided by the client device 204. In implementations, some or all of the functions of the adaptive resolution coding system 202 may be included in a cloud computing system or architecture, and may be provided as services to the client device 204.
In implementations, the client device 204 or the client device 206 may be implemented as any of a variety of computing devices including, but not limited to, a desktop computer, a notebook or portable computer, a handheld device, a netbook, an Internet appliance, a tablet or slate computer, a mobile device (e.g., a mobile phone, a personal digital assistant, a smart phone, etc. ) , a server computer, etc., or a combination thereof.
The network 210 may be a wireless or a wired network, or a combination thereof. The network 210 may be a collection of individual networks interconnected with each other and functioning as a single large network (e.g., the Internet or an intranet) . Examples of such individual networks include, but are not limited to, telephone networks, cable networks, Local Area Networks (LANs) , Wide Area Networks (WANs) , and Metropolitan Area Networks (MANs) . Further, the individual networks may be wireless or wired networks, or a combination thereof. Wired networks may include an electrical carrier connection (such a communication cable, etc. ) and/or an optical carrier or connection (such as an optical fiber  connection, etc. ) . Wireless networks may include, for example, a WiFi network, other radio frequency networks (e.g., 
Figure PCTCN2019111598-appb-000001
Zigbee, etc. ) , etc.
In implementations, the adaptive resolution coding system 202 may receive an instruction to transmit an input video at a target bit rate. The adaptive resolution coding system 202 may determine a recommended resolution or downsampling ratio for the input video based on a machine learning model, downsample the input video according to the recommended resolution or downsampling ratio, and encode the input video to obtain an encoded video for storage or transmission by the client device 204 or the adaptive resolution coding system 202.
Example Adaptive resolution coding system
FIG. 3 illustrates the adaptive resolution coding system 202 in more detail. In implementations, the adaptive resolution coding system 202 may include, but is not limited to, one or more processors 302, memory 304, and program data 306. In implementations, the adaptive resolution coding system 202 may further include one or more encoders 308, an input/output (I/O) interface 310, and/or a network interface 312. In implementations, some or all of the functions of the adaptive resolution coding system 202 may be implemented using hardware, for example, an ASIC (i.e., Application-Specific Integrated Circuit) , a FPGA (i.e., Field-Programmable Gate Array) , and/or other hardware. By way of example and not limitation, the one or more encoders 308 of the adaptive resolution coding system 202 may be implemented using an ASIC, a FPGA, and/or any other hardware.
In implementations, the one or more processors 302 are configured to execute instructions that are stored in the memory 304, and/or received from the input/output interface 310, and/or the network interface 312. In implementations, the one or more processors 302 may be implemented as one or more hardware processors including, for example, a microprocessor, an application-specific instruction-set processor, a physics processing unit (PPU) , a central processing unit (CPU) , a graphics processing unit, a digital signal processor, etc. Additionally or alternatively, the functionality described herein can be performed, at least in part, by one or more hardware logic components. For example, and without limitation, illustrative types of hardware logic components that can be used include field-programmable gate arrays (FPGAs) , application-specific integrated circuits (ASICs) , application-specific standard products (ASSPs) , system-on-a-chip systems (SOCs) , complex programmable logic devices (CPLDs) , etc.
The memory 304 may include processor-readable media in a form of volatile memory, such as Random Access Memory (RAM) and/or non-volatile memory, such as read only memory (ROM) or flash RAM. The memory 304 is an example of processor-readable media.
The processor-readable media may include a volatile or non-volatile type, a removable or non-removable media, which may achieve storage of information using any method or technology. The information may include a processor-readable instruction, a data structure, a program module or other data. Examples of processor-readable media include, but not limited to, phase-change memory (PRAM) , static random access memory (SRAM) , dynamic random access  memory (DRAM) , other types of random-access memory (RAM) , read-only memory (ROM) , electronically erasable programmable read-only memory (EEPROM) , quick flash memory or other internal storage technology, compact disk read-only memory (CD-ROM) , digital versatile disc (DVD) or other optical storage, magnetic cassette tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission media, which may be used to store information that may be accessed by a computing device. As defined herein, the processor-readable media does not include any transitory media, such as modulated data signals and carrier waves.
Although in this example, only hardware components are described in the adaptive resolution coding system 202, in other instances, the adaptive resolution coding system 202 may further include other hardware components and/or other software components such as program units to execute instructions stored in the memory 304 for performing various operations such as processing, determination, allocation, storage, etc. In some instances, the adaptive resolution coding system 202 may further include a model database 314 that is configured to store information of one or more trained machine learning models used for determining recommended resolutions for videos of different input resolutions.
Example Methods
FIGS. 4 and 5 show a schematic diagram depicting an example method of adaptive resolution coding. The methods of FIGS. 4 and 5 may, but need not, be implemented in the environment of FIG. 2 and using the system of FIG. 3. For ease of explanation,  methods  400 and 500 are described with reference to FIGS. 1-3.  However, the  methods  400 and 500 may alternatively be implemented in other environments and/or using other systems.
The  methods  400 and 500 are described in the general context of computer-executable instructions. Generally, computer-executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, and the like that perform particular functions or implement particular abstract data types. Furthermore, each of the example methods are illustrated as a collection of blocks in a logical flow graph representing a sequence of operations that can be implemented in hardware, software, firmware, or a combination thereof. The order in which the method is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method, or alternate methods. Additionally, individual blocks may be omitted from the method without departing from the spirit and scope of the subject matter described herein. In the context of software, the blocks represent computer instructions that, when executed by one or more processors, perform the recited operations. In the context of hardware, some or all of the blocks may represent application specific integrated circuits (ASICs) or other physical components that perform the recited operations.
Referring back to FIG. 4, at block 402, the adaptive resolution coding system 202 may receive an input video and information of a target bit rate.
In implementations, the adaptive resolution coding system 202 may receive an instruction to encode an input video of a certain resolution (which is called as an input resolution) at a target bit rate (or a target bandwidth) from a user  of the client device 204. The input video may include, but is not limited to, a stored video or a streaming video. In implementations, the input video may be a subset (e.g., a segment) of a stored video or streaming video.
At block 404, the adaptive resolution coding system 202 may determine or obtain image information of at least one video frame and a bit rate of the at least one video frame from the input video.
In implementations, in response to receiving the input video, the adaptive resolution coding system 202 may determine or obtain image information of at least one video frame and a bit rate of the at least one video frame from the input video, which can be used as an input to a trained machine learning model for determining or obtaining a recommended resolution (or a resampling ratio) . For example, the adaptive resolution coding system 202 may encode the input video of the input resolution at the target bit rate, and extract or obtain image information of at least one video frame and a bit rate of the at least one video frame from the encoded video. In implementations, encoding the input video of the input resolution at the target bit rate may include compressing a size of the input video, so that an average bit rate of transmitting the compressed video over a communication network (such as the network 210) is at or around the target bit rate.
In implementations, the adaptive resolution coding system 202 may obtain at least one video frame and a bit rate thereof by calculating a prediction residual of the input video and estimating resulting bits for the at least one video frame (e.g., the first video frame of the input video) and the bit rate thereof based on the prediction residual. In this case, the adaptive resolution coding system 202 may  not need to encode or compress the input video completely. In implementations, the adaptive resolution coding system 202 may randomly select a portion of the input video, encode or compress the selected portion of the input video, and extract or obtain image information of at least one video frame and a bit rate of the at least one video frame from the encoded or compressed portion of the input video.
In implementations, the at least one video frame may include, but is not limited to, an intra frame that is representative of the input video from among all intra frames of the input video, or an intra frame that is randomly selected from among all the intra frames of the input video. In implementations, the image information of the at least one video frame may include, but is not limited to, image data of the at least one video frame (such as pixel values at each coordinate in the at least one video frame) . Additionally or alternatively, the image information of the at least one video frame may include feature data of the at least one video frame. For example, the adaptive resolution coding system 202 may perform feature extraction or detection on the at least one video frame after obtaining the at least one video frame from the input video, and additionally or alternatively use feature data that is obtained from the feature extraction or detection as the image information of the at least one video frame. Examples of the feature extraction or detection may include, but are not limited to, edge detection, corner detection, blob detection, curvature detection, shape-based detection, Hough transform, etc. Depending on what type of machine learning model is used at a later stage (i.e., a stage of determination of a recommended resolution or resampling ratio) , one or more types of feature extraction may be performed on the at least one video frame.
In implementations, the intra frame that is representative of the input video may include, but is not limited to, a first intra frame of the input video, an intra frame having a bit rate that is a median of bit rates associated with intra frames of the input video, an intra frame having a bit rate that is closest to an average of the bits rates associated with the intra frames of the input video, etc.
In implementations, the at least one intra frame may include one or more intra frames that are representative of the input video, and/or one or more intra frames that are randomly selected from among the intra frames of the input video.
In implementations, depending on the number of video frames and/or the size of the input video, the adaptive resolution coding system 202 may divide the input video into a plurality of video segments. For example, the adaptive resolution coding system 202 may divide the input video into a plurality of video segments having the same time length such as one second, two seconds, etc. In implementations, the adaptive resolution coding system 202 may divide the input video (e.g., a stored video) into a predetermined number of video segments.
In implementations, the adaptive resolution coding system 202 may divide the input video into a plurality of video segments based on any scene change detection (SCD) method and/or shot transition detection (STD) method. In an event that a scene change detection (SCD) method and/or a shot transition detection (STD) method is employed, the plurality of video segments that are obtained by the adaptive resolution coding system 202 may have different lengths.
By way of example and not limitation, an amount of change between two video frames may be used for detecting a presence of a scene change. For example, if scenes between two video frames are different, a residual obtained after performing motion compensation between these two video frames is usually large, or a difference between pixel values of these two video frames is usually large. In implementations, predetermined threshold (s) may be set up for a residual associated with motion compensation between two video frames and/or a difference between pixel values of two video frames. In response to detecting that a residual associated with motion compensation between two video frames and/or a difference between pixel values of two video frames is/are greater than respective predetermined threshold (s) , the adaptive resolution coding system 202 may determine that a scene change occurs between the two video frames. The adaptive resolution coding system 202 may divide the input video into a plurality of video segments, with boundaries of a video segment of the plurality of video segments corresponding to positions of respective scene changes that are detected.
In implementations, the input video may be divided into a plurality of video segments, and the adaptive resolution coding system 202 may determine or obtain at least one video frame and a respective bit rate from each video segment of the plurality of video segments according to a similar approach as described above.
At block 406, after determining at least one video frame and a respective bit rate, the adaptive resolution coding system 202 may input the image information of the at least one video frame and the bit rate into a trained machine learning model to determine or obtain a recommended resolution.
In implementations, the adaptive resolution coding system 202 may be associated with one or more trained machine learning models that are configured to receive image information of one or more video frames and respective one or more bit rates as inputs, and produce a recommended resolution (or resampling ratio) as an output. In implementations, the one or more trained machine learning models may be able to process video frames of a particular resolution or different resolutions. By way of example and not limitation, the adaptive resolution coding system 202 may have one or more trained machine learning models that are stored in the memory 304, e.g., stored in the model database 314. The adaptive resolution coding system 202 may select a trained machine learning model from the model database 314, and input the image information of the at least one video frame and the bit rate into the trained machine learning model to obtain a recommended resolution (or resampling ratio) .
Additionally or alternatively, one or more trained machine learning models may be stored in a remote device (for example, a server, a cloud, or a data center, etc. ) that is accessible to the adaptive resolution coding system 202 through a network, e.g., the network 210. The adaptive resolution coding system 202 may send the image information of the at least one video frame and the bit rate to the remote device through the network 210 to request the remote device for determining a recommended resolution (or resampling ratio) , and receive the recommended resolution from the remote device after the remote device determines the recommended resolution (or resampling ratio) using a trained machine learning model therein.
In implementations, depending on a type of the machine learning model, the image information of the at least one video frame inputted into the machine learning model may vary. By way of example and not limitation, if the machine learning model is a neural network model, the image information of the at least one video frame may include image data (e.g., pixel values) of the at least one video frame, because feature extraction or detection can be performed in the first few layers of the neural network model. In implementations, if the machine learning model is a decision tree model, the image information of the at least one video frame may include feature data of the at least one video frame, such as information about a presence or absence of certain features (such as an edge, a corner, a shape, etc. ) at different positions or coordinates on the at least one video frame. The adaptive resolution coding system 206 may perform such feature extraction or detection at block 404 as described above.
In implementations, the input video may be divided into a plurality of video segments as described above. In this case, the adaptive resolution coding system 202 may use at least one respective video frame and a respective bit rate of each video segment as an input to a trained machine learning model to obtain or determine a respective recommended resolution (or resampling ratio) . After obtaining respective recommended resolutions (or resampling ratios) for the plurality of video segments of the input video, the adaptive resolution coding system 202 may determine or calculate a resulting resolution (or resampling ratio) based on the respective recommended resolutions (or resampling ratios) of the plurality of video segments as a recommended resolution (or resampling ratio) for the input video.
In implementations, the adaptive resolution coding system 202 may further determine a resolution (or resampling ratio) that is representative of the respective recommended resolutions (or resampling ratios) of the plurality of video segments as the recommended resolution (or resampling ratio) for the input video. By way of example and not limitation, the resolution (or resampling ratio) that is representative of the respective recommended resolutions (or resampling ratios) of the plurality of video segments may include, but is not limited to, an average of the respective recommended resolutions (or resampling ratios) of the plurality of video segments, a median of the respective recommended resolutions (or resampling ratios) of the plurality of video segments, etc. In implementations, the adaptive resolution coding system 202 may randomly select one of the respective recommended resolutions (or resampling ratios) of the plurality of video segments as the recommended resolution (or resampling ratio) for the input video.
At block 408, in response to determining or obtaining the recommended resolution (or resampling ratio) for the input video, the adaptive resolution coding system 202 may resample or resize the at least one video frame and one or more other video frames associated with the at least one video frame based on the recommended resolution (or resampling ratio) .
In implementations, the adaptive resolution coding system 202 may resize or resample (e.g., downsample) the at least one video frame and one or more other video frames associated with the at least one video frame from the input resolution to the recommended resolution (or by the recommended resampling ratio) . In implementations, the adaptive resolution coding system 202 may  downsample the input video from the input resolution to the recommended resolution (or by the recommended resampling ratio) . In implementations, the at least one video frame may include an intra frame, and the one or more other video frames associated with the at least one video frame may include inter frames depending on the intra frame.
In implementations, the input video may be divided into a plurality of video segments as described above, and the adaptive resolution coding system 202 may resize or resample (e.g., downsample) the input video including the plurality of video segments from the input resolution to the same recommended resolution (or by the same recommended resampling ratio) as described in the above situation when the input video are divided into the plurality of video segments.
In implementations, the input video may be divided into a plurality of video segments, and the adaptive resolution coding system 202 may further divide the plurality of video segments into a plurality of video groups that may not overlap with each other. Each video group may include one or more video segments. In implementations, the adaptive resolution coding system 202 may divide the plurality of video segments into a plurality of video groups based on a predetermined number of video segments and/or a predetermined time period. By way of example and not limitation, after obtaining or determining recommended resolutions (or resampling ratios) for a predetermined number of video segments and/or after a predetermined period of time has past, the adaptive resolution coding system 202 may group video segments whose recommended resolutions (or resampling ratios) have been  determined and which have not been resized or resampled as an individual video group.
In implementations, the adaptive resolution coding system 202 may resize or resample (e.g., downsample) a video group to a recommended resolution (or by a recommended resampling ratio) associated with that video group, encode the resized video group, and send the encoded video group to another device over the network 210, without waiting for other subsequent video groups, thus further speeding up a process of transmitting the input video from one device to another device. A recommended resolution (or a recommended resampling ratio) associated with a video group may be determined as described above by selecting a resolution (or a resampling ratio) that is representative of resolutions (or resampling ratios) of video frames included in the video group, or by randomly selecting a resolution (or a resampling ratio) from among the resolutions (or the resampling ratios) of the video frames included in the video group.
In implementations, the adaptive resolution coding system 202 may resize or resample the input video (or video segment or video group) using a predetermined resizing filter or resampling filter such as a downsampling filter. Examples of the predetermined resizing filter or resampling filter may include, but are not limited to, a downsampling filter such as a bi-linear filter, an averaging filter, a Lanczos filter, a convolutional filter, etc.
At block 410, the adaptive resolution coding system 202 may encode or compress the resized video according to the target bit rate.
In implementations, after the input video is resized or resampled, the adaptive resolution coding system 202 may encode or compress the resized video using an encoder (e.g., one of the encoders 308) at the target bit rate. For example, the adaptive resolution coding system 202 or the encoder 308 may encode the resized video into a MPEG-4 format, a H. 264 format, or any format that is supported by the encoder 308 and/or agreed upon between the adaptive resolution coding system 202 and the other device (i.e., the client device 206) . In implementations, if video grouping is employed, the adaptive resolution coding system 202 or the encoder 308 may encode a video group according to the target bit rate to produce an encoded video group without waiting for other subsequent video groups.
At block 412, the adaptive resolution coding system 202 may transmit the encoded video to another device over a network.
In implementations, after an encoded video according to the target bit rate is obtained, the adaptive resolution coding system 202 may transmit the encoded video to another device (such as the client device 206) over a network, e.g., the network 210. In implementations, if video grouping is employed, the adaptive resolution coding system 202 may send an encoded video group to the other device over the network, without waiting for other subsequent encoded video groups. This further improves the speed of video transmission without a need of waiting for a completion of encoding the entire video, which could take tens of seconds, or minutes.
In implementations, if the input video is sent to the other device (such as the client device 206) for storage in the other device, the adaptive resolution  coding system 202 may further send information of the plurality of video groups to the other device, so that the other device can recover the input video from the plurality of video groups. By way of example and not limitation, the adaptive resolution coding system 202 may include or insert respective sequence numbers of the plurality of video groups of the input video in corresponding data headers of data packets including the plurality of video groups, and a data header of a data packet including the last video group includes a special label indicating that the video group included in this data packet is the last video group of the input video. The other device can then recover the input video based on the sequence numbers included in the data headers of the data packets that are received.
In implementations, if the input video is sent to the other device as a streaming video (or a video stream) , an inclusion of a sequence number in a data header of a data packet including a video group as described above may or may not be used, depending on whether a strict in-order requirement (i.e., a requirement for a correct order of video groups to be displayed) is imposed at the other device. For example, the adaptive resolution coding system 202 may include or insert respective sequence numbers of the plurality of video groups of the input video in corresponding data headers of data packets including the plurality of video groups if the strict in-order requirement is imposed.
In implementations, the adaptive resolution coding system 202 may further send additional information. The additional information may include, but is not limited to, information of an original resolution of the input video (i.e., the input resolution) , information of the resampling filter (such as the downsampling filter)  that is used for resizing or resampling to the other device, etc. As such, the other device may restore the video to the original resolution by decoding and resizing (e.g., upsampling) using an opposite or conjugate filter (such as a corresponding upsampling filter) .
Although the above method blocks are described to be executed in a particular order, in some implementations, some or all of the method blocks can be executed in other orders, or in parallel. For example, the adaptive resolution coding system 202 may encode a certain video group of an input video at a target bit rate, while resizing or resampling one or more video segments that are located after the video group according to a recommended resolution (or resampling ratio) . Additionally or alternatively, the adaptive resolution coding system 202 may determine a recommended resolution (or resampling ratio) for at least one video frame of a video segment, while preliminarily encoding video segments that are located after the video segment to determine respective one or more video frames and bit rates as inputs to a trained machine learning model.
Furthermore, although the adaptive resolution coding system 202 is described to obtain the bit rate of the at least one video frame, and use the bit rate as one of the inputs to the machine learning model in the above blocks, in other instances, the video resolution coding system 202 may obtain a size (e.g., an amount of bits) of the at least one video frame, and use the size of the at least one video frame as one of the inputs to the machine learning model instead. Moreover, in some implementations, the target bit rate is used as one of the inputs to the  machine learning model, instead of the bit rate or the size of the at least one video frame.
Referring back to FIG. 5, at block 502 the client device 206 may receive an encoded or compressed video.
In implementations, the client device 206 may receive an encoded video from the video resolution coding system 202 or another device such as the client device 204 the network 210. In implementations, the client device 206 may further receive additional information, which may include, but is not limited to, information of an original or intended resolution to which the encoded video is resized or restored, information of a resizing or resampling filter that has been used in the encoded video, etc. In implementations, if the encoded video is a video group of an input video that is sent from the video resolution coding system 202 or the client device 204, the additional information may further include a sequence number associated with the video group.
At block 504, the client device 206 may decode the encoded or compressed video to obtain a decoded or decompressed video.
In implementations, the client device 206 may decode the encoded video into a video format that is supported by the client device 206. In implementations, the encoded video may be a compressed video, and decoding the encoded video may include decompressing the compressed video. Examples of the video format include, but are not limited to, a H. 264 format, a MPEG-4 format, an AVI format, etc.
At block 506, the client device 206 may resize the decoded video.
In implementations, based on the additional information received from the adaptive resolution coding system 202 or the client device 204, the client device 206, the client device 206 may resize the decoded video to the original resolution using an upsampling filter that is opposite or conjugate to the downsampling filter used in the encoded video.
At block 508, the client device 206 may play or present the resized video to a user of the client device 206, and/or store the resized video in a memory of the client device 206.
In implementations, if the video received by the client device 206 at block 502 is a video group of a plurality of video groups of a streaming video (or a video stream) , the client device 206 may display or store the video group according to a correct order of the plurality of video groups. For example, the client device 206 may place the video group in a buffer of the client device 206, and arrange the video group in a right position among one or more video groups (of the plurality of video groups) that have been received by the client device 206 according to a sequence number associated with the video group. The client device 206 may display the video group after video group (s) located prior thereto is/are displayed. In implementations, the video received by the client device 206 at block 502 is a video group of a plurality of video groups of a video intended to be stored in the client device 206, the client device 206 may place and arrange the video group in the buffer of the client device 206, and wait until all the video groups of the video are received to combine and store the video groups as a single video in the memory of the client device 206.
Any of the acts of any of the methods described herein may be implemented at least partially by a processor or other electronic device based on instructions stored on one or more computer-readable media. By way of example and not limitation, any of the acts of any of the methods described herein may be implemented under control of one or more processors configured with executable instructions that may be stored on one or more computer-readable media.
Example Machine Learning Model
By way of example and not limitation, a neural network model is used herein as an example of the trained machine learning model described above. It should be noted that the present disclosure is not limited to this example neural network model, and other types of machine learning models can also be used and applicable to the present disclosure.
In implementations, a neural network model, such as a convolutional neural network (CNN) model, may be used as the machine learning model described above. By way of example and for the sake of simplicity, a convolutional neural network (CNN) model, such as Mobilenet v2, may be used as an example and backbone for the machine learning model as described above. In this example, since the convolutional neural network model has characteristics or capabilities of performing extraction of image features, and thus image information of at least one video frame as described above may be pixel values of the at least one video frame. In implementations, training samples with image information of respective video frames of a plurality of videos and corresponding bit rates of the respective video  frames of the plurality of videos (or respective target bit rates) as inputs, and respective known optimal resolutions (or resampling ratios) as outputs may be used for training the neural network model. These training samples may be obtained by a brute force approach or from a third-party database. In implementations, parameters (such as connection weights between nodes of same and different layers, biases, etc. ) of the neural network model are learned and determined using a subset of the training samples based on a particular optimization or training algorithm, such as a gradient descent method, a conjugate gradient method, a Quasi-Newton method, etc. After training, the neural network model may be tested and validated using another subset of the training samples. If an accuracy of recognition is less than a predetermined threshold, the neural network model may be retrained until the accuracy of recognition is greater than or equal to the predetermined threshold.
Furthermore, the neural network model may have a different number of convolutional layers, and/or a different number of feature maps in each layer, depending on the desired complexity of the neural network model, the desired accuracy of recognition, and/or the desired speed of computation for determining a recommended resolution (or resampling ratio) , etc. For example, if a recommended resolution or resampling ratio is needed for videos of a higher resolution, a higher number of features may exist in video frames of a video of the higher resolution, and so a higher number of feature maps in each layer may be desirable.
Conclusion
Although implementations have been described in language specific to structural features and/or methodological acts, it is to be understood that the claims are not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as exemplary forms of implementing the claimed subject matter. Additionally or alternatively, some or all of the operations may be implemented by one or more ASICS, FPGAs, or other hardware.
The present disclosure can be further understood using the following clauses.
Clause 1: A method implemented by a computing device, the method comprising: determining at least one video frame and a corresponding bit rate of the at least one video frame; inputting the at least one video frame and the corresponding bit rate into a machine learning model to obtain a recommended resolution; resizing the at least one video frame and one or more other video frames associated with the at least one video frame according to the recommended resolution; and encoding the at least one video frame and the one or more other video frames to obtain an encoded video according to a target bit rate after the resizing.
Clause 2: The method of Clause 1, further comprising: sending the encoded video to a receiving computing device via a network.
Clause 3: The method of Clause 2, further comprising: sending information of a resizing filter for resizing the at least one video frame and the one or more other video frames to the receiving computing device via the network, the  information of the resizing filter enabling the receiving computing device to undo the resizing using a corresponding reversing filter.
Clause 4: The method of Clause 1, wherein determining the at least one video frame and the bit rate of the at least one video frame comprises: encoding an input video comprising the at least one video frame according to the target bit rate; and extracting information of the at least one video frame and the corresponding bit rate from the encoded input video.
Clause 5: The method of Clause 4, wherein the input video comprises a subset of a streaming video or a stored video.
Clause 6: The method of Clause 1, wherein the at least one frame comprises an intra frame, and the one or more other video frames comprise one or more inter frames that are encoded based on the at least one frame.
Clause 7: The method of Clause 1, wherein the machine learning model is configured to receive video frames of a particular resolution and determine a corresponding resolution for resizing a video comprising the video frames of the particular resolution for transmission at a designated bit rate.
Clause 8: One or more processor-readable media storing executable instructions that, when executed by one or more processors, cause the one or more processors comprising: determining at least one video frame and a corresponding bit rate of the at least one video frame; inputting the at least one video frame and the corresponding bit rate into a machine learning model to obtain a recommended resolution; resizing the at least one video frame and one or more other video frames associated with the at least one video frame according to the recommended  resolution; and encoding the at least one video frame and the one or more other video frames to obtain an encoded video according to a target bit rate after the resizing.
Clause 9: The one or more processor-readable media of Clause 8, the acts further comprising: sending the encoded video to a receiving computing device via a network.
Clause 10: The one or more processor-readable media of Clause 9, the acts further comprising: sending information of a resizing filter for resizing the at least one video frame and the one or more other video frames to the receiving computing device via the network, the information of the resizing filter enabling the receiving computing device to undo the resizing using a corresponding reversing filter.
Clause 11: The one or more processor-readable media of Clause 8, wherein determining the at least one video frame and the bit rate of the at least one video frame comprises: encoding an input video comprising the at least one video frame according to the target bit rate; and extracting information of the at least one video frame and the corresponding bit rate from the encoded input video.
Clause 12: The one or more processor-readable media of Clause 11, wherein the input video comprises a subset of a streaming video or a stored video.
Clause 13: The one or more processor-readable media of Clause 8, wherein the at least one frame comprises an intra frame, and the one or more other video frames comprise one or more inter frames that are encoded based on the at least one frame.
Clause 14: The one or more processor-readable media of Clause 8, wherein the machine learning model is configured to receive video frames of a particular resolution and determine a corresponding resolution for resizing a video comprising the video frames of the particular resolution for transmission at a designated bit rate.
Clause 15: A system comprising: one or more processors; and memory storing executable instructions that, when executed by the one or more processors, cause the one or more processors comprising: determining at least one video frame and a corresponding bit rate of the at least one video frame; inputting the at least one video frame and the corresponding bit rate into a machine learning model to obtain a recommended resolution; resizing the at least one video frame and one or more other video frames associated with the at least one video frame according to the recommended resolution; and encoding the at least one video frame and the one or more other video frames to obtain an encoded video according to a target bit rate after the resizing.
Clause 16: The system of Clause 15, the acts further comprising: sending the encoded video to a receiving computing device via a network.
Clause 17: The system of Clause 16, the acts further comprising: sending information of a resizing filter for resizing the at least one video frame and the one or more other video frames to the receiving computing device via the network, the information of the resizing filter enabling the receiving computing device to undo the resizing using a corresponding reversing filter.
Clause 18: The system of Clause 15, wherein determining the at least one video frame and the bit rate of the at least one video frame comprises: encoding an input video comprising the at least one video frame according to the target bit rate; and extracting information of the at least one video frame and the corresponding bit rate from the encoded input video.
Clause 19: The system of Clause 15, wherein the at least one frame comprises an intra frame, and the one or more other video frames comprise one or more inter frames that are encoded based on the at least one frame.
Clause 20: The system of Clause 15, wherein the machine learning model is configured to receive video frames of a particular resolution and determine a corresponding resolution for resizing a video comprising the video frames of the particular resolution for transmission at a designated bit rate.

Claims (20)

  1. A method implemented by a computing device, the method comprising:
    determining at least one video frame and a corresponding bit rate of the at least one video frame;
    inputting the at least one video frame and the corresponding bit rate into a machine learning model to obtain a recommended resolution;
    resizing the at least one video frame and one or more other video frames associated with the at least one video frame according to the recommended resolution; and
    encoding the at least one video frame and the one or more other video frames to obtain an encoded video according to a target bit rate after the resizing.
  2. The method of claim 1, further comprising sending the encoded video to a receiving computing device via a network.
  3. The method of claim 2, further comprising sending information of a resizing filter for resizing the at least one video frame and the one or more other video frames to the receiving computing device via the network, the information of the resizing filter enabling the receiving computing device to undo the resizing using a corresponding reversing filter.
  4. The method of claim 1, wherein determining the at least one video frame and the bit rate of the at least one video frame comprises:
    encoding an input video comprising the at least one video frame according to the target bit rate; and
    extracting information of the at least one video frame and the corresponding bit rate from the encoded input video.
  5. The method of claim 4, wherein the input video comprises a subset of a streaming video or a stored video.
  6. The method of claim 1, wherein the at least one frame comprises an intra frame, and the one or more other video frames comprise one or more inter frames that are encoded based on the at least one frame.
  7. The method of claim 1, wherein the machine learning model is configured to receive video frames of a particular resolution and determine a corresponding resolution for resizing a video comprising the video frames of the particular resolution for transmission at a designated bit rate.
  8. One or more processor-readable media storing executable instructions that, when executed by one or more processors, cause the one or more processors comprising:
    determining at least one video frame and a corresponding bit rate of the at least one video frame;
    inputting the at least one video frame and the corresponding bit rate into a machine learning model to obtain a recommended resolution;
    resizing the at least one video frame and one or more other video frames associated with the at least one video frame according to the recommended resolution; and
    encoding the at least one video frame and the one or more other video frames to obtain an encoded video according to a target bit rate after the resizing.
  9. The one or more processor-readable media of claim 8, the acts further comprising sending the encoded video to a receiving computing device via a network.
  10. The one or more processor-readable media of claim 9, the acts further comprising sending information of a resizing filter for resizing the at least one video frame and the one or more other video frames to the receiving computing device via the network, the information of the resizing filter enabling the receiving computing device to undo the resizing using a corresponding reversing filter.
  11. The one or more processor-readable media of claim 8, wherein determining the at least one video frame and the bit rate of the at least one video frame comprises:
    encoding an input video comprising the at least one video frame according to the target bit rate; and
    extracting information of the at least one video frame and the corresponding bit rate from the encoded input video.
  12. The one or more processor-readable media of claim 11, wherein the input video comprises a subset of a streaming video or a stored video.
  13. The one or more processor-readable media of claim 8, wherein the at least one frame comprises an intra frame, and the one or more other video frames comprise one or more inter frames that are encoded based on the at least one frame.
  14. The one or more processor-readable media of claim 8, wherein the machine learning model is configured to receive video frames of a particular resolution and determine a corresponding resolution for resizing a video comprising the video frames of the particular resolution for transmission at a designated bit rate.
  15. A system comprising:
    one or more processors; and
    memory storing executable instructions that, when executed by the one or more processors, cause the one or more processors comprising:
    determining at least one video frame and a corresponding bit rate of the at least one video frame;
    inputting the at least one video frame and the corresponding bit rate into a machine learning model to obtain a recommended resolution;
    resizing the at least one video frame and one or more other video frames associated with the at least one video frame according to the recommended resolution; and
    encoding the at least one video frame and the one or more other video frames to obtain an encoded video according to a target bit rate after the resizing.
  16. The system of claim 15, the acts further comprising sending the encoded video to a receiving computing device via a network.
  17. The system of claim 16, the acts further comprising sending information of a resizing filter for resizing the at least one video frame and the one or more other video frames to the receiving computing device via the network, the information of the resizing filter enabling the receiving computing device to undo the resizing using a corresponding reversing filter.
  18. The system of claim 15, wherein determining the at least one video frame and the bit rate of the at least one video frame comprises:
    encoding an input video comprising the at least one video frame according to the target bit rate; and
    extracting information of the at least one video frame and the corresponding bit rate from the encoded input video.
  19. The system of claim 15, wherein the at least one frame comprises an intra frame, and the one or more other video frames comprise one or more inter frames that are encoded based on the at least one frame.
  20. The system of claim 15, wherein the machine learning model is configured to receive video frames of a particular resolution and determine a corresponding resolution for resizing a video comprising the video frames of the particular resolution for transmission at a designated bit rate.
PCT/CN2019/111598 2019-10-17 2019-10-17 Adaptive resolution coding based on machine learning model WO2021072694A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/111598 WO2021072694A1 (en) 2019-10-17 2019-10-17 Adaptive resolution coding based on machine learning model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/111598 WO2021072694A1 (en) 2019-10-17 2019-10-17 Adaptive resolution coding based on machine learning model

Publications (1)

Publication Number Publication Date
WO2021072694A1 true WO2021072694A1 (en) 2021-04-22

Family

ID=75537626

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/111598 WO2021072694A1 (en) 2019-10-17 2019-10-17 Adaptive resolution coding based on machine learning model

Country Status (1)

Country Link
WO (1) WO2021072694A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113452996A (en) * 2021-06-08 2021-09-28 杭州朗和科技有限公司 Video coding and decoding method and device
CN115190309A (en) * 2022-06-30 2022-10-14 北京百度网讯科技有限公司 Video frame processing method, training method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2736261A1 (en) * 2012-11-27 2014-05-28 Alcatel Lucent Method For Assessing The Quality Of A Video Stream
US20180063549A1 (en) * 2016-08-24 2018-03-01 Ati Technologies Ulc System and method for dynamically changing resolution based on content
CN109218727A (en) * 2017-06-30 2019-01-15 华为软件技术有限公司 The method and apparatus of video processing
US20190075301A1 (en) * 2017-09-01 2019-03-07 Apple Inc. Machine learning video processing systems and methods
US20190132591A1 (en) * 2017-10-26 2019-05-02 Intel Corporation Deep learning based quantization parameter estimation for video encoding

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2736261A1 (en) * 2012-11-27 2014-05-28 Alcatel Lucent Method For Assessing The Quality Of A Video Stream
US20180063549A1 (en) * 2016-08-24 2018-03-01 Ati Technologies Ulc System and method for dynamically changing resolution based on content
CN109218727A (en) * 2017-06-30 2019-01-15 华为软件技术有限公司 The method and apparatus of video processing
US20190075301A1 (en) * 2017-09-01 2019-03-07 Apple Inc. Machine learning video processing systems and methods
US20190132591A1 (en) * 2017-10-26 2019-05-02 Intel Corporation Deep learning based quantization parameter estimation for video encoding

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113452996A (en) * 2021-06-08 2021-09-28 杭州朗和科技有限公司 Video coding and decoding method and device
CN113452996B (en) * 2021-06-08 2024-04-19 杭州网易智企科技有限公司 Video coding and decoding method and device
CN115190309A (en) * 2022-06-30 2022-10-14 北京百度网讯科技有限公司 Video frame processing method, training method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110059772B (en) Remote sensing image semantic segmentation method based on multi-scale decoding network
US11200424B2 (en) Space-time memory network for locating target object in video content
US10582205B2 (en) Enhancing visual data using strided convolutions
US11222211B2 (en) Method and apparatus for segmenting video object, electronic device, and storage medium
WO2019001108A1 (en) Video processing method and apparatus
WO2021093393A1 (en) Video compressed sensing and reconstruction method and apparatus based on deep neural network
WO2021072694A1 (en) Adaptive resolution coding based on machine learning model
CN105488759B (en) A kind of image super-resolution rebuilding method based on local regression model
CN106503112B (en) Video retrieval method and device
WO2020177015A1 (en) Adaptive resolution video coding
CN109672885B (en) Video image coding and decoding method for intelligent monitoring of mine
US11403782B2 (en) Static channel filtering in frequency domain
WO2021211771A1 (en) Systems and methods for optical flow estimation
CN113393385A (en) Unsupervised rain removal method, system, device and medium based on multi-scale fusion
CN112714313A (en) Image processing method, device, equipment and storage medium
US20220327663A1 (en) Video Super-Resolution using Deep Neural Networks
CN115941966A (en) Video compression method and electronic equipment
CN115601759A (en) End-to-end text recognition method, device, equipment and storage medium
CN111405293B (en) Video transmission method and device
CN113747242A (en) Image processing method, image processing device, electronic equipment and storage medium
CN115209150B (en) Video coding parameter acquisition method and device and electronic equipment
US11842463B2 (en) Deblurring motion in videos
CN113177483B (en) Video object segmentation method, device, equipment and storage medium
CN114501031B (en) Compression coding and decompression method and device
US20230237628A1 (en) Modeling continuous kernels to generate an enhanced digital image from a burst of digital images

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19949242

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19949242

Country of ref document: EP

Kind code of ref document: A1