US20140269910A1 - Method and apparatus for user guided pre-filtering - Google Patents

Method and apparatus for user guided pre-filtering Download PDF

Info

Publication number
US20140269910A1
US20140269910A1 US13/840,600 US201313840600A US2014269910A1 US 20140269910 A1 US20140269910 A1 US 20140269910A1 US 201313840600 A US201313840600 A US 201313840600A US 2014269910 A1 US2014269910 A1 US 2014269910A1
Authority
US
United States
Prior art keywords
user
filter
parameters
video
video content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/840,600
Inventor
Sek Chai
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SRI International Inc
Original Assignee
SRI International Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SRI International Inc filed Critical SRI International Inc
Priority to US13/840,600 priority Critical patent/US20140269910A1/en
Assigned to SRI INTERNATIONAL reassignment SRI INTERNATIONAL ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CHAI, SEK
Publication of US20140269910A1 publication Critical patent/US20140269910A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • H04N19/00066
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • H04N19/0089
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/162User input
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object

Definitions

  • Embodiments of the present invention generally relate to salience based compression and video transmission and, more particularly, to a method and apparatus for user guided pre-filtering.
  • VGC vision guided compression
  • SBC salience based compression
  • FIG. 1 depicts a functional block diagram of an adaptive filter module in accordance with exemplary embodiments of the present invention
  • FIG. 2 is an illustration of the impact of the adaptive filter module on a sample frame of video content in accordance with an exemplary embodiment of the present invention
  • FIG. 3 is an illustration of the result of the pixel propagation module in accordance with exemplary embodiments of the present invention.
  • FIG. 4 depicts a computer in accordance with at least one embodiment of the present invention
  • FIG. 5 depicts a flow diagram of a method for modifying bit-rate of video content in accordance with embodiments of the present invention.
  • FIG. 6 depicts a flow diagram of a method for modifying bit-rate of video content in accordance with embodiments of the present invention.
  • Embodiments of the present invention generally relate to vision and network guided pre-filtering.
  • an encoder encodes video for transmission over a network and a decoder receives the video and decodes the video for displaying, storage or the like.
  • a decoder receives the video and decodes the video for displaying, storage or the like.
  • FIG. 1 depicts a functional block diagram of an adaptive filter module 100 in accordance with exemplary embodiments of the present invention.
  • An image sensor 102 senses and captures video or images of a scene (not shown).
  • the video or image content can also optionally be stored in an image and video database 103 , or stored in another form of external or internal storage.
  • the image sensor 102 records the video at a particular bit-rate, in such formats as MPEG-1 (H.261), MPEG-2 (H.262), MPEG-4/AVC (H.264) and MPEG HEVC (H.265), or the like.
  • the originally captured frames may be in high definition (HD) or standard definition (SD), where even standard definition frames of a video may be several megabytes in size.
  • the HD frames of video are significantly larger and occupy more storage space as well as require more bandwidth when being transmitted.
  • an acceptable target bit-rate may be 1-5 Mbps
  • an HD video stream requires as much as a 10-18 Mbps capable network to transmit video streams at their desired clarity.
  • a vision processor 104 is embedded between the image sensor 102 and a video encoder 106 .
  • Typical networks may include RF channels which have an approximate bandwidth of approximately 20 Megabits per second (Mbps), IP networks which have an approximate bandwidth of approximately 0.1 to 5 Mbps, and the like.
  • the vision processor 104 further comprises a pre-filter 105 .
  • the vision processor 104 applies vision guided compression (VGC) and salience based compression (SBC) to the video content in order to reduce the bit-rate and compress the video content to a manageable size without losing important details.
  • VCC vision guided compression
  • SBC salience based compression
  • the vision pre-filter 105 performs salience based blurring or other functions on the video content. For example, if the video content contains two moving objects on a background, the moving objects are detected and regarded as salient, and the rest of the background is considered non-salient.
  • the non-salient regions are then blurred or filtered, by various filters such as a Gaussian filter, a boxcar filter, a pillbox filter, or the like, removing a significant amount of unimportant detail that would have been compressed.
  • filters such as a Gaussian filter, a boxcar filter, a pillbox filter, or the like, removing a significant amount of unimportant detail that would have been compressed.
  • the video encoder 106 encodes the compressed video content using the codecs mentioned above such as MPEG2/MPEG4 or the like.
  • the video encoder 106 further comprises a pre-filter 107 which performs pixel by pixel filtering, and does not take into account spatial attributes of the video content, as opposed to the vision processor 104 .
  • the video encoder 106 is a standard off-the-shelf video encoder. The video encoder encodes the video in order to transmit the video at a particular bit-rate over the network 101 .
  • the video decoder 108 In order for the video content to be viewed, it must first be decoded by the video decoder 108 .
  • the video decoder 108 is a standard off-the-shelf video decoder capable of decoding standard video formats such as MPEG1-4. Once the decoder decodes the video content, the content is streamed or transmitted to a display 110 , or to a storage database 112 .
  • the video decoder 108 can couple the video content with any end user consuming device such as a tablet, a mobile phone, a television, or the like.
  • the display 110 is coupled to a user interface 114 , which allows a user of the display 110 to provide dynamic feedback to the adaptive filter module 100 .
  • the user interface 114 may be displayed on a touch-based mobile device, for example, a smart-phone or tablet, in which the region of interest drives the inset location or adds a new feature to be tracked and kept salient in the vision processor 104 based on, for example, where a user's touch input is detected.
  • a touch-based mobile device for example, a smart-phone or tablet
  • the region of interest drives the inset location or adds a new feature to be tracked and kept salient in the vision processor 104 based on, for example, where a user's touch input is detected.
  • the user interface 114 is a vision-based system, where the image sensor 102 captures images in a location remote from the vision processor 104 .
  • the vision processor 104 is used to process the captured images and generate “mask” information for the vision processor 104 , i.e., masking the areas to be filtered.
  • the user interface 114 includes a latency adjustment module 115 that uses network traffic information to alter the mask information. For example, if there is high latency through the network 101 , the mask information may take additional time to reach the vision processor 104 .
  • the latency adjustment module 115 can process predictive mask information. For example, if the vision processing tracks motion of the gaze of the user in order to generate the mask information, the latency adjustment module 115 can include a prediction of where the gaze would be in a few moments.
  • the user interface is coupled to a gaze tracking system where a user's face is tracked, and gaze location is determined. If a user is looking at a top corner of the image on the display, the location information (top corner) is used to generate mask information that affects the vision processor 104 . The vision processor 104 would pre-filter the source image whereby the top-corner location is considered salient. Changes in the user's gaze would change the location of a salient inset location.
  • the user interface 114 is coupled to a remote system comprising a two-way video conferencing system including a camera in a conference area that also operates as a gaze tracker.
  • the gaze tracker provides feedback to the vision processor 104 via the adaptive filter module 100 .
  • a user in such a system looking at a particular object in a conference area will see un-filtered regions where the users gaze is focused.
  • a video source other than the image sensor 102 is used to supply the image from which the mask is generated to affect the pre-filter 105 of the vision processor 104 through the adaptive filter module 100 .
  • the sensor modality of the video source used to drive the mask generation may be different than the sensor modality of video source 102 , such as an infrared (IR) sensor modality.
  • IR infrared
  • the user interface 114 is coupled to an iris recognition system or face recognition system, in which the identity of a viewer is determined.
  • the user interface 114 pre-filters a selected region of the image because the user's attention is not directed at that particular area of that image.
  • the user interface 114 can pre-filter a selected region of the image because the user might not have access to particular information.
  • the selected region of the image can be pre-filtered to conceal the identity of an object or person.
  • the selected region of the image can also be pre-filtered to conceal objects, such as signs and the like, to conceal information that reveals the location of the image.
  • multiple users are detected at different remote locations, and different salient regions can be selected for different users.
  • multiple users are detected in the same remote location, and salient requirements are set based on viewing angle. For example, in a 3D HDTV display, different salient regions can affect the viewing experience based on the pre-filtering of video content.
  • the user interface 114 controls the feedback to incorporate the duration of a user's gaze, affecting the level and type of pre-filtering over time.
  • the user interface 114 modifies the feedback to the adaptive filter module 100 based on the expression of a user of the user interface 114 to select salient regions.
  • User movement is also incorporated into the feedback of the user interface 114 to affect parameters of the vision processor 104 and the pre-filter 105 . For example, if a user is in motion, a higher level of pre-filtering is used (i.e., higher blur) because the user would be unable to perceive the difference in levels of pre-filtering due to his or her motion.
  • a feedback path is present between the user interface 114 and the vision processor 104 , as well as between the video decoder 108 and the vision processor 104 .
  • the video decoder 108 receives information about network bandwidth changes, vision and gaze changes, user movement, and the like and couples with the adaptive filter module 100 to send a message to the vision processor 104 concerning modifying the parameters of the pre-filter 105 .
  • the adaptive filter module 100 determines how the vision processor 104 and the pre-filter 105 will be modified to increase or decrease the bit-rate depending on the user feedback from the user interface 114 .
  • the adaptive filter module 100 may, according to one embodiment, request that the pre-filter 105 modify the type of filter being applied, for example, a boxcar, a Gaussian filter or a pillbox filter.
  • the filter size is modified. For example, a smaller or larger region is filtered according to salient region selection.
  • the number of salient objects being filtered is modified according to location, size of objects, amount of motion, or the like.
  • the adaptive filter module 100 requests that the vision processor 104 and the pre-filter 105 vary the rate in which the filter is applied to salient objects.
  • the degree of low-pass filtering applied to non-salient pixels in a frame greatly affects the bit rate. For a given low-pass filter shape, the degree of filtering increases with filter size.
  • a box-car filter applied to video processed by a binary salience map drastically reduces the bit-rate as the filter increases in size.
  • a 640 ⁇ 480 pixel video running at 30 frames per second is filtered with a boxcar filter and encoded in “constant quality” mode using H.264/MPEG-4 AVC video compression.
  • the quantization parameter (QP) stays fixed, and bits are produced in proportion to the underlying entropy of the video signal.
  • QP quantization parameter
  • Major drops in bit rate, independent of QP occur as the boxcar size increases from 1 ⁇ 1 to 5 ⁇ 5, with diminishing returns thereafter. Boxcar sizes larger than 9 ⁇ 9 show almost no additional drop in bit rate.
  • the resulting bit rate is approximated as a weighted average of the two external bit rates produced when all pixels are filtered by each of the filters individually:
  • BRmax is the bit rate produced by filtering all pixels with the salient, on “inside”, filter
  • BRmin is the bit rate produced by filtering all pixels with the non-salient, on “outside”, filter
  • W the weighting parameter
  • the filter size lowers the bit rate. For instance, if the channel bit rate is 3 Mbps, a 3 ⁇ 3 boxcar filter is used; however, if the channel bit rate drops to 1 Mbps, an 11 ⁇ 11 boxcar filter is selected. Doing so increases the blur of the non-salient pixels but minimally affects the quality of the salient pixels.
  • bit rate is modeled verses a filter size curve as in the following exponential function:
  • r is the rate in bits per second (bps)
  • s is the filter size (in pixels)
  • a, b, and c are known, non-negative, measured constants that are a function of image format and content.
  • Filter sizes and filter kernels can either be generated adaptively or pre-computed and stored in a look-up table stored in the adaptive filter module 100 . According to an exemplary embodiment, filter sizes increase as network bandwidth decreases, and less filtering is done in salient regions compared to non-salient regions.
  • the adaptive filter module 100 may also comprise a pixel propagation module 116 , which may be directly coupled with the image sensor 102 , the image and video database 103 , the vision processor 104 an the video encoder 106 .
  • the pixel propagation module 116 can be used independently of the adaptive filter module 100 .
  • the pixel propagation module 116 receives video content from the image sensor 102 , for example, and analysis frame to frame movement in the captured video content.
  • video stabilization is initially performed in order to align the frames in the video content.
  • the pixel propagation module 116 analyzes frame to frame pixel differences in the video content and determines the pixels which remain static are “non-salient” in the sense that they do not need to be represented in each frame.
  • the pixel propagation module 116 then propagates the pixels found in the initial frame to the other frames which share an overlapping view of the initial frame.
  • the vision processor 104 or the video encoder 106 directly performs compression on the video content and achieve a great compression ratio because each of the frames are essentially composed of the same pixels, excluding any moving object pixels.
  • the highly compressed video content can then be encoded at a significantly lower bit-rate and can therefore be transmitted over low bandwidth networks.
  • the video is decoded by video decoder 108 and displayed on display 110 with most of the background remaining static while only foreground, or salient, objects are in motion.
  • FIG. 2 is an illustration of the impact of the adaptive filter module 100 on a sample frame of video content in accordance to an exemplary embodiment of the present invention.
  • Illustration 200 depicts the typical scenario where an image frame 202 comprises a torso 206 , a head 208 and a background 210 .
  • the vision processor 104 is applied to the frame of the video content to produce a salience detected image where the torso 206 and the head 208 are selected as salient and the background 210 is selected as non-salient by a user of the user interface 114 .
  • the background 210 has had a filter applied to it, for example, a Gaussian blur, in order to reduce the amount of detail shown, whereas the torso 206 and the head 208 are maintained at their current fidelity or sharpened.
  • illustration 207 shows a frame 201 which is the same as frame 202 being processed by the vision processor 104 , but the output image 214 has produced only one salient object: the head 208 .
  • the vision processor has filtered the torso 206 and the background 210 by, according to one embodiment, reducing the number of salient objects to be produced by the vision processor 104 , where the only salient object is the head 208 .
  • the decoder decodes the video content and displays the frame 214 on a display, the body and background will be blurred and the foreground face 208 will be sharp.
  • FIG. 3 depicts computers 300 and 350 in accordance with at least one embodiment of the present invention for implementing the functional block diagram illustrated in FIG. 1 .
  • the computer 300 includes a processor 302 , various support circuits 306 , and memory 304 .
  • the processor 302 may include one or more microprocessors known in the art.
  • the support circuits 306 for the processor 302 include conventional cache, power supplies, clock circuits, data registers, I/O interface 307 , and the like.
  • the I/O interface 307 may be directly coupled to the memory 304 or coupled through the supporting circuits 306 .
  • the I/O interface 307 may also be configured for communication with input devices and/or output devices 308 such as network devices, various storage devices, mouse, keyboard, display, video and audio sensors, IMU and the like.
  • the memory 304 stores non-transient processor-executable instructions and/or data that may be executed by and/or used by the processor 302 .
  • These processor-executable instructions may comprise firmware, software, and the like, or some combination thereof.
  • Modules having processor-executable instructions that are stored in the memory 304 comprise a vision processing module 310 , an adaptive filter module 314 and a pixel propagation module 316 .
  • the vision processing module 310 further comprises a pre-filter 312 .
  • the propagation module 316 may be a portion of the adaptive filter module 314 .
  • the computer 300 may be programmed with one or more operating systems (generally referred to as operating system (OS)), which may include OS/2, Java Virtual Machine, Linux, SOLARIS, UNIX, HPUX, AIX, WINDOWS, WINDOWS95, WINDOWS98, WINDOWS NT, AND WINDOWS2000, WINDOWS ME, WINDOWS XP, WINDOWS SERVER, WINDOWS 8, IOS, ANDROID among other known platforms.
  • OS operating system
  • OS operating system
  • the memory 304 may include one or more of the following random access memory, read only memory, magneto-resistive read/write memory, optical read/write memory, cache memory, magnetic read/write memory, and the like, as well as signal-bearing media as described below.
  • the computer 300 may be coupled to computer 350 for implementing the user interface 114 .
  • the computer 350 includes a processor 352 , various support circuits 356 , and memory 354 .
  • the processor 352 may include one or more microprocessors known in the art.
  • the support circuits 356 for the processor 352 include conventional cache, power supplies, clock circuits, data registers, I/O interface 357 , and the like.
  • the I/O interface 357 may be directly coupled to the memory 354 or coupled through the supporting circuits 356 .
  • the I/O interface 357 may also be configured for communication with input devices and/or output devices (not specifically shown) such as network devices, various storage devices, mouse, keyboard, display, video and audio sensors, IMU and the like.
  • the memory 354 stores non-transient processor-executable instructions and/or data that may be executed by and/or used by the processor 352 .
  • These processor-executable instructions may comprise firmware, software, and the like, or some combination thereof.
  • Modules having processor-executable instructions that are stored in the memory 354 comprise a user interface 360 , which further comprises a latency adjustment module 362 .
  • the computer 350 may be programmed with one or more operating systems (generally referred to as operating system (OS)), which may include OS/2, Java Virtual Machine, Linux, SOLARIS, UNIX, HPUX, AIX, WINDOWS, WINDOWS95, WINDOWS98, WINDOWS NT, AND WINDOWS2000, WINDOWS ME, WINDOWS XP, WINDOWS SERVER, WINDOWS 8, IOS, ANDROID among other known platforms.
  • OS operating system
  • OS operating system
  • the memory 354 may include one or more of the following random access memory, read only memory, magneto-resistive read/write memory, optical read/write memory, cache memory, magnetic read/write memory, and the like, as well as signal-bearing media as described below.
  • FIG. 4 depicts a flow diagram of a method 400 for user guided pre-filtering of video content in accordance with embodiments of the present invention.
  • the method 400 is an implementation of the user interface module 360 and the vision processing module 310 as executed by the processor 452 and processor 302 , respectively, as shown in FIG. 4 .
  • the method begins at step 402 and proceeds to step 404 .
  • the method receives feedback from a user interface coupled to the device displaying the video content.
  • the feedback may be user initiated, or automatically detected by the device itself. For example, a user can indicate by tactile interaction with the user interface or display device salient and non-salient regions in the video content, or the device can track user gaze to determine salient and non-salient regions based. The device may also monitor user motion as an indicator of attentiveness, to create the feedback information.
  • one or more parameters of the pre-filter are modified based on the user feedback.
  • the pre-filter is applied to the video content, and the video content is encoded to transmit over the network to the display device.
  • the method terminates at step 410 .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A method and apparatus for user guided pre-filtering of video content comprising modifying one or more parameters of a pre-filter coupled to a video encoder based on feedback from a user of a device displaying the video content, applying the pre-filter to video content based on the modified parameters and encoding the pre-filtered video content for transmission over a network to display on the device.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • The present invention is related to commonly assigned and co-pending U.S. patent application Ser. No. ______ attorney docket number SRI6628. Each of the aforementioned patent applications is herein incorporated in its entirety by reference.
  • BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • Embodiments of the present invention generally relate to salience based compression and video transmission and, more particularly, to a method and apparatus for user guided pre-filtering.
  • 2. Description of the Related Art
  • Technologies such as vision guided compression (VGC) or salience based compression (SBC) are often used to perform compression on video content to reduce bit rate so as to reduce network bandwidth requirements by preserving important and actionable details in the original video content in salient regions at the cost of discarding “unimportant” detail in non-salient regions. However, standard VGC/SBC methods do not address a network's variable bandwidth or delivering actionable video on very low bandwidth networks and therefore video streaming may be interrupted or distorted. Current VGC/SBC implementations also do not address human reception of pre-filtered video. For example pre-filtered video destined for human viewing does not allow for human interaction and feedback on the video content to affect encoding and pre-filtering parameters.
  • Therefore, there is a need in the art for a method and apparatus for user guided pre-filtering to perform video encoding for low and variable band-width networks.
  • SUMMARY OF THE INVENTION
  • An apparatus and/or method for user guided pre-filtering of video content, substantially as shown in and/or described in connection with at least one of the figures, as set forth more completely in the claims.
  • Various advantages, aspects and features of the present disclosure, as well as details of an illustrated embodiment thereof, are more fully understood from the following description and drawings.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
  • FIG. 1 depicts a functional block diagram of an adaptive filter module in accordance with exemplary embodiments of the present invention;
  • FIG. 2 is an illustration of the impact of the adaptive filter module on a sample frame of video content in accordance with an exemplary embodiment of the present invention;
  • FIG. 3 is an illustration of the result of the pixel propagation module in accordance with exemplary embodiments of the present invention;
  • FIG. 4 depicts a computer in accordance with at least one embodiment of the present invention;
  • FIG. 5 depicts a flow diagram of a method for modifying bit-rate of video content in accordance with embodiments of the present invention; and
  • FIG. 6 depicts a flow diagram of a method for modifying bit-rate of video content in accordance with embodiments of the present invention.
  • DETAILED DESCRIPTION
  • Embodiments of the present invention generally relate to vision and network guided pre-filtering. According to one embodiment, an encoder encodes video for transmission over a network and a decoder receives the video and decodes the video for displaying, storage or the like. Once a user of a mobile device, tablet device, or the like, views the video content, the user provides dynamic feedback to modify pre-filter parameters to affect prefilter processing.
  • FIG. 1 depicts a functional block diagram of an adaptive filter module 100 in accordance with exemplary embodiments of the present invention. An image sensor 102 senses and captures video or images of a scene (not shown). The video or image content can also optionally be stored in an image and video database 103, or stored in another form of external or internal storage. The image sensor 102, for example, records the video at a particular bit-rate, in such formats as MPEG-1 (H.261), MPEG-2 (H.262), MPEG-4/AVC (H.264) and MPEG HEVC (H.265), or the like. The originally captured frames may be in high definition (HD) or standard definition (SD), where even standard definition frames of a video may be several megabytes in size. The HD frames of video are significantly larger and occupy more storage space as well as require more bandwidth when being transmitted.
  • For example, for a video composed of SD frames, an acceptable target bit-rate may be 1-5 Mbps, whereas an HD video stream requires as much as a 10-18 Mbps capable network to transmit video streams at their desired clarity. For commonly used networks such as network 101, such large bandwidth requirements are impractical and therefore, a vision processor 104 is embedded between the image sensor 102 and a video encoder 106. Typical networks may include RF channels which have an approximate bandwidth of approximately 20 Megabits per second (Mbps), IP networks which have an approximate bandwidth of approximately 0.1 to 5 Mbps, and the like.
  • The vision processor 104 further comprises a pre-filter 105. The vision processor 104 applies vision guided compression (VGC) and salience based compression (SBC) to the video content in order to reduce the bit-rate and compress the video content to a manageable size without losing important details. The vision pre-filter 105 performs salience based blurring or other functions on the video content. For example, if the video content contains two moving objects on a background, the moving objects are detected and regarded as salient, and the rest of the background is considered non-salient.
  • The non-salient regions are then blurred or filtered, by various filters such as a Gaussian filter, a boxcar filter, a pillbox filter, or the like, removing a significant amount of unimportant detail that would have been compressed. For further detail regarding SBC and VGC, please see U.S. patent application Ser. No. 12/644,707 entitled “High-Quality Region-Of-Interest Compression using commercial Off-The-Shelf encoders”, filed on Dec. 22, 2009, hereby incorporated by reference in its entirety.
  • The video encoder 106 encodes the compressed video content using the codecs mentioned above such as MPEG2/MPEG4 or the like. The video encoder 106 further comprises a pre-filter 107 which performs pixel by pixel filtering, and does not take into account spatial attributes of the video content, as opposed to the vision processor 104. The video encoder 106 is a standard off-the-shelf video encoder. The video encoder encodes the video in order to transmit the video at a particular bit-rate over the network 101.
  • In order for the video content to be viewed, it must first be decoded by the video decoder 108. As with the video encoder 106, the video decoder 108 is a standard off-the-shelf video decoder capable of decoding standard video formats such as MPEG1-4. Once the decoder decodes the video content, the content is streamed or transmitted to a display 110, or to a storage database 112. According to other embodiments, the video decoder 108 can couple the video content with any end user consuming device such as a tablet, a mobile phone, a television, or the like. The display 110 is coupled to a user interface 114, which allows a user of the display 110 to provide dynamic feedback to the adaptive filter module 100. The user interface 114 may be displayed on a touch-based mobile device, for example, a smart-phone or tablet, in which the region of interest drives the inset location or adds a new feature to be tracked and kept salient in the vision processor 104 based on, for example, where a user's touch input is detected.
  • According to some other instances, the user interface 114 is a vision-based system, where the image sensor 102 captures images in a location remote from the vision processor 104. The vision processor 104 is used to process the captured images and generate “mask” information for the vision processor 104, i.e., masking the areas to be filtered. The user interface 114, according to this embodiment, includes a latency adjustment module 115 that uses network traffic information to alter the mask information. For example, if there is high latency through the network 101, the mask information may take additional time to reach the vision processor 104. In this case, the latency adjustment module 115 can process predictive mask information. For example, if the vision processing tracks motion of the gaze of the user in order to generate the mask information, the latency adjustment module 115 can include a prediction of where the gaze would be in a few moments.
  • According to some embodiments, the user interface is coupled to a gaze tracking system where a user's face is tracked, and gaze location is determined. If a user is looking at a top corner of the image on the display, the location information (top corner) is used to generate mask information that affects the vision processor 104. The vision processor 104 would pre-filter the source image whereby the top-corner location is considered salient. Changes in the user's gaze would change the location of a salient inset location.
  • According to another embodiment, the user interface 114 is coupled to a remote system comprising a two-way video conferencing system including a camera in a conference area that also operates as a gaze tracker. The gaze tracker provides feedback to the vision processor 104 via the adaptive filter module 100. A user in such a system looking at a particular object in a conference area will see un-filtered regions where the users gaze is focused. Thus, a video source other than the image sensor 102 is used to supply the image from which the mask is generated to affect the pre-filter 105 of the vision processor 104 through the adaptive filter module 100. In one embodiment the sensor modality of the video source used to drive the mask generation may be different than the sensor modality of video source 102, such as an infrared (IR) sensor modality.
  • In another embodiment, the user interface 114 is coupled to an iris recognition system or face recognition system, in which the identity of a viewer is determined. The user interface 114 pre-filters a selected region of the image because the user's attention is not directed at that particular area of that image. In another embodiment, the user interface 114 can pre-filter a selected region of the image because the user might not have access to particular information. For example, the selected region of the image can be pre-filtered to conceal the identity of an object or person. The selected region of the image can also be pre-filtered to conceal objects, such as signs and the like, to conceal information that reveals the location of the image. In a related embodiment, multiple users are detected at different remote locations, and different salient regions can be selected for different users. In yet another embodiment, multiple users are detected in the same remote location, and salient requirements are set based on viewing angle. For example, in a 3D HDTV display, different salient regions can affect the viewing experience based on the pre-filtering of video content.
  • According to other embodiments, the user interface 114 controls the feedback to incorporate the duration of a user's gaze, affecting the level and type of pre-filtering over time. In other instances, the user interface 114 modifies the feedback to the adaptive filter module 100 based on the expression of a user of the user interface 114 to select salient regions. User movement is also incorporated into the feedback of the user interface 114 to affect parameters of the vision processor 104 and the pre-filter 105. For example, if a user is in motion, a higher level of pre-filtering is used (i.e., higher blur) because the user would be unable to perceive the difference in levels of pre-filtering due to his or her motion.
  • In a closed network, a feedback path is present between the user interface 114 and the vision processor 104, as well as between the video decoder 108 and the vision processor 104. The video decoder 108 receives information about network bandwidth changes, vision and gaze changes, user movement, and the like and couples with the adaptive filter module 100 to send a message to the vision processor 104 concerning modifying the parameters of the pre-filter 105.
  • The adaptive filter module 100 then determines how the vision processor 104 and the pre-filter 105 will be modified to increase or decrease the bit-rate depending on the user feedback from the user interface 114. The adaptive filter module 100 may, according to one embodiment, request that the pre-filter 105 modify the type of filter being applied, for example, a boxcar, a Gaussian filter or a pillbox filter. According to other embodiments, the filter size is modified. For example, a smaller or larger region is filtered according to salient region selection. According to another embodiment, the number of salient objects being filtered is modified according to location, size of objects, amount of motion, or the like. According to yet another embodiment, the adaptive filter module 100 requests that the vision processor 104 and the pre-filter 105 vary the rate in which the filter is applied to salient objects. The degree of low-pass filtering applied to non-salient pixels in a frame greatly affects the bit rate. For a given low-pass filter shape, the degree of filtering increases with filter size.
  • For example, for a box-car filter applied to video processed by a binary salience map drastically reduces the bit-rate as the filter increases in size. For example, a 640×480 pixel video running at 30 frames per second is filtered with a boxcar filter and encoded in “constant quality” mode using H.264/MPEG-4 AVC video compression. In constant quality mode, the quantization parameter (QP) stays fixed, and bits are produced in proportion to the underlying entropy of the video signal. As QP increases, more transform coefficients are quantized to zero, and fewer coded bits per image block are produced. Major drops in bit rate, independent of QP, occur as the boxcar size increases from 1×1 to 5×5, with diminishing returns thereafter. Boxcar sizes larger than 9×9 show almost no additional drop in bit rate. The resulting bit rate is approximated as a weighted average of the two external bit rates produced when all pixels are filtered by each of the filters individually:

  • BR=W*BRmax+(1−W)*BRmin  (1)
  • where BRmax is the bit rate produced by filtering all pixels with the salient, on “inside”, filter; BRmin is the bit rate produced by filtering all pixels with the non-salient, on “outside”, filter; and W, the weighting parameter, is equal to the fraction of salient pixels in the frame. In this example, when video is filtered with a 1×1 boxcar (i.e., is not filtered at all) and encoded in constant quality mode with QP=20, the resulting bit rate is BRmax=8 Mbps. When the same video is filtered with an 11×11 boxcar and encoded in constant quality mode with QP=20, the resulting bit rate is BRmin=1 Mbps. When the fraction of salient pixels in the frame is 10% (W=0.1), the resulting bit rate is approximately BR=0.1*8+0.9*1=1.7 Mbps, a point that is plotted on the dashed line. As W approaches 1.0, BR approaches BRmax; as W approaches 0.0, BR approaches BRmin.
  • Accordingly, increasing the filter size lowers the bit rate. For instance, if the channel bit rate is 3 Mbps, a 3×3 boxcar filter is used; however, if the channel bit rate drops to 1 Mbps, an 11×11 boxcar filter is selected. Doing so increases the blur of the non-salient pixels but minimally affects the quality of the salient pixels.
  • Generally speaking, the bit rate is modeled verses a filter size curve as in the following exponential function:

  • r(s)=a·e−bs+c  (2)
  • where r is the rate in bits per second (bps), s is the filter size (in pixels) and a, b, and c are known, non-negative, measured constants that are a function of image format and content. For a two-level salience map, the rate R produced by filtering some non-negative fractional of the pixels with size s1 and the complementary non-negative fraction α2=1−α1 with size s2 is given by:

  • R=α1r(s1)+α2r(s2)=[α·a·exp(−bs1)+c]+[aa·exp(−bs2)+c]  (3)
  • We know R, α1, α2, a, b and c, so the equation reduces to

  • C=α1×1+α2×2  (4)
  • where C=(R−2c)/a and xi=αi·exp(−bsi) for i=1,2. This is a linear equation in x1,x2 so any two values satisfying the equation can be picked. Once they are picked, the filter sizes are obtained as follows:

  • si=−ln(xi/αi)/b for i=1,2  (5)
  • Although this is for the two-level saliency case (N=2), it is easy to generalize this method to the N-level saliency case, where N>2. Filter sizes and filter kernels can either be generated adaptively or pre-computed and stored in a look-up table stored in the adaptive filter module 100. According to an exemplary embodiment, filter sizes increase as network bandwidth decreases, and less filtering is done in salient regions compared to non-salient regions.
  • According to other embodiments, the adaptive filter module 100 may also comprise a pixel propagation module 116, which may be directly coupled with the image sensor 102, the image and video database 103, the vision processor 104 an the video encoder 106. In some instances, the pixel propagation module 116 can be used independently of the adaptive filter module 100.
  • According to one embodiment, the pixel propagation module 116 receives video content from the image sensor 102, for example, and analysis frame to frame movement in the captured video content. In scenes where the sensor 102 view is relatively fixed, but there is some movement of the sensor 102, video stabilization is initially performed in order to align the frames in the video content. Once the frames are aligned, the pixel propagation module 116 analyzes frame to frame pixel differences in the video content and determines the pixels which remain static are “non-salient” in the sense that they do not need to be represented in each frame.
  • The pixel propagation module 116 then propagates the pixels found in the initial frame to the other frames which share an overlapping view of the initial frame. When the vision processor 104, or the video encoder 106 directly performs compression on the video content and achieve a great compression ratio because each of the frames are essentially composed of the same pixels, excluding any moving object pixels. The highly compressed video content can then be encoded at a significantly lower bit-rate and can therefore be transmitted over low bandwidth networks. The video is decoded by video decoder 108 and displayed on display 110 with most of the background remaining static while only foreground, or salient, objects are in motion.
  • FIG. 2 is an illustration of the impact of the adaptive filter module 100 on a sample frame of video content in accordance to an exemplary embodiment of the present invention. Illustration 200 depicts the typical scenario where an image frame 202 comprises a torso 206, a head 208 and a background 210. The vision processor 104 is applied to the frame of the video content to produce a salience detected image where the torso 206 and the head 208 are selected as salient and the background 210 is selected as non-salient by a user of the user interface 114. The background 210 has had a filter applied to it, for example, a Gaussian blur, in order to reduce the amount of detail shown, whereas the torso 206 and the head 208 are maintained at their current fidelity or sharpened.
  • However, when the adaptive filter module 100 receives user feedback from the user interface 114 that salient regions have changed, the vision processor 104 behaves differently. According to this embodiment, illustration 207 shows a frame 201 which is the same as frame 202 being processed by the vision processor 104, but the output image 214 has produced only one salient object: the head 208. The vision processor has filtered the torso 206 and the background 210 by, according to one embodiment, reducing the number of salient objects to be produced by the vision processor 104, where the only salient object is the head 208. In this embodiment, when the decoder decodes the video content and displays the frame 214 on a display, the body and background will be blurred and the foreground face 208 will be sharp.
  • FIG. 3 depicts computers 300 and 350 in accordance with at least one embodiment of the present invention for implementing the functional block diagram illustrated in FIG. 1. The computer 300 includes a processor 302, various support circuits 306, and memory 304. The processor 302 may include one or more microprocessors known in the art. The support circuits 306 for the processor 302 include conventional cache, power supplies, clock circuits, data registers, I/O interface 307, and the like. The I/O interface 307 may be directly coupled to the memory 304 or coupled through the supporting circuits 306. The I/O interface 307 may also be configured for communication with input devices and/or output devices 308 such as network devices, various storage devices, mouse, keyboard, display, video and audio sensors, IMU and the like.
  • The memory 304, or computer readable medium, stores non-transient processor-executable instructions and/or data that may be executed by and/or used by the processor 302. These processor-executable instructions may comprise firmware, software, and the like, or some combination thereof. Modules having processor-executable instructions that are stored in the memory 304 comprise a vision processing module 310, an adaptive filter module 314 and a pixel propagation module 316. The vision processing module 310 further comprises a pre-filter 312. According to some embodiments, the propagation module 316 may be a portion of the adaptive filter module 314.
  • The computer 300 may be programmed with one or more operating systems (generally referred to as operating system (OS)), which may include OS/2, Java Virtual Machine, Linux, SOLARIS, UNIX, HPUX, AIX, WINDOWS, WINDOWS95, WINDOWS98, WINDOWS NT, AND WINDOWS2000, WINDOWS ME, WINDOWS XP, WINDOWS SERVER, WINDOWS 8, IOS, ANDROID among other known platforms. At least a portion of the operating system may be disposed in the memory 304.
  • The memory 304 may include one or more of the following random access memory, read only memory, magneto-resistive read/write memory, optical read/write memory, cache memory, magnetic read/write memory, and the like, as well as signal-bearing media as described below.
  • The computer 300 may be coupled to computer 350 for implementing the user interface 114. The computer 350 includes a processor 352, various support circuits 356, and memory 354. The processor 352 may include one or more microprocessors known in the art. The support circuits 356 for the processor 352 include conventional cache, power supplies, clock circuits, data registers, I/O interface 357, and the like. The I/O interface 357 may be directly coupled to the memory 354 or coupled through the supporting circuits 356. The I/O interface 357 may also be configured for communication with input devices and/or output devices (not specifically shown) such as network devices, various storage devices, mouse, keyboard, display, video and audio sensors, IMU and the like.
  • The memory 354, or computer readable medium, stores non-transient processor-executable instructions and/or data that may be executed by and/or used by the processor 352. These processor-executable instructions may comprise firmware, software, and the like, or some combination thereof. Modules having processor-executable instructions that are stored in the memory 354 comprise a user interface 360, which further comprises a latency adjustment module 362.
  • The computer 350 may be programmed with one or more operating systems (generally referred to as operating system (OS)), which may include OS/2, Java Virtual Machine, Linux, SOLARIS, UNIX, HPUX, AIX, WINDOWS, WINDOWS95, WINDOWS98, WINDOWS NT, AND WINDOWS2000, WINDOWS ME, WINDOWS XP, WINDOWS SERVER, WINDOWS 8, IOS, ANDROID among other known platforms. At least a portion of the operating system may be disposed in the memory 354.
  • The memory 354 may include one or more of the following random access memory, read only memory, magneto-resistive read/write memory, optical read/write memory, cache memory, magnetic read/write memory, and the like, as well as signal-bearing media as described below.
  • FIG. 4 depicts a flow diagram of a method 400 for user guided pre-filtering of video content in accordance with embodiments of the present invention. The method 400 is an implementation of the user interface module 360 and the vision processing module 310 as executed by the processor 452 and processor 302, respectively, as shown in FIG. 4.
  • The method begins at step 402 and proceeds to step 404. At step 404, the method receives feedback from a user interface coupled to the device displaying the video content. The feedback may be user initiated, or automatically detected by the device itself. For example, a user can indicate by tactile interaction with the user interface or display device salient and non-salient regions in the video content, or the device can track user gaze to determine salient and non-salient regions based. The device may also monitor user motion as an indicator of attentiveness, to create the feedback information. At step 406, one or more parameters of the pre-filter are modified based on the user feedback. At step 408, the pre-filter is applied to the video content, and the video content is encoded to transmit over the network to the display device. The method terminates at step 410.
  • Various elements, devices, modules and circuits are described above in association with their respective functions. These elements, devices, modules and circuits are considered means for performing their respective functions as described herein. While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims (20)

1. A method for user guided pre-filtering of video content comprising:
modifying one or more parameters of a pre-filter coupled to a video encoder based on feedback from a user of a device displaying the video content;
applying the pre-filter to video content based on the modified parameters; and
encoding the pre-filtered video content for transmission over a network to display on the device.
2. The method of claim 1 further comprising:
selecting salient and non-salient regions based on user gaze and modifying the parameters of the pre-filter based on the selection,
wherein the device comprises an image sensor for capturing the gaze of a user.
3. The method of claim 1 further comprising:
wherein a user interface of the device for providing the feedback is remotely provided relative to the applying of the pre-filter and encoding of the pre-filtered video.
4. The method of claim 1 further comprising:
detecting user movement and modifying the parameters of the pre-filter based on the magnitude of the user motion.
5. The method of claim 1 further comprising:
wherein the encoding the pre-filtered video is performed by a standard video encoder.
6. The method of claim 1 further comprising:
selecting one or more salient regions based on the location of multiple users of the device.
7. The method of claim 1 further comprising:
wherein the parameters of the pre-filter comprise at least one of filter type, filter size, number of salient objects, rate of filter application to the salient objects, saliency regions and bit-rate.
8. The method of claim 1 further comprising:
providing available modifiable parameters to the user on the user device; and
allowing modification of the parameters from the device.
9. The method of claim 1 further comprising:
increasing pre-filtering to a predetermined limit when bandwidth of the network decreases, so as to decrease a bit-rate of the video content; and
decreasing pre-filtering when bandwidth of the network increases, so as to decrease a bit-rate of the video content.
10. The method of claim 2 further comprising:
measuring duration of the user gaze; and
modifying the feedback and pre-filtering parameters based on the duration.
11. An apparatus for user guided pre-filtering of video content comprising:
a user interface, executed on a device, for modifying one or more parameters of a pre-filter coupled to a video encoder based on feedback from a user of the device displaying the video content;
a video processor for applying the pre-filter to video content based on the modified parameters; and
a video encoder for encoding the pre-filtered video content for transmission over a network to display on the device.
12. The apparatus of claim 11 further comprising:
selecting salient and non-salient regions based on user gaze and modifying the parameters of the pre-filter based on the selection,
wherein the device comprises an image sensor for capturing the gaze of a user.
13. The apparatus of claim 11, wherein a user interface of the device for providing the feedback is provided remotely relative to the applying of the pre-filter and encoding of the pre-filtered video.
14. The apparatus of claim 11 wherein the device is further configured for:
detecting user movement and modifying the parameters of the pre-filter based on the magnitude of the user motion.
15. The apparatus of claim 11 wherein encoding the pre-filtered video is performed by a standard video encoder.
16. The apparatus of claim 11 wherein the device is further configured for:
selecting various salient regions based on multiple users of the device.
17. The apparatus of claim 11 further comprising:
wherein the parameters comprise at least one of filter type, filter size, number of salient objects, rate of filter application to the salient objects, saliency regions and bit-rate.
18. The apparatus of claim 11 wherein the device is further configured for:
providing available modifiable parameters to the user on the user device; and
allowing modification of the parameters from the device.
19. The apparatus of claim 11 wherein the device is further configured for:
increasing pre-filtering to a predetermined limit when the bandwidth decreases, so as to decrease a bit-rate of the video content; and
decreasing pre-filtering when the bandwidth increases, so as to decrease a bit-rate of the video content.
20. The apparatus of claim 18 wherein the device is further configured for:
measuring duration of the user gaze; and
modifying the feedback and pre-filtering parameters based on the duration.
US13/840,600 2013-03-15 2013-03-15 Method and apparatus for user guided pre-filtering Abandoned US20140269910A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/840,600 US20140269910A1 (en) 2013-03-15 2013-03-15 Method and apparatus for user guided pre-filtering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/840,600 US20140269910A1 (en) 2013-03-15 2013-03-15 Method and apparatus for user guided pre-filtering

Publications (1)

Publication Number Publication Date
US20140269910A1 true US20140269910A1 (en) 2014-09-18

Family

ID=51526947

Family Applications (1)

Application Number Title Priority Date Filing Date
US13/840,600 Abandoned US20140269910A1 (en) 2013-03-15 2013-03-15 Method and apparatus for user guided pre-filtering

Country Status (1)

Country Link
US (1) US20140269910A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180096461A1 (en) * 2015-03-31 2018-04-05 Sony Corporation Information processing apparatus, information processing method, and program
US11740624B2 (en) 2017-08-17 2023-08-29 Sri International Advanced control system with multiple control paradigms

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060176951A1 (en) * 2005-02-08 2006-08-10 International Business Machines Corporation System and method for selective image capture, transmission and reconstruction

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060176951A1 (en) * 2005-02-08 2006-08-10 International Business Machines Corporation System and method for selective image capture, transmission and reconstruction

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180096461A1 (en) * 2015-03-31 2018-04-05 Sony Corporation Information processing apparatus, information processing method, and program
US10559065B2 (en) * 2015-03-31 2020-02-11 Sony Corporation Information processing apparatus and information processing method
US11740624B2 (en) 2017-08-17 2023-08-29 Sri International Advanced control system with multiple control paradigms

Similar Documents

Publication Publication Date Title
US11350179B2 (en) Bandwidth efficient multiple user panoramic video stream delivery system and method
US10582196B2 (en) Generating heat maps using dynamic vision sensor events
US9247203B2 (en) Object of interest based image processing
US20140321561A1 (en) System and method for depth based adaptive streaming of video information
US10027966B2 (en) Apparatus and method for compressing pictures with ROI-dependent compression parameters
Li et al. Weight-based R-λ rate control for perceptual HEVC coding on conversational videos
US20180063549A1 (en) System and method for dynamically changing resolution based on content
US9264661B2 (en) Adaptive post-processing for mobile video calling system
US10205763B2 (en) Method and apparatus for the single input multiple output (SIMO) media adaptation
US10616498B2 (en) High dynamic range video capture control for video transmission
TW201347549A (en) Object detection informed encoding
CN113228686B (en) Apparatus and method for deblocking filter in video coding
US9210444B2 (en) Method and apparatus for vision and network guided prefiltering
EP2810432A1 (en) Video coding using eye tracking maps
US20140269910A1 (en) Method and apparatus for user guided pre-filtering
Steinert et al. Architecture of a Low Latency H. 264/AVC Video Codec for Robust ML based Image Classification: How Region of Interests can Minimize the Impact of Coding Artifacts
US9407925B2 (en) Video transcoding system with quality readjustment based on high scene cost detection and method for use therewith
US20160360230A1 (en) Video coding techniques for high quality coding of low motion content
US11252451B2 (en) Methods and apparatuses relating to the handling of a plurality of content streams
Ko et al. An energy-efficient wireless video sensor node with a region-of-interest based multi-parameter rate controller for moving object surveillance
Stabernack et al. Architecture of a low latency h. 264/AVC video codec for robust ml based image classification
Andaló et al. Transmitting what matters: Task-oriented video composition and compression
Andalo et al. TWM: A framework for creating highly compressible videos targeted to computer vision tasks
US20240089436A1 (en) Dynamic Quantization Parameter for Encoding a Video Frame
KR101981868B1 (en) Virtual reality video quality control

Legal Events

Date Code Title Description
AS Assignment

Owner name: SRI INTERNATIONAL, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CHAI, SEK;REEL/FRAME:030029/0734

Effective date: 20130315

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION