WO2001078399A1 - Method and apparatus for transcoding of compressed image - Google Patents
Method and apparatus for transcoding of compressed image Download PDFInfo
- Publication number
- WO2001078399A1 WO2001078399A1 PCT/JP2001/000662 JP0100662W WO0178399A1 WO 2001078399 A1 WO2001078399 A1 WO 2001078399A1 JP 0100662 W JP0100662 W JP 0100662W WO 0178399 A1 WO0178399 A1 WO 0178399A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- transcoder
- content
- transcoding
- bitstream
- rate
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/1066—Session management
- H04L65/1101—Session protocols
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/75—Media network packet handling
- H04L65/752—Media network packet handling adapting media to network capabilities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/75—Media network packet handling
- H04L65/765—Media network packet handling intermediate
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/80—Responding to QoS
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/565—Conversion or adaptation of application format or content
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/04—Protocols for data compression, e.g. ROHC
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/40—Network security protocols
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
- H04N19/126—Details of normalisation or weighting functions, e.g. normalisation matrices or variable uniform quantisers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/132—Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/147—Data rate or code amount at the encoder output according to rate distortion criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/152—Data rate or code amount at the encoder output by measuring the fullness of the transmission buffer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/164—Feedback from the receiver or from the transmission channel
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/179—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scene or a shot
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/189—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
- H04N19/19—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding using optimisation based on Lagrange multipliers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/20—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/20—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
- H04N19/25—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding with scene description coding, e.g. binary format for scenes [BIFS] compression
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/20—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
- H04N19/29—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding involving scalability at the object level, e.g. video object layer [VOL]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
- H04N19/33—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/40—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video transcoding, i.e. partial or full decoding of a coded input stream followed by re-encoding of the decoded output stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/436—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/587—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal sub-sampling or interpolation, e.g. decimation or subsequent interpolation of pictures in a video sequence
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/234318—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by decomposing into objects, e.g. MPEG-4 objects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/24—Monitoring of processes or resources, e.g. monitoring of server load, available bandwidth, upstream requests
- H04N21/2402—Monitoring of the downstream path of the transmission network, e.g. bandwidth available
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/258—Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
- H04N21/25808—Management of client data
- H04N21/25825—Management of client data involving client display capabilities, e.g. screen resolution of a mobile phone
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/258—Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
- H04N21/25808—Management of client data
- H04N21/25833—Management of client data involving client hardware characteristics, e.g. manufacturer, processing or storage capabilities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/266—Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
- H04N21/2662—Controlling the complexity of the video stream, e.g. by scaling the resolution or bitrate of the video stream based on the client capabilities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/84—Generation or processing of descriptive data, e.g. content descriptors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/564—Enhancement of application control based on intercepted application data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/30—Definitions, standards or architectural aspects of layered protocol stacks
- H04L69/32—Architecture of open systems interconnection [OSI] 7-layer type protocol stacks, e.g. the interfaces between the data link level and the physical level
- H04L69/322—Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions
- H04L69/329—Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions in the application layer [OSI layer 7]
Definitions
- the present invention relates generally to information distribution systems, and more particularly to a distribution system that adapts information to bit rates available on a network.
- VOPs Video object planes
- the object can be visual data, audio data, natural data, synthetic data, basic data, composite data, or a combination thereof.
- Image objects are assembled to form composite objects or "scenes: scenesj".
- the emerging emerging MPEG-4 standard is intended to enable multimedia applications such as interactive images where natural and synthetic materials are integrated and universally accessible.
- MPEG-4 enables content-based interaction.
- Bit rate conversion includes bit rate scaling and conversion between a constant bit rate (CBR) and a variable bit rate (VBR).
- CBR constant bit rate
- VBR variable bit rate
- the basic function of bit rate scaling is to take an input bit stream and generate a scaled output bit stream that meets the new load constraints of the receiver.
- the bitstream scaler is a transcoder or filter that matches the source bitstream with the receive load.
- scaling can be accomplished by the transcoder 100.
- the transcoder has a decoder 110 and an encoder 120.
- the compressed input bit stream 101 is completely decoded at the input rate R in and then encoded at the new output rate R out 102 to produce the output bit stream 103.
- the output rate is lower than the input rate.
- the encoding of the decoded bitstream is so complex that there is no complete decoding and encoding at the transcoder.
- FIG. 2 illustrates an exemplary method.
- the image bit stream is only partially decoded.
- the macroblock of the input bit stream 201 is subjected to variable length decoding (VLD) 210.
- the input bit stream is also delayed 220 and inverse quantized (IQ) 230 to form the discrete cosine transform (DCT) coefficients.
- IQ inverse quantized
- DCT discrete cosine transform
- the partially decoded data is analyzed and at 240 and 250 a new set of quantizers is applied to the DCT block.
- These requantized blocks can then be variable length coded (VLC) 260 to form a lower rate new output bit stream 203.
- VLC variable length coded
- the number of bits allocated to encode texture information is controlled by a quantization parameter (QP).
- QP quantization parameter
- the above documents are similar in that the rate of texture bits is reduced by changing QP based on information contained in the original bit stream.
- the information is usually extracted directly in the compressed domain and may include measurements related to the motion of the macroblock or residual energy of the DCT block. This type of analysis can be found in bit allocation analyzers.
- the bitstream may be pre-processed, but it is also important that the transcoder operates in real time. Therefore, a large processing delay on the bit stream cannot be tolerated.
- a transcoder extracts information from a group of frames, and then triggers the content based on this leakhead information. Lance coding is not feasible. This does not work for live broadcasts or video conferences. With better bit allocation, it is possible to achieve better transcoding in terms of quality, but such real-time implementation is not practical.
- Such a concept of a space-time trade-off can also be considered in the encoder.
- not all image coding standards support frame skipping.
- the group-of-victory (GOP) structure is predetermined. That is, the intra-frame period and distance between anchor frames are fixed. As a result, all pictures must be encoded.
- syntax allows for skipping macroblocks. If all macroblocks in a frame are skipped, the frame is effectively skipped. At least one bit is used for each macroblock in the frame to indicate this skipping. This can be inefficient for some bit rates.
- the H.263 and MPEG-4 standards allow for frame skipping. Both standards support a syntax that allows references to be specified. However, these frame skippings are mainly used to satisfy buffer constraints. In other words, the buffer occupancy is too high and the risk of overflow In some cases, the encoder skips frames, reduces the flow of bits to the buffer, and gives the buffer some time to send its current bit to the buffer.
- this spatial one-hour trade-off control method has received minimal attention.
- the information available in the transcoder to make such decisions is quite different from the encoder information.
- the following describes how to make such a trade-off in the transcoder.
- the transcoder has some alternative means of transmitting the information contained in the bitstream to adapt to the reduced available bit rate. Must be found.
- MPEG-7 Multimedia Content Description System
- this standard can be used to describe descriptor sets, and various types of multimedia content.
- MPEG-7 The primary application of MPEG-7 is expected to be search and retrieval applications. See "MPEG-7 Applications, ISO / IEC N2861, July 1999.
- the user can specify some attributes of a particular object. In this low-level representation, these attributes Can include descriptors that describe the texture, motion, and shape of a particular object.
- a method for displaying and comparing shapes is described in US Patent Application No. 09 / 09,094, filed June 4, 1999 to / 3 2 6, 7 5 9 r Method for Ordering Image Spaces to Represent Object Shapesj.A method for describing motion activity is described in a U.S. patent application by Divakaran et al., Filed on September 27, 1999. No.
- These descriptors and description schemes provided by the MPEG-7 standard allow access to properties of image content that cannot be retrieved by a transcoder. For example, these properties can display lookahead information that is presumed to be inaccessible to the transcoder. The only reason that transcoders access these properties is because This is because the properties were derived from the initial content (ie, the content was pre-processed and stored in the database along with its associated data).
- syntactic information refers to the physical and logical aspects of the content signal
- semantic information refers to the conceptual meaning of the content.
- syntactic elements can relate to the color, shape, and motion of a particular object.
- semantic elements can refer to information that cannot be extracted from low-level descriptors, such as the time and place of an event or the name of a person in an image sequence.
- a generator simulates network constraints and user device constraints.
- the classifier is connected to receive the input compressed image and the constraints.
- the classifier generates content information from the features of the input compressed image.
- the manager generates multiple conversion modes depending on constraints and content information, and the transcoder generates one output compressed image for each of the multiple conversion modes.
- Figure 1 is a block diagram of a conventional transcoder.
- Figure 2 is a block diagram of a conventional partial decoder / encoder
- FIG. 3 is a block diagram of an adaptive bit stream distribution system according to the present invention.
- Figure 4 shows a block diagram of the applicable transcoder and transcoder manager
- FIG. 5 shows the traffic that can be used by the transcoder and manager of FIG. Graph of the sliding function
- Figure 6 shows a block diagram of object-based bitstream scaling
- Figure 7 shows the search space graph
- FIG. 8 is a block diagram showing details of an object-based transcoder according to the present invention.
- Fig. 9 is a block diagram of feature extraction by Kureppel
- FIG. 10 is a block diagram of an image content classifier having three stages.
- FIG. 11 is a block diagram of a descriptor scheme.
- FIG. 12 is a block diagram of transcoding by the descriptor method shown in FIG.
- Fig. 13 is a transcoding block diagram based on the descriptor method shown in Fig. 11 b.
- Figure 14 is a block diagram of a system for generating content summaries and changes in content according to the content summaries
- Figure 15 is a graph of the transcoding function based on the content summary and content changes of Figure 14, the best mode for carrying out the invention.
- the compressed input bit stream can be converted or “scaled” to an output bit stream compressed at a target rate (ie, available bit rate (ABR) on the network).
- ABR available bit rate
- the evening bit rate of the output bit stream is less than the rate of the input bit stream.
- our transcoder task is usually
- This paper describes a transcoding technique that can maintain the quality of bitstream content and achieve an overnight get rate.
- a conventional frame-based transcoding technique may be defined as a continuous transform.
- the output is always the sequence of frames that best represents the input sequence, since conventional techniques continuously attempt to maintain the best trade-off in spatial versus temporal quality.
- To meet rate constraints when a particular frame is skipped, the information contained in the skipped frame is not considered. If enough frames are skipped, the received bitstream will be meaningless to the user, or at best unsatisfactory.
- distortion is usually dependent on the noise ratio, such as the beak signal (PSNR). Taken as an upcoming arbitrary distortion metric. In such a conversion, the distortion is not a measure of how well the bitstream content is transmitted, but rather the bit-to-bit between the original input bitstream and the reconstructed output bitstream. (Ie, quality).
- PSNR beak signal
- One embodiment for transcoding a bit sequence under low bit rate constraints summarizes the contents of a bit stream having a small number of frames. In this way, we do not use traditional distortion metrics that focus on quality. Rather, it employs a new measurement called fidelity. Fidelity takes into account the semantics and syntax of the content. Semantics and syntax do not mean bits or pixels, but rather concepts that are meaningful to humans represented by bits, such as words, sounds, levels of humor and image actions, image objects, etc. .
- Fidelity can be defined in many ways. However, fidelity, as defined here, is not related to traditional quantitative quality, eg, differences between bits. Rather, fidelity measures the degree to which one frame or any number of frames conveys the information contained in the original image sequence, that is, the content or higher-level meaning of the conveyed information, It does not measure bits.
- Fidelity is a more subjective or semantic measure than traditional distortion metrics.
- fidelity is a useful measure to evaluate the performance of non-traditional transcoders. Because the output of our transcoder according to one embodiment is a limited set of relatively high quality frames that attempt to summarize the entire bit sequence, we call this type of transcoder a "discrete summary transcoder". Call.
- the video can be lost because of the selective sampling of abundant frames.
- the rate distortion performance of the continuous transform transcoder is severely degraded or the target rate cannot be met.
- discrete digest transcoding is used.
- conventional continuous transcoders lose smooth video (performance). This is because the frame rate is so low that the information delivery rate causes the image to be jerky (a phenomenon called jerky) and gives users discomfort.
- the main advantage of discrete-summarizing transcoding over conventional continuous-transcoding is that continuous-transcoding transcoders, which are subject to severe rate constraints, drop information-rich frames, while discrete-summarizing transcoders have less information. Try to select abundant frames.
- a content network device (CND) manager is described to control which transcoder is best for a given situation.
- the purpose of the CND manager is to select which transcoder to use. The selection is based on data obtained from content, network and user device characteristics. We can also simulate these device characteristics in "offline” mode, alter the bitstream and deliver it later.
- the adaptive bitstream distribution system 300 has four main components: a content classifier 310, a model predictor 320, and a It has a content network device manager 330 and a switchable transcoder 340.
- the goal of the system 300 is to deliver the compressed bitstream 301 with the information content to the user device 360 through the network 350.
- the bitstream content can be visual data, audio data, text data, natural data, synthetic data, basic data, compound data, or a combination thereof.
- the network may be wireless, packet switched, or other network with unpredictable operating characteristics.
- the user device may be an image receiver, a fixed or mobile radio receiver, or similar other user device with internal resource constraints that may make it difficult to receive the quality of the bitstream.
- the system maintains the semantic fidelity of the content even when the bitstream needs to be further compressed to satisfy network and user device characteristics.
- the input compressed bitstream is directed to transcoders and content classifiers.
- the transcoder may eventually reduce the rate of the output compressed bitstream 309 directed to the user device through the network.
- the content classifier 310 extracts content information (CI) 302 from the input bit stream for the manager.
- the main function of the content classifier is to convert the semantic features of the content characteristics, such as motion activity, image change information and textures, into the parameters used by the content network manager to create rate-quality tradeoffs. Mapping to a set. To assist in this mapping function, the content classifier can also receive the message information 303.
- Metadata can be low-level and high-level.
- Metade includes descriptors and description schemes specified by the emerging MPEG-7 standard.
- the model predictor 320 provides real-time feedback 321, relating to the dynamics of the network 350, and possibly constraining the characteristics of the user device 360.
- the predictor is Report congestion and available bit rate (ABR).
- ABR available bit rate
- the predictor also receives and translates feedback about the packet loss ratio in the network.
- the predictor estimates the current network conditions and long-term network predictions 3 2 1.
- user devices may have limited resources. For example, processing power, memory, and display constraints. For example, if the user device is a cellular phone, its display may be constrained to textual information or low resolution images, or worse, just audio. These properties can also influence the choice of transcoding mode.
- the manager 330 In addition to receiving the message, the manager 330 also receives inputs from both the content classifier 310 and the model predictor 320. CND combines the output data from these two sources so that the optimal transcoding strategy for the switchable transcoder 340 is determined.
- CND Content Classifier:
- classification can be achieved by extracting features from various levels of images. For example, program features, shot features, frame features, and features of sub-regions within a frame. The features themselves can be extracted using sophisticated transformations or simple mouth-to-mouth operations. Regardless of how the features are extracted, given a feature space of dimension N, each pattern can be represented as a point in this feature space.
- the content classifier 310 operates in three stages (1, I I, and I I 3 1 1 to 3 13). First, classify the bitstream content so that higher-level semantics can be inferred, and second, adapt the classified content to network and user device characteristics.
- the first stage (1) 311 extracts a number of low-level features (eg, motion activity, texture, or DCT coefficients) from the compressed bitstream using conventional techniques. Also, it is possible to access the metadata 303 such as the MPEG-7 descriptor and description method. Not much action is needed on compressed bitstreams if the media is available.
- the end result of this first stage is that a predetermined set of content features is mapped to a semantic class or a limited set of high-level media. Furthermore, within each semantic class, a distinction is made based on coding complexity (ie, complexity is conditioned on semantic class and network characteristics, and possibly device characteristics).
- CI 302 partially characterizes the potential performance of this embodiment of a switchable transcoder.
- the content network device (CND) manager 330 and transcoder 340 are shown in more detail in FIG.
- the CND manager has a discrete continuous control 431 and a content network device (CND) integrator 4432.
- the transcoder 340 has a plurality of transcoders 441 to 443.
- the control 431 uses a switch 450 to control the input compressed bitstream 3 0 1, for example, with a discrete digest transcoder 4 4 1, a continuous transform transcoder 4 4 2, or some other transcoder 4 4 3. Has the function of determining how to be transcoded.
- the network content manager also dynamically adapts to the overnight gate rate for the transcoder and considers resources that constrain the characteristics of the network and user devices. These two very important items are determined by control 431.
- Figure 5 shows several rate-quality functions graphically over the rate 501 and quality 502 scales. I have.
- One write quality function of the continuous transform transcoder 4 4 2 is represented by the convex function 5 0 3.
- the rate quality curve for the discrete summarization transcoder 4 4 1 is represented by a linear function 5 0 4.
- Other transcoders may have different functions.
- transcoding techniques produce crossover. For rates greater than ⁇ , it is best to use a continuous transform transcoder, and for rates less than ⁇ , use a discrete digest transcoder. Of course, crossover points change dynamically as content and network characteristics change.
- a continuous transform transcoder typically assumes conventional distortion metrics such as PSNRR. Since such measurements do not apply to our discrete summarization transcoder, it is more reasonable to map traditional distortion metrics to “fidelity” measurements. Fidelity measures how well the content is semantically summarized, not the quantitative differences between bits. Given the same quality metrics, avoid inconsistencies in determining the best transcoding strategy.
- the CND integrator 4 3 2 is a content classifier.
- the CND manager uses the mapping from the content classifier.
- bit rate feed pack 3 51 output from the CI and switchable transcoder 340. Using this information, the integrator selects the optimal modeling function 505 with the specific model parameters. Rate Feedback 3 51 is used to improve the parameters dynamically. If the integrator discovers that the selected model is not optimal, the integrator can make the decision to dynamically switch the rate-quality function. Also, Integre may track several functions for different objects or different bitstreams and consider the functions individually or together.
- the network prediction 3 21 may affect these characteristic functions by modulating a particular part of the optimal curve 505 in one or the other direction. For example, where higher bit rates are available, the most care must be taken.
- the network model allows a large number of bits to be consumed at a particular time, but the long-term effects show that it can quickly become congested, so our system is constrained to operate at a lower rate. You may choose to continue. In this way, the problem of a sudden drop in the available bit rate is avoided.
- the emphasis in this embodiment is to enable dynamic selection of transcoding strategies that provide the best distribution of semantic content of the bitstream, and how the actual transcoding is It does not run. So far, the different types of trade-offs that can be made by switchable transcoders, including continuous transform transcoders and discrete summarization transcoders, have been described. For each of these transcoders, an optimal rate quality curve is assumed.
- the novelty of our system is that it is possible to transcode a large number of objects of varying complexity and size, but more importantly, the overall quality of the image Is that a one-hour trade-off in space can be performed in order to optimize Focuses on object-based bitstreams due to the added flexibility. It also describes the various tools that can be used to manipulate the quality of a particular object.
- the texture data of one object can be reduced without changing the shape information, while the shape information of another object is reduced without changing the texture information.
- Many other combinations are also conceivable, including doping frames.
- a news clip for example, it is possible to reduce the frame rate along with the texture and shape bits for the background without changing the information associated with the foreground newscaster.
- the "quality" of a bitstream is measured as the difference between the bits between the input bitstream and the output bitstream.
- object-based transcoding according to the present invention, there is no longer any restriction on manipulating the entire image. Transcode the bitstream decomposed into meaningful image objects. It is understood that the distribution of each object, together with the quality of each object, has a different effect on the quality as a whole. Because our object-based approach has such a finer level of access, the spatial-temporal quality level of an object can be reduced without significantly affecting the overall quality of the stream. It can be reduced. This is a completely different strategy than that used by traditional frame-based transcoders.
- Perceived image quality in contrast to traditional bitstream quality, which measures the differences between bits in the entire image, regardless of content.
- Perceived image quality is related to the quality of objects in the image that convey the desired information. For example, the image background can be completely lost without affecting the perceived image quality of the more important foreground objects.
- FIG. 6 shows a high-level block diagram of an object-based transcoder 600 according to another embodiment of the present invention.
- the transcoder 600 has a demultiplexer 601, a multiplexer 602, and an output buffer 603.
- the transcoder 600 also has a transcoder 800 based on one or more objects operated by a transcoding control unit (TCU) 6100 according to the control information 604.
- TCU transcoding control unit
- the unit 610 has a shape, texture, time and space analyzer 611 to 614.
- the input compressed bitstream 605 to the transcoder 600 has a basic bitstream based on one or more objects.
- the bit stream based on objects can be serial or parallel. All bit rates of bit stream 605 are R in.
- the output compressed bit stream 600 from the transcoder 600 is a full bit stream such that R out ⁇ R in. G has R out.
- the demultiplexer 601 provides one or more elementary bitstreams to each of the object-based transcoders 800, and the object-based transcoder 800 converts the object Provide to 610.
- Transcoder 800 scales the basic bit stream. Before being passed to the output buffer 603, the scaled bit stream is configured in the multiplexer 602, from where it is passed to the receiver. Buffer 606 also provides rate feedback information 608 to the TCU.
- the control information 604 passed to each of the transcoders 800 is provided by the TCU.
- the TCU has the function of analyzing not only time and space resolution but also texture and shape data. All of these new degrees of freedom make the object-based transcoding framework very specific and desirable for network applications.
- MPEG-4 utilizes spatio-temporal image redundancy using motion compensation and DCT.
- the core of the object-based transcoder 800 is the application of the MPEG-2 transcoder described above. The main difference is that the shape information is included in the bitstream, and tools for predicting DC and AC between blocks are provided for texture coding.
- texture transcoding relies on shape data. In other words, the shape data is simply parsed and not ignored.
- the compliant bit stream syntax depends on the decoded shape data.
- the use of texture models for rate control in encoders has been extensively described in the prior art. See, e.g., "MPEG-4 rate control for multiple video objects," IEEE Trans, on Circuits and Systems for Video Technology, February 1999, and references therein by Vetro et al.
- the variable R represents the texture bits consumed by the image object (VO)
- the variable Q represents the quantization parameter QP
- the variable (XX 2 ) The first and second order model parameters are shown
- the variable S shows the coding complexity such as the average absolute difference.
- the transcoding problem differs in that _g, the original set of QPs, and the actual number of bits are already given. Also, rather than computing the coding complexity S from the spatial domain, a new DCT-based complexity measurement tilde S must be defined. This measurement is
- a macroblock index in the set M of coded blocks, M c is the number of proc in the set of that, / o (i) is the weighting der frequency dependent You.
- the complexity measure shows the energy of the AC coefficient, where the contribution of the high frequency components is reduced by the weighting function.
- the weighting function may be selected to mimic the function of the MPEG quantization matrix.
- the model parameters can be determined and updated continuously. In fact, the model can be updated twice for each transcoded V OP. Once before transcoding with the data in the bitstream, and after encoding the texture with a new set of QPs, _0_. As the number of data points increases, the model parameters become more powerful and gather more quickly.
- conditional distortion Can be defined as
- k indicates a VOP index in the VOP set ⁇
- k indicates the visual significance or priority of the object k.
- D (Q) is not explicitly specified, but is known to be proportional to Q.
- Visual significance can be a function of the object for size and complexity.
- the solution space is limited to the effective solution space shown in FIG.
- the X axis indicates the image object 701
- the y axis indicates QP.
- the figure also shows a valid search space 710, a restricted search space 711, a valid path 712, and an invalid path 713.
- Q P are the nodes in the trellis, and each node is associated with an estimated rate and conditional distortion.
- the problem can be described as:
- skipping frames In general, the purpose of skipping frames is to reduce the buffer occupancy level so that the buffer overflows and ultimately prevents packet loss. Another reason for skipping frames is to allow a trade-off between spatial and temporal quality. In this way, fewer frames are coded, but they are coded with higher quality. As a result, if the buffer is not at risk of over-burping, the decision to skip a frame is
- This space-time trade-off is achieved by constraining the solution space by searching for an effective solution space for the set of QPs, building from the proposed techniques for QP selection. As shown in Fig. 7, a valid route is one in which all elements of 'enter the restricted area. If one of these factors goes out of the area, the path is invalid in that it does not maintain a specified level of spatial quality. Spatial quality is implied by conditional distortion.
- the maximum value may be a function of the complexity of the object, or may simply be a percentage of the input QP. If the maximum is based on complexity, the transcoder effectively limits the objects with higher complexity to smaller QPs. Because their effects on spatial quality are the most serious. On the other hand, limiting the complexity based on the input QP means that the transcoder maintains a similar QP distribution compared to the originally encoded bitstream. Both approaches are valid. Q for each object
- the trade-off that determines the best way to limit P can depend on the trade-off between spatial and temporal quality.
- V L D variable length decoding
- MPEG-4 the shape data is coded block by block using a so-called context-based mathematical coding algorithm. See Brady, "MPEG-4 standardization methods for the compression of arbitrarily shaped objects", IEEE Trans Circuits and Systems for Video Technology, December 1999.
- the context for each pixel is determined by the selected mode. The context is calculated based on a 9-bit or 10-bit causal template, depending on the probabilities. Used to access.
- DRC Dynamic Resolution Conversion
- FIG. 8 shows the components of an object-based transcoder 800 according to the present invention.
- the syntax of the coding standard dictates some of the architecture of the transcoder 800.
- the transcoder 800 includes a V0L / V0P parser 810, a shape scaler 820, an MB header parser 830, a motion parser 840, and a texture scaler 850.
- the transcoder also has a bus 860 that transfers various parts of the elementary bit stream 801 to a bit stream memory 870. From this comprehensive storage, the basic bitstream composition unit 880 may form a reduced rate compressed bitstream according to the MPEG-4 standard.
- the output basic bitstream 809 is provided to the multiplexer of FIG.
- each object is associated with a picture object layer (VOL) and a picture object plane (VOP) header.
- VOP header contains the quantum used to encode the object. Includes chemical parameters overnight (QP).
- QP chemical parameters overnight
- All other bits are stored in the bitstream memory 870 until the point of composing the output bitstream 606 of FIG.
- MPEG-4 can encode the shape of an object. From the VOP layer, find out whether VOP contains shape information (binary) or not (square). If it is a square VOPP, the object is simply a square frame and there is no need to parse the shape bits. If it is a binary shape, it is necessary to determine whether the macroblock is transparent or not. A transparent block is inside the object's bounding box, but outside the object's bounds. Therefore, there is no motion or texture information associated with it.
- the shape scaler 8 20 is composed of three sub-components: a shape decoder Z parser 8 21, a shape downsampler 8 22, and a shape encoder 8 23. If the bitstream shape information is not scaled, the shape decoder Z parser is simply a shape parser. This is indicated by the control information 6104 received from the RD shape analysis 611 of the transcoder control unit 610. In this case, the shape downsampler 82 2 and the shape encoder 8 23 are disabled. If the shape information is scaled, the shape decoder / parser 821 must first decode the shape information into a pixel domain representation.
- the block can be downsampled by a factor of 2 or 4 using a shape downsampler 82 2 and re-encoded using a shape encoder 8 23.
- the conversion ratio is determined by RD shape analysis 6 11. Regardless of whether the shape bits are simply parsed or scaled, the output of the shape scaler 820 is transferred to the bit stream memory 870 via the bit stream bus 860 Is done.
- Layer has bits with coded block pattern (CBP) I do.
- CBP is used to signal to the decoder which of the macroblocks contains at least one AC coefficient. Not only does CBP affect the structure of the bitstream, but it also affects AC and DC predictions. The reason the transcoder must be related to this parameter is that the CBP changes in response to the requantization of the DCT block. Therefore, CBP is recalculated after the block is requantized.
- the Text Scaler CBP Recalculation Unit 856 accomplishes this.
- the unit 856 transmits the variable length code (VLC) 855 to the bit stream memory 870 via the bit stream bus 860, and replaces the header existing in the input bit stream.
- VLC variable length code
- the texture block 851 is partially decoded.
- the result of this process is the DCT block coefficients.
- objects can be downsampled by a factor of two or four.
- the ability to downsample the block is indicated by the downsampling factor from the transcoding control unit 610 and the spatial analysis 614.
- this downsampling is performed in the DCT domain so that ID CTZD CT operation can be avoided. See U.S. Pat. No. 5,855,151, "Method and apparatus for down-converting a digital signal," filed Nov. 10, 1998, by Bao et al.
- the DCT block is temporarily stored in the coefficient memory 853. From this memory, the blocks are sent to a quantizer 854.
- Quantizer 854 quantizes the block according to the QP transmitted from RD texture analysis 612 using the techniques described in this invention that match the new target rate.
- temporal analysis 613 indicates to bitstream configuration 880 which bits are to be configured and transmitted and which bits to drop.
- bitstream configuration 880 which bits are to be configured and transmitted and which bits to drop.
- the portion of the bitstream written to this memory is simply overwritten by the next image object's data.
- an image can be partitioned into a coarse-to-fine hierarchy 900, as shown in FIG.
- the image program or session 9100 is considered to be the highest level of the hierarchy 900. This level may represent 30 minutes of full-time programming from a news program or broadcast network.
- Program 910 is shot S0t—1,..., Shot
- the next level 920 is divided into shots.
- a “shot” is a group of frames (G 0 F) or a group of image object planes (GOV). 2 1—9 2 9 This level represents a smaller image segment that starts when the camera is turned on and continues until the camera is turned off. To avoid confusion, we will simply call this level shot level 920.
- a shot is composed of 0 0? Frame 930 and the most basic unit of GOV or image object plane (VOP) 931. Other lower levels can also be considered. This refers to the sub-regions 941 to 942 of the frame or VOP.
- a feature extraction process 901-1904 is applied to the image data. It goes without saying that the data at each level is arranged in a different manner, and the relevant features vary from level to level, so different feature extraction techniques are applied to each level. That is, program-level features are extracted in a different manner than frame features.
- these features represent "hints” or “queues” 905-908 that can be applied to the transcoding system.
- hints are either semantic or syntactic, and can represent either a high-level or low-level metade.
- messaging can be applied to transcoding at any given level.
- the method for higher level data such as shot level is used for classification, bit allocation, and rate-quality considerations for that particular shot and among other shots. in this case
- transcoders are limited to transcoders, but the CND manager in Figure 3 determines the transcoding strategy in all output content.
- a message schedule for lower levels of data such as the object level may be more useful to the transcoder 340 itself in helping dynamic bit allocation. This is because it is difficult to classify and manage output content at such a low level.
- hybrid discrete summarization and continuous transform transcoders Once again, we describe techniques that focus primarily on using high-level (shot-level) messaging in CND managers. However, such meta-descriptions in discrete summarization transcoders can also be considered. Finally, it describes how to use transcoding to guide transcoding. As noted above, this is equally applicable to both the management and transcoding stages.
- Content Classifier 310 are content characteristics such as activity, image change information, and texture. Is to map the set of parameters used to make the trade-off trade-offs. To assist in this mapping function, the content classifier also accepts metadata information 303.
- An example of a message is the descriptor and description system (DS) specified by the emerging MPEG-7 standard.
- stage I I I 3 13 of the content classifier 310 such low-level media data is mapped to rate-quality characteristics that depend only on the content.
- FIG. 10 illustrates this. Rate-quality characteristics affect the rate-quality function shown in Figure 5.
- the content classifier 310 receives the low-level message 303.
- Stage I311 extracts high-level metadata or class 1001.
- Stage II 312 uses predictions 3 2 1 to determine content, network, and device dependent rate quality (RQ) characteristics.
- the stage I I I 3 13 extracts the R-Q characteristic 100 3 that depends only on the low-level menu.
- the spatial distribution parameters of motion activity descriptors in MPEG-7 are similar to how motion program And whether it can be classified into the category 1 of spatial and spatial distribution.
- the news program includes several shots of the general moderator and various other shots related to the entire news story.
- FIG. 11 and Figures 12 and 13 show three shots 1 201-123, namely, a general moderator's shot, an on-site repo overnight shot, and a police trail shot.
- a news program 1 2 0 0 For the sake of simplicity, all news program shots are categorized into only one of the three categories, but it goes without saying that in a practical application the categories would be different in number and type.
- the first class, 111 indicates shots where the temporal quality of the content is less important than the spatial quality.
- the second class, 1102 shows shots where the spatial quality of the content is more important, and the third class, 1103, shows shots where the spatial and temporal quality of the shot is equally important. Is shown.
- This set of classes is called S E T — 1 1 1 1 0.
- Such classes are clearly characterized by rate and quality.
- the purpose of the content classifier stages I I I 3 13 is to process low-level features and map these features to the most appropriate of these classes. Note that the importance of spatial and temporal quality can also be evaluated on a scale of 1 to 10 or on a real-time basis of 0.0 to 1.0.
- the first class 1 1 2 1 indicates that shots can be compressed very easily, ie a large compression ratio can be easily achieved for a given distortion.
- the third class, 1 1 2 3, represents the absolute opposite. In other words, it indicates that it is very difficult to compress the shot content due to dog / complex motion or spatially active scenes.
- the second class 1 1 2 2 is about halfway between the first and third classes.
- This set of classes is called SET—2 1 1 2 0.
- these classes 110 are also managed by CND manager 330. It also illustrates the effect of content classification on the rate-quality decisions made, and how the switchable transcoder 340 may operate.
- compression difficulties can be categorized by numerical values. Of course, other sets of classes can be defined for other types of imaging programs.
- rate-quality classes 3 £ 1 and 3 £ 1-2.
- Content is classified into these classes according to the features extracted from low-level media. The following describes how these classes can be extracted from motion activity.
- FIG. 12 illustrates a transcoding strategy according to the SET-1 classification.
- the general moderator shot 1 201 is transcoded using a discrete digest transcoder 1 210. See block 4 4 1 in FIG. This transcoder reduces the entire shot 1 201 into a single frame 1 2 1 1 (ie, a still image of the general moderator). During the duration of the shot, the full audio portion of the speaking host is provided.
- the short-term shot of the field repo, 122 is continuously converted at 5 frames per second, with perfect sound, and the viewer feels some background motion. Hold.
- Police chase shots 123 are also converted continuously at 30 frames 1231 per second.
- the classification results can be interpreted differently than shown in FIG.
- SET-2 the segment is very easily compressed due to the lack of motion in the general moderator's short 1201 and is therefore classified into the first class 1121 of Set-2.
- This shot is continuously converted at a high compression rate of 1240 at 30 frames per second at 1241.
- police chase shot 1203 is more difficult to compress because it contains high motion. Therefore, the police pursuit shot 1203 is classified into the third class 1123 of Set-2.
- police chase shots 1203 are converted continuously at 7.5 frames per second 1261 1260. Again, according to the characteristics of Shot 1202 with a field reporter, it could fall into one of three classes. For illustration purposes, the in-field repo—evening shot 1202 is assigned to the second class 1122 and is continuously converted 1250 at 15 frames 1251 per second.
- hints can generate either constant rate bitstreams or variable rate bitstreams (CBR or VBR). For example, if the classification is based on hard compression (SET 2), a CBR bitstream will be generated when a low frame rate is given to a hard-to-compress frame sequence, and a VBR bitstream will have more bits allocated. Can be generated when
- these classifications suggest how the content can be manipulated. In fact, the classification can greatly reduce the number of scenarios considered. For example, if the CND manager has to consider a rate-quality trade-off for a large number of bitstreams (frames or objects) at a given time, the CND manager may decide between the continuous transform and the discrete digest transcoding. You can consider the best way to distribute transcoding responsibilities. Instead of choosing one method for all segments under consideration, it is possible to consider a hybrid approach. The priority of the program, or the difficulty of compression due to its low-level features, is an example of a useful parameter that could be used to make such a decision.
- Figures 12 and 13 show how the classification in SET-1111 and SET2111 affects the strategy determined by the CND manager and the transcoder An example of how to operate the day and night is shown. Of particular interest in Fig. 12 is that a hybrid transcoding scheme is used.
- the general moderator can be assigned a lower priority than police chase. If you are dealing with object-based images, another transcoding method is to assign a lower priority to the background of the shot 1221 than the general moderator in the foreground. All this can be achieved, for example, through the classification of the motion activity parameter (s) at the object level.
- Low-level features are considered individually or in pairs Regardless of whether they are considered together, low-level features can be used to effectively collect and categorize image content into meaningful parameters that assist CND managers and transcoders.
- CND classifier 310 and CND manager 330 appear to be inconsistent with TCU 610 in FIG. 6, but they are not.
- the classifier and CND manager try to pre-select the best strategy for the transcoder 340. Given this strategy and instructions from the manager, the transcoder has the ability to manipulate the content in the best possible way. If the transcoder is unable to satisfy the request due to incorrect predictions or the strategy chosen by the CND manager, the transcoder will provide a mechanism to handle such situations (eg, time analysis). Need. Therefore, it can also be used in the TCU. However, the purpose of Metadata overnight for TCUs is different from that for classifiers and CND managers. [Effect of Meta-Data on Transcoding on Transcoding: Effects of Meta-Data on Transcoding]
- the first method is performed in the CND manager 330, where the bit allocation derives a strategy and ultimately a decision on how to use the functions provided by the discrete digest and continuous transform transcoders 441-442. Used for Thus, the report quality function of FIG. 5 is used to make the decision.
- the second method is performed in the transcoder 340 itself. Again, it is used for estimating, but rather than determining a strategy, a real-time determination of coding parameters that can be used to match bit-rate purposes. Used to Thus, the coding parameters were chosen such that the transcoder achieved the optimal rate-quality function of FIG.
- low-level and high-level methods are used for discrete summarization and continuous transformation.
- Semantic information can be associated with content automatically or by manual annotation.
- the CND manager 330 In applications where a large number of users request various shots simultaneously, the CND manager 330 must determine how much to assign to each shot. For a discrete-summarizing transcoder 441, this rate may correspond to the number of frames transmitted, while for a continuous transform transcoder 4442, the rate may correspond to an acceptable evening get frame rate. I can do it. If the level of action indicates a particular level of temporal activity, bits may be assigned for each frame sequence according to the description of the content. For shots with high functions, the CND manager will indicate that frame rates below a certain level will not be tolerated by the continuous transform transcoder and that better quality shots will summarize the content with a discrete digest transcoder. Determines that it can be delivered.
- action level
- the process of generating high-level metadata from low-level metadata can be defined as evening encoding.
- Such an encoding process may be considered in stage I 311 in the transcoding system content classifier.
- this high-level generation process can be used in stand-alone systems.
- An example of such a stand-alone system is a system that exemplifies a description method specified by the MPEG-7 standard. Such a system may be referred to as an MPEG-7 high-level meta-decoder.
- the current proposal for MPEG-7 has a high-level description scheme, which is a placeholder for various types of messaging.
- the normative part of this standard explicitly specifies important requirements for implementation, and the informative part merely suggests a potential technology or one way to do something. A note that was noted.
- determining the appropriate motion vector or quantization parameter is considered an encoder issue, ie, the informative part of the standard.
- the standard specifies a variable length coding (VLC) table for motion vectors and a 5-bit field for quantization parameters. How these fields are used is entirely a matter of the encoder and has nothing to do with the standard, ie the informative part.
- VLC variable length coding
- SummaryDS is used to identify visual abstracts of content that are primarily used for content search and navigation.
- VariationDS is used to specify changes in content Can be.
- changes can be formed in a number of ways and reflect revisions and manipulations of the original data.
- description methods such as the Summary DS and Variati 0 n DS do not describe how to summarize or generate changes in content.
- the first major problem is that these changes must have been generated before any requests for the original image. As a result, real-time transmission is not an option. This is because the delay associated with bringing about many changes in content is too long.
- the second major problem is that network characteristics tend to change over time. Therefore, under current network conditions, selecting a particular pre-transcoded change at once is not applicable for the entire duration.
- FIG. 14 shows such an encoder generating summarization and change data, along with the associated instance of the corresponding description scheme.
- the components of the encoder are similar to those of the adaptive transcoding system 300 of FIG. However, the encoder differs in that it is not connected to a network to receive and transmit in real time during transcoding. Instead, the encoder is connected to a database where the images are stored. The encoder produces various offline image versions for later real-time distribution.
- the adaptive bitstream image delivery system 1300 has five major components: a content classifier 1310, a network device (ND) generator 1320 It has a CND manager 133, a switchable transcoder 134, and a DS instantiator 135.
- the system 1300 has its inputs and outputs connected to the database 1360.
- the system 1300 is also based on network and data
- the purpose of the distribution system 1303 is to generate a changing and / or summarizing bitstream 1308 from the original compressed bitstream (Vide0In) 1301.
- Bitstream content can be visual, audio, This can be texture data, natural data, synthetic data, basic data, compound data, or a combination thereof.
- the image distribution system 1300 is similar to the adaptive transcoder system 300. The main differences are that it is not connected to the user device 360 via the network 350 in FIG. 3, and that transcoding is not performed in real time.
- the ND generator 1350 replaces devices and networks.
- the generator has the ability to simulate network and device (ND) constraints that may exist in real-time operation.
- the ND generator may simulate a CBR channel having 64 kbps, 128 kbps, and 512 kbps, or a VBR channel.
- the generator can simulate channels that are experiencing a reduction in available bandwidth. This loss can be primary, secondary, or very sharp. Many other typical conditions may be considered, some of which may be related to user device constraints such as limited display capabilities.
- the database stores a number of input bitstream transformations 1301 so that bitstreams for some real-time operating conditions will be readily available in downstream transcoders in the future.
- the change bit stream can be both CBR and VBR.
- the purpose of the ND generator 1320 is to simulate various network device conditions and automatically generate a change Z summary 1308 of the original content 1301 according to these conditions. While doing this, the system also illustrates the corresponding description scheme 1309.
- the fields of the description scheme eg, Variant DS and Summar DS are the changing bitstreams.
- the CND manager must pass this information to DS Instantly 1350. After the change is illustrated, the corresponding description scheme can be accessed and used, for example, by the real-time transcoder 300 as described above. [Rate-Quality functions]
- the changes and / or summaries 1308 created by the system 1300 are calculated from the optimal rate—points V (1),...,. V (5) is a subset.
- Figure 15 shows the limited number of points. These points indicate the optimal operating point for a particular change.
- Each change has an associated exemplified description scheme (DS) 1309. Both the changing bitstream 13 08 and the illustrated description scheme 13 09 are stored in the database 13 60 together with the original image stream 13 01.
- the selector 1370 of the system 1300 receives a request for a particular image program.
- the selector provides information about the available changes and the associated DS stored in the data pace 1360.
- the CND manager of the transcoder 300 utilizes this pre-transcoded data.
- High-level metadata allows transcoders to correlate specific changes in the requested image with current real-time network and device constraints.
- the CND manager requests that certain changes be sent by the selector over network 350.
- transcoder 340 can operate in bypass mode. If a close match is found, transcoder 340 may operate more efficiently.
- bitstreams 13 08 This is just one example of a practical application. It is also possible to further manipulate and modify already operated bitstreams 13 08 to increase compliance with current network and device constraints. This creates a large number of pre-transcoded bitstreams that cover a very wide range of conditions, while a small number of pre-transcoded bitstreams covers some of the most common conditions. Is a problem in generating Different levels of quality can be predicted from each approach. Because the distribution system under relaxed time constraints 1 3
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP01902702A EP1248466A4 (en) | 2000-04-11 | 2001-01-31 | PROCESS AND DEVICE FOR TRANSCODING COMPRESSED IMAGE DATA |
AU30548/01A AU3054801A (en) | 2000-04-11 | 2001-01-31 | Method and apparatus for transcoding of compressed image |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/546,717 US6490320B1 (en) | 2000-02-02 | 2000-04-11 | Adaptable bitstream video delivery system |
US09/546,717 | 2000-04-11 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2001078399A1 true WO2001078399A1 (en) | 2001-10-18 |
Family
ID=24181700
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2001/000662 WO2001078399A1 (en) | 2000-04-11 | 2001-01-31 | Method and apparatus for transcoding of compressed image |
Country Status (4)
Country | Link |
---|---|
US (1) | US6490320B1 (ja) |
EP (1) | EP1248466A4 (ja) |
AU (1) | AU3054801A (ja) |
WO (1) | WO2001078399A1 (ja) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006099565A (ja) * | 2004-09-30 | 2006-04-13 | Kddi Corp | コンテンツ識別装置 |
JP2006246008A (ja) * | 2005-03-03 | 2006-09-14 | Ntt Docomo Inc | 映像トランスコードシステム、映像取得装置、トランスコーダ装置、及び、映像トランスコーディング方法 |
JP2008521293A (ja) * | 2004-11-15 | 2008-06-19 | スミス マイクロ ソフトウエア,インコーポレイテッド | 既圧縮ファイルのロスレス圧縮システムおよび方法 |
Families Citing this family (126)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8028314B1 (en) | 2000-05-26 | 2011-09-27 | Sharp Laboratories Of America, Inc. | Audiovisual information management system |
TW519840B (en) * | 2000-06-02 | 2003-02-01 | Sony Corp | Image coding apparatus and method, image decoding apparatus and method, and recording medium |
US20020120780A1 (en) * | 2000-07-11 | 2002-08-29 | Sony Corporation | Two-staged mapping for application specific markup and binary encoding |
EP1303987A1 (en) * | 2000-07-13 | 2003-04-23 | Koninklijke Philips Electronics N.V. | Mpeg-4 encoder and output coded signal of such an encoder |
US6697523B1 (en) * | 2000-08-09 | 2004-02-24 | Mitsubishi Electric Research Laboratories, Inc. | Method for summarizing a video using motion and color descriptors |
JP2002064802A (ja) * | 2000-08-21 | 2002-02-28 | Sony Corp | データ伝送システム、データ伝送装置及び方法、シーン記述処理装置及び方法 |
US8020183B2 (en) | 2000-09-14 | 2011-09-13 | Sharp Laboratories Of America, Inc. | Audiovisual management system |
US6904094B1 (en) * | 2000-09-20 | 2005-06-07 | General Instrument Corporation | Processing mode selection for channels in a video multi-processor system |
US7039115B1 (en) * | 2000-09-20 | 2006-05-02 | General Instrument Corporation | Processor allocation for channels in a video multi-processor system |
US7398275B2 (en) * | 2000-10-20 | 2008-07-08 | Sony Corporation | Efficient binary coding scheme for multimedia content descriptions |
WO2002043396A2 (en) * | 2000-11-27 | 2002-05-30 | Intellocity Usa, Inc. | System and method for providing an omnimedia package |
JP4534106B2 (ja) * | 2000-12-26 | 2010-09-01 | 日本電気株式会社 | 動画像符号化システム及び方法 |
US20030038796A1 (en) * | 2001-02-15 | 2003-02-27 | Van Beek Petrus J.L. | Segmentation metadata for audio-visual content |
US6520032B2 (en) * | 2001-03-27 | 2003-02-18 | Trw Vehicle Safety Systems Inc. | Seat belt tension sensing apparatus |
US6925501B2 (en) * | 2001-04-17 | 2005-08-02 | General Instrument Corporation | Multi-rate transcoder for digital streams |
US6895050B2 (en) * | 2001-04-19 | 2005-05-17 | Jungwoo Lee | Apparatus and method for allocating bits temporaly between frames in a coding system |
US7904814B2 (en) | 2001-04-19 | 2011-03-08 | Sharp Laboratories Of America, Inc. | System for presenting audio-video content |
US20030018599A1 (en) * | 2001-04-23 | 2003-01-23 | Weeks Michael C. | Embedding a wavelet transform within a neural network |
US7237033B2 (en) | 2001-04-30 | 2007-06-26 | Aol Llc | Duplicating switch for streaming data units to a terminal |
US8572278B2 (en) | 2001-04-30 | 2013-10-29 | Facebook, Inc. | Generating multiple data streams from a single data source |
US7124166B2 (en) * | 2001-04-30 | 2006-10-17 | Aol Llc | Duplicating digital streams for digital conferencing using switching technologies |
JP3866538B2 (ja) * | 2001-06-29 | 2007-01-10 | 株式会社東芝 | 動画像符号化方法及び装置 |
US7474698B2 (en) | 2001-10-19 | 2009-01-06 | Sharp Laboratories Of America, Inc. | Identification of replay segments |
US6944616B2 (en) * | 2001-11-28 | 2005-09-13 | Pavilion Technologies, Inc. | System and method for historical database training of support vector machines |
US20030110297A1 (en) * | 2001-12-12 | 2003-06-12 | Tabatabai Ali J. | Transforming multimedia data for delivery to multiple heterogeneous devices |
WO2003052981A1 (en) * | 2001-12-14 | 2003-06-26 | The Texas A & M University System | System for actively controlling distributed applications |
US20030169816A1 (en) * | 2002-01-22 | 2003-09-11 | Limin Wang | Adaptive universal variable length codeword coding for digital video content |
FR2837330B1 (fr) * | 2002-03-14 | 2004-12-10 | Canon Kk | Procede et dispositif de selection d'une methode de transcodage parmi un ensemble de methodes de transcodage |
US8214741B2 (en) | 2002-03-19 | 2012-07-03 | Sharp Laboratories Of America, Inc. | Synchronization of video and data |
DE10218812A1 (de) * | 2002-04-26 | 2003-11-20 | Siemens Ag | Generische Datenstrombeschreibung |
US8028092B2 (en) | 2002-06-28 | 2011-09-27 | Aol Inc. | Inserting advertising content |
US7657907B2 (en) | 2002-09-30 | 2010-02-02 | Sharp Laboratories Of America, Inc. | Automatic user profiling |
KR100498332B1 (ko) * | 2002-10-24 | 2005-07-01 | 엘지전자 주식회사 | 비디오 트랜스코더의 적응적 비트율 제어장치 및 방법 |
US7042943B2 (en) | 2002-11-08 | 2006-05-09 | Apple Computer, Inc. | Method and apparatus for control of rate-distortion tradeoff by mode selection in video encoders |
SG111978A1 (en) * | 2002-11-20 | 2005-06-29 | Victor Company Of Japan | An mpeg-4 live unicast video streaming system in wireless network with end-to-end bitrate-based congestion control |
JP2004178332A (ja) * | 2002-11-28 | 2004-06-24 | Satake Corp | コンテンツ変換制御方法及びコンテンツ利用システム |
AU2003303116A1 (en) * | 2002-12-19 | 2004-07-14 | Koninklijke Philips Electronics N.V. | A residential gateway system having a handheld controller with a display for displaying video signals |
KR20050087842A (ko) * | 2002-12-20 | 2005-08-31 | 코닌클리케 필립스 일렉트로닉스 엔.브이. | 오디오-비주얼 데이터의 스트림 기록 방법 |
US7194035B2 (en) * | 2003-01-08 | 2007-03-20 | Apple Computer, Inc. | Method and apparatus for improved coding mode selection |
EP1443776B1 (en) * | 2003-01-29 | 2012-08-15 | Sony Deutschland GmbH | Video signal processing system |
JP4539018B2 (ja) * | 2003-03-04 | 2010-09-08 | ソニー株式会社 | 送信制御装置および方法、記録媒体、並びにプログラム |
US7142601B2 (en) * | 2003-04-14 | 2006-11-28 | Mitsubishi Electric Research Laboratories, Inc. | Transcoding compressed videos to reducing resolution videos |
FR2857198B1 (fr) * | 2003-07-03 | 2005-08-26 | Canon Kk | Optimisation de qualite de service dans la distribution de flux de donnees numeriques |
JP2005045357A (ja) * | 2003-07-23 | 2005-02-17 | Hitachi Ltd | リモートディスプレイプロトコル、映像表示システム及び端末装置 |
US7898951B2 (en) * | 2003-08-13 | 2011-03-01 | Jones Farm Technology 2, Llc | Encoding and transmitting variable bit streams with utilization of a constrained bit-rate channel |
US7330509B2 (en) * | 2003-09-12 | 2008-02-12 | International Business Machines Corporation | Method for video transcoding with adaptive frame rate control |
US7535959B2 (en) * | 2003-10-16 | 2009-05-19 | Nvidia Corporation | Apparatus, system, and method for video encoder rate control |
TWI244323B (en) * | 2003-10-31 | 2005-11-21 | Benq Corp | Method for transmitting video and the device thereof |
TWI262660B (en) * | 2003-11-19 | 2006-09-21 | Inst Information Industry | Video transcoder adaptively reducing frame rate |
ES2445333T3 (es) * | 2004-01-08 | 2014-03-03 | Entropic Communications, Inc. | Distribución de vectores candidatos basada en complejidad de movimiento local |
US20050163378A1 (en) * | 2004-01-22 | 2005-07-28 | Jau-Yuen Chen | EXIF-based imaged feature set for content engine |
TWI230547B (en) * | 2004-02-04 | 2005-04-01 | Ind Tech Res Inst | Low-complexity spatial downscaling video transcoder and method thereof |
US8949899B2 (en) | 2005-03-04 | 2015-02-03 | Sharp Laboratories Of America, Inc. | Collaborative recommendation system |
US8356317B2 (en) | 2004-03-04 | 2013-01-15 | Sharp Laboratories Of America, Inc. | Presence based technology |
US20050201469A1 (en) * | 2004-03-11 | 2005-09-15 | John Sievers | Method and apparatus for improving the average image refresh rate in a compressed video bitstream |
US7983835B2 (en) | 2004-11-03 | 2011-07-19 | Lagassey Paul J | Modular intelligent transportation system |
KR100967125B1 (ko) * | 2004-03-26 | 2010-07-05 | 노키아 코포레이션 | 네트워크 휴대용 장치에서의 특징 추출 |
US20050215239A1 (en) * | 2004-03-26 | 2005-09-29 | Nokia Corporation | Feature extraction in a networked portable device |
US8406293B2 (en) | 2004-06-27 | 2013-03-26 | Apple Inc. | Multi-pass video encoding based on different quantization parameters |
US8005139B2 (en) | 2004-06-27 | 2011-08-23 | Apple Inc. | Encoding with visual masking |
US20060015799A1 (en) * | 2004-07-13 | 2006-01-19 | Sung Chih-Ta S | Proxy-based error tracking for real-time video transmission in mobile environments |
US20060062312A1 (en) * | 2004-09-22 | 2006-03-23 | Yen-Chi Lee | Video demultiplexer and decoder with efficient data recovery |
US20060088105A1 (en) * | 2004-10-27 | 2006-04-27 | Bo Shen | Method and system for generating multiple transcoded outputs based on a single input |
US7945535B2 (en) * | 2004-12-13 | 2011-05-17 | Microsoft Corporation | Automatic publishing of digital content |
EP1832116A1 (en) | 2004-12-22 | 2007-09-12 | Koninklijke Philips Electronics N.V. | Video stream modifier |
US8780957B2 (en) | 2005-01-14 | 2014-07-15 | Qualcomm Incorporated | Optimal weights for MMSE space-time equalizer of multicode CDMA system |
AU2006223416A1 (en) | 2005-03-10 | 2006-09-21 | Qualcomm Incorporated | Content adaptive multimedia processing |
US20060235883A1 (en) * | 2005-04-18 | 2006-10-19 | Krebs Mark S | Multimedia system for mobile client platforms |
US8208536B2 (en) * | 2005-04-28 | 2012-06-26 | Apple Inc. | Method and apparatus for encoding using single pass rate controller |
US7548657B2 (en) * | 2005-06-25 | 2009-06-16 | General Electric Company | Adaptive video compression of graphical user interfaces using application metadata |
JP4839035B2 (ja) * | 2005-07-22 | 2011-12-14 | オリンパス株式会社 | 内視鏡用処置具および内視鏡システム |
US20070074251A1 (en) * | 2005-09-27 | 2007-03-29 | Oguz Seyfullah H | Method and apparatus for using random field models to improve picture and video compression and frame rate up conversion |
US9113147B2 (en) | 2005-09-27 | 2015-08-18 | Qualcomm Incorporated | Scalability techniques based on content information |
US8149909B1 (en) | 2005-10-13 | 2012-04-03 | Maxim Integrated Products, Inc. | Video encoding control using non-exclusive content categories |
US8081682B1 (en) | 2005-10-13 | 2011-12-20 | Maxim Integrated Products, Inc. | Video encoding mode decisions according to content categories |
US8126283B1 (en) | 2005-10-13 | 2012-02-28 | Maxim Integrated Products, Inc. | Video encoding statistics extraction using non-exclusive content categories |
US8948260B2 (en) | 2005-10-17 | 2015-02-03 | Qualcomm Incorporated | Adaptive GOP structure in video streaming |
US8654848B2 (en) | 2005-10-17 | 2014-02-18 | Qualcomm Incorporated | Method and apparatus for shot detection in video streaming |
WO2007073616A1 (en) * | 2005-12-28 | 2007-07-05 | Intel Corporation | A novel user sensitive information adaptive video transcoding framework |
US20070160134A1 (en) * | 2006-01-10 | 2007-07-12 | Segall Christopher A | Methods and Systems for Filter Characterization |
US8014445B2 (en) * | 2006-02-24 | 2011-09-06 | Sharp Laboratories Of America, Inc. | Methods and systems for high dynamic range video coding |
US8689253B2 (en) | 2006-03-03 | 2014-04-01 | Sharp Laboratories Of America, Inc. | Method and system for configuring media-playing sets |
US8194997B2 (en) * | 2006-03-24 | 2012-06-05 | Sharp Laboratories Of America, Inc. | Methods and systems for tone mapping messaging |
US9131164B2 (en) | 2006-04-04 | 2015-09-08 | Qualcomm Incorporated | Preprocessor method and apparatus |
US8130822B2 (en) * | 2006-07-10 | 2012-03-06 | Sharp Laboratories Of America, Inc. | Methods and systems for conditional transform-domain residual accumulation |
US7535383B2 (en) * | 2006-07-10 | 2009-05-19 | Sharp Laboratories Of America Inc. | Methods and systems for signaling multi-layer bitstream data |
US7885471B2 (en) * | 2006-07-10 | 2011-02-08 | Sharp Laboratories Of America, Inc. | Methods and systems for maintenance and use of coded block pattern information |
US8422548B2 (en) * | 2006-07-10 | 2013-04-16 | Sharp Laboratories Of America, Inc. | Methods and systems for transform selection and management |
US8532176B2 (en) * | 2006-07-10 | 2013-09-10 | Sharp Laboratories Of America, Inc. | Methods and systems for combining layers in a multi-layer bitstream |
US8059714B2 (en) * | 2006-07-10 | 2011-11-15 | Sharp Laboratories Of America, Inc. | Methods and systems for residual layer scaling |
US7840078B2 (en) * | 2006-07-10 | 2010-11-23 | Sharp Laboratories Of America, Inc. | Methods and systems for image processing control based on adjacent block characteristics |
US8379733B2 (en) * | 2006-09-26 | 2013-02-19 | Qualcomm Incorporated | Efficient video packetization methods for packet-switched video telephony applications |
WO2008084424A1 (en) * | 2007-01-08 | 2008-07-17 | Nokia Corporation | System and method for providing and using predetermined signaling of interoperability points for transcoded media streams |
US8233536B2 (en) | 2007-01-23 | 2012-07-31 | Sharp Laboratories Of America, Inc. | Methods and systems for multiplication-free inter-layer image prediction |
US8503524B2 (en) * | 2007-01-23 | 2013-08-06 | Sharp Laboratories Of America, Inc. | Methods and systems for inter-layer image prediction |
US7826673B2 (en) * | 2007-01-23 | 2010-11-02 | Sharp Laboratories Of America, Inc. | Methods and systems for inter-layer image prediction with color-conversion |
US8665942B2 (en) | 2007-01-23 | 2014-03-04 | Sharp Laboratories Of America, Inc. | Methods and systems for inter-layer image prediction signaling |
US7760949B2 (en) | 2007-02-08 | 2010-07-20 | Sharp Laboratories Of America, Inc. | Methods and systems for coding multiple dynamic range images |
WO2008114306A1 (ja) * | 2007-02-19 | 2008-09-25 | Sony Computer Entertainment Inc. | コンテンツ空間形成装置、その方法、コンピュータ、プログラムおよび記録媒体 |
US8767834B2 (en) | 2007-03-09 | 2014-07-01 | Sharp Laboratories Of America, Inc. | Methods and systems for scalable-to-non-scalable bit-stream rewriting |
US8175150B1 (en) * | 2007-05-18 | 2012-05-08 | Maxim Integrated Products, Inc. | Methods and/or apparatus for implementing rate distortion optimization in video compression |
US8893204B2 (en) | 2007-06-29 | 2014-11-18 | Microsoft Corporation | Dynamically adapting media streams |
KR101428671B1 (ko) | 2007-11-02 | 2014-09-17 | 에꼴 드 테크놀로지 수페리에르 | 스케일링 및 퀄리티-컨트롤 파라미터의 변경에 의한 변환이 가능한 이미지의 파일 사이즈 예측 시스템 및 방법 |
US8270739B2 (en) * | 2007-12-03 | 2012-09-18 | Ecole De Technologie Superieure | System and method for quality-aware selection of parameters in transcoding of digital images |
US8155184B2 (en) * | 2008-01-16 | 2012-04-10 | Sony Corporation | Video coding system using texture analysis and synthesis in a scalable coding framework |
US9357233B2 (en) * | 2008-02-26 | 2016-05-31 | Qualcomm Incorporated | Video decoder error handling |
US8300961B2 (en) * | 2008-12-12 | 2012-10-30 | Ecole De Technologie Superieure | Method and system for low complexity transcoding of images with near optimal quality |
EP2227023A1 (en) * | 2009-03-05 | 2010-09-08 | BRITISH TELECOMMUNICATIONS public limited company | Video streaming |
US9131007B2 (en) * | 2009-05-19 | 2015-09-08 | Vitrual World Computing, Inc. | System and method for dynamically transcoding data requests |
EP2469795B1 (en) * | 2010-02-25 | 2013-04-17 | Ntt Docomo, Inc. | Method and apparatus for rate shaping |
US9691430B2 (en) | 2010-04-01 | 2017-06-27 | Microsoft Technology Licensing, Llc | Opportunistic frame caching |
EP2577489A4 (en) * | 2010-06-02 | 2014-09-10 | Onmobile Global Ltd | METHOD AND APPARATUS FOR ADAPTING MULTIMEDIA CONTENT |
US20120275511A1 (en) * | 2011-04-29 | 2012-11-01 | Google Inc. | System and method for providing content aware video adaptation |
JP6247286B2 (ja) * | 2012-06-12 | 2017-12-13 | コーヒレント・ロジックス・インコーポレーテッド | ビデオコンテンツの符号化及び配信のための分散アーキテクチャ |
US20140040496A1 (en) * | 2012-08-06 | 2014-02-06 | General Instrument Corporation | On-demand http stream generation |
US20140044197A1 (en) * | 2012-08-10 | 2014-02-13 | Yiting Liao | Method and system for content-aware multimedia streaming |
US9516305B2 (en) * | 2012-09-10 | 2016-12-06 | Apple Inc. | Adaptive scaler switching |
US9357213B2 (en) * | 2012-12-12 | 2016-05-31 | Imagine Communications Corp. | High-density quality-adaptive multi-rate transcoder systems and methods |
US10609405B2 (en) | 2013-03-18 | 2020-03-31 | Ecole De Technologie Superieure | Optimal signal encoding based on experimental data |
US9338450B2 (en) | 2013-03-18 | 2016-05-10 | Ecole De Technologie Superieure | Method and apparatus for signal encoding producing encoded signals of high fidelity at minimal sizes |
US9661331B2 (en) | 2013-03-18 | 2017-05-23 | Vantrix Corporation | Method and apparatus for signal encoding realizing optimal fidelity |
US9247315B2 (en) * | 2013-11-07 | 2016-01-26 | Hulu, Inc. | Disabling of multiple bitrate algorithm for media programs while playing advertisements |
JP6274067B2 (ja) * | 2014-10-03 | 2018-02-07 | ソニー株式会社 | 情報処理装置および情報処理方法 |
US10264273B2 (en) * | 2014-10-31 | 2019-04-16 | Disney Enterprises, Inc. | Computed information for metadata extraction applied to transcoding |
US10454989B2 (en) * | 2016-02-19 | 2019-10-22 | Verizon Patent And Licensing Inc. | Application quality of experience evaluator for enhancing subjective quality of experience |
WO2023241690A1 (en) * | 2022-06-16 | 2023-12-21 | Douyin Vision (Beijing) Co., Ltd. | Variable-rate neural network based compression |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08237663A (ja) * | 1994-12-22 | 1996-09-13 | At & T Corp | マルチメディア通信システム用ビデオ伝送レート整合 |
JPH08237621A (ja) * | 1994-11-01 | 1996-09-13 | At & T Corp | マルチメディア通信システムのための符号化領域画像複合化 |
JPH10164143A (ja) * | 1996-11-28 | 1998-06-19 | Hitachi Ltd | ゲートウェイ装置およびそれを用いた通信システム |
JPH11252546A (ja) * | 1998-02-27 | 1999-09-17 | Hitachi Ltd | 伝送速度変換装置 |
JP2000165436A (ja) * | 1998-11-13 | 2000-06-16 | Tektronix Inc | マルチメディア・デ―タ・フロ―のネットワ―ク・トランスコ―ディング方法及び装置 |
JP2001069502A (ja) * | 1999-08-25 | 2001-03-16 | Toshiba Corp | 映像送信端末、及び映像受信端末 |
JP2001086460A (ja) * | 1999-09-14 | 2001-03-30 | Nec Corp | トランスコードの高速化方法及び装置 |
JP2001094994A (ja) * | 1999-09-20 | 2001-04-06 | Canon Inc | 画像処理装置及び方法 |
JP2001094980A (ja) * | 1999-09-21 | 2001-04-06 | Sharp Corp | データ処理装置 |
JP2001103425A (ja) * | 1999-09-29 | 2001-04-13 | Victor Co Of Japan Ltd | 符号化データ蓄積出力装置 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5969764A (en) * | 1997-02-14 | 1999-10-19 | Mitsubishi Electric Information Technology Center America, Inc. | Adaptive video coding method |
US6345279B1 (en) * | 1999-04-23 | 2002-02-05 | International Business Machines Corporation | Methods and apparatus for adapting multimedia content for client devices |
US6307964B1 (en) * | 1999-06-04 | 2001-10-23 | Mitsubishi Electric Research Laboratories, Inc. | Method for ordering image spaces to represent object shapes |
US6400846B1 (en) * | 1999-06-04 | 2002-06-04 | Mitsubishi Electric Research Laboratories, Inc. | Method for ordering image spaces to search for object surfaces |
US6542546B1 (en) * | 2000-02-02 | 2003-04-01 | Mitsubishi Electric Research Laboratories, Inc. | Adaptable compressed bitstream transcoder |
-
2000
- 2000-04-11 US US09/546,717 patent/US6490320B1/en not_active Expired - Lifetime
-
2001
- 2001-01-31 AU AU30548/01A patent/AU3054801A/en not_active Abandoned
- 2001-01-31 WO PCT/JP2001/000662 patent/WO2001078399A1/ja active Application Filing
- 2001-01-31 EP EP01902702A patent/EP1248466A4/en not_active Ceased
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08237621A (ja) * | 1994-11-01 | 1996-09-13 | At & T Corp | マルチメディア通信システムのための符号化領域画像複合化 |
JPH08237663A (ja) * | 1994-12-22 | 1996-09-13 | At & T Corp | マルチメディア通信システム用ビデオ伝送レート整合 |
JPH10164143A (ja) * | 1996-11-28 | 1998-06-19 | Hitachi Ltd | ゲートウェイ装置およびそれを用いた通信システム |
JPH11252546A (ja) * | 1998-02-27 | 1999-09-17 | Hitachi Ltd | 伝送速度変換装置 |
JP2000165436A (ja) * | 1998-11-13 | 2000-06-16 | Tektronix Inc | マルチメディア・デ―タ・フロ―のネットワ―ク・トランスコ―ディング方法及び装置 |
JP2001069502A (ja) * | 1999-08-25 | 2001-03-16 | Toshiba Corp | 映像送信端末、及び映像受信端末 |
JP2001086460A (ja) * | 1999-09-14 | 2001-03-30 | Nec Corp | トランスコードの高速化方法及び装置 |
JP2001094994A (ja) * | 1999-09-20 | 2001-04-06 | Canon Inc | 画像処理装置及び方法 |
JP2001094980A (ja) * | 1999-09-21 | 2001-04-06 | Sharp Corp | データ処理装置 |
JP2001103425A (ja) * | 1999-09-29 | 2001-04-13 | Victor Co Of Japan Ltd | 符号化データ蓄積出力装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP1248466A4 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006099565A (ja) * | 2004-09-30 | 2006-04-13 | Kddi Corp | コンテンツ識別装置 |
JP4553300B2 (ja) * | 2004-09-30 | 2010-09-29 | Kddi株式会社 | コンテンツ識別装置 |
JP2008521293A (ja) * | 2004-11-15 | 2008-06-19 | スミス マイクロ ソフトウエア,インコーポレイテッド | 既圧縮ファイルのロスレス圧縮システムおよび方法 |
JP2012054940A (ja) * | 2004-11-15 | 2012-03-15 | Smith Micro Software Inc | 既圧縮ファイルのロスレス圧縮システムおよび方法 |
JP2012054939A (ja) * | 2004-11-15 | 2012-03-15 | Smith Micro Software Inc | 既圧縮ファイルのロスレス圧縮システムおよび方法 |
JP2006246008A (ja) * | 2005-03-03 | 2006-09-14 | Ntt Docomo Inc | 映像トランスコードシステム、映像取得装置、トランスコーダ装置、及び、映像トランスコーディング方法 |
Also Published As
Publication number | Publication date |
---|---|
AU3054801A (en) | 2001-10-23 |
EP1248466A4 (en) | 2006-06-07 |
US6490320B1 (en) | 2002-12-03 |
EP1248466A1 (en) | 2002-10-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4650868B2 (ja) | 圧縮ビデオのトランスコーディング方法 | |
US6490320B1 (en) | Adaptable bitstream video delivery system | |
JP4601889B2 (ja) | 圧縮ビットストリームを変換するための装置及び方法 | |
US8218617B2 (en) | Method and system for optimal video transcoding based on utility function descriptors | |
US6542546B1 (en) | Adaptable compressed bitstream transcoder | |
JP4786114B2 (ja) | 映像をコード化するための方法及び装置 | |
US6925120B2 (en) | Transcoder for scalable multi-layer constant quality video bitstreams | |
Vetro et al. | Object-based transcoding for adaptable video content delivery | |
Eleftheriadis et al. | Meeting arbitrary QoS constraints using dynamic rate shaping of coded digital video | |
Kim et al. | Content-adaptive utility-based video adaptation | |
CA2491522C (en) | Efficient compression and transport of video over a network | |
Ortega | Variable bit rate video coding | |
Kim et al. | An optimal framework of video adaptation and its application to rate adaptation transcoding | |
Eleftheriadis et al. | Dynamic rate shaping of compressed digital video | |
Auli-Llinas et al. | Enhanced JPEG2000 quality scalability through block-wise layer truncation | |
KR100802180B1 (ko) | 엠펙-4 비디오 신호의 비트율을 동적인 통신 용량 변화에따라 제어하는 방법 | |
Suchomski et al. | RETAVIC: using meta-data for real-time video encoding in multimedia servers | |
CN100366077C (zh) | 基于实用函数描述的最优视频解码的方法和系统 | |
Van Der Schaar et al. | Real-time ubiquitous multimedia streaming using rate-distortion-complexity models | |
Vetro | Object-based encoding and transcoding | |
Cha et al. | Adaptive scheme for streaming MPEG-4 contents to various devices | |
Tao | Video adaptation for stored video delivery over resource-constrained networks | |
Ortega et al. | Mechanisms for adapting compressed multimedia to varying bandwidth conditions | |
Cucchiara et al. | Semantic transcoding of videos by using adaptive quantization | |
Auli-Llinas et al. | Research Article Enhanced JPEG2000 Quality Scalability through Block-Wise Layer Truncation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref country code: JP Ref document number: 2001 575723 Kind code of ref document: A Format of ref document f/p: F |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2001902702 Country of ref document: EP |
|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWP | Wipo information: published in national office |
Ref document number: 2001902702 Country of ref document: EP |