WO2001078398A1 - Transcoding of compressed video - Google Patents
Transcoding of compressed video Download PDFInfo
- Publication number
- WO2001078398A1 WO2001078398A1 PCT/JP2001/002354 JP0102354W WO0178398A1 WO 2001078398 A1 WO2001078398 A1 WO 2001078398A1 JP 0102354 W JP0102354 W JP 0102354W WO 0178398 A1 WO0178398 A1 WO 0178398A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- content
- transcoder
- video
- level
- transcoding
- Prior art date
Links
- 238000000034 method Methods 0.000 claims abstract description 88
- 238000006243 chemical reaction Methods 0.000 claims abstract description 18
- 230000033001 locomotion Effects 0.000 claims description 33
- 230000002123 temporal effect Effects 0.000 claims description 27
- 230000000694 effects Effects 0.000 claims description 17
- 230000006835 compression Effects 0.000 claims description 16
- 238000007906 compression Methods 0.000 claims description 16
- 230000001419 dependent effect Effects 0.000 claims description 6
- 230000006870 function Effects 0.000 description 30
- 238000010586 diagram Methods 0.000 description 12
- 238000004458 analytical method Methods 0.000 description 10
- 238000013139 quantization Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 8
- 230000008859 change Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 7
- 230000000007 visual effect Effects 0.000 description 7
- 230000003044 adaptive effect Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 230000009467 reduction Effects 0.000 description 6
- 101150055297 SET1 gene Proteins 0.000 description 5
- 230000009471 action Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 4
- 239000002131 composite material Substances 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 230000015556 catabolic process Effects 0.000 description 3
- 238000006731 degradation reaction Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000012732 spatial analysis Methods 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 241000287107 Passer Species 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001364 causal effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 238000011425 standardization method Methods 0.000 description 1
- 230000002459 sustained effect Effects 0.000 description 1
- 229920002994 synthetic fiber Polymers 0.000 description 1
- 238000012731 temporal analysis Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/001—Model-based coding, e.g. wire frame
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/1066—Session management
- H04L65/1101—Session protocols
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/75—Media network packet handling
- H04L65/752—Media network packet handling adapting media to network capabilities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/75—Media network packet handling
- H04L65/765—Media network packet handling intermediate
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/80—Responding to QoS
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/04—Protocols for data compression, e.g. ROHC
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/147—Data rate or code amount at the encoder output according to rate distortion criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/152—Data rate or code amount at the encoder output by measuring the fullness of the transmission buffer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/189—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
- H04N19/19—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding using optimisation based on Lagrange multipliers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/20—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/20—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
- H04N19/25—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding with scene description coding, e.g. binary format for scenes [BIFS] compression
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/20—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
- H04N19/29—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding involving scalability at the object level, e.g. video object layer [VOL]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
- H04N19/33—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/40—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video transcoding, i.e. partial or full decoding of a coded input stream followed by re-encoding of the decoded output stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/436—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/234318—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by decomposing into objects, e.g. MPEG-4 objects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/266—Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
- H04N21/2662—Controlling the complexity of the video stream, e.g. by scaling the resolution or bitrate of the video stream based on the client capabilities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/84—Generation or processing of descriptive data, e.g. content descriptors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
- H04N21/8549—Creating video summaries, e.g. movie trailer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/23439—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements for generating different versions
Definitions
- the present invention relates to an information distribution system, and particularly to a distribution system that applies information to a usable bit rate of a network.
- VOP can be encoded and decoded.
- Objects can be visual, audio, natural, synthetic, primitive, composite, or a combination thereof.
- Video objects are assembled to form composite objects or "scenes.”
- the new MPEG-4 standard is intended to enable multimedia applications where natural and synthetic materials are integrated and access is universal, such as interactive video.
- MPE G-4 enables content-based interactivity. For example, you may want to "cut and paste" moving shapes or objects from one video to another.
- the objects in the multimedia content are presumed to have been identified through some type of segmentation process.
- U.S. patent application Ser. No. 09 / 326,750 filed by Lin et al.
- a network can represent a wireless channel or the Internet. In any case, the network is limited in capacity and contention for that resource must be resolved when content needs to be transmitted.
- Bitstream conversion can be categorized as bitrate conversion, resolution conversion, and syntax conversion.
- Bit rate conversion includes bit rate scaling and conversion between a fixed bit rate (CBR) and a variable bit rate (VBR).
- CBR fixed bit rate
- VBR variable bit rate
- the basic function of bit rate scaling is to receive an input bit stream and produce a scaled output bit stream that meets the new load constraints of the receiver.
- a bitstream-scaler is a transcoder or filter that matches the bitstream on the transmitting side with the load on the receiving side.
- the transcoder includes a decoder 110 and an encoder 120.
- the compressed input bitstream 101 is completely decoded at the input rate Rin, and then encoded at the new output rate Rout102, so that the output bitstream 103 is output.
- the output rate is lower than the input rate.
- the decoding complexity of the decoded bitstream is so high that a complete decoding and And no full encoding is performed.
- FIG. 2 shows an example method.
- the video bitstream is only partially decoded. More specifically, the macroblock of the input bitstream 201 is subjected to variable length decoding (VLD) 210. Also, the input bit stream is delayed 220 and inverse quantized (IQ) 230 to provide discrete cosine transform (DCT) coefficients. Given the desired output bit rate, the partially decoded data is analyzed 240 and at 250 a new set of quantizers is applied to the DCT block. These requantized blocks are then variable length coded (VLC) 260 to form a new output bitstream 203 at a lower rate.
- VLC variable length coded
- Section 8 describes a simplified architecture for the same task. They use a motion compensation (MC) loop to manipulate the drift compensation in the frequency domain. An approximate matrix is derived for fast calculation of the MC block in the frequency domain. Lagrangian optimization is used to calculate the optimal quantizer scale for transcoding. Other research by S.orial et al., ⁇ Joint transcoding of multiple MPEG video bitstreams j, Proceedings of the International Symposium on Circuits and Systems, Can
- 199 9 proposes a method of jointly transcoding a plurality of MPEG-2 bit streams. 1 9 9 9 1 10
- the number of bits allocated to encode texture information is controlled by the quantization parameter (QP).
- QP quantization parameter
- the above paper is similar in that it reduces the texture bit rate by changing the QP based on information contained in the original bitstream.
- the information is usually extracted directly in the compressed domain and can include criteria related to the macroblock movement or the residual energy of the DCT block. This type of analysis is performed in a bit allocation analyzer.
- the bitstream can be pre-processed, but it is still important that the transcoder operates in real time. Therefore, large processing delays on the bit stream cannot be tolerated. For example, it is not feasible for a transcoder to extract information from a group of frames and then transcode the content based on pre-fetched information. It cannot work on live broadcasts or video conferencing. With better bit allocation, it is possible to obtain better transcoding results in terms of quality, but such realization for real-time abridgement is impractical.
- this concept of the space-time trade-off may be considered in the encoder.
- the group of pictures Group of Picture (GOP)
- the intraframe Intraframe
- Period and distance are fixed.
- macroblocks can be skipped by syntax. If all macroblocks are skipped in a frame, the frame is essentially skipped. At least one bit is used for each macroblock in the frame to indicate this skipping. This can be inefficient for some bit rates.
- the H.263 and MPEG-4 standards allow for frame skipping. Both standards support a syntax that allows the specification of criteria.
- frame skipping has been used primarily to satisfy buffer constraints. In other words, if the buffer occupancy is too high and there is a danger of overflow, the encoder will skip the frame and reduce the bit width of the bit to the buffer, sending the current bit when the buffer is appropriate.
- transcoders must find some alternative means of transmitting the information contained in the bitstream in order to accommodate the reduction in available bit rates.
- MPEG-7 formally the “Multimedia Content Description Interface”. It is. See MPEG-7 Context, Objects and Technical Roadmap, ISO / IEC N 2861, July 1999. In essence, this standard plans to incorporate a set of descriptors and a description scheme that can be used to describe various types of multimedia content. Descriptors and description schemes are associated with the content itself, allowing a specific user to quickly and efficiently search for important materials. This standard is intended to replace the preceding coding standard. Rather, it is possible to decompose multimedia content into various objects, and to assign each object to a unique set of descriptors. It is important to note that it is based on the EG-4 representation. This standard is independent of the format in which content is stored.
- MPEG-7 The main application of MPEG-7 is expected to be search and search abrication. See MPEG-7 Applications, ISO / IEC N2861, July 1999.
- the user can specify some attributes of a particular object. In this low-level representation, these attributes can include descriptors that describe the texture, motion, and shape of a particular object.
- Methods for representing and comparing shapes are described in U.S. Patent Application Serial No. 09 / 326,759, filed June 4, 1999 by Lin et al., ⁇ Method for Ordering Image Space to Represent Object Shapesj.
- a method of describing motion activity is described in U.S. Patent Application Serial No. 09/4, filed September 27, 1999 by Divakar an et al.
- These descriptors and description schemes provided by the MPEG-7 standard allow access to characteristics of video content that cannot be derived by a transcoder. For example, these properties may represent preemption information that the transcoder was deemed inaccessible. The only reason that the transcoder can access these properties is that the properties are derived earlier from the content, i.e. the content is pre-processed and stored on a data base with its associated metadata. That's because.
- syntactic information refers to the physical and logical signaling aspects of the content
- semantic information refers to the conceptual meaning of the content.
- syntactic elements can describe the color, shape, and movement of a particular object.
- semantic elements can refer to information that cannot be extracted from low-level descriptors, such as the time and place of an event in a video sequence or the name of a person.
- the method of transcoding compressed video divides the compressed video into hierarchical levels and extracts features from each of the hierarchical levels. Depending on the features extracted from the hierarchy level, one of the transcoder's many conversion modes is selected. The compressed video is then transcoded according to the selected conversion mode.
- FIG. 1 is a block diagram of a conventional transcoder.
- FIG. 2 is a block diagram of a prior art partial decoder / encoder
- FIG. 3 is a block diagram of an adaptive bitstream distribution system according to the present invention
- FIG. 4 is an adaptive transcoder and transcoder manager. Block diagram
- FIG. 5 is a graph of transcoding functions that can be used by the transcoder and manager of FIG. 4,
- Figure 6 shows a block diagram of object-based bitstream scaling
- Figure 7 shows the search space graph
- FIG. 8 is a block diagram showing details of an object-based transcoder according to the present invention.
- Figure 10 is a block diagram of a three-stage video content classifier
- Figure 11 is a block diagram of the descriptor method.
- FIG. 12 is a block diagram of the transcoding by the descriptor scheme of FIG. 11 (a),
- Fig. 13 is a block diagram of the transcoding by the descriptor scheme of Fig. 11 (b),
- FIG. 14 is a block diagram of a system for generating a content summary and a content variation using the content summary.
- FIG. 15 is a graph of a transcoding function based on the content summary and content variation of FIG. BEST MODE FOR CARRYING OUT THE INVENTION
- a video By converting, or “scaling,” the compressed input bitstream, a video can be created that has a compressed output bitstream at the target rate, or available bit rate (ABR) of the network.
- ABR available bit rate
- the delivery system will be described.
- transcoding based on the low-level features of the bitstream and the identifier scheme is described. It is an object of the present invention to perform transcoding while maximizing rate-quality (RQ) characteristics.
- the target rate of the output bitstream is lower than the rate of the input bitstream.
- the task of the transcoder according to the invention is to further compress the bitstream, usually due to constraints on network resources or the end user equipment's receiver load.
- This paper describes a content-based transcoding technique for video in areas including the lower object level and the lower area level.
- the system according to the invention can overcome the shortcomings of conventional transcoders, namely the limitations of rate conversion, especially in real-time applications.
- Traditional transcoding techniques can reduce the rate satisfactorily, but the quality of the content is usually severely degraded. Often, the information transmitted in the reduced bit rate bitstream is lost.
- bitstream “quality” is measured as the difference in bits between the input and output bitstreams. This paper describes a transcoding technique that can achieve the target rate while maintaining the quality of the bitstream content. Continuous conversion
- Traditional frame-based transcoding techniques can be defined as continuous transforms.
- the output is always the sequence of frames that best represents the input sequence, as conventional techniques attempt to continuously maintain the optimal trade-off between spatial and temporal quality. If a particular frame is skipped to meet rate constraints, the information contained in the skipped frame is not considered. If many frames are skipped, the received bitstream may be meaningless to the user or at best unsatisfactory. Quality distortion criteria
- the content of a bitstream with a small number of frames is summarized.
- fidelity fidelity
- Semantics and syntax do not refer to bits or pixels, but to meaningful concepts represented by the bit, for example, words, sounds, levels of humor and action in video, video objects, etc. I do.
- Fidelity can be defined in many ways. However, fidelity, as defined herein, is not related to traditional quantitative quality, eg, bit-by-bit differences. Rather, fidelity in the present invention refers to the information contained in the original image sequence, ie, the higher level meaning of the content or transmitted information rather than the raw bits, in one frame or multiple frames. Measure the degree to which the system transmits. Discrete summary transcoder
- Fidelity is a more subjective or semantic measure than traditional distortion measures.
- fidelity is a useful measure for measuring the performance of non-conventional transcoders.
- the output of the transcoder of the present invention in one embodiment, is a finite set of relatively high quality frames that attempt to sum up the entire sequence of bits, so that a transcoder of this type Discrete Summary Transcoder ”.
- One aspect of the bitstream, the motion, can be lost by selectively sampling rich frames.
- relying on discrete summary transcoding is only used if the rate distortion performance of the continuous transform transcoder is severely degraded or the target rate cannot be achieved.
- the conventional continuous transcoder loses fluidity because the information delivery rate is not stable and the frame rate is low enough to confuse the user.
- discrete summary transcoding over traditional continuous transform transcoding is that while continuous transform transcoders in tight rate constraints drop information rich frames, discrete summary transcoders drop information rich frames. Is to choose.
- a content network equipment (CND) manager is described.
- the purpose of the CND manager is to choose which transcoder to use. The selection is based on data obtained from content, network and user equipment characteristics. It is also possible to simulate these device characteristics in an "off-line" mode to generate bitstream variations for later distribution.
- CND content network equipment
- the adaptive bitstream delivery system 300 has four main components: a content classifier (content classifier) 310, a model predictor (model predictor). ) 320, Includes Content Network Device Manager 330 and Switchable Transcoder 340.
- a content classifier content classifier
- model predictor model predictor
- the purpose of the system 300 is to deliver a compressed bit stream 301 containing information content to a user device 360 through a network 350.
- the bitstream content can be visual, audio, text, natural, synthetic, primitive, data, composite, or a combination thereof.
- the network may be wireless, packet switched, or any other network with unpredictable operating characteristics.
- the user equipment may be a video receiver, a stationary or mobile radio receiver, or other similar user equipment with internal resource constraints that may make it difficult to receive a high quality bitstream. Good.
- the system maintains the semantic fidelity of the content even if the bitstream needs to be further compressed to meet network and user equipment characteristics.
- the input compressed bitstream is directed to a transcoder and content classifier.
- the transcoder can ultimately reduce the rate of the output compressed bitstream 309 directed to the user equipment over the network.
- the content classifier 310 extracts content information (C I) 302 from the input bitstream for the manager.
- the main function of the content classifier is to define the semantic features of the content characteristics, such as motion activity, video change information and textures, into a set of content network managers used to perform rate-quality trade-offs. It is to map (correspond to) the parameters overnight.
- the content classifier can also accept metadata information 303. Metadata may be at low and high levels. Examples of metadata include descriptors and description schemes specified in the new MPEG-7 standard.
- a model predictor (model predictor) 3 20 provides real-time feedback 32 1 about the dynamics of the network 350 and possible constraint characteristics of the user equipment 360. For example, predictors report network congestion and available bit rate (ABR). Further, the predictor receives and converts the feedback on the packet loss rate in the network. The predictor estimates the current network condition and the long-term network forecast 3 2 1.
- user equipment may have limited resources. For example, processing power, memory and display constraints. For example, if the user device is a mobile phone, the display may be constrained to textual information or low resolution images, or worse, audio only. These characteristics can also influence the choice of transcoding modality.
- manager 330 In addition to receiving metadata 3 Q 3, manager 330 also receives inputs from both content classifier 310 and model predictor 320. CND combines the output data from these two sources together so that the optimal transcoding strategy for switchable transcoder 340 is determined.
- Content Classifier 310 In addition to receiving metadata 3 Q 3, manager 330 also receives inputs from both content classifier 310 and model predictor 320. CND combines the output data from these two sources together so that the optimal transcoding strategy for switchable transcoder 340 is determined.
- classification can be achieved by extracting features from various levels of video. For example, program features, shot features, frame features, and features of sub-regions within a frame. The features themselves can be extracted using sophisticated transformations or simple local operators. Regardless of how the features are extracted, given a feature space of dimension N, each pattern can be represented as a point in this feature space.
- the content classifier 310 operates in three stages (I, II, III31 1 to 313). First, classify the bitstream content so that high-level semantics can be inferred, and second, adapt the classified content to network and user equipment characteristics.
- a number of low-level features are extracted from the compressed bitstream using conventional techniques, such as, for example, motion activity, texture or DCT coefficients. It is also possible to access the metadata 303, such as the MPEG-7 descriptor and description method. If a message is available, less work needs to be performed on the compressed bitstream.
- the end result of this first step is that a predetermined set of content features is mapped to a semantic class or a finite set of high-level metadata. Furthermore, within each semantic class, identification is performed based on the complexity of the encoding. That is, complexity depends on semantic class and network characteristics, and possible device characteristics.
- the above classifications are useful in terms of understanding content and ultimately discrete summary transcoding, but are also useful as an intermediate step result.
- the second stage of classification maps the semantic classes of the present invention to features of network and device characteristics. These features help determine the characteristics of the late quality functions that support the system in developing a transcoding strategy.
- a semantic class is characterized by explosive data due to object movement or video changes, this is how much network resources are used. It must be explained when estimating what should be provided.
- the third stage 3 13 is described below with respect to other embodiments.
- the content network equipment (CND) manager 330 and transcoder 340 are shown in more detail in FIG.
- the CND manager includes a discrete continuous control 431 and a content network device (CND) Integra 432.
- Transcoder 340 includes a plurality of transcoders 441-1443.
- the control 431 uses the switch 450 to determine how the input compressed bitstream 310 should be transcoded, for example, by a discrete summary transcoder 441, It is responsible for deciding whether to use a transcoding transcoder 442 or another transcoder 443.
- the network content manager dynamically adapts to the target rate of the transcoder and considers resources that constrain network and user equipment characteristics. These two very important items are determined by control 431.
- FIG. 5 graphically illustrates the rate-quality function associated with the rate 501 and quality 502 scales.
- One rate quality function of the continuous transform transcoder 4 4 2 is represented by a convex function 5 3.
- the rate-quality curve of the discrete summary transcoder 441 is represented by a linear function 504.
- Other transcoders may have different functions.
- intersections change dynamically as content and network characteristics change.
- a continuous transform transcoder usually assumes a classic distortion criterion such as PSNRR. Since such measures do not apply to the discrete summary transcoder according to the present invention, it makes more sense to map classical distortion measures to measures of "fidelity". Fidelity measures how well the content is semantically summarized, not a quantitative bit-by-bit difference. Given the same quality criterion, it prevents any inconsistencies in determining the optimal transcoding strategy.
- the CND integr. 4 3 2 contains the content information 3 0 2 from the content classifier 3 10 and the model predictor. It is part of the CND manager that combines these network equipment predictions 321 together. It is this part of the manager that produces the model expressed as the rate-quality function shown in Figure 5, or as another similar optimization function.
- CND Integral examines the mapping CI from the content classifier and the bitrate feedback 3 5 1 output from the switchable transcoder 3 4 0. . Using this information, Integre overnight selects an optimal modeling function 505 with some model parameters. Rate feedback 351 is used to refine the parameters more dynamically. INTEGRAY can decide to dynamically switch the report quality function if the selected model is found to be sub-optimal. Also, the integration can track several functions for different objects or different bitstreams and consider them separately or together. Impact of network forecasts
- the network prediction 3 21 can work on these characteristic functions by adjusting some parts of the optimal curve 5 05 in some way. For example, when higher bit rates are available, the most care must be taken. The network model allows a large number of bits to be spent at a particular moment, but the long-term results show that congestion can build up quickly, so the system can You can choose to keep running at a lower rate. In this way, problems associated with a sharp drop in the available bit rate are prevented. These types of characteristics can be taken into account by adjusting the curves of the transcoder according to the invention. Impact of equipment restrictions
- a mobile device is a stationary device It has different operating characteristics, and its performance may be degraded at a high available bit rate due to, for example, the spread of the dobbler. For this reason, a lower bit rate must be selected.
- Equipment may have limited processing, storage and display capabilities, which may affect transcoders. For example, it makes no sense to deliver video to audio-only devices.
- switchable transcoders can include other transcoders 443, such as converting spoken language to text or converting data to spoken language. The important point is that the switchable transcoder considers the semantics of the bitstream content and the destination device, and most prior art transcoders simply consider the available bit rate. It is. Frame-based transcoder
- switchable transcoders including continuous transform transcoders and discrete summary transcoders, have been described above.
- the optimal rate-quality curve is estimated.
- the scheme according to the present invention is flexible in that various techniques can be employed to reduce the rate depending on the ratio of the input rate to the output rate.
- the purpose of the present invention is to provide optimal overall quality for objects of varying complexity, so that the degradation of each object need not be the same. As described above, in this specification, objects are parsed instead of frames.
- the novelty of the system is that it can transcode multiple objects of varying complexity and size, but more importantly, it optimizes the overall quality of the video. That is, it is possible to make a space-time trade-off to make Focus on object-based bitstreams for added flexibility. There are also various means available for manipulating the quality of a particular object. Is described.
- bitstream PP for object-based transcoding
- bitstream PP for object-based transcoding
- conventional frame-based transcoders can significantly reduce the bit rate.
- bitstream “quality” is measured as the bit-by-bit difference between the input and output bitstreams.
- object-based transcoding according to the present invention is not constrained to manipulate the entire video. Transcode the bitstream into meaningful video objects. It is understood that the distribution of each object, along with the quality of each object, has a different overall impact on the quality.
- the object base method according to the present invention has a finer access level and reduces the space-time quality level of one object without greatly affecting the quality of the entire stream. It becomes possible. This is a completely different strategy than that used by traditional frame-based transcoders Introduces the concept of “perceptua 1 videoqua 1 ity”, in contrast to the traditional bitstream quality, which measures the difference in the bits of the entire video, regardless of the content. Perceptual video quality is related to the quality of the objects in the video that carry the intended information. For example, the video background can be completely lost without affecting the perceptual video quality of the more important foreground objects.
- Object-based transcoding framework is not constrained to manipulate the entire video. Transcode the bitstream into meaningful video objects. It is understood that the distribution of each object,
- FIG. 6 shows a high-level block diagram of an object-based transcoder 600 according to an alternative embodiment of the present invention.
- the transcoder 600 includes a demultiplexer 601, a multiplexer 602, and an output buffer 603.
- the transcoder 600 also includes one or more object-based transcoders 800 operated by a transcoding control unit (TCU) 6100 according to the control information 604.
- TCU transcoding control unit
- the unit 610 includes shape, texture, temporal and spatial analyzers 611 to 614.
- the input compressed bitstream 605 for the transcoder 600 includes one or more object-based elementary bitstreams.
- the object base bit stream may be serial or parallel.
- the total bit rate of the bit stream 605 is R in .
- the output compressed bit stream 606 from the transcoder 600 is R. Total bit rate R such that ut ⁇ R in . with ut .
- Multiplexer 601 provides one or more elementary bitstreams for each of object-based transcoders 800, and object-based transcoder 800 provides object data transcoders to TCU 610. 6 0 7 is provided.
- the transcoder 800 scales the elementary bit stream.
- the scaled bit stream is composed by multiplexer 602 before being passed to output buffer 603, and the output The message is sent from the receiver 603 to the receiver.
- the buffer 606 provides rate feedback information 608 to the TCU.
- the control information 604 passed to each of the transcoders 800 is provided by the TCU.
- the TCU is responsible for analyzing texture and shape data as well as temporal and spatial resolution. All of these new degrees of freedom make the object-based transcoding framework very unique and desirable for network applications.
- MPEG-4 exploits the spatial temporal temporal redundancy of video using motion compensation and DCT. Consequently, at the heart of the object-based transcoder 800 according to the present invention is the adaptation of the MPEG-2 transcoder described above. The main difference is that here the shape information is contained in the bitstream and, in connection with texture coding, tools are provided to predict DC and AC in blocks (Intra blocks) That is, it is.
- texture transcoding actually depends on shape data. In other words, the shape-evening cannot be ignored simply by parsing it. That is, the syntax of the compliant bit stream is determined by the decoded shape data.
- the object-based input and output bitstreams 601, 602 according to the present invention are completely different from conventional frame-based video programs.
- MPEG-2 does not allow for dynamic frame skiving.
- the GOP structure and reference frames are usually fixed. Texture model
- the variable R represents the texture bits consumed for the video object (V0)
- the variable Q represents the quantization parameter QP
- the variable (XX 2 ) is the first- and second-order model parameters
- the variable S indicates the encoding complexity such as the sum of absolute differences.
- the value of Q is determined by the current value of (XX 2 ).
- the actual number of bits consumed is known and the model parameters can be updated. This can be done by linear regression using the results of the previous n frames. Texture analysis
- the transcoding problem is different in that 0., ie the original QP set and the actual number of bits are already given. Also, instead of calculating the coding complexity S from the spatial domain, a new DCT-based complexity measure S must be defined. This measure is defined as:
- m C msM ⁇ 1
- B m (i) is the AC coefficient of the block
- m is the macroblock exponent in the set M of coded blocks
- M c is the number of blocks in the set
- p (i) is Frequency-dependent weight.
- the complexity measure indicates the energy of the AC coefficient, where the contribution of high frequency components is reduced by the weight function. This weighting function can be chosen to mimic that of an MPEG quantization matrix.
- the model parameters can be determined from the data transmitted in the bitstream and the data from the past video object, and can be updated continuously. In fact, twice for each transcoded VOP, once before transcoding using the bitstream data, and then again after encoding the texture with a new set of QPs The model can be updated. With this increase in the number of data points, the model parameters converge more robustly and faster.
- the main purpose of the texture analysis according to the present invention is to select that satisfies the rate constraint while minimizing distortion. It is important to note, however, that optimality depends on. Therefore, care must be taken how distortion is quantized. From this point, this distortion is called conditional distortion because it depends on _QJ.
- k denotes the V_ ⁇ P index in the set K of VOP
- shed k represents the visual significance or priority of object k.
- D (Q) is not explicitly specified, but is known to be proportional to Q.
- Visual significance can be a function of the relative size and complexity of the objects.
- the solution space is limited to the effective solution space shown in Figure 7.
- the X-axis indicates the video object, 701, and the y-axis indicates the QP.
- This figure also shows an effective search space 7110, a restricted search space 7111, an effective path 712, and an invalid path 713.
- the problem can be stated as follows.
- skipping frames In general, the purpose of skipping frames is to reduce the buffer occupancy level so that buffer overflow and eventually packet loss is prevented. Another reason for skipping frames is to allow for a trade-off between spatial and temporal quality. In this way, fewer frames are encoded, but they are encoded with higher quality. Thus, if there is no risk of buffer overflow, the decision to skip the frame is built into the QP selection process.
- This space-time trade-off is achieved by constraining the solution space by constructing from a proposed technique for QP selection, searching for an effective solution space for a set of QPs.
- the effective path is one in which all elements of are in the constrained region. If one of these elements goes outside of that area, the path is invalid because it does not maintain a specified level of spatial quality. Spatial quality is implied by conditional distortion.
- Different criteria can be used to determine the maximum QP for a particular object. For example, the maximum value can be a function of the object complexity or just a percentage of the input QP. If the maximum is based on complexity, the transcoder limits those objects with intrinsically high complexity to smaller QPs because their impact on spatial quality is the most rigorous .
- limiting the complexity based on the input QP means that the transcoder maintains the same QP variance as the originally encoded bitstream. Both methods are effective.
- the trade-off to determine the best way to limit the QP for each object may depend on the trade-off between spatial and temporal quality.
- one of the advantages of working with object-based data is that the spatial quality of some objects can be different from others. In this way, bits can be saved by skipping background objects, such as stationary walls.
- background objects such as stationary walls.
- reducing the temporal resolution of certain objects can introduce holes in the assembled video. This problem can be reduced by imposing the constraint that all VOPs have the same temporal resolution. Shape analysis
- shape data is encoded in units of blocks by so-called context-based arithmetic coding.
- MPEG-4 by Brady See standardization methods for the compression of arbitrarily shaped objects, IEEE Trans Circuits and Systems for Video Technoloy, December 1999.
- the context for each pixel is calculated based on a 9-bit or 10-bit causal template, depending on the mode selected. This context is used to access the probability look-up table, whereby the sequence of probabilities in the block drives the arithmetic encoder.
- DRC Dynamic Resolution Conversion
- FIG. 8 shows the components of an object-based transcoder 800 according to the present invention. Same as transcoding architecture in the prior art As such, the syntax of the coding standard somewhat describes the architecture of the transcoder 800. Here, the main features of the transcoder according to the present invention will be described in view of the MPEG-4 standard, and these features will be compared with conventional frame-based transcoding.
- the transcoder 800 includes a VOL / VOP parser 810, a shape scaler 820, an MB header parser 830, a motion parser 840, and a texture scaler 850.
- the transcoder also includes a bus 860 that transfers all parts of the basic bitstream 801 to the bitstream memory 870. From this global storage, the basic bitstream configuration unit 880 can form a reduced rate compressed bitstream according to the MPEG-4 standard. Output basic bitstream 809 is provided to the multiplexer of FIG.
- each object is associated with a video object layer (VOL) and a video object plane (VOP) header.
- VOL video object layer
- VOP video object plane
- the V ⁇ P header contains the quantization parameters (QP) used to encode the object.
- QP quantization parameters
- the QP for each object is later used for modeling and analyzing texture information. All other bits are stored in bitstream memory 870 until it is time to make up output bitstream 606 of FIG.
- the VOP layer indicates whether the VOP contains shape information (binary) or not (rectangle) 812. If rectangle V ⁇ P, the object is simply a rectangular frame and there is no need to parse the shape bits. If it is a binary shape, it is necessary to determine whether the macroblock is transparent or not. A transparent block is within the bounding box of the object, but has no associated motion or texture information because it is outside the object boundaries.
- the shape scaler 820 has three sub-components: a shape decoder. / Parser 82 1, shape downsampler 822 and shape encoder 823. If the shape information of the bitstream is not scaled, the shape decoder / passer is simply a shape verser. This is indicated by the control information 604 received from the R-D shape analysis 611 of the transcoder control unit 6110. In this case, the shape downsampler 822 and the shape encoder 823 are disabled. When the shape information is scaled, the shape decoder / parser 821 must first decode the shape information into its pixel area representation.
- the blocks can be downsampled by a factor of 2 or 4 using shape downsampler 822 and then re-encoded using shape encoder 823.
- the rate of conversion is determined by the RD shape analysis 6 11. Regardless of whether the shape bits are simply parsed or scaled, the output of shape scaler 820 is transferred to bit stream memory 870 via bit stream bus 860.
- CBP coded block pattern
- the spatial analysis 61 3 determines which bits are to be composed and sent out and which bits are to be dropped. Shown in unit. In this way, the portion of the bitstream that can be written to this memory is simply overwritten by the next video object's data stream.
- transcoder 800 represents a component for one object.
- multiple transcoders can scale multiple objects, as shown in Figure 6. This can be the most effective method for software implementation that considers multi-thread execution.
- the challenge in software implementation is to allocate an appropriate amount of CPU processing for each object considered.
- the case is very different for hardware implementation.
- Knowware designers usually prefer to have one piece of logic that operates on a particular functionality. For example, the hardware design allows multiple objects to be parsed at a given moment, rather than implementing M motion parser for the maximum number of M objects that can be received , Including one motion pulser operating at a certain speed.
- the video can be partitioned into our company-to-file hierarchy 900.
- the video program or session 910 is considered the highest level of the hierarchy 900. This level can represent a 30 minute news program from the broadcast network or a full day of programming.
- the program 910 includes a sequence of Shots Shot-1,..., Shot-n91 1 to 919.
- the next level 920 is divided into shots.
- a “short” can be a group-of-frames (G0F) or a group-of-videos object plane (GOV) 921-929. This level represents a smaller segment of the video that starts when the camera is turned and continues until the camera is turned off. To avoid any confusion, we will simply call this level Shot Level 920.
- a shot consists of the most basic unit: frame 930 for G ⁇ F and video object plane (V ⁇ P) 931 for GOV. You can also consider other levels below this, either frames or
- V ⁇ P The lower region of V ⁇ P is 941 to 942.
- a feature extraction process 91-1 to 904 is applied to the video data at each of the levels.
- the data of each level is arranged in a different way, and the appropriate features change for each level, so different feature extraction techniques are applied to each level. That is, program-level features are extracted differently than frame features.
- these features represent "hints” or “queues” 905-908 that can be applied to transcoding systems.
- hints can be semantic or syntactic, and can represent either high-level or low-level metadata.
- the method can be applied to transcoding at any given level.
- higher levels of meta-data such as shot levels, are used to consider the classification, bit allocation and rate-quality for that particular shot and among other shots.
- the metadata is of limited use to the transcoder, but is very useful to the CND manager 330 of FIG. 3, which determines the transcoding strategy between all output content.
- low-level metadata such as at the object level, is difficult to classify and manage output content at such low levels, so transcoders 3 support dynamic bit allocation. It can be more useful for 40 itself.
- the main function of the content classifier 310 is to use features of the content characteristics, such as activity, video change information and texture, to provide a rate-quality trade-off. This is to map to a set of parameters. To support this mapping function, the content classifier also accepts metadata information 303.
- An example of a message-and-description includes the descriptor and description scheme (DS) specified by the new MPEG-7 standard.
- this low-level metade map is mapped to a rate-quality characteristic that depends only on the content. This is shown in FIG.
- the rate-quality characteristics affect the rate-quality function shown in Figure 5 sequentially.
- the content classifier 310 receives the low-level metadata 303.
- Stage I 311 extracts high-level media or class 1001.
- Stage II 312 uses the prediction 3 21 to determine content, network and device dependent rate-quality (R-Q) characteristics.
- Stage II 13 extracts the R-Q characteristic 1003 that depends only on the low-level metadata.
- the news program includes the general moderator and various other shots related to the news as a whole.
- Fig. 11 (&) to (1) Fig. 12 and Fig. 13 are three shots 1201 to 1203, that is, a general moderator's shot and a lip-on-shot on the scene.
- a news program 1200 including police tracking shorts.
- all news program shots are categorized into only three categories, with the understanding that the number and type of categories will differ when applied.
- Class 1 101 represents a shot where the temporal quality of the content is less important than the spatial quality.
- the second class 1102 represents shots where the spatial quality of the content is more important, and the third class 1103 represents shots where the spatial and temporal quality of the shot are equally important.
- This set of classes is called SET — 1 1 1 10.
- Such classes are clearly rate and quality characteristics.
- the purpose of the content classifier phase III 313 is to process low-level features and map these features to the most appropriate of these classes. It should be noted that the importance of spatial and temporal quality can also be evaluated on a scale of 1 to 10 or a real interval of 0.0 to 1.0.
- Figure 11 (b) To further illustrate these rate-quality classes, consider another set of three distinct classes, as shown in Figure 11 (b).
- the first class 1 121 indicates that the shots are very simple to compress, ie a large compression ratio can easily be achieved for a given distortion.
- the third class, 1123 shows the exact opposite, that is, the content of the shot is very difficult to compress, either due to large / complex motion or spatially active scenes.
- the second class 1122 is somewhere between the first and third classes.
- the set of this class is called SET-2-1120.
- these classes 1 120 also have a content classification It shows the possible effects on the rate-quality decision made by the switch, and how the switchable transcoder 340 can operate.
- compression difficulties can be categorized by numerical evaluation criteria.
- other sets of classes can be defined for other types of video programs. So far, we have described two examples of rate-quality classes, SET-1 and SET-2. Content is categorized into these classes according to features extracted from the low-level metadata 303. The following describes how these classes can be derived from movement activities.
- FIG. 12 shows a transcoding strategy according to the SET-1 classification.
- the general moderator shot 2012 is transcoded using a discrete summary transcoder. See block 4 4 1 in FIG. This transcoder reduces the entire shot 1 201 into one frame 1 2 1 1, a still image of the general moderator. The duration of the shot, the full audio portion of the speaking host, is provided.
- 1 2 0 2 on the scene shot is continuous with 1 2 2 1 full audio at 5 frames / sec so that the viewer does not lose the meaning of the background movement. Is converted.
- the police tracking shot 1203 is converted to 123 1 continuously at 30 frames / sec 1 230.
- the classification results could be interpreted differently, as shown in Figure 13.
- the general moderator shot 1 201 With no motion, the segment can be compressed very easily, so it is the first class of SET-2 1 1 2 Classified as 1.
- This shot is continuously converted at a high compression rate of 1 240 at 30 frames / sec 1 240.
- police Pursuit Shortship 1203 involves high movement and is more difficult to compress. Therefore, it is classified into the third class 1 123 of SET-2. It is continuously converted at 7.5 frames / sec 1 260 1 260. Again, depending on the characteristics of the shot 1202, including the lipo scene on the scene, it can fall into any one of three classes.
- the second class 1 1 2 2 is assigned to the second class 1 1 2 2 and is converted continuously 1 2 5 1 at 15 frames / s 1 2 5
- the hints can be fixed or variable rate bits
- any stream (CBR or VBR) can be created.
- SET 2 compression difficulties
- a CBR bitstream can be generated if the difficulty of compressing a sequence of frames is imposed at a low frame rate. If more bits are allocated, a VBR bit stream can be generated.
- SET 2 compression difficulties
- VBR bit stream can be generated.
- the rate-quality matting implied by each class can be widely varied by a particular application.
- the spatial and temporal quality may be affected by the difficulty of compressing the video or the level of priority assigned to the spatial and temporal quality. Both classifications were derived from low-level features.
- classifications suggest ways in which the content can be manipulated. In practice, classification can greatly reduce the number of scenarios to consider. For example, if the CND manager has to consider the reputation quality trade-off for multiple bitstreams (frames or objects) at a given instant, the CND manager should consider the continuous transform and the discrete summary transcoding. Optimal ways of distributing transcoding responsibilities between the two can be considered. It is also possible to consider the hybrid method instead of choosing one method for all segments considered. The difficulty of compression due to the priority of the program or its low-level features is an example of a useful parameter that can be used to make such a determination.
- Figures 12 and 13 show that the classification in SET—1111 and 0—SET affects the strategy determined by the CND manager and the way the transcoder manipulates the original data. Is given. What is particularly important in Fig. 12 is that the hybrid transcoding method is adopted.
- Low-level features are used to effectively cluster and classify video content into meaningful parameters that support CND managers and transcoders can do.
- the C N D classifier 310 and the C N D manager 330 appear to contradict TCU 610 in Figure 6, but this is not the case.
- the classifier and CND manager will try to pre-select the optimal strategy for the transcoder 340. Given this strategy and instructions from the manager, the transcoder is responsible for manipulating the content in the best possible way. Eventually, transcoders may not meet their needs due to mispredictions or the strategy chosen by the CND manager, and require a mechanism to address such situations, such as spatial analysis. Therefore, in TCU, the main menu can be used again.
- the purpose of the metadata for the TCU is different from that for the classifier and CND manager. Impact of metadata on transcoding
- the first method uses bit allocation to derive a strategy and, ultimately, a decision on how to use the functionality provided by the discrete summary and continuous transform transcoders 441-142.
- the CND Manager is the one in the 330.
- Figure 5 we use Figure 5 to make decisions.
- a rate-quality function is used.
- the second method is in the transcoder 340 itself.
- Metadata is used for estimation, but rather than making a strategic decision, it makes a real-time decision on the coding parameters that can be used to meet the bitrate objectives. To do so, metadata is used.
- the coding parameters are selected so that the transcoder achieves the optimal rate-quality function of FIG.
- low-level and high-level metadata provide hints for performing discrete summaries and continuous transform transcoding. These hints are useful for both CND managers and transcoders.
- Semantic information can be associated with content either automatically or by manual annotation.
- the CND manager 330 In applications where multiple users request different shots at the same time, the CND manager 330 must determine how much rate is assigned to each shot. For a discrete summary transcoder 441, this rate can correspond to the number of frames transmitted, and for a continuous transform transcoder 44, the rate corresponds to an acceptable target frame rate. be able to. If the level of action indicates a level of temporal activity, bits can be assigned for each frame sequence according to the description of the content. For high-action shots, the CND manager can improve by using a continuous transform transcoder that frame rates below a predetermined level are unacceptable and by summarizing the content with a discrete summary transcoder. Determine that it is possible to deliver quality shots.
- the process of generating high-level metadata from low-level metadata can be defined as medium-time encoding.
- Such an encoding process can be considered in stage I 311 in the content classifier of the transcoding system according to the invention.
- this high-level generation process can be used in standalone systems.
- An example of such a stand-alone system is a system that instantiates the description scheme specified by the MPE G-7 standard. Such a system can be referred to as an MPEG-7 high-level encoder.
- Additional descriptor schemes include various descriptive schemes specified in the MPEG-7 working draft, such as Summary DS, Variation DS, Hierarchical Summary. Consider (Hierarchical Summation) DS, Highlight Segment (Highlight Segment) DS, Cluster (Cluster) DS and Classifier (Classifier) DS. See ISO / IECJTCN 313, “MP EG—7 Multimedia Descriptor Schemes WD”, December 1999.
- a summary DS is used to specify the visual abstraction of the content that is initially used for content browsing and navigation
- a variation DS is used to specify the variation of the content.
- variations can be generated in a number of ways, reflecting corrections and manipulations of the original data.
- description schemes such as Summary DS and Variation DS do not describe how to summarize or generate content variations.
- the first major problem is that these variations must be generated prior to any request for the original video. As a result, real-time transmission is not an option because the delay associated with generating multiple variations of content is too long.
- the second major problem is that network characteristics can change over time. Thus, selecting a particular pre-transcoded variation at the moment that is the source of the current network state cannot be sustained over the entire duration.
- encoders differ in that they are not connected to a network to transmit and receive in real time during transcoding. Instead, the encoder is connected to a database where the videos are stored. The encoder generates various versions of the video off-line for later real-time distribution.
- the adaptive bitstream video distribution system 130 has five main components: a content classifier 1310, a network equipment (ND) generator. 1320, CND Manager 1330, Switchable Transcoder 1340, and DS Installation 1350.
- System 1 3 0 0 is database 1 3
- the system 1303 has inputs and outputs connected to 60. Also, the system 1303 has a selector 130 connected to a network and a database 1306.
- the purpose of the distribution system 1303 is to generate a variation and / or summary bitstream 13008 from the original compressed bitstream (video-in) 1301.
- the content of the bitstream may be visual, audio, text, natural, synthetic, primitive, data, composite, or a combination thereof.
- the video distribution system 130 is similar to the adaptive transcoder system 300.
- the main difference is that it is not connected to the user equipment 360 via the network 350 of FIG. 3, and transcoding is not performed in real time.
- ND Genera 1350 will replace equipment and networks.
- the generator is responsible for simulating network and equipment (ND) constraints that exist in real-time operation.
- ND network and equipment
- the ND generator has 64 kbps, 128 kbps and 512 kbps You can simulate either a CBR channel or a VBR channel.
- the generator can simulate channels with reduced available bandwidth. This loss can be linear, rectangular, or very sharp. Many other typical situations can be considered as well, and some can be related to user equipment limitations, such as limited display capabilities.
- the Nomination Bitstream can be both CBR and VBR.
- the purpose of the ND generator 1320 is to simulate various network device states and to automatically restore the original content according to these states.
- the variation and / or summary 1308 generated by the system 1300 is the optimal rate-quality function.
- the selector 1370 of the system 1300 receives a request for a particular video program.
- the selector provides information about the available variations and the associated DS stored in database 1360.
- the CND manager of the transcoder 300 utilizes this pre-transcoded data.
- High-level metadata allows transcoders to associate current real-time network and equipment constraints with specific variations of the requested video. If a suitable match is found, the CND manager requests that a particular variation be sent over network 350 by the selector. If a proper match is found, transcoder 340 can operate in bypass mode. If an approximate match is found, the transcoder 340 can operate more efficiently.
- bitstreams 1308 This is just one practical example application. It is also possible to further manipulate and modify already operated bitstreams 1308 to increase the match with current network and equipment constraints. This is a large number of pre-transcoded bits that cover a very wide range of conditions, versus generating a small number of pre-transcoded bit streams that cover some of the most common conditions. Creating a stream. In general, transcoding with delivery system 1303 under relaxed time constraints will result in better quality Because of the resulting video, different levels of quality can be expected from each method.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Business, Economics & Management (AREA)
- Business, Economics & Management (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2001575722A JP4650868B2 (ja) | 2000-04-11 | 2001-03-23 | 圧縮ビデオのトランスコーディング方法 |
EP01915736A EP1195992A1 (en) | 2000-04-11 | 2001-03-23 | Transcoding of compressed video |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/547,159 | 2000-04-11 | ||
US09/547,159 US6574279B1 (en) | 2000-02-02 | 2000-04-11 | Video transcoding using syntactic and semantic clues |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2001078398A1 true WO2001078398A1 (en) | 2001-10-18 |
Family
ID=24183560
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2001/002354 WO2001078398A1 (en) | 2000-04-11 | 2001-03-23 | Transcoding of compressed video |
Country Status (5)
Country | Link |
---|---|
US (1) | US6574279B1 (ja) |
EP (1) | EP1195992A1 (ja) |
JP (1) | JP4650868B2 (ja) |
CN (1) | CN1366775A (ja) |
WO (1) | WO2001078398A1 (ja) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1488628A2 (en) * | 2001-12-28 | 2004-12-22 | Nokia Corporation | Method and apparatus for selecting macroblock quantization in a video encoder |
JPWO2004093457A1 (ja) * | 2003-04-10 | 2006-07-13 | 日本電気株式会社 | 動画像圧縮符号化方式変換装置及び動画像通信システム |
JP2008533841A (ja) * | 2005-03-10 | 2008-08-21 | クゥアルコム・インコーポレイテッド | マルチメディア処理のためのコンテンツ分類 |
US8654848B2 (en) | 2005-10-17 | 2014-02-18 | Qualcomm Incorporated | Method and apparatus for shot detection in video streaming |
US8780957B2 (en) | 2005-01-14 | 2014-07-15 | Qualcomm Incorporated | Optimal weights for MMSE space-time equalizer of multicode CDMA system |
US8879856B2 (en) | 2005-09-27 | 2014-11-04 | Qualcomm Incorporated | Content driven transcoder that orchestrates multimedia transcoding using content information |
US8948260B2 (en) | 2005-10-17 | 2015-02-03 | Qualcomm Incorporated | Adaptive GOP structure in video streaming |
US9131164B2 (en) | 2006-04-04 | 2015-09-08 | Qualcomm Incorporated | Preprocessor method and apparatus |
Families Citing this family (86)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7484172B2 (en) * | 1997-05-23 | 2009-01-27 | Walker Digital, Llc | System and method for providing a customized index with hyper-footnotes |
US20010047517A1 (en) * | 2000-02-10 | 2001-11-29 | Charilaos Christopoulos | Method and apparatus for intelligent transcoding of multimedia data |
JP2001339460A (ja) * | 2000-05-26 | 2001-12-07 | Matsushita Electric Ind Co Ltd | デジタル送受信装置 |
FR2813484A1 (fr) * | 2000-08-31 | 2002-03-01 | Koninkl Philips Electronics Nv | Traitement de donnees en une serie temporelle d'etapes |
JP2002152759A (ja) * | 2000-11-10 | 2002-05-24 | Sony Corp | 画像情報変換装置および画像情報変換方法 |
KR100433516B1 (ko) * | 2000-12-08 | 2004-05-31 | 삼성전자주식회사 | 트랜스코딩 방법 |
US6925501B2 (en) * | 2001-04-17 | 2005-08-02 | General Instrument Corporation | Multi-rate transcoder for digital streams |
US7734997B2 (en) * | 2001-05-29 | 2010-06-08 | Sony Corporation | Transport hint table for synchronizing delivery time between multimedia content and multimedia content descriptions |
CN1286326C (zh) * | 2001-05-31 | 2006-11-22 | 佳能株式会社 | 信息存储设备及其方法 |
JP2003087785A (ja) * | 2001-06-29 | 2003-03-20 | Toshiba Corp | 動画像符号化データの形式変換方法及び装置 |
JP3866538B2 (ja) * | 2001-06-29 | 2007-01-10 | 株式会社東芝 | 動画像符号化方法及び装置 |
US20030105880A1 (en) * | 2001-12-04 | 2003-06-05 | Koninklijke Philips Electronics N.V. | Distributed processing, storage, and transmision of multimedia information |
DE10218812A1 (de) * | 2002-04-26 | 2003-11-20 | Siemens Ag | Generische Datenstrombeschreibung |
FR2842983B1 (fr) | 2002-07-24 | 2004-10-15 | Canon Kk | Transcodage de donnees |
US7292574B2 (en) * | 2002-09-30 | 2007-11-06 | Intel Corporation | Automated method for mapping constant bit-rate network traffic onto a non-constant bit-rate network |
US7042943B2 (en) | 2002-11-08 | 2006-05-09 | Apple Computer, Inc. | Method and apparatus for control of rate-distortion tradeoff by mode selection in video encoders |
US7194035B2 (en) * | 2003-01-08 | 2007-03-20 | Apple Computer, Inc. | Method and apparatus for improved coding mode selection |
US7606305B1 (en) * | 2003-02-24 | 2009-10-20 | Vixs Systems, Inc. | Method and system for transcoding video data |
US7327784B2 (en) * | 2003-02-24 | 2008-02-05 | Vixs Systems, Inc. | Method and system for transcoding video data |
US9612965B2 (en) * | 2003-06-24 | 2017-04-04 | Hewlett-Packard Development Company, L.P. | Method and system for servicing streaming media |
KR20060127022A (ko) * | 2004-01-05 | 2006-12-11 | 코닌클리케 필립스 일렉트로닉스 엔.브이. | 코딩 방법 및 대응하는 코딩된 신호 |
US20050175099A1 (en) * | 2004-02-06 | 2005-08-11 | Nokia Corporation | Transcoder and associated system, method and computer program product for low-complexity reduced resolution transcoding |
KR20050090841A (ko) * | 2004-03-10 | 2005-09-14 | 엘지전자 주식회사 | 비트율 제어 방법 |
KR101196429B1 (ko) * | 2004-03-12 | 2012-11-01 | 삼성전자주식회사 | 동영상 트랜스코딩 방법 및 그 장치, 이에 사용되는움직임 벡터 보간방법 |
US7983835B2 (en) | 2004-11-03 | 2011-07-19 | Lagassey Paul J | Modular intelligent transportation system |
US7818444B2 (en) | 2004-04-30 | 2010-10-19 | Move Networks, Inc. | Apparatus, system, and method for multi-bitrate content streaming |
WO2006000887A1 (en) * | 2004-06-23 | 2006-01-05 | Nokia Corporation | Methods, systems and computer program products for expressing classes of adaptation and classes of content in media transcoding |
US8406293B2 (en) | 2004-06-27 | 2013-03-26 | Apple Inc. | Multi-pass video encoding based on different quantization parameters |
US8005139B2 (en) | 2004-06-27 | 2011-08-23 | Apple Inc. | Encoding with visual masking |
FR2879387B1 (fr) * | 2004-12-15 | 2007-04-27 | Tdf Sa | Procede de transmission a debit binaire variable a travers un canal de transmission. |
US7974193B2 (en) | 2005-04-08 | 2011-07-05 | Qualcomm Incorporated | Methods and systems for resizing multimedia content based on quality and rate information |
US8208536B2 (en) * | 2005-04-28 | 2012-06-26 | Apple Inc. | Method and apparatus for encoding using single pass rate controller |
JP4839035B2 (ja) * | 2005-07-22 | 2011-12-14 | オリンパス株式会社 | 内視鏡用処置具および内視鏡システム |
JP4921476B2 (ja) | 2005-09-28 | 2012-04-25 | テレフオンアクチーボラゲット エル エム エリクソン(パブル) | メディアコンテンツの管理 |
US20070160134A1 (en) * | 2006-01-10 | 2007-07-12 | Segall Christopher A | Methods and Systems for Filter Characterization |
US8582905B2 (en) * | 2006-01-31 | 2013-11-12 | Qualcomm Incorporated | Methods and systems for rate control within an encoding device |
US20070201388A1 (en) * | 2006-01-31 | 2007-08-30 | Qualcomm Incorporated | Methods and systems for resizing multimedia content based on quality and rate information |
US8014445B2 (en) * | 2006-02-24 | 2011-09-06 | Sharp Laboratories Of America, Inc. | Methods and systems for high dynamic range video coding |
US8194997B2 (en) * | 2006-03-24 | 2012-06-05 | Sharp Laboratories Of America, Inc. | Methods and systems for tone mapping messaging |
US8532176B2 (en) * | 2006-07-10 | 2013-09-10 | Sharp Laboratories Of America, Inc. | Methods and systems for combining layers in a multi-layer bitstream |
US7885471B2 (en) * | 2006-07-10 | 2011-02-08 | Sharp Laboratories Of America, Inc. | Methods and systems for maintenance and use of coded block pattern information |
US8059714B2 (en) * | 2006-07-10 | 2011-11-15 | Sharp Laboratories Of America, Inc. | Methods and systems for residual layer scaling |
US8422548B2 (en) * | 2006-07-10 | 2013-04-16 | Sharp Laboratories Of America, Inc. | Methods and systems for transform selection and management |
US7840078B2 (en) * | 2006-07-10 | 2010-11-23 | Sharp Laboratories Of America, Inc. | Methods and systems for image processing control based on adjacent block characteristics |
US7535383B2 (en) * | 2006-07-10 | 2009-05-19 | Sharp Laboratories Of America Inc. | Methods and systems for signaling multi-layer bitstream data |
US8130822B2 (en) * | 2006-07-10 | 2012-03-06 | Sharp Laboratories Of America, Inc. | Methods and systems for conditional transform-domain residual accumulation |
US8761248B2 (en) * | 2006-11-28 | 2014-06-24 | Motorola Mobility Llc | Method and system for intelligent video adaptation |
US8804829B2 (en) * | 2006-12-20 | 2014-08-12 | Microsoft Corporation | Offline motion description for video generation |
US7826673B2 (en) * | 2007-01-23 | 2010-11-02 | Sharp Laboratories Of America, Inc. | Methods and systems for inter-layer image prediction with color-conversion |
US8665942B2 (en) | 2007-01-23 | 2014-03-04 | Sharp Laboratories Of America, Inc. | Methods and systems for inter-layer image prediction signaling |
US8503524B2 (en) * | 2007-01-23 | 2013-08-06 | Sharp Laboratories Of America, Inc. | Methods and systems for inter-layer image prediction |
US8233536B2 (en) | 2007-01-23 | 2012-07-31 | Sharp Laboratories Of America, Inc. | Methods and systems for multiplication-free inter-layer image prediction |
US8411734B2 (en) | 2007-02-06 | 2013-04-02 | Microsoft Corporation | Scalable multi-thread video decoding |
US7760949B2 (en) | 2007-02-08 | 2010-07-20 | Sharp Laboratories Of America, Inc. | Methods and systems for coding multiple dynamic range images |
WO2008114306A1 (ja) * | 2007-02-19 | 2008-09-25 | Sony Computer Entertainment Inc. | コンテンツ空間形成装置、その方法、コンピュータ、プログラムおよび記録媒体 |
US8767834B2 (en) | 2007-03-09 | 2014-07-01 | Sharp Laboratories Of America, Inc. | Methods and systems for scalable-to-non-scalable bit-stream rewriting |
US8265144B2 (en) | 2007-06-30 | 2012-09-11 | Microsoft Corporation | Innovations in video decoder implementations |
US9648325B2 (en) | 2007-06-30 | 2017-05-09 | Microsoft Technology Licensing, Llc | Video decoding implementations for a graphics processing unit |
US8290036B2 (en) * | 2008-06-11 | 2012-10-16 | Optibase Technologies Ltd. | Method, apparatus and system for concurrent processing of multiple video streams |
US8311115B2 (en) | 2009-01-29 | 2012-11-13 | Microsoft Corporation | Video encoding using previously calculated motion information |
US8396114B2 (en) | 2009-01-29 | 2013-03-12 | Microsoft Corporation | Multiple bit rate video encoding using variable bit rate and dynamic resolution for adaptive video streaming |
US8270473B2 (en) | 2009-06-12 | 2012-09-18 | Microsoft Corporation | Motion based dynamic resolution multiple bit rate video encoding |
FR2954035B1 (fr) * | 2009-12-11 | 2012-01-20 | Thales Sa | Procede d'estimation de la qualite video a une resolution quelconque |
US20130039303A1 (en) * | 2010-02-11 | 2013-02-14 | Sony Corporation | Mapping apparatus and method for transmission of data in a multi-carrier broadcast system |
US8705616B2 (en) | 2010-06-11 | 2014-04-22 | Microsoft Corporation | Parallel multiple bitrate video encoding to reduce latency and dependences between groups of pictures |
US8687700B1 (en) * | 2010-06-18 | 2014-04-01 | Ambarella, Inc. | Method and/or apparatus for object detection utilizing cached and compressed classifier information |
US8712930B1 (en) | 2010-08-09 | 2014-04-29 | Google Inc. | Encoding digital content based on models for predicting similarity between exemplars |
US8885729B2 (en) | 2010-12-13 | 2014-11-11 | Microsoft Corporation | Low-latency video decoding |
US9706214B2 (en) * | 2010-12-24 | 2017-07-11 | Microsoft Technology Licensing, Llc | Image and video decoding implementations |
GB2488159B (en) * | 2011-02-18 | 2017-08-16 | Advanced Risc Mach Ltd | Parallel video decoding |
US8515193B1 (en) | 2011-04-08 | 2013-08-20 | Google Inc. | Image compression using exemplar dictionary based on hierarchical clustering |
US8982942B2 (en) * | 2011-06-17 | 2015-03-17 | Microsoft Technology Licensing, Llc | Adaptive codec selection |
MY189650A (en) | 2011-06-30 | 2022-02-23 | Microsoft Technology Licensing Llc | Reducing latency in video encoding and decoding |
US8731067B2 (en) | 2011-08-31 | 2014-05-20 | Microsoft Corporation | Memory management for video decoding |
US8525883B2 (en) * | 2011-09-02 | 2013-09-03 | Sharp Laboratories Of America, Inc. | Methods, systems and apparatus for automatic video quality assessment |
US9591318B2 (en) | 2011-09-16 | 2017-03-07 | Microsoft Technology Licensing, Llc | Multi-layer encoding and decoding |
US9819949B2 (en) | 2011-12-16 | 2017-11-14 | Microsoft Technology Licensing, Llc | Hardware-accelerated decoding of scalable video bitstreams |
US11089343B2 (en) | 2012-01-11 | 2021-08-10 | Microsoft Technology Licensing, Llc | Capability advertisement, configuration and control for video coding and decoding |
HK1205426A2 (en) * | 2015-09-24 | 2015-12-11 | Tfi Digital Media Ltd | Method for distributed video transcoding |
US10499056B2 (en) * | 2016-03-09 | 2019-12-03 | Sony Corporation | System and method for video processing based on quantization parameter |
EP3340105A1 (en) * | 2016-12-21 | 2018-06-27 | Axis AB | Method for and apparatus for detecting events |
CA3028701A1 (en) * | 2017-12-28 | 2019-06-28 | Comcast Cable Communications, Llc | Content-aware predictive bitrate ladder |
US10419773B1 (en) * | 2018-03-22 | 2019-09-17 | Amazon Technologies, Inc. | Hybrid learning for adaptive video grouping and compression |
EP4218247A1 (en) * | 2020-09-24 | 2023-08-02 | Centurylink Intellectual Property LLC | Content delivery using distributed ledger and ai-based transcoding technologies |
US12032591B2 (en) | 2020-09-24 | 2024-07-09 | Centurylink Intellectual Property Llc | Content delivery using distributed ledger and AI-based transcoding technologies |
US11910056B2 (en) * | 2020-09-24 | 2024-02-20 | Centurylink Intellectual Property Llc | Content delivery using distributed ledger and AI-based transcoding technologies |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08111870A (ja) * | 1994-10-12 | 1996-04-30 | Kokusai Denshin Denwa Co Ltd <Kdd> | 画像情報の再符号化方法及び装置 |
JPH10271494A (ja) * | 1997-03-26 | 1998-10-09 | Nec Commun Syst Ltd | 動画符号変換装置 |
JPH1174798A (ja) * | 1997-06-30 | 1999-03-16 | Hewlett Packard Co <Hp> | 圧縮入力ビットストリーム処理装置 |
JP2000069442A (ja) * | 1998-08-24 | 2000-03-03 | Sharp Corp | 動画システム |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6421733B1 (en) * | 1997-03-25 | 2002-07-16 | Intel Corporation | System for dynamically transcoding data transmitted between computers |
US6173287B1 (en) * | 1998-03-11 | 2001-01-09 | Digital Equipment Corporation | Technique for ranking multimedia annotations of interest |
US6298071B1 (en) * | 1998-09-03 | 2001-10-02 | Diva Systems Corporation | Method and apparatus for processing variable bit rate information in an information distribution system |
US6236395B1 (en) * | 1999-02-01 | 2001-05-22 | Sharp Laboratories Of America, Inc. | Audiovisual information management system |
US6345279B1 (en) * | 1999-04-23 | 2002-02-05 | International Business Machines Corporation | Methods and apparatus for adapting multimedia content for client devices |
US6430558B1 (en) * | 1999-08-02 | 2002-08-06 | Zen Tech, Inc. | Apparatus and methods for collaboratively searching knowledge databases |
-
2000
- 2000-04-11 US US09/547,159 patent/US6574279B1/en not_active Expired - Lifetime
-
2001
- 2001-03-23 WO PCT/JP2001/002354 patent/WO2001078398A1/ja not_active Application Discontinuation
- 2001-03-23 CN CN01800896A patent/CN1366775A/zh active Pending
- 2001-03-23 EP EP01915736A patent/EP1195992A1/en not_active Withdrawn
- 2001-03-23 JP JP2001575722A patent/JP4650868B2/ja not_active Expired - Lifetime
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08111870A (ja) * | 1994-10-12 | 1996-04-30 | Kokusai Denshin Denwa Co Ltd <Kdd> | 画像情報の再符号化方法及び装置 |
JPH10271494A (ja) * | 1997-03-26 | 1998-10-09 | Nec Commun Syst Ltd | 動画符号変換装置 |
JPH1174798A (ja) * | 1997-06-30 | 1999-03-16 | Hewlett Packard Co <Hp> | 圧縮入力ビットストリーム処理装置 |
JP2000069442A (ja) * | 1998-08-24 | 2000-03-03 | Sharp Corp | 動画システム |
Non-Patent Citations (1)
Title |
---|
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, vol. 9, no. 1, February 1999 (1999-02-01), pages 186 - 199, XP002941470 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1488628A2 (en) * | 2001-12-28 | 2004-12-22 | Nokia Corporation | Method and apparatus for selecting macroblock quantization in a video encoder |
EP1488628A4 (en) * | 2001-12-28 | 2008-12-10 | Nokia Corp | METHOD AND DEVICE FOR SELECTING THE MACROBLOCK QUANTIZATION IN A VIDEO PROCESSOR |
JPWO2004093457A1 (ja) * | 2003-04-10 | 2006-07-13 | 日本電気株式会社 | 動画像圧縮符号化方式変換装置及び動画像通信システム |
US8780957B2 (en) | 2005-01-14 | 2014-07-15 | Qualcomm Incorporated | Optimal weights for MMSE space-time equalizer of multicode CDMA system |
JP2008533841A (ja) * | 2005-03-10 | 2008-08-21 | クゥアルコム・インコーポレイテッド | マルチメディア処理のためのコンテンツ分類 |
JP2013085287A (ja) * | 2005-03-10 | 2013-05-09 | Qualcomm Inc | マルチメディア処理のためのコンテンツ分類 |
US9197912B2 (en) | 2005-03-10 | 2015-11-24 | Qualcomm Incorporated | Content classification for multimedia processing |
US8879857B2 (en) | 2005-09-27 | 2014-11-04 | Qualcomm Incorporated | Redundant data encoding methods and device |
US8879856B2 (en) | 2005-09-27 | 2014-11-04 | Qualcomm Incorporated | Content driven transcoder that orchestrates multimedia transcoding using content information |
US8879635B2 (en) | 2005-09-27 | 2014-11-04 | Qualcomm Incorporated | Methods and device for data alignment with time domain boundary |
US9071822B2 (en) | 2005-09-27 | 2015-06-30 | Qualcomm Incorporated | Methods and device for data alignment with time domain boundary |
US9088776B2 (en) | 2005-09-27 | 2015-07-21 | Qualcomm Incorporated | Scalability techniques based on content information |
US9113147B2 (en) | 2005-09-27 | 2015-08-18 | Qualcomm Incorporated | Scalability techniques based on content information |
US8948260B2 (en) | 2005-10-17 | 2015-02-03 | Qualcomm Incorporated | Adaptive GOP structure in video streaming |
US8654848B2 (en) | 2005-10-17 | 2014-02-18 | Qualcomm Incorporated | Method and apparatus for shot detection in video streaming |
US9131164B2 (en) | 2006-04-04 | 2015-09-08 | Qualcomm Incorporated | Preprocessor method and apparatus |
Also Published As
Publication number | Publication date |
---|---|
US6574279B1 (en) | 2003-06-03 |
CN1366775A (zh) | 2002-08-28 |
JP4650868B2 (ja) | 2011-03-16 |
EP1195992A1 (en) | 2002-04-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4650868B2 (ja) | 圧縮ビデオのトランスコーディング方法 | |
US6490320B1 (en) | Adaptable bitstream video delivery system | |
US6493386B1 (en) | Object based bitstream transcoder | |
US6542546B1 (en) | Adaptable compressed bitstream transcoder | |
US8218617B2 (en) | Method and system for optimal video transcoding based on utility function descriptors | |
JP4786114B2 (ja) | 映像をコード化するための方法及び装置 | |
Vetro et al. | Object-based transcoding for adaptable video content delivery | |
Kim et al. | Content-adaptive utility-based video adaptation | |
US6925120B2 (en) | Transcoder for scalable multi-layer constant quality video bitstreams | |
US20050271140A1 (en) | Bit stream separating and merging system, apparatus, method and computer program product | |
JP2005323353A (ja) | 高忠実度のトランスコーディング | |
JPH09163362A (ja) | ソフトウェア実行型端末相互スケーラブルビデオ送達システム用ソフトウェアベースエンコーダ | |
JP2001511983A (ja) | 知覚特性利用型のトレリスに基づいて低ビットレートでビデオ符号化を行なうレート制御方法及び装置 | |
Kim et al. | An optimal framework of video adaptation and its application to rate adaptation transcoding | |
Valentim et al. | Evaluating MPEG-4 video decoding complexity for an alternative video complexity verifier model | |
Safranek et al. | Methods for matching compressed video to ATM networks | |
Eleftheriadis et al. | Dynamic rate shaping of compressed digital video | |
Eleftheriadis et al. | Optimal data partitioning of MPEG-2 coded video | |
KR100802180B1 (ko) | 엠펙-4 비디오 신호의 비트율을 동적인 통신 용량 변화에따라 제어하는 방법 | |
CN100366077C (zh) | 基于实用函数描述的最优视频解码的方法和系统 | |
Smith | Receiver-Driven Video Adaptation | |
Bojkovic | MPEG and ITU-T video communication: standardization process | |
Kang et al. | MPEG-21 DIA-based video adaptation framework and its application to rate adaptation | |
Bocheck et al. | Content-based VBR Video Tra c Modeling and its Application to Dynamic Network Resource Allocation | |
Tao | Video adaptation for stored video delivery over resource-constrained networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 01800896.8 Country of ref document: CN |
|
ENP | Entry into the national phase |
Ref document number: 2001 575722 Country of ref document: JP Kind code of ref document: A |
|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CN JP |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2001915736 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 2001915736 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2001915736 Country of ref document: EP |