WO1997043856A1 - Compression et codage de services audiovisuels - Google Patents

Compression et codage de services audiovisuels Download PDF

Info

Publication number
WO1997043856A1
WO1997043856A1 PCT/AU1997/000297 AU9700297W WO9743856A1 WO 1997043856 A1 WO1997043856 A1 WO 1997043856A1 AU 9700297 W AU9700297 W AU 9700297W WO 9743856 A1 WO9743856 A1 WO 9743856A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
video
video signal
dimensional space
signal
Prior art date
Application number
PCT/AU1997/000297
Other languages
English (en)
Inventor
Michael R. Frater
John Frederick Arnold
Original Assignee
Unisearch Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisearch Limited filed Critical Unisearch Limited
Priority to AU26863/97A priority Critical patent/AU2686397A/en
Publication of WO1997043856A1 publication Critical patent/WO1997043856A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • H04N21/43076Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of the same content streams on multiple devices, e.g. when family members are watching the same movie on different devices
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234345Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234354Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering signal-to-noise ratio parameters, e.g. requantization
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • H04N21/2368Multiplexing of audio and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/242Synchronization processes, e.g. processing of PCR [Program Clock References]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/266Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
    • H04N21/2665Gathering content from different sources, e.g. Internet and satellite
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/4223Cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • H04N21/4334Recording operations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440245Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440254Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering signal-to-noise parameters, e.g. requantization
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/462Content or additional data management, e.g. creating a master electronic program guide from data received from the Internet and a Head-end, controlling the complexity of a video stream by scaling the resolution or bit-rate based on the client capabilities
    • H04N21/4622Retrieving content or additional data from different sources, e.g. from a broadcast channel and the Internet
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8106Monomedia components thereof involving special audio data, e.g. different tracks for different languages
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/142Constructional details of the terminal equipment, e.g. arrangements of the camera and the display

Definitions

  • the present invention relates to the field of audio-visual services over a telecommunications network and in particular, the invention provides techniques for improved signal compression.
  • This compression is of a type known as "lossy"; ie. it usually results in a reduction in audio-visual quality.
  • Lossy ie. it usually results in a reduction in audio-visual quality.
  • One of the key decisions that must be made is to identify those parts of the picture which are most important to a user's perception of quality, and should therefore be transmitted at the highest quality. The remainder of the picture can then be transmitted at a lower quality. Many techniques have been proposed for achieving this, based solely on the use of information contained in the video pictures.
  • video quality is defined as the fidelity wilh which the dynamic image or part of the dynamic image of a field of view is represented by a video signal, or compressed video signal, including both the spatial and temporal resolution of the respective image or partial image.
  • the present invention consists in an audio-visual signal processing system including: a) video signal input means for receiving a primary video signal representing an image of a three dimensional space; b) sound signal input means for receiving a sound signal representing sounds including a sound produced within the three dimensional space; c) Direction Information Extraction means arranged to process the video and/or sound signals to extract information indicative of a location of a sound source within the three dimensional space; and d) video signal processing means arranged to identify portions of the primary video signal corresponding to an image encompassing the location of the sound source within the three dimensional space and to produce a secondary video signal from the primary video signal, in which portions corresponding to the location of the sound source within the three dimensional space have a higher video quality (as hereinbefore defined) than at least part of the remainder of the secondary video signal.
  • the present invention consists in an audio-visual signal processing system including: a) video signal input means for receiving a primary video signal representing an image of a three dimensional space; b) sound signal input means for receiving a sound signal representing sounds including a sound produced within the three dimensional space; c) Direction Information Extraction means arranged to process the video and/or sound signals to extract information indicative of a location of a sound source of interest within the three dimensional space; and d) audio processing means arranged to attenuate components of the sound signals representative of sounds not originating from the location of the sound source of interest.
  • the present invention consists in a method of processing a video signal in an audio visual system including the steps of: a) inputting a primary video signal representing an image of a three dimensional space; b) inputting a sound signal representing sounds including a sound produced within the three dimensional space; c) processing the primary video signal and/or the sound signal to produce location data representing a location of a sound source within the three dimensional source; d) using the location data to identify portions of the primary video signal corresponding to an image portion encompassing the location of the sound source; and e) processing the primary video signal to produce a secondary video signal in which portions of the secondary video signal representing the image portion encompassing the location of the sound source have a higher video quality (as hereinbefore defined) than at least part of the remainder of the secondary video signal.
  • the present invention consists in a method of processing an audio signal in an audio-visual system including the steps of: a) inputting a primary video signal representing an image of a three dimensional space directing a camera at a field of view containing a sound source to produce a primary video signal representing a dynamic image of a three dimensional space; b) inputting a sound signal representing sounds including a sound produced within the three dimensional space; ' c) processing the primary video signal and/or the sound signal to produce location data representing a location of a sound source within the three dimensional source; d) processing the sound signal to selectively attenuate components of the sound signal representative of sounds from different sound sources depending upon the location of each source relative to the sound source represented by the location data.
  • the audio-visual processing system is part of an audio ⁇ visual transmission system, including a video signal generating means such as a video camera or video tape recorder and a sound signal generating means such as a microphone or similar transducer, or a tape recorder or the audio output of a video tape recorder.
  • a video signal generating means such as a video camera or video tape recorder
  • a sound signal generating means such as a microphone or similar transducer, or a tape recorder or the audio output of a video tape recorder.
  • a plurality of audio signals are used, each generated by a microphone at a one of a plurality of different locations around the three dimensional space and in the case of signals retrieved from a tape recorder, the signal from each microphone is recorded on a different recording channel or is multiplexed with other signals in such a way that it can be separated without loss of phase relative to other signals and separately processed.
  • the location of sound sources is achieved by the correlation of the plurality of sound signals, and the locations are mapped into a co-ordinate system from which the images of the location can be identified in the video signal.
  • the location of sounds sources of interest can be identified by identifying areas of movement within the video image (eg., lip areas) and these image areas can be used to identify sound components issuing from the source of interest by mapping the image areas onto a suitable co-ordinate system and identifying the sounds originating from those co-ordinates.
  • a plurality of video cameras are used to provide video signals of images from multiple points of view by processing such multiple images simultaneously. By processing such multiple images simultaneously, three dimensional locations of sound sources can be more accurately identified and closer correlation between video images and sound sources can be determined.
  • the video signal is transmitted as a digitally encoded signal in which signals representing those portions of the image which change rapidly are transmitted more frequently than signals representing portions of the image that change less frequently or do not change at all.
  • the sound signals are also represented digitally and are multiplexed with the video signals.
  • the present invention provides an audio ⁇ visual transmitter which includes: a) video input means to receive video information representing an image of a target; b) sound input means to receive audio information which permits the location of a source of sound in three dimensional space relative to the target; c) correlation means to map the audio information onto the image information and to identify a portion of the image information corresponding to an image area encompassing the location of the sound source; and d) communication means arranged to generate modified video information wherein video quality (as defined herein) of the image represented by the modified video information varies with proximity to the image area encompassing the location of the sound source, the communication means being further arranged to transmit or communicate the modified video information and the audio information to a remote location.
  • the video quality of the video information corresponding to the area of the image encompassing the location of the sound source is improved relative to the rest of the video image.
  • the basis of the new techniques proposed in embodiments of the present invention is the use of directional information in the audio as well as the raw video data to identify regions of interest in pictures so that these regions can be transmitted at higher quality than other less important parts.
  • cancellation of extraneous audio sources will also be possible, thus enhancing the overall sound quality.
  • Figure 1 is a block diagram of an audio-visual compression system in accordance with the present invention
  • Figure 2 depicts a camera and microphone arrangement diagram for an embodiment of the invention
  • Figure 3 illustrates the definition of "horizontal direction of arrival”
  • Figure 4 illustrates the definition of "vertical direction of arrival”
  • video consists of a sequence of frames, each of which is represented by a number of pixels. Each pixel represents the brightness and colour information at a particular point in the frame.
  • adjacent pixels tend to be very similar.
  • pixels in the same location in different frames that are closely spaced in time tend be similar.
  • the present invention provides a first step towards implementing a combined representation of these two signals. This is widely regarded as a very important problem (for example papers on this topic are regularly solicited for international conferences in this area); it is also regarded as being a very difficult problem. Embodiments of the present invention provide methods whereby audio information can-be used to produce an improved representation of the video, more efficiently than this could be achieved by looking at the video alone.
  • DCT Discrete Cosine Transform
  • the mouth area could be more finely quantised thus resulting in a higher quality rendition.
  • most videophone and video conference links utilise a frame rate which is low (say 6 frames per second) compared with the frame rate available from the camera producing the analogue video to the coder (25 frames per second). It would be possible to transmit frames which only contain lip information thus providing a full refresh rate for this important information (and so maintaining good lip sync between video and audio) while only refreshing other areas of the video material at a lower rate. In the remainder of this application, we describe a technique by which this can be achieved.
  • the position of the lips of the speaker in question be known precisely so that different video coding techniques can be used for this area compared to the rest of the scene. It is well known that any deterioration of the reconstruction of the lips destroys the quality of the video interaction more than any other part of the scene.
  • the lips of the speaker are the source of sound, it is aimed to determine their positions by estimating the direction of this sound source using an array of microphones suitably placed in a plane near the camera and in front of the speaker.
  • Direction of arrival estimation techniques using an array of sensors are well known for radar, sonar and seismology.
  • the present application requires searching for a sound source located at the lips of the speaker.
  • One possible technique for locating this source is to consider points on a spiral starting at the outer boundary of the area under consideration and measure the power arriving from the direction of each point using the optimal beamforming methods described above. The point which yields the maximum power would be identified as the location of the lips.
  • the microphone array is connected to an appropriate digital signal processor (eg. a member of the Motorola DSP56000 family) which will be used to control the microphone array in real time and thus extract the position of the audio source.
  • the information gathered by the microphone array is utilised by the video coding algorithm to determine the position of the lips of the user.
  • microphone arrays can be used for noise cancellation.
  • a beamforming approach can be used to cancel noise from other sources such as background noise.
  • this approach will also be beneficial in improving the coding efficiency of the audio coding algorithm used.
  • FIG. 1 shows one possible implementation of the principles of the present invention. Audio information is acquired via an array of two or more microphones 11. 12 while picture information is acquired via one or more cameras 13 and these signals are them processed as shown in Figure 1.
  • the Extract Directional Information block 15 identifies the location in the field of view of the camera 13 and microphone array 11, 12 those areas that are of most interest.
  • the direction of sound sources can be identified by measuring the difference in propagation delay between the sound source and two or more microphones 11, 12. This can be performed by examining the cross correlation between the signals from the different microphones 11 and 12. Where the delay difference is small, it may be possible to use a subtraction operation rather than multiplication in the cross correlation.
  • the video information can be used by looking for regions with high motion.
  • the directional information is used to apply beamforming techniques to select particular directions or to attenuate particular regions of the field of view of the microphone array.
  • the simplest action for this block would be to either select a single microphone or sum the outputs of all microphones together.
  • the output could be multi-channel audio.
  • the Video Compression block 16 uses a video compression algorithm, which takes in video pictures and produces a compressed video bitstream, eg. ISO/IEC MPEG 2 Video, ITU-T H.263. This block may include filtering to reduce the spatial and/or temporal resolution, or for other purposes.
  • a video compression algorithm which takes in video pictures and produces a compressed video bitstream, eg. ISO/IEC MPEG 2 Video, ITU-T H.263.
  • This block may include filtering to reduce the spatial and/or temporal resolution, or for other purposes.
  • the Audio Compression block 17 uses an audio compression algorithm, which takes in a digital audio stream, and produces a compressed audio bitstream. Examples include ISO/IEC MEG 2 Audio, & ITU T G.723.
  • the multiplexer 18 combines the various video and audio bitstreams so that they can be carried on a single channel, eg. ITU-T H.922 and ISO/IEC
  • the field of view of the camera 13 is the area between an angle ⁇ Hmax to the left and right of the perpendicular to the plane of the microphones and an angle ⁇ Vl ,,_. above and below this perpendicular.
  • the angle of the field of view is 2 ⁇ UlU(lx horizontally and 2 ⁇ Vm(lx vertically.
  • each microphone is connected to a 16 bit analog to digital converter, which has sampling frequency f s , which here takes the value of 32,000 samples per second.
  • the sequence of samples output from the analog to digital converter connected to the first microphone 11 is denoted by X j (t), where t is the time at which the sample is taken.
  • the sequence of samples from the second and third microphones 12, 21, are denoted by x 2 (t) and x 3 (t) respectively.
  • k denotes the time corresponding to the beginning of the video frame
  • iV is the value of i that corresponds to a sound source whose vertical location places it on the edge of the field of view of the camera (i.e.
  • the location of the sound source in the video pictures can be calculated. This location lies to the right of the center of the picture by an amount:
  • the information on the location of sound sources is applied to a video coding system conforming with the ITU-T- Recommendation H.261.
  • Macroblocks each picture belong to one of two classes:
  • the information on the location of sound sources is applied to the video coding by using two different schemes for choosing the value of the H.261 parameter QP for macroblocks in the two classes.
  • a variable bitrate is employed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Astronomy & Astrophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

L'invention porte sur un nouveau procédé d'amélioration de la qualité de l'audiovisuel recourant à des informations directionnelles provenant du son (par exemple captées par deux ou plusieurs micros) pour améliorer la qualité de services vidéo en utilisant lesdites informations pour identifier des parties importantes de l'image et transmettre lesdites parties avec une qualité plus élevée (par exemple par ajustage des paramètres de quantification du processus de compression) et en recourant à des informations directionnelles provenant du son et de la vidéo (par exemple à l'aide de techniques de segmentation) pour améliorer la qualité des services audio en atténuant le son provenant de zones non souhaitées (par exemple en appliquant des techniques de formation de faisceaux). L'organigramme de la figure illustre l'une des réalisations possibles des principes de ladite invention. Les informations audio sont acquises par l'intermédiaire d'un réseau d'un ou plusieurs micros (11, 12) tandis que les informations vidéo le sont par une ou plusieurs caméras (13), les signaux étant alors traités comme l'indique la figure.
PCT/AU1997/000297 1996-05-16 1997-05-15 Compression et codage de services audiovisuels WO1997043856A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU26863/97A AU2686397A (en) 1996-05-16 1997-05-15 Compression and coding of audio-visual services

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AUPN9889A AUPN988996A0 (en) 1996-05-16 1996-05-16 Compression and coding of audio-visual services
AUPN9889 1996-05-16

Publications (1)

Publication Number Publication Date
WO1997043856A1 true WO1997043856A1 (fr) 1997-11-20

Family

ID=3794201

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU1997/000297 WO1997043856A1 (fr) 1996-05-16 1997-05-15 Compression et codage de services audiovisuels

Country Status (2)

Country Link
AU (1) AUPN988996A0 (fr)
WO (1) WO1997043856A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000002388A1 (fr) * 1998-07-03 2000-01-13 Antonio Messina Procede et appareil de commande automatique de cameras video a l'aide de microphones
WO2000056070A1 (fr) * 1999-03-18 2000-09-21 Qualcomm Incorporated Videophone avec poursuite de source audio
EP1205762A1 (fr) * 1999-06-11 2002-05-15 Japan Science and Technology Corporation Procede et appareil de determination d'une source sonore
GB2391126A (en) * 2002-07-18 2004-01-28 Lg Electronics Inc Calculation method for prediction motion vector
US9094771B2 (en) 2011-04-18 2015-07-28 Dolby Laboratories Licensing Corporation Method and system for upmixing audio to generate 3D audio

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1994000951A1 (fr) * 1992-06-29 1994-01-06 British Telecommunications Public Limited Company Codage et decodage de signaux video
EP0617537A1 (fr) * 1993-03-24 1994-09-28 AT&T Corp. Agencement de conférence pour des signaux d'information comprimées

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1994000951A1 (fr) * 1992-06-29 1994-01-06 British Telecommunications Public Limited Company Codage et decodage de signaux video
EP0617537A1 (fr) * 1993-03-24 1994-09-28 AT&T Corp. Agencement de conférence pour des signaux d'information comprimées

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7277116B1 (en) 1998-07-03 2007-10-02 Antonio Messina Method and apparatus for automatically controlling video cameras using microphones
WO2000002388A1 (fr) * 1998-07-03 2000-01-13 Antonio Messina Procede et appareil de commande automatique de cameras video a l'aide de microphones
WO2000056070A1 (fr) * 1999-03-18 2000-09-21 Qualcomm Incorporated Videophone avec poursuite de source audio
EP1205762A1 (fr) * 1999-06-11 2002-05-15 Japan Science and Technology Corporation Procede et appareil de determination d'une source sonore
EP1205762A4 (fr) * 1999-06-11 2005-07-06 Japan Science & Tech Agency Procede et appareil de determination d'une source sonore
US7035418B1 (en) 1999-06-11 2006-04-25 Japan Science And Technology Agency Method and apparatus for determining sound source
US8565544B2 (en) 2002-07-18 2013-10-22 Lg Electronics Inc. Apparatus for predicting a motion vector for a current block in a picture to be decoded
US8644631B2 (en) 2002-07-18 2014-02-04 Lg Electronics Inc. Method of predicting a motion vector for a current block in a current picture
US8428373B2 (en) 2002-07-18 2013-04-23 Lg Electronics Inc. Apparatus for determining motion vectors and a reference picture index for a current block in a picture to be decoded
US8463058B2 (en) 2002-07-18 2013-06-11 Lg Electronics Inc. Calculation method for prediction motion vector
US8467620B2 (en) 2002-07-18 2013-06-18 Lg Electronics Inc. Method of determining motion vectors and a reference picture index for a current block in a picture to be decoded
US8467622B2 (en) 2002-07-18 2013-06-18 Lg Electronics Inc. Method of determining motion vectors and a reference picture index for a current block in a picture to be decoded
US8467621B2 (en) 2002-07-18 2013-06-18 Lg Electronics Inc. Method of determining motion vectors and a reference picture index for a current block in a picture to be decoded
US8472738B2 (en) 2002-07-18 2013-06-25 Lg Electronics Inc. Apparatus for determining motion vectors and a reference picture index for a current block in a picture to be decoded
US8509550B2 (en) 2002-07-18 2013-08-13 Lg Electronics Inc. Apparatus for determining motion vectors and a reference picture index for a current block in a picture to be decoded
US8548264B2 (en) 2002-07-18 2013-10-01 Lg Electronics Inc. Apparatus for predicting a motion vector for a current block in a picture to be decoded
GB2391126A (en) * 2002-07-18 2004-01-28 Lg Electronics Inc Calculation method for prediction motion vector
US8571335B2 (en) 2002-07-18 2013-10-29 Lg Electronics Inc. Calculation method for prediction motion vector
US8634667B2 (en) 2002-07-18 2014-01-21 Lg Electronics Inc. Method of predicting a motion vector for a current block in a current picture
US8634666B2 (en) 2002-07-18 2014-01-21 Lg Electronics Inc. Apparatus for determining motion vectors and a reference picture index for a current block in a picture to be decoded
US8639048B2 (en) 2002-07-18 2014-01-28 Lg Electronics Inc. Method of determining motion vectors and a reference picture index for a current block in a picture to be decoded
GB2391126B (en) * 2002-07-18 2006-11-15 Lg Electronics Inc Calculation method for prediction motion vector
US8644630B2 (en) 2002-07-18 2014-02-04 Lg Electronics Inc. Apparatus for determining motion vectors and a reference picture index for a current block in a picture to be decoded
US8649621B2 (en) 2002-07-18 2014-02-11 Lg Electronics Inc. Apparatus for determining motion vectors and a reference picture index for a current block in a picture to be decoded
US8649622B2 (en) 2002-07-18 2014-02-11 Lg Electronics Inc. Method of determining motion vectors and a reference picture index for a current block in a picture to be decoded
US8655089B2 (en) 2002-07-18 2014-02-18 Lg Electronics Inc. Apparatus for determining motion vectors and a reference picture index for a current block in a picture to be decoded
US8712172B2 (en) 2002-07-18 2014-04-29 Lg Electronics Inc. Method of predicting a motion vector for a current block in a current picture
US8908983B2 (en) 2002-07-18 2014-12-09 Lg Electronics Inc. Method of predicting a motion vector for a current block in a current picture
US10897613B2 (en) 2002-07-18 2021-01-19 Lg Electronics Inc. Method of predicting a motion vector for a current block in a current picture
US9544590B2 (en) 2002-07-18 2017-01-10 Lg Electronics Inc. Method of predicing a motion vector for a current block in a current picture
US9544589B2 (en) 2002-07-18 2017-01-10 Lg Electronics Inc. Method of predicting a motion vector for a current block in a current picture
US9544591B2 (en) 2002-07-18 2017-01-10 Lg Electronics Inc. Method of predicting a motion vector for a current block in a current picture
US9560354B2 (en) 2002-07-18 2017-01-31 Lg Electronics Inc. Method of predicting a motion vector for a current block in a current picture
US10425639B2 (en) 2002-07-18 2019-09-24 Lg Electronics Inc. Method of predicting a motion vector for a current block in a current picture
US9094771B2 (en) 2011-04-18 2015-07-28 Dolby Laboratories Licensing Corporation Method and system for upmixing audio to generate 3D audio

Also Published As

Publication number Publication date
AUPN988996A0 (en) 1996-06-06

Similar Documents

Publication Publication Date Title
Torres et al. Video coding: the second generation approach
US6496607B1 (en) Method and apparatus for region-based allocation of processing resources and control of input image formation
Zafar et al. Multiscale video representation using multiresolution motion compensation and wavelet decomposition
Sikora Trends and perspectives in image and video coding
US5995150A (en) Dual compressed video bitstream camera for universal serial bus connection
JP2818340B2 (ja) モーション・ビデオ圧縮システムおよび方法
Wu et al. Perceptual visual signal compression and transmission
EP1438862B1 (fr) Pretraitement adapte au mouvement pour la reduction du bruit dans un signal video numerique
US6275614B1 (en) Method and apparatus for block classification and adaptive bit allocation
US20230082561A1 (en) Image encoding/decoding method and device for performing feature quantization/de-quantization, and recording medium for storing bitstream
Schiller et al. Efficient coding of side information in a low bitrate hybrid image coder
JP2014502443A (ja) 深さ表示マップの生成
US6987889B1 (en) System and method for dynamic perceptual coding of macroblocks in a video frame
JP2001514826A (ja) 静止画像の送信と表示の方法と装置
JP2001525131A (ja) 視聴者情報データに従った適応圧縮符号化を行うビデオ符号化方法
JP2006520141A (ja) 多数のビデオシーケンスを伝送するシステム及び方法
MXPA04011439A (es) Metodo y aparato para transcodificar flujos de bits de video comprimido.
JPH04323989A (ja) 高品位テレビ用符号化装置
US6865229B1 (en) Method and apparatus for reducing the “blocky picture” effect in MPEG decoded images
CA2576679C (fr) Methode et appareil pour interpoler un pixel de reference dans une image annulaire et coder/decoder une image annulaire
US20050021620A1 (en) Web data conferencing system and method with full motion interactive video
Benzler Spatial scalable video coding using a combined subband-DCT approach
CN115699775A (zh) 视频或图像编码系统中基于单色颜色格式的色度去块参数信息的图像编码方法
WO1997043856A1 (fr) Compression et codage de services audiovisuels
US6864909B1 (en) System and method for static perceptual coding of macroblocks in a video frame

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH HU IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK TJ TM TR TT UA UG US UZ VN YU AM AZ BY KG KZ MD RU TJ TM

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH KE LS MW SD SZ UG AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: JP

Ref document number: 97540319

Format of ref document f/p: F

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA