CA2048623A1

CA2048623A1 - Video telephone systems

Info

Publication number: CA2048623A1
Application number: CA002048623A
Authority: CA
Inventors: Robert L. Harvey; Patrick R. Hirschler-Marchand; David J. Cipolle; Kipton C. Kumler
Original assignee: Individual
Current assignee: Massachusetts Institute of Technology
Priority date: 1989-12-28
Filing date: 1990-12-26
Publication date: 1991-06-29
Also published as: JPH04505537A; AU7166491A; KR920702157A; EP0462261A1; WO1991010328A1; AU650256B2; WO1991010324A1

Abstract

Abstract of the Disclosure Differential image data compression systems and techniques are disclosed for use in video telephone systems for full duplex, real time transmission of audio and video images over nominal 3 kHz residential telephone lines and other narrow bandwidth channels. Each video telephone device consists of a transmitting portion (12) and a receiving portion (38). In the transmitting section (12) a reduced grey-scale (luminance) image, preferably consisting of only black-and-white pixels, is compared in an image processing module (20) with a similarly reduced image derived from previous values to determine pixel positions that have changed.
Information representative of the changes is then encoded and modulated and (time or frequency) multiplexed with the audio and/or chrominance signals for transmission. At the receiver (38), the incoming signal is demodulated and demultiplexed to separate the audio and video portions, the image portion is decoded and the luminance value is updated by an image updating unit (50). Adaptive resolution, pixel averaging and interpolation techniques are also disclosed for picture enhancement.

Description

-2 ~

VIDEO TELEPHONE SY~TEMS

5 Backaround of the Invention The technical field of this invention is image processing and, more specifically, differential motion detection processes and devices. In 10 particular, the invention relates to video telephones for transmitting both sound and images in real time over standard residential telephone lines.

When video conferencing was first 15 demonstrated at the New York World~s Fair in 1964, public expectations were raised that a new technology would soon render the ~elephone obsolete. However, various technical constraints have made video telephone systems prohibitively costly to all but a 20 relatively small group. In particular, the amount of image data that must be transmitted has posed a most significant problem because such data far exceeds the capacity of existing standard residential telephone networks.

-2- 2~8~2~

Researchers have attempted to overcome this obstacle in two ways: first, by using a different medium for data transmission to enable a higher data transfer rate; or second, by using image data 5 manipulation techniques to compress the amount of data required to represent the image. This invention primarily is concerned with the latter approach of data compression.

Much of the work on video conferencing has been directed toward data transmission over special, high-quality transmission lines, such as fiber optics, which are capable of transmitting at least several times as much data as standard residential 15 telephone lines. For example, an Integrated Switched Digital Network (ISDN)!service is being implemented with a 64 kbit/sec. video transmission rate to replace, in some instances, the standard 3 kHz telephone lines that can handle at best up to about 20 20 kbit/sec., depending upon the signal processing employed. These special lines are relatively costly and currently are available only in limited areas.

An object of this invention is to provide an 25 image data compression process to enable video telephones to be used over the present, copper-based, residential telephone network, as well as other low bandwidth transmission media.

Another o~ject of this invention is to provide an ine~pensive video telephone that may be used with standard video cameras and video display screens or personal computers to provide videoconferencing capabilities between users 35 connected to the standard residential telephone network.

,. . ~: ' ~ ' :
' ' :

~3- 2 ~ 3 Summ~rY of the Invention Differential motion detection data compression systems and techniques are disclosed for 5 use in low-cost, video telephone systems for full duplex, real time transmission of audio and video images over nominal 3 kHz residential telephone lines and other low bandwidth channels.

Each video telephone device consists of a transmitting portion and a receiving portion. In a simple ~black-and-white~ embodiment, the transmitting section transforms an image (e.g., either from a standard video camera or an alternative imaging 15 device such as a charge coupled device or CCD) into a reduced grey-scale image preferably consisting of only black-and-white pixels. This image is then compared with a similarly reduced image derived from previous image data to determine pi~el positions that 20 have changed. Information representative of ~he changes between image frames is then encoded to further compress the data and then modulated and multipl~xed with the audio signal for transmission over a standard 3 kHz telephone line.
In another embodiment, color ima~es can ~e transmitted by decomposing the color video signal into its luminance and chrominance components and then processing the luminance values in accordance 30 with his invention. As used herein, the term ~grey-scal~ image" iæ intended to encompass both simple ~black-and-whiteU ima~es and the luminance component of color images. Techni~ues for encoding and transmitting the chrominance values of color 35 îmages, as well as reconstruction of a color image from ~he luminance and chrominance information, will be described below.

-4- ~ ~3 ~

Coh~rent modulation/demodulation ~an be used to enable transmission and reception of the video and audio signals over a standard residential telephone line. Coherent modulation produces frequency 5 transformations of the signals to position the signal bandwidth in the telephone line channel, nominally, 0 to 3 kHz. The coherent modulation also is used to enable multiple~ing two analog signals simultaneously onto the telephone line bandwidth, as 10 described in more detail below. Techniques for reducing crosstalk between the transmitted audio and video signals, as well as alternative frequency division multiplexing techniques for transmittal of the audio and video signals, are also disclosed below.
In another aspect of the invention, adaptive resolution apparatus and methods are disclosed in which different data compression techniques are used, depending on the degree of motion in the image over 20 time. In one illustrated embodiment, three states (fast motion, intermediate motion and slow motion) are defined and different data processing steps practiced in th~ transmitter based on the state determination.
The receiving section reverses the data compression process of the transmitting section. The incoming signal is demodulated and demultiple~ed to separate the audio and video portions, the image 30 portion is decoded, and the reduced grey-scale image of the previous frame is updated accordingly, Prior to the display of the updated image, the image can be transformPd from a reduced grey-scale state into a fuller grey-scale image or a recon~tructed luminance 35 signal by overlapping and averaging blocks of pi~el ~; values.
, , .

-5- ~ 3 When chrominance information is also encoded, various transmission schemes can be employed. For e~ample, luminance and audio information can be coherently modulated, as in the 5 black-and-white case, but over a slightly narrowed bandwidth (e.g., over a 0 - 2500 Hz band with a first carrier frequency, fl, and the I & Q color components can be coherently modulated in a second band (e.g., over 2500 - 3000 Hz band with a second carrier 10 frequency, f2). Alternatively, a luminance signal L
and chrominance si~nals, e.g., Xred a~d Xblue color signals, can be multiplexed over time. In yet another approach, the color signals can be sampled over time and then time domain multiple~ed over the 15 audio channel.

The image data compression techniques of the present inYention can be applied not only to Yideo telephones and video conferencing systems but to 20 graphic image storage devices, high definition television (HDTV), cable television, facsimile machines and computerized dncument storage on magnetic or optical disks. In addition, the images that can be processed vary from still, 25 black-and-white characters to fast-moYing, high-resolution color images of intricate objects.

2~g,8~23 The invention can also be adapted for image transmission over other narrow band media, such as radio transmissions, through the air. In addition, the invention can be adapted to transmit graphic 5 images of text generated by a computer instead of a video camera. The video telephones of the present invention also are compatible with conventional telephones and can receive and~or transmit audio signals, alone, whenever communication with a regular 10 telephone or other audio transceiver is desired.
Likewise, the systems of the present invention can be used not only with analog signals produced by conventional TV cameras but also with the signals produced by imaging devices such as CCDs and the 15 like. These features, as well as the addition, subtraction or substitution of other components, will be obvious to those familiar with the art.

It should also be noted that throughout this 20 specification, the video telephone system has been described in terms of transmission via telephone lines having a nominal bandwidth from about 0 to about 3 kiloHertz. ~owever, telephone bandwidths actually are slightly offset from this range, 25 typically operating from about 300 Hz to about 3.4 kHz. Those skilled in the art will appreciate this distinction and can readily adjust the parameters described herein to match actual conditions in use.

-7~ 2~

Brief DescriP~ion of the Drawinas FIG. 1 is a schematic block diagram of a black-and-white video telephone system in accordance 5 with the present invention;

FIG. 2 is a schematic block diagram of a color video telephone system in accordance with the present invention;
FIG. 3 is a more detailed schematic diagram of an image process;ng module for use in the transmitter of FIG. l;

FIG. 4 is an illustrative matri~ of dithered threshold values for a 4 ~ 4 block of pi~els useful in a grey-scale reduction unit according to the invention;

FIG. 5A illustrates a hysteresis process for adjusting the dithered threshold values in a grey-scale reduction unit to decrease toggling and image flickering for a white pisel value in a previous frame;
FIG. 5B illustrates a ~imilar hysteresis adjustment for a black pixel value in a previous frame;

FIG. 6 is a more detailed schematic diagram of the modulation and multiple~ing elements of the transmitter and the demo~ulation and demultiple~ing elements of the receiver of the system of FIG. 1.

-B- 2 ~8 ~23 FIG. 7 is a schematic illustration of a system for suppressing cross-talk between the video and audio signals in a syætem, such as shown in FIG. 6.

FIG. 8 is a schemati~ illustration of an alternative modulation and demodulation approach for use in the present invention.

FIGS. 9A-9D illustrate an averaging process useful in the image averaging unit of the receiver of FIG. 1.

FIG. 10 is a schematic block diagram of a . 15 video telephone system.employing an adaptive - -resolution module;

FIG. llA is an illustration of a matrix of dithered threshold values for coarse resolution in 20 the adaptive system of FIG. 10.

FIG. llB is an illustration of a matrix of dithered threshold values for intermediate resolution in the adaptive æystem of FI~. 10; and FIG. llC is an illustration of a matrix of dithered threshold values for fine resolution in the adaptive system of ~IG. 10.

,~ '' ' ' 8 ~ ~, 3 Detailed DescriPtion In FIG. l, a video telephone system lO in accordance with the present invention is shown, 5 including a transmitter section 12, having a sampling unit 14, an image processing module 20 (including a grey-scale reduction unit 16, a frame memory 15 and a motion detection unit 18), a differential image encoding unit 2Z, an optional error correction unit 10 24, an image modulator 2~, an optional audio modulator 28 and a multiplexing mi~er 30. System 10 further includes a receiver section 38 having a demultiple~ing and demodulating unit 42, an optional speaker 44 for audio output, an optional error 15 detector 46, an image decoder 48, an image updating unit 50 (including ima~e memory 52 and comparator 54~, a pi~el averaging unit 56, and a display or monitor driver 58 for video output.

Image data is first compressed by the sampling and grey-scale reduction units 14 and 16, then compared with a previously reduced grey-scale image in the motion detection unit 18 to produce image data representative of the changes from the 2~ previous image frame. In the final stage, the differential ima~e data is further compressed by the encoding unit 22, such that the image data may be modulated and transmitted over a 3 kHz or other narrow bandwidth channel.

3~

-lO- ~0~2~

The transmitted image data is received by the recei~ing section 38, as shown in FIG. 1, which reconstructs the new image. The receiving section 38 also has three levels of data decompression. After 5 demodulation, the encoded image data is decoded by decoding unit 48 to provide the differential image, which is then used in the image updating unit 50 to make the designated changes-to the previous image frame. Finally, the image is averaged in averaging 10 unit 56 to yield a greater rangP of shades of grey which can then be displayed on monitor 58.

In FIG. 2, an alternative system lOA is shown, including tran~m;tter 12A and receiver 38A for 15 incorporating color information, with like references characters indicating ~lements substantially similar in function to those shown in FIG. 1. System lOA
includes filter 11 which decomposes a color (e~g., NTSC3 video signal into its luminance component L and 20 two chrominance components I and Q. The lumin3nce component L can be processed by module 20, encoder 24 and modulator 26 in a manner substantially identical to the processing of a black-and-white image~ as shown in FIG. 1. The luminance data can then be 25 multiplexed with audio data via mixer 30 and transmitted over a portion of the bandwidth (e~g., a nominal frequency band ranging from about 0 to about 2500 H~) while a second portion of the bandwidth (e.g., a nominal fre~uency bandwidth (~.g. a nominal 30 frequency bandwidth from about 2500 to about 3000 H~). The I and Q chrominance values can be modulated by modulator 13 and mized together in mixer 17. The luminance/audio and chrominance data can then be multiplexed to~ether via miser 19 for transmission.

, , ~:
::
,:
,~ , .

-11- 2~8~2~

In the receiver 38A of FIG. 2, the luminance and chrominance values (as well as audio signals, if any) are demultiplexsd and demodulated by unit 42, and the luminance data is decoded by decoding unit 48 5 to provide the differential image, which is then used in the updating unit 50 to update the image. Again, in a manner analogous to the process of FIG. 1, the luminance values can be avsraged to yield a greater range of grey values, which are then inputed into 10 display driver S8, together with the chrominance values to provide a color video output.

With reference again to FIG. 1, the image processing module 20 will be described in more 15 detail. In the first level of data compression, the grey-scale reducing unit 16 transforms the image by reducing the number of grey levels for each pixel.
The resultant image, when viewed by the human eye from a distance, has an appearance which is 20 strikingly similar to the original image; however, it requires fewer bits of information. In one preferred embodiment, the transformation entails reducing an image having 256 shades of grey into two shades, white and black. This results in an B-fold reduction 25 in the data required to represent the image, as each pi~el is converted from having an 8-bit grey-scale representation to a l-bi representation.

-12~ g ~ 2 3 To allow the compressed image data to appear as various shades of grey to the human eye, a dithering comparison, is employed. In one preferred embodiment, the grey value of each pi~el is compared 5 to a threshold value which varies with its pixel position. For grey values greater than the threshold, i.e., lighter in shade, the pixel value becomes 1, representative of pure white. For grey values less than the threshold, the transformed pixel 10 value hecomes 0, or pure black.

Different pi~el positions have different threshold values which are selected to provide a proportional combination of black-and-white pixels 15 such that when an area,or block of pixels are viewed from a distance, the image appears the desired shade of grey. The selected threshold values produce a primarily white pi~els for light shades of grey, and increasingly more black pisels per unit area for 20 darker shades of grey. Various dithering methods known in the art can be employed in the p::esent convention. See, for example, Ochi et al., NA New Halftone Reproduction and Transmission Method Using Standard Black & White Facsimile Code, n Vol. COM-35, 25 IEEE Transactions on Communication~, pp. 466-470 (1987), herein incorporated by reference for further background materials on dithering methods.

-13~ 8~

In the second level of data compression, the compressed or dithered image is then compared in a motion detection unit 18 to the compressed image from the previous image frame, as stored in an image 3 memory unit. The motion detection unit 18 detects which pixels have been changed between the two image frames and records the pixel locations. In one illustrated embodiment, pixel positions with no change have a value of 0, and those that have 10 changed, either from white to black or black to white, have a value of 1. The new compressed image is then stored in the image memory unit for comparison with the next image frame.

In the third level of data compression, a differential image is ne~t encoded in the differential image encoding unit 22 to further compress the data prior to transmission. In a preferred embodiment, run-length encoding is used, 20 many ver~ions of which are known in the art. In normal operation, the image will not change too much from frame to frame, leaving long series o 0 bits in the differential image. Run-length encoding represents various lengths (or runs~ of consecutive 25 O's or l's as code words. For eYample, in one embodiment, the shortest code words can be assigned to the most likely runs, effectively compressing the numb~r of bits re~uired to represent th2 data where there are long series of consecutive O's or l's.
30 ~2e, for e~ample, 9haravi, ~Conditional Run-length and Variabl Length Coding of Digital Pictures,~ IEEE
Transaction~_on ~munica~ions~ pp. 671-677 ~19S7), incorporated her~in by reference for further e~planation of coding schemes.

.

-14~ 2 ~

The encoded differential image data is then ready for transmission over a narrow bandwidth channel, particularly a 3 ~Hz telephone line.
However, additional optional coding techniques, such 5 as toward error corre~tion, may be conducted prior to transmission which will be described below.

The receiving section 38, shown in FIG. 1, (or the similar receiver 38A shown in FIG 2), 10 generally reverses and decodes the three levels of data compression as described in the transmitter but in the opposite order. In the first level of data decompression, the encoded differential image data is decoded in decoding unit 48 using the reverse process 15 as used in the differential image encoding unit 22.
In the preferred embodiment, this decoding process would reverse the selected run-length coding scheme.

In the second level of data decompression, 20 the previous image frame as stored in the image 52 memory unit is updated by the differential image data in the image updating unit. For pixel positions in which a change occurred, represented by 1 in the preferred embodiment, the pi~el value of the 25 corresponding pi~el position is changed. This would switch a black pi~el to white and vice versa. For pixel positions in which the differential image value is 0, the pi~el valu~ of the corresponding pi2el position remains unchanged.

' ' ' .

~8~2~

In the third level of data decompression, the updated compressed or dithered imaye is partially restored to its original image with multiple grey-scales in the image averaging unit. The value 5 of each pixel position is calculated by averaging the pi~el values of all positions within a prescribed region.

In addition to the three levels of data 10 compression and decompression, the invention may include an error code generator 24 and error detector 46, as shown in dotted lines in FIG. 1. This adaptation may be desirable for use over noisy transmission lines. One commonly-used error 15 correction technique is forward error correction (FEC~ which redundant bits of data are added to the data stream in a specified manner, such that the FEC decoder on tbe receiving section can check for errors due to noise. See, for e~ample, S. Lin and D.
20 Costello, Error Control Codin~: Fundamentals an~
Applications (Prentice-~all, Englewood Cliffs, NJ
1983) for a further description of FEC systems, incorporated herein by reference.

While the FEC method is a pre~erred technique, other error correction techniques can also be employed. For e~ample, a joint modulativn-coding scheme may be used to combine Z4 and 26 i~to a single unit. At the receiver 3~ 2 corresponding 30 demodulation-decoding unit combining 42 and 46 would be used. Possible choices for this technigue are tamed frequency modulation, continuous phase modulation, and trellis-code modulation. Other choices are obvious to those familiar with the art.

~ . .

-16- 2~

These techniques provide noise reduction without increasing the signal bandwidth, but require more complesity.

Another optional element of the invention as shown in FIGS. 1 and 2 is the audio modulator 28 and mixer 30 for multiple~ing an audio signal with the modulated image signal for simultaneous audio and video transmission. When an audio modulator and 10 mixer are u~ed, the receiving section 38 then separates the audio and video portions of the signal by an analogous demodulation and demultiple~ing unit 42. While this combination is envisioned to be a highly ~esirable feature, particularly for video 15 conferencing; it is not an essential element. For some applications, an audio portion may be superfluous or undesired.

FIG. 3 shows the image processing module 20 20 of FIGS. 1 and 2 (the grey-scale reducing unit 16, the frame memory unit 15 and the motion detection unit 18~ in greater detail. In particular, FIG. 3 illustrates an embodiment which includes an hysteresis-dithered, thresholding process for 25 converting a multiple grey-scale image into a halftone image consisting of black-and-white pixels.
The halftone image in turn is compared to that from the previous image rame to provide a differential image, which is further processed by the image 30 encoding unit in the third compression stage.

2~8~23 As shown in FIG. 3, the image processing module 20 includes an analog comparator 80, an ordered dither threshold table 82, a frame memory 15, inverter 84, summer 86, summer 88, digital-to-analog 5 converter 90 and an exclusive OR gate 92.

Inputs to the image processing module 20 of FIG. 3 are luminance values which can be derived from any standard video camera that ~onverts an image into 10 an NTSC or similar format (such as PAL or SECAM).
The analog signal representative of the image from the camera is then passed through a clamp and sample circuit which provides the reduced analog image, which is an analog signal representative of an image 15 screen of 128 x 128 pixels at a rate of 10 frames per second. This can he accomplished by sampling the NTSC signal 128 times per line and one time out of ev~ry four lines. The sampled pixel values from the analog signal are real numbers representativç of an 20 8-bit grey-scale consisting of 256 shades of grey from pure black ~0) to pure white (255).

While this is the preferred embodiment, it should be understood that the image size can be any 25 N x K matri~ of pixels, the frame rate may be varied, and any number of grey levels may be used. Such alternatives will be obvious to those familiar with the art; however, i~ the resolution is increased, the frame rate wi~l ~enerally have to be decreased to 30 permit the image data to compress sufficiently to permit ef fective transmission of the real time image over a 3 kHz telephone line. ~imilarly, if the frame ra~e i~ increased, the image resolution will have to be decreased ac~ordingly.

In an alternate embodiment, the camera, clamp and sample circuit may be replaced by a digital or analog storage devicP or other means for generating a digital or analog signal representative S of an image.

The next stage of the process entails converting the sampled analog pi~el values into l-bit binary values representative of black or white. This 10 is accomplished in comparator B0 b~ comparing the real value of each pi~el with a threshold value, as stored in the ordered dither threshold table 82, as shown in FIG. 3. The table is a digital memory device representative of a shade of grey ~or each of lS the 128 ~ 128 pixel positions. Different pixel positions have different threshold levels to permit a grey area spanning a given group of neighboring pixel positions to be represented by combinations of black-and-white pixels to give the perception of the 20 particular shade of grey when viewed from a distance. For e~ample, or an 8-bit grey-scale spanning from pure black (0) to pure white (255), a medium dark grey of ~hade level 63 over a block of pixels would be converted into black-and-white pixels 25 with about three times as many black pi~els as white.

The output of the analog comparator 82 is stored in frame memory 15 and also used to Udither"
the threshold values used to process the n~st frame.
30 As shown in FIG. 3, a hysteresis-ordered, dither threshold is implemented by ~nverter 84 and summers 86 and 88 which operate to define a hysteresis hand around each threshold value, T~y ~ 6, which servas to reduce flicker in the analog comparator 80.

-19- ~ 23 A set of illustrative threshold values for the ordered dither threshold table are shown in FIG. 4. The 128 ~ 128 pi~el image is broken down into 4 $ 4 pi~el blocks. There are 32 x 32 5 superblocks of these 4 ~ 4 blocks. The threshold values are selected to create a line-type dither pattern, which facilitates greater data compression in the preferred embodiment of the differential image encoding stage. As will be described below, a 10 l-dimensional, Modified Huffman run-length encoding scheme compresses data effectively where there is a long series of the same value.

For the esample shown in FIG. 4, grey-scale 15 63 would result in alt,ernating black-and-white pi~els on the first row, all black pi~els on the second row, alternative black-and-white pixels on the third row, and all black pi~els on the fourth row. For a given 4 x 4 block, this results in 12 black pixels and four 20 white pi~els -- e~actly the desired 3-to-1 ratio. In addition, the efficiency of the run-length encoding will be ma~imized by this dithered pattern as every other row consists of continuous black pixels.

As noted above, to prevent unnecessary flickering of certain pi~els from black to white between image frames, the preferred emhodiment includes a hysteresis adjustment of the dithared threshold values. As ~hown in FIGS. 5A and 5B, the 30 hysteresis adjustment increases the threshold value for a given pi~el position if the corresponding piYel position in the previous image frame i~ black, and decreases the threshold value if the corresponding pi~el position in the previous frame is white.

-20- 2~48~2~

To illustr~te this process, we return to the e~ample of grey level 63, as applied to the ordered dither threshold table of FIG. 4. Note that the second position in the second row of the matrix has a 5 threshold value of 54, which is very close to level 63. Minor fluctuations in the grey level that may occur between sampled image frames could result in the grey-scale oscillating between 63 and 55, for example, every tenth of a second, which would result 10 in the grey level toggling between black-and-white every image frame. This would result in an unnecessary increase in the amount of data that would be transmitted to the receiving section. To prevent such unwanted toggling, each dither threshold is 15 adjusted by a predetermined amount to ensure any change in shade is sufficient to warrant toggling of the pi~el value.

Once the dither threshold values are 20 adjusted with respect to the previous image frame, the new image is compared to the threshold values to transform the multiple grey-scale image into a halftone image. As shown in FIG. 3, the adjusted dithexed threshold value is converted from digital to 25 analog in a D-to A converter 90,and sent along with the reduced analog image to the analog comparator 80. Each reduced analog pi~el value that is greater than the adjusted analo~ threshold value becomes a digital output of 1 or white; each reduced analoy 30 pixel value that is less than the adjusted analog threshold value becomes a digital output of 0 or black. The analog comparator 80 also converts the compared results into digital values ~or each pi~el.

-21~ 8 ~ 2 ~

The digital output from the analog comparator 80, i.e., the first level compressed image, is simultaneously sent to the frame memory 15 and to the motion detection unit 18, which is shown 5 in FIG. 3 as being an XOR gate 92 for l-bit adding of the halftone pixel values generated by the D-to-A
analog comparator and the halftone pi~el values of the previous halftone image frame as stored in the frame memory. For pixel values that did not change 10 between frames, the output oE the XOR gate is 0. For pi~el values that changea between frames, the output of the XOR gate is lo These values are then sent to the differential image encoding unit.

As the digita~ output of the analog comparator 80 is sent to the XOR gate 92, this data is also sent to the frame memory 15 to replace the currently-stored pixel values. The new digital values representative of the halftone image replaces 20 the old values in the frame memory 15 representative of the previous halftone image frame. This updates the frame memory 15 for process;ng of the ne~t frame in time.

The final data compression stage is the differential image encoding scheme, as shown in FIG. 1. The preferred embodiment is a l-dimensional Modified Huffman run-length code. This encoding scheme $ransforms long series of 0 or 1 bits into 30 shorter codes. Integrated circuits for -22- 2 ~ 2 3 implementation of such encoding techniques are commercially available, e.g., the AM7971 compression-e~pansion processor chip (Advanced Micro Devices, Inc., Sunnyvale, California~. Alternative 5 embodiments may be substituted for the l-dimensional Modified Huffman code, such as the 2-dimensional Modified Huffman code, and other variations.

The encoded differential image values may 10 either be directly sent over a transmission line or multiple~ed and modulated with audio portion for simultaneous transmission over the same bandwidth.
FIG. 6 illustrates this process.

The video sig~al from the encoder is processed by image modulating module 26 comprising a delay modulator 76 and mi~er 78. The incoming video signal, essentially a binary bit stream, is converted into a rectangular waveform of two levels according 20 to the following rules: a transition from one level to the other level is placed at the midpoint of the bit cell when the binary data contains a one. No transition is used for a zero unlsss it is followed by another zero, in which case the kransition is 25 placed at the end of the bit cell for the first zero. The resulting waveform is then low pass filtered to remove higher harmonics and yields an analog signal which lies within the 0 - 3 k~z range.
For further details on delay modulation, see, Hecht 30 et al., ~Delay ModulationU, Vol. 57, Proc. ~E~E
(Let~ers~, pp 131~-1316 (July, 1969). The processed video portion is then modulated by a cosine function to permit coherent (in-phase and quadrature) modulation and added to the ~udio portion, which is 35 similarly modulated with a sine function.

-23- 2 ~ 2 ~

On the receiving end, a frequency and phase recovery unit 4n detects and tracks the phase at which ~he signal arrives; and demodulators 43, 45 separate the sine and cosine components of the 5 signal, providing an audio and a video signal. The video signal is then further processed by a delay demodulation 79 to recover the original binary bit stream.

After demodulation, the process of decoding the image in the receiver is essentially the reverse of that described above, with the exception of grey-scale recovery. Instead, a pseudo grey-scale is achieved by averaging individual pixel values with 15 their neighbors.

Alternatively, demultiplexing and demodulation unit 42 can also include filter elements 83 and 85 and a video/audio recovery module 87 (shown 20 in dotted lines in FIG. 6) to suppress cross talk.
The output of low pass filter 85 is an audio signal rl and the output of high pass filter 83 is a video signal r2.

-24~ 2 ~

The signais r1 and r2 contain the transmitted audio and video signals plus additional cross-talk terms. They have the form rl = Saudio + TltSvideo]

r2 = Svideo 1 1'21Saudio]

The functions Tl[svideo] and T2[Svideo] are the 10 cross-talk terms. If they were absent, rl would be the audio signal and r2 would be the video ignal.
Tl[] and T21~ are tranformations defined by the processing steps carried out between the transmitter and the receiver. Thus, the transform functions 15 encompass the filterinq as well as the multiplication operations (e.g., multiplication by the carrier and its quadrature) that occur during modulation and multiple~ing.

Given r~ and r2, the above equations can be solved for the audio and video signals. A practical method is to use recursion. Rewriting the e~uations in recursive form yields:

SaUdio = rl - Tl~Svideo]
SVi~eo = r2 - T2rSaudio]

30 where ^ indica~es an appro~imation.

-25- ~8~,3 A three-step recursive process can be used to recovering the audio and video signals, as shown in FIG. 7. In the first step, an initial estimate of saudio(t) is produced by applying the transform 5 function Tl to an initialization value of svideo(t) in element 93. This initial value s can be obtained from a previous imaqe frame ~with an appropriate delay) or, during ~tart-up, from an initialîzation signal. The transform Tl~Svideo(~)] is then 10 subtracted from r1 in summer 94 to obtain the initial estimate of Saudio(t) In the second step, an estimate of Svideo(t~) is produced, by applying the transform . 15 function T2 to the estlmate f Saudio~t) (obtained in step one~ in element 93 and then subtracting this transformed signal from r2(t+Q) in summer 96. The delayed signal, r2(t+~) is obtained by passing the received video signal r2 through delay element 97.
20 The actor ~ compensates for time delays inherent in the transformations Tl and T2. (Although these delays may be different, for purposes of illustration they are assumed to be the same.~ The output of summer 96 is an estimate of video signal, svideo(t~
In the third step, saU~io~tl2~) is produced by applying a delayea transform of Tl to the estimate Df sv(t+~) in element 9R and hen subtracting this tranformed signal from l(t+2~ in summer 99 to yield 30 a refined audio estimate saU~io(t~2~. SFurther recursions can b implemented if desired to obtain more refined estimates of the audio and/or video signals.3 2~ ~g ~ J
The results of step two and three are the outputs of the recovery system. The time delay, ~, associated with Tl and T2 is less than one millisecond, a delay which is normally imperceptible 5 to the users.

In FIG. 8, an alternative modulation apparatus is shown including data encoder 22, error corrector encoder 24, delay modulation 31, audio high lO pass filter 31, and mi~er 35, in the transmitter section and filter elements 36, 37 and delay demodulator 3~, in the receiver section. This amplified system is based on the observation that the necessary video data rate for normal use of a video 15 telephone (e.g., without handwaving or gross head movements~ is about 2~400 bits per second (b/s). The delay modulator for a 2,400 b/s input stream can produce an analog signal in a band ranging from 0 to about llO0 Hz.
By filtering the audio signal to the portion of the bandwidth above 1100 Hertz, the audio and video signals can be frequency division multiplexed (FDM). That is, the video ~ignal lies in the 0 to 25 1100 band. The audio signal lies in the llO0 to 3000 (or higher) band. The loss in audio signal-to-noise would be about 25 percent on dB, which is tolerable over mo~t telephone channel~.

-27~ 3 The video signal in the 0 to 1100 band can also be moved to another part of the band by modulation. Such a relocation of the video signal may be desirable to reduce its effect on the voice 5 quality (insofar as much of the energy in normal voice signals lies below 2000 Hz.) For example, by modulating with a 1000 Hz carrier, the video can be moved to the 1300 to 2100 band. A carrier recovery system, similar to that discussed previously, can 10 then be used to synchronize the transmitter and receiver for demodulation.

For the case o color transmissions, the frequency bandwidth can be further divided to provide 15 a first band for chrominance information, a second band for luminance information and a third band for audio information. For e~ample, colsr information can be transmitted over a narrow band of nominal frequency from 0 to 500 Hz. The selection of 20 particular frequency ranges for such bands is within the ordinary abilities of those skilled in the art.

Various oth r modulation techniques can also be practiced in accordance with the invention. For 25 e~ample, all of the ~ignals (or a subset, such as chrominance and luminance information, can be multiple~ed over time, rather than frequency. Thus, one can time sample the ~, Xred and Xblue signals in frequency and send the three images on a rotating 30 basis. Alternatively, one can time sample the color I and Q signals and transmit them using time-domain multiplexing over the audio channel.

-28- 2~ ~ g'~,~

The advantage of this scheme is simplicity.
FIG. 4 shows a block diagram. The disadvantage is its loss of audio signal-to-noise ratio and its limitation in tracking motion in the video image.
5 There may be applications where these disadvantages are unimportant.

FIGS. 9A-9D illustrate an image averaging process useful in the image averaging unit 56 of 10 receiver 38 shswn in FIG. 1. In the illustrated embodiment, 5 ~ 6 blocks of pisel values are averaged with the averaged value being applied to the pi~el situated in the upper left hand corner of the block.
This provides 30 shades of grey. In FIG. 9A, an 15 initial pi~el value is averaged; in FIG. 9B, the pi~el in the ne~t column is averaged using a 5 x 6 matrix of pi~el values, which is displaced one column to the right. In FIG. 9C, a pixel in the next row relative to the pixel illustrated in FIG. 9A is 20 shown. This pixel is averaged using a 5 x 6 matrix of pixel values which is displaced one row downward relative to the matri~ of FIG. 9A. ~imilarly, in FIG. 9D, the averaging process is illustrated ~sr a pixel one row below and one column to the right of 25 the original pi~el shown in FIG. 9A.

-29~ J f~

Various other picture Uenhancement~
techniques can also be employed in the image averaging unit 56 to reduce the ~blockiness" of the picture. For e~ample, spatial filtering techniques 5 can be used to sverage pi~el values across a line or from one line to the next. Moreover, as discussed in more detail below, in some cases it is also possible to average pi~el values over time ~i.e~, from one frame to another) to further enhance image quality.
Additionally, interpolation techniques can be used to "fill-in~ additional data values (e.g., intermediate values between pisels or between lines). With reference again to FIG. 1, the pi~el 1~ averaging unit 56 in t~ansmitter 38 can further include means for interpolating pixel values ~o improve the resolution of the reconstructed image.
The effect of such interpolation is to smooth out the discontinuities in the reconstructed image, provide a 20 subjectively more pleasing image and allow the use of a larger display at the receiving end. These interpolation functions can take place entirely at the receiving end. No modification is required at the transmitter.
In the illustrated embodiments, the original scene is described by ~ samples per line and M lines per frame, corresponding the MsN picture elements or pisels per frame. For instanse, possible choic~s for 30 M and ~ are M = 90 samples per line and N = 128 lines per frame, for ~ total of 90 x 128 = 11,520 pi~els per frame. For each frame, the receiver calculates the luminance levels at each of the MxN pixels.

~ t~ J
The pixel averaging unit 56 can further include a resolution multiplier which introduces additional interpolation points or pi~els in the reconstructed signal, speciically, between any two 5 consecutive pi~els in a same row or in a same column. When one interpolation point is added between any two original pixels, the total number of pi~els per frame is multiplied by 4.

For the purpose of illustration, assume in the description that only one pixel is added between any such horizontal or vertical pair, and consider arbitrary rows Ri, Ri~l and Ri+2, and Columns Cj, C;+l and Cj+2 in the reronstructed picture. Let us 15 call Pij the pisel at the intersection of row Ri and column Cj. In one embodiment, the resolution multiplier can proceed as follows:

In step one, interpolated columns are 20 generated. On row Ri, a new pixel, Pi,j+l/2~ is added halfway between pi~els Pij and Pi,j+l. Its luminance, bi,j+l/2, is equal to:

a bij + (l-a) b~ , if bi,j+2 > 2 bi,j~l - bi;
1/2 bij + 1/2 bi, j~l~ if bi, j~2 = 2 bi, j+l - bii (l-a) bij + a bi,j+l, if bi~j+2 < ~- bi,j+l ~ ~ij where a is a-selectable parameter which can range 30 typically from 0 to about 1/2. One suitable value of a is 1~4.

-31- ~8~23 This procedure can ne~t be repeated for all values of j (j = 1,..., M) within row ~i, and for all values of i (i = 1,..., N). This results in the creation of new columns, Ci~l/2~ located between Cj 5 and Cj+l over the whole display, thereby doubling the number o columns.

In step two, a similar process can be employed to interpolate rows. On column C;, a new 10 pixal, Pi~l/2~; is added half way between pixels P
and Pi~l,j. Its luminance, bi+l/2,j, is equal to:

a bij + (l-a) b~ , if bi~2~i > 2 bi+l,j - bi;
1/2 bij + 1/2 bi~ if bi+2,j - 2 bi+l,j - bij (l-a) bij + a bi+l,j, if bi~2~; c 2 bi+l,j - b Again, a is a selectable parameter (e.g., a = 1/4).
When step two is repeated for all possible values of i and j, ~t results in an overall doubling of the 20 number of rows of each frame.

After the enhancement has been completed, ~he number of pixel values per frame is 4MN
consisting of MN original pixel values, and new 3MN
25 interpolated pisel values. This enhancement process can be repeated any number of times. It will result each time in a quadrupling of the number of pixels values within each frame.

, ~ ' -32- ~ $ ~

In FIG. 10, a system 100 is shown that provides for adaptive resolution in image processing. Adaptive resolution provides a means to enhance the resolution of the received picture 5 depending upon the degree of motion existing in the original scene. For example, when there is animated motion in the scene ~e.g., rapid head movement during a videophone conversation), the basic resolution techni~ues described above can be applied. However, 10 when there is slow motion in the original scene (i.e., the face is not moving very much), a different protocol is employed. Finally, when there is no motion (i.e., either there is no motion at all, or the amount of motion is very small), yet another 15 motion detection approach is taken.

As shown in FIG. 10, system 100 includes a transmitter section 112 having a grey-scale reduction unit 102, a multi-frame buffer 109, a motion 20 estimation unit 106, a threshold look-up table 108, differential image buffer 114, reference frame buffer 116, image date encoders (e.g. run length and Huffman coding elements3 118, and channel buffer 120, as well as control c;rcuit 110. The receiver 13R includes 25 image data decoders 140, differential buffer 142, reference frame buffer 144, a multi-frame buffer 146, and a grey-scale computer 150.

-33- ~ 2~

In FIG. 10, the image data is compressed by the grey scale reduction unit 102 to yield binary luminance values. Unit 102 applies threshold matrices to the incoming data ~using the look-up 5 tables stored in element 108) in a manner analogous to that described above in FI~S. 1-5. However, in this embodiment, a multiframe buffer 104 is used to store a series of binar~ frame values. These values are then compared by motion estimator 106 to 10 determine the motion state (e.g., fast, intermediate or slow). Depending on the motion state, different threshold values arP selected from element 108.

Differential buffer 114 contains the changes 15 between the last received frame in buffer 104 and the reference frame from buffer 116. The contents of the reference frame buffer 116 are updated at different times depending on the motion state, as described in more detail below. In the illustrated embodiment, 20 the contents of the reference buffer will be the last frame when fast motion is occurring, or will be an average of the four most recent frames for intermediate motion, or will be an average of si~teen frames during the slow motion operating mode.
In the system of FIG. 10, the motion estimator 106 estimates the average amount of motion existing in the original scene between two instants in time. Motion estimation is an ongoing process at 30 the transmitter, and every frame, a new motion estimate is generated. This estimate is used to either keep the resolution level unchanged or switch to a different re~olution level.

~ o ~

For e~ample, if L is a number representing the ma~imum level of motion allow d on the transmission channel, then a fast motion state can be defined as esisting when the motion estimate is 5 between the maximum motion level L and L~4. An intermediate motion state can be defined to exist when the motion estimate is between L/4 and L/16. A
third state --slow motion-- can be defined to exist when the motion estimate is less than L/16.
In one preferred embodiment, a change in the motion level in the scene can be signalled by the transmitter 1~2 to the receiver 138 by imbedding into the transmitted video bit stream a Nresolution sync 15 word~' consisting of two additional hits of information per frame. In this way, it is po~sible for the receiver 138 to decode the resolution sync word, and know the resolution level to be used in the reconstruction of images. Different reconstruction 20 procedure is then used in ~rey level computer 150 for each of the different resolution levels.

In the illustration of FIG. 10, motion estimation is based on the differential information, 25 D(n3, D~n-l), D(n-2), D(n-33 which represent the changes which have occurred over the four most recent frames. Speciically, the differential information at rame F(n), is equal to the difference between binary ~i.e. black-and-white) frame ~(n) and the 30 previous binary frame F(n-l):

D(n~ = F(n~ - F(n-13.

F(n) is a binary matri~ of O's and l's. Let sum [M]
3~ be the sum of all the elements of the matrix M.
.

20~8~

Then, the motion estimate at the time n, ME~n), can be defined as:

ME(n) = sum[D(n)] ~ sumlD(n-1)]+
5sum~D(n-2)] + sum[D(n-3)~

= sum[F(n)~ - ~umtF(n-4)]

and the motion estimat~ at time n+l is ME(n) - ~um[D(n+l)] ~ ~um[D(n)]~
sum[D(n-l~ + sum~DSn-2)]

= sumtF(n~l)] - sum[F~n-3)~.

The motion estimate represents the total number of bit chanses that have occurred over the past four frames. This provides a reading of the 20 motion level at the end of each frame. The four-frame averaging process and the readout of the motion estimate are synchronized to the frame sync.

- If ME(n) is between L and L/4, a coarse 25 resolution level is used ~e.q., same as described above in connection with FIGS. 3 and 4).

- If ME~n) is between L/4 and L~16, an intermediate resolution le~el is used.
- If ME(n~ is less than ~/16, a fine resolution l~el is used.

I~ should be clear that other threshold 35 choices can be made in distinguishing motion states.

-36- ~8~23 In the embodiment of FIGS 3 and 4, there was only one way to calculate the grey levels. This was done by averaging over 4g4 blocks the bit values within a binary frame F(n), i.e., the values at 5 po~itions i, i~l, i+2, i+3 on lines j, j~l, j+2, and j+3.

In other words, for frame F(n), the grey level, Gi,;(n3, of pi~el ~i,j3 was defined as:

Gi~i~n) = ~ ~ Fi+k,j+l(n) K=0 1=0 However, in the embodiment of FIG. 10, the grey level can be calculated as:

q P P
Gi,j(n) = ~ ~ ~ Fi~k,j+l(n-m) m=0 k=0 1=0 The notation (p~p,q) is used to represent this class of grey level estimates. The notation underlines the fact that the spatial sum of the binary values over a block of si~e pxp and the time sum over q frames. With this notation, a three grey 30 level estimation scheme is illustrated in FIGS. llA, llB and llC.

Specifically, FI~. llA illustrates the course resolution level in which spatial averaging 35 over a 4~4 pixel block from a single frame is used to derive a binary value in the grey-scale reduction unit. FIG. llB illustrates the intermediate resolution level, where the grey level of a pi~el is 2 ~ J

derived from both spatial averaging over a 2~2 block and time averaging over 4 ~uccessive framss. FIG.
llC illustrates the fine resolution level, where the averaging is over a 1~1 block, or 1 pi2el, and over 5 16 successive frames. There is no spatial averaging, but only time averaging.

As the amount of motion decreases, the spatial averaging is decrease~ and more time 10 averaging is introduced. The grey level resolution (e.g. thresholds3 can be left unchanged at 16 levels or 4 bits of grey.

When n ~less than or equal to 16) successive 15 frames are avera~ed, n different threshold matrices are used. In total, t~e procedure uses one 4~4 threshold matri~, M, at the coarse resolution level, four 2x2 threshold matrices, Ml to M4, at the intermediate resolution, and 16 threshold levels at 20 the fine resolution. The threshold matrices M, Ml, M2, M3, M4 are given in FIGS llA and llB. The 16 scaler thresholds are the values 16, 32, 48, ... up to 256, i.e., values multiple of 16, as illustrated schematically in FIG llC. These matrices are for 25 illustration purposes and other matrices can perform e~ually well.

With reference again to FIG 10, a multi-frame buffer 104 is used in the transmitter 112 30 to calculate the motion level in the scene. A
four-frame buffer is sufficient to calculate the estimate, once every frame. Refesence buffer 116 is used to calculate the motion estimate and generate the differential information.

~, ç~

While in the coarse resolution mode or when switching to the coarse resolution mode, the previously-received frame can be used as the reference frame for calculating the differential 5 information. The same convention is used at the receiver. The decision to switch to a different resolution level ~such as intermediate or fine) can occur at the end of any frame.

Assuming that the decision to switch to intermediate resolution occurs immediately at the end of frame F(n), then the differential information D(n+l), D(n~2~, D(n~3), D(n+4) i~ calculated using F~n) as the reference frame. This convention is also 15 followed by the receiver. In other words, D(n+l) = F(n~ F(n) D~n+2) = F(n~23 - F(n) ~(n+3) = F(n+~) - F(n) D(n+4) = F(n+4) - F~n) In addition, the transmitter does not switch to another resolution (coarse or ~ine) until all four 25 differential frames have been transmitted. At that point, whether a resolution switch occurs or not, the last transmitted frame becomes the new reference frame.

-39~ 8 ~ ~ ~

If one assumes instead that a decision to switch to fine resolution occurred at the end of frame F(n~, then this frame is used as the refPrence $or the ne~t 16 frames:

D(n+13 = F(n+l) - F(n) D(n+2) = ~(n~2) - F(n) . . .
D(n+16) = F(n+16~ - F(n).

Again, the resolution does not switch during this period of time until the new pictures have been 15 formed.

With reerence again to FIG. 10, it should be noted that controller 110 can be used to desensitize the updating mechanism of the 20 differential buffer 114 based upon conditions in the channel buffer 120. When the channel buffer 120 exceeds a limit (defined by the transmission bandwidth) controller 110 can increase hysteresis by incrementing the dither parameter ~, thereby making 25 it more difficult to toggle a particular pi~el and, hence, reducing the ~umber of pisel changes recorded in the differential buffer 114. This same mechanism also provided flicker control.

~: -.
','' ~. ' ~8~3 At the receiver 138, a multi-frame buffer 146 can be used to store data values over a series of frames so that the grey-level computer 150 can calculate the grey levels, e.g., by spatial averaging 5 in the fast motion mode, by space and time averaging over four frames in the intermediate motion mode, or by time averaging over 16 frames in the slow motion mode.

It ~hould be appreciated that various alternative averaging techniques can be substituted for this method, including, for esample, approaches in which the pi~el to be averaged is centered in the matrix, as well as methods in which weighted values 15 are applied to various pi~el values within the matrix.

What is claimed is:

Claims

1. In a signal processing apparatus for image data compression, the combination comprising:
storage means for storing a reduced grey-scale image derived from image values;
comparison means for comparing a reduced grey-scale image of a current image frame with a reference image from said storage means, and for generating a luminance difference signal representative of the pixel positions at which the grey-scale value has changed between a previous image frame and a current image frame; and encoding means for encoding said difference signal.

2. The system of claim 1 wherein the system further comprises a grey-scale reduction means for reducing the number of grey levels available to represent each pixel of an image frame.

3. The system of claim 2 in which the grey-scale reduction means further comprises a dithered threshold means for converting a multiple grey-scale image into a halftone image.

4. The system o claim 3 in which the dithered threshold means further comprises a hysteresis adjustment in which the threshold value applied to each pixel is modified in order to reduce toggling of the pixel value.

5. The system of claim 4 in which the system further comprises means for varying the hysteresis adjustment.

6. The system of claim 1 in which the encoding means further comprises a run-length encoding means for representing series of repeating differential image data bits in a coded fashion, such that long series of said bits are represented by fewer bits.

7. The system of claim 1 wherein the system further comprises a modulation means for modulating a carrier signal with the encoded luminance difference signal for transmission.

8. The system of claim 7 wherein the modulation means further comprises means for multiplexing an audio signal with said luminance difference signal.

9. The system of claim 7 wherein the modulation means further comprises means for multiplexing a chrominance signal with said luminance difference signal.

10. The system of claim 1 wherein the system further includes an adaptive resolution means for determining the degree of motion in successive image frames and for modifying the resolution in response to such determination.

11. In a signal processing apparatus for decoding and reconstructing an image from compressed image data, the combination comprising:
a differential image decoding unit for decoding a difference signal representative of changes in a reduced grey-scale image; and an image updating unit for updating changes to a previously stored image by adding said decoded difference signal to said previously stored image.

12. The system of claim 11 which further comprises an image averaging unit for averaging blocks of pixel values to increase the number of grey levels of said updated image.

13. The system of claim 11 which further comprises an image interpolating unit for generating a more detailed image by interpolation.

14. A method of signal processing for image data compression, the method comprising:
storing a reduced grey-scale image derived from image values;
comparing a reduced grey-scale image of a current image frame with a previously stored reference grey-scale image;
generating a luminance difference signal representative of the pixel positions at which the grey-scale has changed between a previous image frame and a current image frame; and encoding said difference signal.

15. The method of claim 14 wherein the method further comprises reducing the number of grey levels available to represent each pixel of an image frame prior to storage and comparison.

16. The method of claim 15 in which the step of reducing the number of grey levels further comprises converting a multiple grey-scale image into a halftone image.

17. The method of claim 15 in which the step of reducing the number of grey levels further comprises comparing a dithered threshold value to the luminance value of each pixel, and assigning the pixel a reduced grey-scale value based upon the comparison.

18. The method of claim 17 in which the step of comparing a dithered threshold value further comprises applying a hysteresis adjustment to said threshold value in order to reduce toggling of the pixel value.

19. The method of claim 18 in which the method further comprises varying the hysteresis adjustment to desensitize the comparison step.

20. The method of claim 14 in which the step of encoding the luminance difference signal further comprises run-length encoding said signal such that commonly repeated series of image data bits are assigned shorter code words.

21. The method of claim 14 wherein the method further comprises modulating a carrier signal with the encoded luminance difference signal for transmission.

22. The method of claim 14 wherein the modulation step further comprises multiplexing an audio signal with said luminance difference signal.

23. The method of claim 14 wherein the method further comprises measuring the degree of change in the encoded luminance difference signal and performing different comparisons based upon the degree of change.