A Method of and a System for Processing Digital Information
TECHNICAL FIELD
The present invention relates to a method of and a system for processing digital
information. More particularly, the invention is concerned with processing information
representing video signals and/or audio signals where the processing involves compression to
reduce the quantity of information needed to reproduce the signals.
DISCLOSURE OF THE INVENTION
An object of the invention is to provide a system composed of one or more
microprocessors in a convenient plug-in unit, which can be used with any suitable existing
personal computer or network computer (and which may be termed the "host computer"), to
enable such computers to be used to process and compress video signals for storage or
transmission.
In one aspect of the invention there is provided a system for processing digital
information utilizing a unit, conveniently a plug-in unit, composed of one or more
microprocessors that normally require auxiliary memory to operate and simulation means for
creating such auxiliary memory from that of another host computer to which the unit is
connected. Microprocessor address pin connections need not be made between the
microprocessors and memory or other parts of the system but instead the state of the address
pins on the microprocessors can be determined through a test interface built into the
microprocessors such as the IEEE 1149.1 Standard Test Access Port and Boundary Scan Architecture.
The system can have one or more video and/or audio inputs which can be either digital or analogue and, in the case of analogue inputs, means is provided for converting the analogue signals into digital form. The system may employ compression means for compressing the video and/or audio digital information. Such a system may be connected to a host computer which may assist in audio and/or video compression and decompression.
Preferably means to configure the system may operate in such an order that later stages of the configuration use help from the parts of the system already configured and the host computer, and in such a way that the system can be reconfigured while it is being used.
In another aspect the invention provides a system for digitizing audio by the use of a multichannel low resolution analogue-to-digital converter and an external amplifier so that one channel of the analogue-to-digital converter can digitize low amplitude audio signals with greater precision than another channel that digitizes the unamplified signal. A technique such as interpolation can be used to estimate the complete waveform.
In a further aspect the invention provides a method of processing digital information utilizing one or more microprocessors which normally require additional memory to operate
which involves simulating memory using a host computer as if the memory was directly
accessible.
A system in accordance with the invention may comprise a combination of the following:
video-input means, video-digitizing means, audio-input means, audio-digitizing means, means
for effecting video compression in hardware or software or both; means for effecting audio
compression in hardware or software or both; means for further compression in hardware or
software or both; transmission means, means for effecting storage, display means, means for
storing program and configuration data; means for controlling the system in hardware or
software or both; means for simulating memory external to a microprocessor of the system in
hardware or software or both; means for communicating information to a host computer;
and/or means for communicating external memory access information to the host computer.
Another aspect of the invention is a system for digitizing and processing video and/or
audio for storage, transmission or processing in a digital computer system. In operation, the
system may process the digitized video to look for moving or changing parts of the image, or
to recognize objects in the image. This aspect of the invention may also compress the video.
Since the invention is intended for processing digital information, there may be additional
features which allow video and audio information to be accessed and processed.
The video compression is achieved by splitting the image up into groups such as rectangular blocks of pixels, called "super blocks". For each of these super blocks a single U
and a single V value for colour, and a Y value for each of minimum and maximum luminance, are coded. The system examines the pixels in the super block, and decides whether each pixel
is nearer the maximum or the minimum luminance, and then codes that information in one bit
called a "shape bit". Groups of pixels that code as the same shape bit value can be compressed
further by using a single shape bit for the group of pixels, plus another indication as to whether
the group of pixels is encoded as individual shape bits or as a single bit to describe the whole group.
The image can be filtered both spatially and temporally. Spatial filtering removes noise
such as spot noise by comparing the shape bit for each pixel with its neighbours using a small
look-up table; the contents of the table can then be altered to change the filtering behaviour.
Temporal filtering is done by having a counter for each of the four super block components U,
V and minimum and maximum luminance. The counter stores historical information about the
accumulated noise in these values in order to find practical estimates of the expected values for
Y, U or V from the noisy source.
The data is subsequently recompressed into a representation which allows for four
possible shape values for each pixel: undefined, uncertain, maximum and minimum. The
additional uncertain value allows for an extra grey scale in the output (allowing for anti-aliasing
of edges) and reduces the data rate for storage or transmission by encoding pixels which
fluctuate between shape 0 and shape 1. The system also counts the number of pixels in each super block that are at this uncertain value, and if this count reaches a critical level then the complete super block is transmitted or stored.
To lower the production cost of the plug-in unit, the compression can be split into two parts. The first part uses little memory but needs to operate at high speeds (synchronized with the incoming video), whereas the second part needs large tables in memory but has less severe timing constraints. An efficient way of implementing the system is by having a fast microprocessor connected to the video and/or audio inputs to process at high speed and a relatively low-speed connection to another computer which could be thought of as a host computer serving to implement the second part of the compression. Host computers can be personal computers or network computers or other devices and typically would have several megabytes of memory for storing the compression data and programs. Usually the host would also have means for storing the data, for example on disc, or for transmitting the data through network or modem connections.
The receiving computer (which can be any kind of general-purpose computer) preferably reconstructs the images by bilinearly interpolating the colour and luminance values for each pixel from the values for the current super block and its neighbours. The system can additionally enhances the contrast of edges by estimating where these were in the original image and then interpolating around these edges in such a way that leaves the contrast of the edges unchanged.
BRIEF DESCRIPTION OF THE DRAWINGS
Embodiments of the invention will now be described, by way of example only, with
reference to the accompanying drawings, wherein:
Figure 1 is a block schematic diagram representing the main hardware components of a
unit and at least part of a system constructed in accordance with the invention used to process
both video and audio;
Figure 2 and 3 represent the sequence of elementary processing steps carried out in the
system shown in Figure 1;
Figure 4 represents the sequence of processing steps carried out in the system on the
incoming video data to compress this into key frames;
Figure 5 represents the sequence of processing steps carried out in the system to
recompress the key frames into delta (difference) frames ready for storage or transmission; and
Figure 6 shows the sequence of processing steps carried out in the system or on a
receiving computer to display the stored or transmitted video.
DESCRIPTION OF BEST MODES OF CARRYING OUT THE INVENTION
A possible embodiment of the invention may contain the following functional components.
The first component provides the means to provide digital video to the rest of the
system.
The second component provides the means to provide digital audio to the rest of the
system.
The third component provides the means for processing the digital video information.
The fourth component provides the means for processing the digital audio information.
The fifth component provides the means to interface between the unit of the invention
and the host computer.
The sixth component provides the means to interconnect the components of the unit of the invention.
The seventh component provides the means to establish the state of address pins on the microprocessor.
The eighth component provides the means for storing the code and configuration
information for the various programmable devices in the system.
The ninth component provides the means for configuring the devices in the unit of the
invention.
The tenth component is an external device which provides the means for handling output
from the unit of the invention in digital form.
In one implementation of the invention, the first component or means is a Philips
SAA7110A video digitizer chip.
The second and ninth components or means are a microcontroller part number Microchip
PIC16C74A.
The third component or means is in combination a SRAM-based FPGA, such as Altera
6000 and 8000 series, and a StrongARM SA-110 microprocessor and the host computer.
The fourth component or means is in combination the PIC16C74A microcontroller and
the host computer.
The fifth component or means is in combination an IEEE 1284 compatible parallel
printer port, the PIC16C74A microcontroller and the SRAM-based FPGA.
The sixth component or means is the SRAM-based FPGA.
The seventh component or means is in combination the PIC16C74A microcontroller and
the IEEE 1149.1 test interface on the StrongARM SA-110 microprocessor.
The eighth and tenth components or means are the host computer. The host computer is
typically either a personal computer or a network computer.
The main hardware components of a system and a plug-in unit constructed in accordance
with the invention are laid out in Figure 1. An explanation of the various components in Figure
1 now follows.
101 Video digitizer: this is an integrated circuit, such as a Philips SAA 7110 A, which allows
analogue video from one or more devices such as a video camera or a video tape
machine to be converted into digital information which can then be processed. In
another implementation of the invention, this is replaced by a digital camera chip, which
takes its input directly from light and so removes the need for an additional video source
such as a camera or a tape machine.
102 FPGA programmable logic: this is a SRAM based FPGA integrated circuit, such as Altera 6000 and 8000 series, which can be programmed to contain a wide range of combinations of logic gates. These logic gates perform certain operations more efficiently than a software system, but the device itself is fully programmable so that the unit of the invention retains the flexibility inherent in a software system. The FPGA performs glue logic functions such as connecting the video digitizer (device 101), StrongARM (device 103), PIC microcontroller (device 104) and the host computer (device 124) via the IEEE 1284 compatible parallel port (signals 117 and 118). In addition, it performs some processing on the data stream to assist in the processing of the digital video and/or audio information.
103 StrongARM microprocessor such as SA-110: this is typical of a new range of embedded microprocessors. These have the following features in common: low cost, fast instruction execution, low power consumption, and large internal cache memory. In this implementation, the StrongARM implements most of the compression.
104 PIC microcontroller such as microchip PIC16C74A: this digitizes the audio from the audio input via a combination of connections 121 and 122. In addition, it connects to the FPGA (device 102) and programs this on start up and to the StrongARM (device 103) and reads the addresses of any external memory requests from this device. The microcontroller subsequently requests this information from the host (device 124) via the
parallel port (signals 117 and 118) before sending the data to the FPGA (device 102) to
forward to the StrongARM (device 103).
105 Audio preamplifier: this amplifies incoming audio signals.
106 Audio amplifier for low amplitude signals: the output from this component can be used
instead of the output from component 105 by means of a real-time software switch
which operates in response to the level of incoming samples.
107 14.3MHz crystal oscillator: this is used by the PIC microcontroller (device 104), the
FPGA (device 102) and indirectly through the FPGA by the StrongARM (device 103)
for their system clocks.
108 26.8MHz crystal: this is used by the video digitizer (device 101) for its system clock.
109 Video input to the SAA7110 digitizer (device 101).
110 14.3MHz clock signal for the FPGA (device 102) and PIC microcontroller (device 104).
111 This is an I2C bus, and allows the microcontroller (device 104) to initialize and control
the video digitizer integrated circuit (device 101).
112 Control signals: information flows from the digitizer (device 101) to the FPGA (device 102) containing information about line and field sync and the pixel clock.
113 YUV data: digital pixel information is transferred from the video digitizer (device 101) to the FPGA (device 102) for processing. This information will typically be in a standard format, for example 8 bits accuracy for the Y (luminance) on every pixel, and 8 bits each
of U and V (chrominance) on every pair of pixels.
114 Control signals: this is a bidirectional link. The StrongARM (device 103) reports to the FPGA (device 102) every time it accesses a non cached location and requires a simulated memory access. FPGA signals to the StrongARM to wait until the information it has requested is available. In addition, as digital video or audio data becomes available, the StrongARM interrupt lines are triggered by the FPGA to signal to the StrongARM to read this data.
115 Data bus: data such as instruction and data cache initial contents and new pixel and audio data is transferred to the StrongARM through this connection.
116 Control signals: this is a bidirectional link. The microcontroller (device 104) programs the FPGA (device 102) with its initial configuration. The FPGA signals the microcontroller to check the address lines on the StrongARM (device 103) when the StrongARM has requested an external access from the FPGA.
117 Control signals: the standard nine control lines on a parallel port are connected so as to allow the FPGA (device 102) and the microcontroller (device 104) to share control of the printer port to the host computer (device 124).
118 Data signals: this allows the FPGA (device 102), the microcontroller (device 104), and the host computer (device 124) to share the 8 parallel port data lines.
119 IEEE 1149.1 test interface: this is a bidirectional link between the StrongARM (device 103) and the microcontroller (device 104). The microcontroller requests information about the state of the I/O connections on the StrongARM, such as the state of its address pins, which is then provided by the StrongARM back to the microcontroller.
120 Clock signal: 3.57 MHz clock for StrongARM timings.
121 Preamplified audio.
122 More highly amplified audio.
123 StrongARM address lines: The system is designed in such a way as to not require any external connections to the StrongARM address lines. This reduces printed circuit board
area and pin count on the FPGA (device 102), reducing electromagnetic interference,
reducing the cost and increasing the reliability of the system.
124 Host computer: This is not part of the plug-in unit, but is necessary for the unit to
perform. The host computer will be a personal computer or a network computer with an
IEEE 1284 parallel port. Such a host computer will typically include some means for
displaying, transmitting or storing the data sent to it through the parallel port interface
(signals 117 and 118). In addition, the host computer will typically contain the data
required for configuration of the FPGA (device 102) and the software for the StrongARM (device 103).
Figure 2 shows the initialisation procedure adopted in an embodiment of the invention.
The four devices labelled at the top, namely StrongARM (device 103), FPGA (device
102), microcontroller (device 104) and host computer (device 124) all have the ability to
process information and react to events. In effect this is a parallel computer system, where
each device waits for the appropriate time to be initialized or to initialize.
Figure 3 shows the memory read cycle of the StrongARM microprocessor 103. As
external memory accesses are simulated and the address lines are not connected, the various
components cooperate to ensure that execution continues smoothly despite the absence of
external memory devices in the invention.
Figures 4 and 5 outline the method for compressing video information. This compression is done in several phases.
Reduce luminance to 6 bits (401, 402 and 403): Luminance is 8 bits after digitizing, of
which 7 bits are used as the index into a look-up table to give a 6-bit luminance value.
Extracting shape (403 and 404): The image is split into 8x8 blocks, called "super
blocks". These are represented as a single U and a single V value for colour, and two Y values
Ymin and Ymax for minimum and maximum luminance. A shape bit is a bit which indicates
that a pixel (or a block of 2x2, 4x4 or 8x8 pixels) is nearer the minimum or nearer the
maximum luminance in the super block, and can be thought of as a one-bit luminance value.
Temporal filtering (405): The two colour components U and V, and the two luminance
values Ymin and Ymax are filtered in a temporal way. The system uses four bits of memory
for each of the values per super block to store historical information about the accumulated
noise in these values, these in addition to the 6 bits required to store each Y value, and 6 bits
required to store each of U and V.
Spatial filtering (406): Shape, as described above, is filtered to remove spot noise, which
is noise where only one or a few pixels deviate from other local pixels. A look-up table is used
which takes five input bits, being the shape bit for a pixel and four of its nearest neighbours.
This look-up table is stored in a processor register for fast access, and generates a shape bit as output which is then used as the shape for the pixel. The filtering is implemented using a 32 bit look-up table and typically performs a median function. At the edge of each super block, the filtering assumes that all pixels over the super block edge are the same shape as the central value.
Fractal compression of shape (407): The shape of all the pixels in each super block is compressed in a fractal way: a single "0" bit for a uniform super block in which all the pixels are the same luminance, or a "1" bit followed by four bits indicating the shape for subsets of 4x4 pixels. These four bits then either indicate the subset is all of the same luminance in which case a single bit follows indicating whether that is the maximum or minimum luminance, or that four more bits follow to indicate the luminance for each of the four subblocks of 2x2 pixels. These four bits again then either indicate the subset is all of the same luminance in which case a single bit follows indicating whether that is the maximum or minimum luminance, or that four more bits follow to indicate the luminance for each of the four pixels.
Compression of U and V (408): The two colour components U and V are compressed by taking advantage of spatial similarities of the colours in the image.
Key frames: the unit stores complete frames at the full resolution, e.g. 320x240 pixels, compressed as described above. These key frames are transmitted over the parallel port to the host computer. The unit does not calculate differences between frames' luminance values b
this is left to the host with its much larger memory, but the U and V values are compressed
spatially to reduce frame size.
Noise reduction on the host (501 to 506): Once the data is received by the host, it is
decompressed and then recompressed giving delta (difference) frames. The source pixels are
all specified as one shape bit, indicating either a maximum or minimum luminance value.
However, after the compression on the host, they are all one of four values: undefined,
uncertain, maximum and minimum. Pixels of undefined state can be switched to either
maximum or minimum luminance by sending or storing a "1" or a "0" bit. Pixels of maximum
or minimum luminance state can be changed to uncertain by sending or storing a "1" bit.
Otherwise, a "0" bit is sent or stored. Thus pixels which fluctuate between maximum and
minimum luminance values are not re-sent if they are considered to be local noise. These
pixels are displayed on the receiving machine as the average luminance of the maximum and
minimum values, so giving the effect of anti-aliasing along noisy edges, with very low data
rate.
If the number of uncertain pixels in any super block reaches a critical level (a level which
can be changed) then all the pixels in the super block are set to the undefined state, which will
cause them to be resent as either maximum or minimum luminance.
History compression (505) is a loss-free means for lowering the data rate by looking for exact matches between the encoding of the current shape and the encoding of the shape in a previous frame or frames.
Figure 6 outlines the method to decompress the video on the receiver.
Interpolation on the receiver: The image reconstructed on the receiver (either connected through some network to the transmitting machine, or playing back images that have been stored on disc) would appear quite blocky, as a consequence of the low number of bits per pixel transmitted or stored. Inteφolation of the U and V colour values at super block resolution gives an adequate image when each 4x4 pixel quadrant of the super block has its U and V values calculated by bilinear inteφolation with the U and V values for the four super blocks neighbouring the super block corner. The Ymin and Ymax luminance values are also inteφolated in a similar way, however, a Y value is taken from neighbouring super blocks only when it is are nearer in luminance to the central Y value than its complement. If this is not the case then the super block is probably on an edge in the image, and because antialiasing with a luminance value taken from the wrong side of the edge is not desirable the central Y value is
taken in those cases.