WO2001084849A1 - Video data transmission - Google Patents

Video data transmission Download PDF

Info

Publication number
WO2001084849A1
WO2001084849A1 PCT/GB2001/001830 GB0101830W WO0184849A1 WO 2001084849 A1 WO2001084849 A1 WO 2001084849A1 GB 0101830 W GB0101830 W GB 0101830W WO 0184849 A1 WO0184849 A1 WO 0184849A1
Authority
WO
WIPO (PCT)
Prior art keywords
pixels
blocks
image
sections
block
Prior art date
Application number
PCT/GB2001/001830
Other languages
French (fr)
Inventor
Philip Brinley John
Original Assignee
Clearstream Technologies Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GB0010552A external-priority patent/GB0010552D0/en
Priority claimed from GB0010549A external-priority patent/GB0010549D0/en
Priority claimed from GB0104606A external-priority patent/GB2362055A/en
Application filed by Clearstream Technologies Limited filed Critical Clearstream Technologies Limited
Priority to AU52355/01A priority Critical patent/AU5235501A/en
Publication of WO2001084849A1 publication Critical patent/WO2001084849A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/94Vector quantisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/008Vector quantisation
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction

Definitions

  • This invention relates to a method and apparatus for transmitting image data in real time across a telecommunications or data network, and in particu ar but not exclusively, to a method and apparatus for transmitting or "streaming" such image data across a low speed network, such as the Internet, although it may be equally applied to high and 'low speed connections.
  • Image compression and transmission and video streaming techniques generally are well known.
  • an image signal such as a video signal is sampled, quantized and digitally encoded before transmission across a network.
  • the end user's terminal device decodes the incoming signal, inverse quantizes it and processes it to reproduce the input video signal.
  • most Internet users are connected via modems that operate at a maximum of 56kb/s.
  • Real time video data in order to preserve its image quality and frame rate, must be sampled at a relatively high rate, and the quantization step size must be such, that the resultant video stream needs to be received and processed at a rate of much greater than 56kb/s.
  • substantial quantities of the video data are lost or distorted when it is transmitted across a low speed network, such that image quality and frame rate are significantly reduced.
  • US Patent No. 5,638,125 describes a method of compressing video data and transmitting it, with the aim of preserving picture quality of the received signals. This is achieved by dividing each image into blocks and using a neural network to vary the quantization step size for each block according to variable parameters such as image complexity and luminance. However, large quantities of colour and image data are still required to be transmitted at a high rate, unsuitable for many low speed networks .
  • US Patent No. 5,828,413 describes a system for compressing and transmitting moving image data, particularly suitable for use in video conferencing and video telephony where the image data is required to be transmitted in real-time across a low speed connection.
  • the transmitter splits a set of image frames into a number of "superblocks" , each ⁇ including the data or signal representing a plurality of pixels.
  • the individual superblocks are processed into a set of vectors which are encoded by means of vector quantization.
  • a dedicated code book is created for each superblock based on localised recent history of previous frames, each code book consisting of a number of "blocks" which are composed of three-dimensional vectors representative of image data from a predetermined number of previous frames.
  • the code books are duplicated at the receiver so that some identical sections of adjacent sets of frames can be reproduced at the receiver without having to re-transmit all of the image data across the connection.
  • a substantial amount of image data still needs to be transmitted to achieve a good quality image at the receiver end because the code books are only useful for reproducing sections of an image which have not changed between adjacent frames.
  • the code books are updated periodically (for example, every second) which requires the periodic transmission of the same image data to update the receiver code books.
  • large quantities of colour and image data are still required to be transmitted at a high rate, unsuitable for many low speed networks.
  • apparatus for processing image data comprising means for dividing an image frame into blocks or sections made up of a plurality of pixels, code book or memory means for storing a plurality of blocks or sections having different predetermined combinations of pixels, each of said combinations being associated with unique identifying data, means for comparing each block or section of'"said image with said stored blocks or sections and identifying a combination of pixels from said code book means which substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in said block or section of said image, and means for outputting the identifying data associated with said matching combination.
  • a method of processing image data comprising the steps of dividing an image frame into blocks or sections made up of a plurality of pixels, storing a plurality of said pixel blocks or sections having different predetermined combinations of pixels together with corresponding unique identifying data associated with each combination, comparing each block or section of said image with said stored blocks or sections, identifying a stored combination of pixels which substantially matches (either exactly or to with a predetermined tolerance level) the combination of pixels in said block or section of said image, and outputting the identifying data associated with the matching combination.
  • each block or section of each frame is preferably identified by short bit streams which are transmitted across the network.
  • the blocks or sections are reproduced by the receiving node using the received bit streams and a matching code book.
  • the image data may comprise one or more still images such as photographs, or moving images, such as video data made up of a plurality of frames.
  • the apparatus beneficially also includes means for identifying a combination of pixels which is not stored in the code book means, allocating unique identifying data to said combination and storing said combination and identifier in the code book means.
  • the new combination and identifier is also transmitted across the network for dynamic updating of " the code book stored in the user end terminal .
  • the pixel blocks or sections may comprise a square or rectangle consisting of a predetermined N by M number of pixels where N and M are integers, and more preferably greater than 2.
  • the integers N and M may be different or they may be equal to each other.
  • the apparatus beneficially includes means, preferably in the form of feedback means, for identifying areas of an image where attention to detail is relatively unimportant, such as relatively large areas of the same colour, and means for combining a plurality of blocks or sections into a single larger block or section of pixels for comparison against said blocks or sections stored in another code book.
  • the apparatus could be arranged to analyse an image , dividing only detailed areas into the relatively small blocks or sections of pixels, and the rest into larger blocks or sections.
  • the code book means comprises a first code book for storing a plurality of combinations of pixel blocks or sections (and their corresponding identifiers) and another code book for storing a plurality of combinations of the larger blocks or sections and their corresponding identifiers.
  • the apparatus preferably includes means for comparing corresponding pixel blocks or sections from adjacent images or frames in a sequence, means for identifying any difference_s in the blocks or sections (corresponding, for example, to motion in the sequence) and only sending identifying- data corresponding to blocks or sections in a frame which have changed.
  • each pixel block or section is carried out by one or more neural networks .
  • apparatus for receiving and decoding image data comprising means for receiving unique identifying data corresponding to combinations of pixels making up an image, code book or memory means for storing a plurality of blocks or sections having different predetermined combinations of pixels and the identifying data corresponding to each combination, means for comparing the incoming identifying data with the identifying data stored in said code book means, means for identifying the pixel block or section to which said data corresponds, and means for outputting the respective pixel block or section.
  • a method of receiving and decoding image data comprising the steps of receiving unique identifying data corresponding to combinations of pixels making up an image, storing a plurality of different predetermined combinations of pixels together with corresponding unique identifying data associated with each combination, comparing the incoming identifying data with the stored identifying data, identifying the combination of pixels to which said data corresponds, and outputting the respective combination of pixels .
  • the image data may comprise one or more still images, such as photographs, or moving images, such as video data made up of a plurality of frames.
  • the apparatus also beneficially- includes means for receiving pixel blocks or sections not stored in the code book means, together with corresponding unique identifying data, and updating said code book means accordingly.
  • the apparatus beneficially includes separate code book means corresponding to each block or section size defined.
  • apparatus for processing image data comprising means for dividing an image into blocks or sections made up of a plurality of pixels, code book or memory means for storing a plurality of blocks or sections having different predetermined combinations of pixels, each of said combinations being associated with unique identifying data, means for comparing each block or section of said image with said stored blocks or sections, and means for determining that a combination of pixels in said block or section of said image does not match (either exactly or to within a predetermined tolerance level) any of the predetermined combinations of pixels stored in said code book means and means for storing said unmatched combination together with unique identifying data in said memory or code book means .
  • a method of processing image data comprising the steps of dividing an image into blocks or sections made up of a plurality of pixels, storing a plurality of blocks or sections having different predetermined combinations of pixels, each of said combinations being associated with identifying data, comparing each block or section of said image with said stored blocks or sections, determining that a combination of pixels in said block or section of said image doe ' s not match (either exactly or to within a predetermined tolerance level) any of the stored combinations of pixels and storing said unmatched combination together with unique identifying data.
  • the newly-stored combination and its unique identifying data is preferably output together with the identifying _data corresponding with other combinations of pixels in the image.
  • one of " more key details may be essential in the sense that, if they are lost, the entire sequence loses its context . This is of particular concern when such key details occupy only a very small area of a frame.
  • the ball is the key detail which defines all other activity in the sequence.
  • the ball may only occupy a single pixel in a frame.
  • apparatus for processing image data comprising means for dividing at least a portion of an image into blocks or sections made up of a plurality of pixels, code book or memory means for storing a plurality of blocks or sections having different predetermined combinations of pixels corresponding to the position of a predetermined element of said image relative to the other pixels in the block or section, each of said combinations being associated with unique identifying data, means for comparing each block or section of the image with said stored blocks or sections and identifying a block or section of pixels from said code book means which substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in said block or section of said image, and means for outputting the identifying data associated with said matching combination.
  • a method of processing image data comprising the steps of dividing at least a portion of an image into blocks or sections made up of a plurality of pixels, storing a plurality of blocks or sections having different predetermined combinations of pixels corresponding to the position of a predetermined element of said image relative to the other pixels in the block or section, each of said combinations being associated with unique identifying data, comparing each block or section of the image with _said stored blocks or sections and identifying a block or section of pixels from said stored blocks or sections - which substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in said block or section of said image, and outputting the identifying data associated with said matching combination.
  • one or more key elements in a video sequence can be identified and the system "taught" to recognise the element within each frame of the sequence as it changes position.
  • At least .the comparing and identifying steps are carried out by a neural network.
  • each of the stored blocks or sections of pixels corresponds to a different position of said predetermined element relative to the other pixels in the block or section.
  • the predetermined element occupies only a single pixel in an image
  • the number of stored combinations will be the same as the number of pixels in a block or section, each of the combinations having the pixel occupied by the predetermined element in a different position relative to the rest of the pixels in the block or section. If, however, the predetermined element occupies, either partially or fully, more than one pixel, the number of stored combinations corresponding to a single block or section of the image will be greater.
  • the stored combinations of pixels may be user-defined, or "learned" by the system, which identifies the predetermined element in an image or video sequence and then identifies and stores different combinations of pixels corresponding to the changing position of the predetermined element during the sequence .
  • apparatus for processing moving image data such as video data, comprising means for dividing an image frame into blocks or sections made up of a plurality of pixels, code book or memory means for storing a plurality of blocks or sections having different predetermined combinations of pixels, each of said blocks or sections being associated with unique identifying data, means for comparing each block or section of said image frame with said stored blocks or sections and identifying a combination of pixels from said code book means which substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in said block or section of said image frame, and means for outputting the identifying data associated with said matching combination, the apparatus further comprising means for determining the number of blocks or sections in the image frame whose combination of pixels has changed from the previous frame and outputting only the identifying data associated with the changed blocks or sections if the number of changed blocks or sections is less than a predetermined number.
  • a method for processing moving image data such as video data, comprising the steps of dividing an image frame into blocks ' or sections made up of a plurality of pixels, storing a plurality of blocks or sections having different predetermined combinations of pixels, each of said blocks or sections being associated with unique identifying data, comparing the block or section of image frame with the stored blocks or sections and identifying a combination of pixels from said code- book means which substantially matches (either exactly or to within a predetermined tolerance -level) the combination of pixels in said block or section of said frame, determining the number of blocks or sections in the frame whose combination of pixels has changed since the previous frame of a sequence, and outputting the identifying data associated only with the changed blocks if the number of changed blocks is less than a predetermined number.
  • the predetermined number is preferably user-defined as m%, where m is a positive number.
  • one or more feedback loops are preferably provided at the point where the number of changed blocks or sections is compared with the predetermined number or value. If the number of changed blocks or sections is less than the predetermined number, the system outputs the identifying data associated only with the changed blocks or sections. If, however, the number of changed blocks exceeds the predetermined number, the system may be arranged to shift all of the blocks or sections by one pixel in any direction and the number of changed blocks or sections may then be determined again. If the number is now less than the predetermined number, the identifying data associated only with the changed blocks or sections is output.
  • the system may be arranged to shift the blocks or sections by one pixel in the opposite direction, and the process repeated, and so on.
  • the number of different pixel blocks or sections which are required to be stored in the code book may be tens of thousands, especially as ' the entries are built up after encoding several different types of image sequence.
  • it may require an unacceptable amount of processing time and or capability to compare an input block or section against every block or section in a single code book, especially restrictions on processing time _when dealing with a live sequence of video frames, taking into account the inherent particularly as the size of the-code book increases.
  • apparatus for processing image data comprising means for dividing an image into blocks or sections made up of a plurality of pixels, first memory means for storing a plurality of blocks or sections having different predetermined combinations of pixels, each of said combinations being associated with unique identifying data, one or more second memory or code book means for storing a plurality of blocks or sections, which are at least a subset of the blocks or sections stored in the first memory means, together with unique identifying data associated with each of the blocks or sections in said subset means for comparing each block or section of said image with the blocks or sections stored in the or one or more of the second memory means and identifying a combination of pixels from said second memory means which substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in said
  • a method of processing image data comprising the steps of dividing an image into blocks or sections made up of a plurality of pixels, storing in a first memory means a plurality of blocks or sections having different predetermined combinations of pixels, each of said combinations being associated with unique identifying data, storing in each of one or more second memory means a plurality of blocks or sections which are at least a partial subset of the blocks or sections stored in the first memory means together with unique identifying data associated with each of said blocks in said subset, comparing each block or section of said image with the blocks stored in the or one or more of the second memory means, identifying a combination of pixels from said second memory means which substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in said block or section of said image, and outputting the identifying data associated with said matching combination.
  • the first memory means is a type of reference or master library with perhaps tens or even hundreds of thousands blocks stored in it, each identified by a unique address having a large number of bits.
  • the or each second memory means or code book may store subsets of reference library entries, together with addresses having fewer bits (because each code book has much fewer entries) .
  • the or each code book to be used in encoding an image or image sequence such as a video sequence may be user-defined for each image or sequence. However, there may be one or more pre-defined code books, especially for certain types of sequence, for example, sports matches, or for certain colour palettes or combinations . These ready-made code books can then be selected by the user or by the system itself based on one or more parameters entered or deduced from analysis of the image or sequence.
  • the apparatus beneficially includes means for determining that a combination of pixels in said block or section of said image does not match (either exactly or to within a predetermined tolerance level) any of the combinations of pixels stored in the or one or more of said second memory means and for comparing said combination of pixels in said block or section of • said image with the blocks or sections stored in said first memory means, and means for outputting the identifying data associated with a block or section stored in said first memory means if that stored block or section substantially matches (either exactly or to within a predetermined tolerance level) the block or section of said image and for storing any combination of pixels which does not substantially match any of the stored blocks or sections, together with unique identifying data, in one or both of said first or second memory means .
  • This newly-stored combination of pixels and associated unique identifying data is transmitted with the rest of the encoded data to update the code book in the end user terminal .
  • apparatus for receiving and decoding image data comprising means for receiving unique identifying data corresponding to combinations of pixels making up an image, first memory means for storing a plurality of blocks or sections having different predetermined combinations of pixels and identifying data corresponding to each combination, one or more second memory means for storing unique identifying data corresponding to a subset of the blocks or sections stored in said first memory means, means for comparing the incoming identifying data with the identifying data stored in the or a selected one or more of said second memory means, means for identifying the pixel blocks or sections to which said data corresponds, and means for outputting the respective pixel blocks or sections.
  • a method of receiving and decoding image data comprising the steps of receiving unique identifying data corresponding to combinations of pixels making up an image, storing in first memory means a plurality of blocks or sections having different predetermined combinations of pixels and identifying data corresponding to each combination, storing in one or more second memory .
  • first memory means a plurality of blocks or sections having different predetermined combinations of pixels and identifying data corresponding to each combination
  • second memory storing in one or more second memory .means unique identifying data corresponding to a subset of the blocks or sections stored in said first memory means, comparing the incoming identifying data with the identifying data stored in the or a selected one or more second memory means, identifying the pixel blocks or sections to which said data corresponds, and outputting the respective pixel blocks or sections.
  • the actual pixel blocks or sections may also be stored in the second memory means, or they may be retrieved from the first memory means using the unique identifying data in the second memory means as a reference.
  • Information relating to the code book(s) used to encode an image or a frame of, for example, a video sequence may be transmitted at the start of each stream of identifying data relating to that image or frame, so that the receiver knows which code book to use to decode the data and constitute the image or frame. If more than one code book is used and information relating to new code book entries is also transmitted, it should preferably include information as to which code book (i.e. second memory means) it should be added to, to ensure correct duplication of code books in the receiver. The new entry would beneficially also be added to the reference library (first memory means) of the receiver.
  • smaller libraries can be built using codes from a master library, and this process does not need to be performed in real time.
  • the choice of codes making up a library may be hard coded into the encoding and decoding software.
  • the smaller code books provide a translation between a subset of the master library and the master library, but allow smaller bit streams to be used to identify each pixel block, and substantially reduce processing time required for the comparison of a pixel block or section of an image or frame and the code book entries.
  • a library of different predetermined combinations of pixels is stored in the code book or memory means, the library comprising one or more source blocks consisting of colour and/or pixel data representing a predetermined combination of pixels, and one or more transform means which, when applied to a source block, produces a block having a different predetermined combination of the same pixels, thereby enabling a very large number of blocks to be stored in a relatively small memory capacity.
  • the probability of a particular combination of pixels appearing in a single image or sequence of frames may be variable, which minimises the benefit of statistical compression.
  • apparatus for processing image data comprising means for dividing an image into blocks or sections made up of a plurality of pixels, memory means for storing a plurality of blocks or sections having different predetermined combinations of pixels, each of said combinations being associated with unique identifying data, means for comparing each block or section of said image with the blocks or sections stored in the memory means and identifying a combination of pixels which substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in said block or section of said frame, and means for outputting the identifying data associated with said matching combination, wherein one or more of said stored combinations of pixels is allocated a probability of occurrence in an image or sequence of images, the unit length of said unique identifying data associated with said at least one stored combination being dependent on the probability of occurrence in an image or sequence of images of said combination of pixels.
  • the combinations of pixels which are most likely to occur i.e. ' those expected to occur most often, can be allocated unique identifying data comprising a relatively short bit stream, and the combinations less likely to occur can be allocated longer bit streams.
  • a method of processing image data comprising the steps of dividing an image into blocks or sections made up of a plurality of pixels, storing a plurality of blocks or sections having different predetermined combinations of pixels, each of said combinations being associated with unique identifying data, comparing each block or section of said image with the stored blocks or sections and identifying a combination of pixels which substantially matches (either exactly or to within a predetermined threshold level) the combination of pixels in said block or section of said image, and outputting the identifying data associated with said matching combination, wherein at least one of said stored combinations of pixels is allocated a probability of occurrence in an image or sequence of images, the unit length of said unique identifying data associated with said- at least one stored combination being dependent on the probability of occurrence in an image or sequence of images of said combination of pixels.
  • the probabilities of occurrence of at least some of the stored pixel combinations may beneficially be stored along with the unique identifying data associated with each combination.
  • the probabilities of occurrence of at least some combinations of pixels may be predefined, i.e. hard-coded into the system. For example, the combination of all green pixels in a video sequence of a football match will have a very high probability of occurrence..
  • the system may be arranged to determine the number of occurrences of each combination of pixels and allocate probabilities and identifying data accordingly, thereby ensuring that the minimum amount of data is used to represent every image or frame.
  • the combinations of pixels for which the identifying data has been changed would ideally need to be sent (together with the respective new identifying data) to the receiving user end terminal to update the code book so enabling it to decode the incoming data stream.
  • the combinations of pixels for which the identifying data has been changed would ideally need to be sent (together with the respective new identifying data) to the receiving user end terminal to update the code book so enabling it to decode the incoming data stream.
  • a large number of pixel combinations have changed their identifying data, then a large amount of additional data needs to be sent to update the receiver code book, with the possibility of actually increasing the amount of data required to transmit a sequence .
  • updated probability/ identifying data is only transmitted for combinations of pixels which occur most commonly.
  • the exact number of combinations which fit this classification may be user-defined or hard-coded into the system.
  • a predetermined number of combinations and their updated probabilities are transmitted, with all other probabilities set to an equal level.
  • the predetermined number of combinations and their updated probabilities are normalised against the original code probabilities.
  • an apparatus and/or method comprising two or more of the ten aspects of the invention as defined above.
  • the image data may comprise one or more still images, such as photographs or moving images, such as video data made up of a plurality of frames.
  • FIGURE 1A is a schematic diagram to illustrate a first neural network training process
  • FIGURE IB is a schematic diagram to illustrate a second neural network training process
  • FIGURE 2 is an image divided into equally sized pixel blocks
  • FIGURE 3 is a schematic block diagram of apparatus according to a first embodiment of the present invention
  • FIGURE 4 is a schematic block diagram of the code book and pixel block analysis/comparison means of the apparatus of Figure 3 ;
  • FIGURE 5 is an example of an image which can be encoded using different sized pixel blocks
  • FIGURE 6 is an image divided into different sized pixel blocks ;
  • FIGURE 7 is a schematic block diagram of apparatus according to a second embodiment of the present invention.
  • FIGURE 8 is a schematic diagram of the combination of pixels required to depict a predetermined element of an image which occupies a single pixel in a 3 x 3 block of pixels;
  • FIGURE 9 is a schematic diagram of the combinations of pixels which may be required to depict a predetermined element of an image which occupies more than one pixel in a 3 x 3 block of pixels;
  • FIGURE 10 shows two adjacent frames of a video sequence
  • FIGURE 11 is a schematic block diagram of the operation code book and pixel block analysis/comparison means of Figure 4 when there has been a single pixel shift between adjacent frames of a video sequence;
  • FIGURE 12 is a schematic diagram of the part of the apparatus according an aspect of the present invention.
  • FIGURE 13 is a schematic diagram of a single master library and several smaller libraries for use in an embodiment of the invention.
  • FIGURE 14 is a schematic diagram illustrating a source block and a relational block of pixels
  • FIGURE 15 is a schematic diagram illustrating various pixel block transforms which can be used in an exemplary embodiment of the invention.
  • FIGURE 16 is a schematic diagram illustrating the method of locating delta blocks using run length which is employed in an exemplary embodiment of the invention
  • FIGURE 17 is a schematic diagram to illustrate the operation of an aspect of the invention.
  • FIGURES 18 - 21 are tables illustrating the operation of an aspect of the invention.
  • Neural networks generally comprise of thousands of individual neurons, analogous to nerves in the human brain.
  • Each neuron is capable of only simple calculations, e.g. summing inputs and applying a threshold function to determine if the neuron will have an output .
  • the outputs of the neurons can be weighted and categorised to determine the best solution to an input problem.
  • An input PI is received by a neural network, and the user trains the neural network that it should be classified as " type X, with a tolerance of a certain radius.
  • a second input P2 is received and the user trains the neural network that it should be classified as type Y.
  • a third input P3 is received that is within the tolerance of PI, P3 is therefore classified as also being of type X.
  • This is known as the radial based function (RBF) approach.
  • RBF radial based function
  • KNN K Nearest Neighbour
  • apparatus comprises a sampling circuit 20 which receives and samples a digital video signal 22.
  • the sampling rate of the sampling circuit 20 is user-defined in accordance with the data throughput capability of the network across which the video signal is to be transmitted.
  • a typical PAL video signal may have a rate of 25 frames per second, whereas a typical low speed communications network may only be capable of handling 5 frames per second. Therefore, in this case, every fifth frame would be sampled.
  • the sampled data is converted by a file formatter 24 into a format for inputting information into a neural network 26 for analysis.
  • the neural network 26 is pre-programmed with information such as image size and shape, colour depth and desired block size and shape, so that it divides the incoming image into a plurality of blocks for analysis using a code book 28.
  • the code book 28 stores a plurality of blocks 30, each of which has a different combination of pixels.
  • a block of pixels is input to the neural network 26, which then analyses the block and compares the combination of pixels with the combinations stored in the code book 28.
  • the combinations held in the code book are set with tolerance levels, so that incoming blocks do not need to be identical to the stored combinations in order to be considered to match.
  • the neural network 26 identifies the stored combination which most closely matches the block 33 under analysis, within the predetermined tolerance level and outputs a data stream 34 corresponding to that combination to a data buffer 37.
  • the block 33 being analysed most closely matches Block 3 in the code book, so the neural network outputs 0011 which is the binary representation of the number 3.
  • the input blocks being analysed will not fall within the tolerance levels of any of the blocks stored in the code book 28.
  • the unrecognised block is dynamically added (at 35) to the code book 28, together with—a -unique identifying number as well as the relevant red, green and blue or grey scale colour information for each pixel in the block.
  • the newly-stored block remains stored in the code book 28, complementing the other entries, for use when categorising other, subsequent blocks.
  • the new block information stored in the code book 28 is also duplicated and sent to a file formatter 36, where it is merged with the identifying data relating to each frame in a sequence output by the neural network 26 as a result of the analysis process.
  • the output file output by the file formatter 36 is added to a server 38 for transmission over a data or telecommunications network 40 at the request of an end user.
  • the formatted data held on the server 38 is transmitted across the network 40 to a data buffer 42 in the end user's terminal device, which may be a personal computer, games console, mobile telephone or other multimedia device.
  • the end user's terminal device includes software adapted to process and view the video stream produced by the apparatus.
  • the software includes a duplicate (code book 44) of the code book 28 used to classify each block of pixels during the encoding process .
  • the incoming data is examined for any additional code book information, generated by unrecognised blocks in the encoding process.
  • the information is used (at 46) to update the code book (44) and maintain synchronism between the code books 28,44 in the encoding and decoding processes respectively.
  • the received data is analysed on an identifying code by code basis. For a specific code, e.g. 0011, the terminal simply looks up the block information held in the code book 44 (at 48) for that code and determines the pixel information necessary - for display. This process is applied across the entire input stream to reconstitute each frame (at -5-0) .and in time the complete sequence of frames.
  • a specific code e.g. 0011
  • the terminal simply looks up the block information held in the code book 44 (at 48) for that code and determines the pixel information necessary - for display. This process is applied across the entire input stream to reconstitute each frame (at -5-0) .and in time the complete sequence of frames.
  • the sequence is displayed on a screen 52. It is important to note that the encoding process uses the Neural Network for significant levels of processing to encode the video stream, but the terminal device is simply required to look up the pixel information and does not require a Neural Network to decode the stream.
  • One of the key advantages of using the approach of the ' present invention to video streaming is that it is only necessary to send a digital code to uniquely identify a block of pixels, instead of detailed colour information. This can lead to a significant reduction in the amount of data to be transmitted.
  • the streaming solution can be tailored so that adjacent frames are analysed to see which blocks have changed, corresponding to motion in the sequence. It is only necessary to send information on the blocks that have changed between frames rather than the entire frame of blocks, which leads to further reductions in data transmission.
  • the quantity of data transmission can be yet further reduced by a second embodiment of the present invention.
  • the Neural Network can be programmed to identify areas where larger blocks can be used instead of smaller ones. This may be achieved by analysing a frame in parallel across two Neural networks analysing the smaller blocks 12 in a frame 10 using one of the networks and analysing the larger blocks 60 using the other neural network or the same neural network which switches between different code books can be used for both block sizes.
  • Figure 7 shows an example of two neural networks analysing an input frame using different sized code books. A frame in a video 10 is input to the two neural networks 26a, 26b.
  • code book 70 is made up of codes relating to n x n pixel blocks
  • code book 72 is made up of codes relating to 2n x 2n pixel blocks, where n is an integer greater than 1.
  • the second code book 72 has larger blocks it is statistically less likely to find an exact or close match to an input block, as the potential variation in the block is very much higher for 2n x 2n pixels than for n x n pixels, even though the tolerance levels for the 2n x 2n blocks can be set higher than for n x n blocks. As a result, coding the entire frame at 2n x 2n would not give as good a picture as at n x n, although there will be cases where close matches to a 2n x 2_ ⁇ block might be found.
  • the two neural networks 26a, 26b analyse the frame 10, in parallel.
  • the output n x n analysis is carried out as described with reference to Figure 4 of the drawings ⁇ using code book 70, and the output from the n x n analysis is stored (at 78) . If the 2n x 2n analysis, which is carried out as described with reference to Figure 4 using code book 72, does not identify any suitable blocks, then only the n x n blocks are used for encoding the frame . If however there are suitable 2n x 2 ⁇ blocks which fall within the tolerance levels set for code book 72, the corresponding codes are used to replace (at 80) the relevant codes for the n x n blocks occupying the same physical location in the frame. The revised combination of n x n and 2n x 2n codes is then output at 82.
  • the end user's terminal device includes both the n x n and the 2n x 2n code books, so that it can reconstitute each frame of a video signal and output it as described with reference to Figure 2.
  • both code books can be updated with new pixel blocks if necessary, as previously described.
  • n x n and 2n x 2n block coding may have ramifications for the layout for the blocks in the frame. Therefore, in order to ensure that the image is reconstructed correctly it may be necessary to add location information to the 2n x 2n blocks to exactly identify the co-ordinates that the block should occupy in the frame .
  • the following relates to means for identifying and encoding important contextual objects in frames of a video sequence .
  • the ball can appear_in any one of 9 positions in the block.
  • the code book which acts as the look-up table for the blocks in the neural network system is taught each of nine different options 84a-84i for the ball against a green (grass) background, as shown in Figure 8.
  • the following also offers the potential for inter-frame compression by only requiring the transmission of blocks that have changed between adjacent frames.
  • Figure 10 shows adjacent frames from a video sequence. While they might look identical, the image in the second frame 88 is one pixel higher in the vertical direction than—the image in the first frame 90. If the image is being coded into blocks of 3 pixels by 3 pixels using a neural network, the pixel information in block 92 is shifted relative to the same block 94 in the previous frame.
  • This aspect of the invention includes apparatus and a technique for compensating for the shift in pixels in any direction, thereby optimising the frame sequence for inter- frame compression and avoiding unnecessary data transfer.
  • Changes in the block structure of adjacent frames will usually be as a result of some change in the scene of video, e.g. changing from one camera angle to another.
  • This embodiment of the invention performs some simple testing to see if there is evidence of an unstable source.
  • the neural network performs a series of tests on a frame, shifting it by one or more pixels in one or both of the x and y directions, re-encoding and comparing with the previous frame .
  • Figure 12 shows the sequence of actions to achieve this solution in more detail.
  • a frame (n) 20 is input to the neural network 22 and encoded using the code book 28.
  • a . test is performed to determine whether a given number of blocks have changed since the previous frame (n ⁇ l) .
  • the threshold for this, comparison, % may be under the control. of the user or hard- coded into the system. If less than the threshold number of blocks have changed, the codes representing the changed blocks are output •at 102.
  • the system effectively concludes that the reason is due to change of scene and not due to an unstable image.
  • the system then codes all blocks in frame n without any pixel shifts and moves on to the next frame.
  • the following relates to a means of flexible management of the code books required to encode a frame in a video sequence using neural networks .
  • the neural network 26 compares the input block 33 with every block held in the code book 28.
  • the neural network can determine that Block 3 is the best match.
  • the output of the process is the identifier of the specific block in the code book, which in this case is Oil, which is the binary representation of the number three.
  • the neural network may impose a practical limit on the number of blocks that can be analysed simultaneously. It is usually not possible to encode an entire video sequence using a single code book of a few thousand entries, which may be the limitation of the neural network.
  • This aspect of the invention provides multiple code books for use in a sequence of video to provide an overall reference block count of many tens of thousands, while at the same time keeping the data addressing requirements of the code books to a minimum. In this way the video quality can be optimised without compromising the limited data capabilities of some network connections.
  • An exemplary embodiment of this aspect of the invention uses a master library 110 of codes and then chooses blocks from this library to make up several smaller code books.
  • Figure 13 shows an example of a master library of blocks of 3 x 3 pixels. In this example it is assumed that there are up to 65,000 entries in the master library, and as such 16-bit addressing is required to uniquely identify each block.
  • a number of smaller libraries 112 are built using codes from the master library 110. This process need not be done in real time. In fact the choice of codes making up a library may be hard-coded into the encoding and decoding software.
  • the object of the smaller code books 112 is to provide a translation between a subset of the master library 110 and the library itself.
  • the code book uses 12-bit addressing and then each entry is translated to a 16-bit address to uniquely identify the block in the master library 110.
  • 12-bit addressing each code book 112 would have up to 4,096 entries.
  • a frame of video in the neural network uses the 12 -bit code book 112 for each input block and only 12 -bits to encode each block.
  • the decoder needs to know which code book to use for decoding. This information can be set at the start of each frame. Allocating 8 bits -.in each frame would allow for 256 code books, each drawn from the population of the master library 110.
  • code books for certain types of video and instruct the neural network to use these when possible.
  • a football match may have a typical code book, or a scene featuring a particular palette of colours may be more suitable for encoding using one particular code book over another.
  • the apparatus may be arranged so as to try one or more code books for every frame. In this way the best image quality for the frame can be achieved.
  • code books for each frame may remove the potential for inter-frame compression, where only codes identifying blocks which have changed are transmitted.
  • Applying the code book management process as described above is complementary to the dynamic code book management described above where unknown blocks are added to the code book at both the encoder and decoder.
  • the system can compare it against the master library (or one or more of the other smaller code books) , and only send the combination of pixels itself and newly-allocated identifying data to update the receiver code book, if it cannot be matched anywhere.
  • the colour information for all of the pixels in the master library makes it a very large file, especially if it is made up of several hundred thousand or more blocks .
  • a library of say 1 million blocks each of
  • the first type is a source block which has the same colour and pixel data as described above.
  • the second type is a relational block derived from the source block.
  • the pixel and colour data for this block is not stored, but Ts instead derived by applying a transform to the relevant source block.
  • the block on the left is a source block. Its pixel and colour data will be stored in the library.
  • the relational block on the right is the same block rotated through 90 degrees. Its colour data can be derived by applying a known transform to the correct source block.
  • inter-frame compression is achieved by comparing adjacent frames in a sequence and only sending information on blocks that have changed between the frames. These are known as Delta Blocks.
  • each delta block must have an 8-bit address to uniquely identify its location in the frame.
  • the delta frame is made up as shown ' in Figure 16. 37 low by neural network standards but it is a convenient basis for the explanation) .
  • the probability of a block appearing in an input frame can be determined by experiment and this information may be held permanently in the encoder and "" decoder. Consider the situation where 30 codes are used as shown in Figure 18, along with the probability of occurrence expressed as a percentage at 120. For ease of explanation in this case the probability decreases as the code number increases. If an input frame is encoded and the occurrence of the codes broadly matches the stored probabilities, then statistical compression is likely to offer significant advantages in the reduction of the amount of data required to transmit the information.
  • One solution to the problem would be to completely refresh the probability information in the code book with the new data and send this information to the end user for updating the probability information used in decompression.
  • This aspect of the intermediate course of action where updated probability information is sent involves only codes that occur commonly.
  • the exact number of codes that fit this classification can be determined by the user.
  • a predetermined process identifies the process (see below) for dealing with the other codes and this process is followed by both the compressor and decompressor ensuring that synchronisation of the probability data is maintained.
  • the black blocks are delta blocks, i.e. those that have changed since the previous frame.
  • the first block that is sent is a reference block (shown in grey) occupying the top left corner of the frame. From this point on, blocks are written and read from left to right.
  • the first delta block is nine places to the right of the reference block. This is run length Rl . It is only necessary to send this number, rather than an x, y coordinate to identify the location of the block. After sending the code for the first delta block, the run length to the next delta block (R2) is sent.
  • the technique can be applied from any point so long as it is consistently applied.
  • the following relates to a means of further compressing the encoded data stream output from a neural network when used to encode video for streaming over a network, as described above .
  • This aspect of the invention concerns means of applying statistical compression techniques to the output of the encoding process by including information in the data stream concerning the probability of appearance of a particular code book entry. This data can then be used for the encoding and decoding process to maximise potential for statistical compression.
  • large code books or libraries may be split into smaller more manageable sized code books to match the limitations of the neural network. In this way it is possible to develop different code books for different applications. For example a specific palette of codes may be used to create a code book for a video such as a soccer match.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Apparatus for processing image data, comprising means (26) for dividing an incoming image to a plurality of blocks or sections made up of a plurality of pixels, code book means (28) for storing a plurality of blocks or sections having different predetermined combinations of pixels, each of the combinations being associated with unique identifying data, means (26) for comparing each block or section of the image with the stored blocks or sections and identifying the combination of pixels from the code book means (28) which substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in the block or section of the image, and means (26) for outputting the identifying data (34) associated with the matching combination. Apparatus is also provided for receiving and decoding image data, comprising means for receiving unique identifying data corresponding to combinations of pixels making up an image, code book or memory means for storing a plurality of blocks or sections having different predetermined combinations of pixels and the identifying data corresponding to each combination, means for comparing the incoming identifying data with the identifying data stored in the code book-means, means for identifying the pixel block or section to which the data corresponds, and means for outputting the respective pixel block or section.

Description

Video Data Transmission
This invention relates to a method and apparatus for transmitting image data in real time across a telecommunications or data network, and in particu ar but not exclusively, to a method and apparatus for transmitting or "streaming" such image data across a low speed network, such as the Internet, although it may be equally applied to high and 'low speed connections.
Image compression and transmission and video streaming techniques generally are well known. In principle, an image signal such as a video signal is sampled, quantized and digitally encoded before transmission across a network. The end user's terminal device decodes the incoming signal, inverse quantizes it and processes it to reproduce the input video signal. However, most Internet users are connected via modems that operate at a maximum of 56kb/s. Real time video data, in order to preserve its image quality and frame rate, must be sampled at a relatively high rate, and the quantization step size must be such, that the resultant video stream needs to be received and processed at a rate of much greater than 56kb/s. As a result, substantial quantities of the video data are lost or distorted when it is transmitted across a low speed network, such that image quality and frame rate are significantly reduced.
US Patent No. 5,638,125 describes a method of compressing video data and transmitting it, with the aim of preserving picture quality of the received signals. This is achieved by dividing each image into blocks and using a neural network to vary the quantization step size for each block according to variable parameters such as image complexity and luminance. However, large quantities of colour and image data are still required to be transmitted at a high rate, unsuitable for many low speed networks .
US Patent No. 5,828,413 describes a system for compressing and transmitting moving image data, particularly suitable for use in video conferencing and video telephony where the image data is required to be transmitted in real-time across a low speed connection. The transmitter splits a set of image frames into a number of "superblocks" , each~~including the data or signal representing a plurality of pixels. The individual superblocks are processed into a set of vectors which are encoded by means of vector quantization. A dedicated code book is created for each superblock based on localised recent history of previous frames, each code book consisting of a number of "blocks" which are composed of three-dimensional vectors representative of image data from a predetermined number of previous frames. The code books are duplicated at the receiver so that some identical sections of adjacent sets of frames can be reproduced at the receiver without having to re-transmit all of the image data across the connection. However, a substantial amount of image data still needs to be transmitted to achieve a good quality image at the receiver end because the code books are only useful for reproducing sections of an image which have not changed between adjacent frames. Further, the code books are updated periodically (for example, every second) which requires the periodic transmission of the same image data to update the receiver code books. Thus, large quantities of colour and image data are still required to be transmitted at a high rate, unsuitable for many low speed networks.
We have now devised an arrangement which seeks to overcome all of the problems outlined above and provide a method and apparatus for encoding and decoding .image data, whether still or moving making it suitable for transmission across a low speed network, but which is also applicable for higher speed connections.
In accordance with a first aspect of the present invention, there is provided apparatus for processing image data, comprising means for dividing an image frame into blocks or sections made up of a plurality of pixels, code book or memory means for storing a plurality of blocks or sections having different predetermined combinations of pixels, each of said combinations being associated with unique identifying data, means for comparing each block or section of'"said image with said stored blocks or sections and identifying a combination of pixels from said code book means which substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in said block or section of said image, and means for outputting the identifying data associated with said matching combination.
Also in accordance with the first aspect of the present invention, there is provided a method of processing image data comprising the steps of dividing an image frame into blocks or sections made up of a plurality of pixels, storing a plurality of said pixel blocks or sections having different predetermined combinations of pixels together with corresponding unique identifying data associated with each combination, comparing each block or section of said image with said stored blocks or sections, identifying a stored combination of pixels which substantially matches (either exactly or to with a predetermined tolerance level) the combination of pixels in said block or section of said image, and outputting the identifying data associated with the matching combination.
Thus, a particular combination of pixels is identified by a relatively small amount of data, and no image information is lost even after transmission across a low speed network. Each block or section of each frame is preferably identified by short bit streams which are transmitted across the network. The blocks or sections are reproduced by the receiving node using the received bit streams and a matching code book.
The image data may comprise one or more still images such as photographs, or moving images, such as video data made up of a plurality of frames.
The apparatus beneficially also includes means for identifying a combination of pixels which is not stored in the code book means, allocating unique identifying data to said combination and storing said combination and identifier in the code book means. The new combination and identifier is also transmitted across the network for dynamic updating of" the code book stored in the user end terminal .
Thus, the only data which is transmitted is the unique identifiers of all pixel blocks or sections and any newly- identified combinations, thereby substantially reducing the colour and image data being transmitted. The pixel blocks or sections may comprise a square or rectangle consisting of a predetermined N by M number of pixels where N and M are integers, and more preferably greater than 2. The integers N and M may be different or they may be equal to each other. However, the sections need not be blocks, i.e. square/rectangular - they may be abstract shapes. In one specific embodiment, N = M = 3 for all blocks. The apparatus beneficially includes means, preferably in the form of feedback means, for identifying areas of an image where attention to detail is relatively unimportant, such as relatively large areas of the same colour, and means for combining a plurality of blocks or sections into a single larger block or section of pixels for comparison against said blocks or sections stored in another code book. Alternatively, of course, the apparatus could be arranged to analyse an image , dividing only detailed areas into the relatively small blocks or sections of pixels, and the rest into larger blocks or sections. In both cases, the code book means comprises a first code book for storing a plurality of combinations of pixel blocks or sections (and their corresponding identifiers) and another code book for storing a plurality of combinations of the larger blocks or sections and their corresponding identifiers.
As a result, even less data characterising a complete image or sequence of images or frames needs to be transmitted across the network. The apparatus preferably includes means for comparing corresponding pixel blocks or sections from adjacent images or frames in a sequence, means for identifying any difference_s in the blocks or sections (corresponding, for example, to motion in the sequence) and only sending identifying- data corresponding to blocks or sections in a frame which have changed.
Preferably, analysis of each pixel block or section is carried out by one or more neural networks .
In accordance with a second aspect of the present invention, there is provided apparatus for receiving and decoding image data, comprising means for receiving unique identifying data corresponding to combinations of pixels making up an image, code book or memory means for storing a plurality of blocks or sections having different predetermined combinations of pixels and the identifying data corresponding to each combination, means for comparing the incoming identifying data with the identifying data stored in said code book means, means for identifying the pixel block or section to which said data corresponds, and means for outputting the respective pixel block or section.
Also in accordance with the second aspect of the present invention, there is provided a method of receiving and decoding image data, comprising the steps of receiving unique identifying data corresponding to combinations of pixels making up an image, storing a plurality of different predetermined combinations of pixels together with corresponding unique identifying data associated with each combination, comparing the incoming identifying data with the stored identifying data, identifying the combination of pixels to which said data corresponds, and outputting the respective combination of pixels .
The image data may comprise one or more still images, such as photographs, or moving images, such as video data made up of a plurality of frames. The apparatus also beneficially- includes means for receiving pixel blocks or sections not stored in the code book means, together with corresponding unique identifying data, and updating said code book means accordingly.
Once again, two or more different block or section sizes may be utilised to identify an image, in which case, the apparatus beneficially includes separate code book means corresponding to each block or section size defined.
In accordance with a third aspect of the present invention, there is provided apparatus for processing image data comprising means for dividing an image into blocks or sections made up of a plurality of pixels, code book or memory means for storing a plurality of blocks or sections having different predetermined combinations of pixels, each of said combinations being associated with unique identifying data, means for comparing each block or section of said image with said stored blocks or sections, and means for determining that a combination of pixels in said block or section of said image does not match (either exactly or to within a predetermined tolerance level) any of the predetermined combinations of pixels stored in said code book means and means for storing said unmatched combination together with unique identifying data in said memory or code book means .
Also in accordance with the third aspect of the present invention, there is provided a method of processing image data comprising the steps of dividing an image into blocks or sections made up of a plurality of pixels, storing a plurality of blocks or sections having different predetermined combinations of pixels, each of said combinations being associated with identifying data, comparing each block or section of said image with said stored blocks or sections, determining that a combination of pixels in said block or section of said image doe's not match (either exactly or to within a predetermined tolerance level) any of the stored combinations of pixels and storing said unmatched combination together with unique identifying data. The newly-stored combination and its unique identifying data is preferably output together with the identifying _data corresponding with other combinations of pixels in the image. In some images or video sequences, one of"more key details may be essential in the sense that, if they are lost, the entire sequence loses its context . This is of particular concern when such key details occupy only a very small area of a frame. For example, in a video sequence of a football match, the ball is the key detail which defines all other activity in the sequence. When the ball is a relatively long distance from the camera, it may only occupy a single pixel in a frame.
In accordance with a fourth aspect ' of the present invention, there is provided apparatus for processing image data, comprising means for dividing at least a portion of an image into blocks or sections made up of a plurality of pixels, code book or memory means for storing a plurality of blocks or sections having different predetermined combinations of pixels corresponding to the position of a predetermined element of said image relative to the other pixels in the block or section, each of said combinations being associated with unique identifying data, means for comparing each block or section of the image with said stored blocks or sections and identifying a block or section of pixels from said code book means which substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in said block or section of said image, and means for outputting the identifying data associated with said matching combination.
Also in accordance with the fourth aspect of the present invention, there is provided a method of processing image data, comprising the steps of dividing at least a portion of an image into blocks or sections made up of a plurality of pixels, storing a plurality of blocks or sections having different predetermined combinations of pixels corresponding to the position of a predetermined element of said image relative to the other pixels in the block or section, each of said combinations being associated with unique identifying data, comparing each block or section of the image with _said stored blocks or sections and identifying a block or section of pixels from said stored blocks or sections - which substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in said block or section of said image, and outputting the identifying data associated with said matching combination.
Thus, one or more key elements in a video sequence can be identified and the system "taught" to recognise the element within each frame of the sequence as it changes position.
Beneficially, at least .the comparing and identifying steps are carried out by a neural network.
Preferably, each of the stored blocks or sections of pixels corresponds to a different position of said predetermined element relative to the other pixels in the block or section. Thus, if the predetermined element occupies only a single pixel in an image, the number of stored combinations will be the same as the number of pixels in a block or section, each of the combinations having the pixel occupied by the predetermined element in a different position relative to the rest of the pixels in the block or section. If, however, the predetermined element occupies, either partially or fully, more than one pixel, the number of stored combinations corresponding to a single block or section of the image will be greater.
The stored combinations of pixels may be user-defined, or "learned" by the system, which identifies the predetermined element in an image or video sequence and then identifies and stores different combinations of pixels corresponding to the changing position of the predetermined element during the sequence .
Generally, and to minimise the amount of data which needs to be output, only information relating to areas of an image which have changed since a previous image or frame should ideally be output. However, a single pixel shift within an image, commonly caused for example by camera shake during shooting of a video sequence or error during the video post production process, can cause an entire frame to be recoded and re-transmitted, which unnecessarily increases the quantity of data being transmitted and reduces the quality of the image at the receiver because the pixel shift is actually an error which is transferred to the receiver.
In accordance with a fifth aspect of the present invention, there is provided apparatus for processing moving image data such as video data, comprising means for dividing an image frame into blocks or sections made up of a plurality of pixels, code book or memory means for storing a plurality of blocks or sections having different predetermined combinations of pixels, each of said blocks or sections being associated with unique identifying data, means for comparing each block or section of said image frame with said stored blocks or sections and identifying a combination of pixels from said code book means which substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in said block or section of said image frame, and means for outputting the identifying data associated with said matching combination, the apparatus further comprising means for determining the number of blocks or sections in the image frame whose combination of pixels has changed from the previous frame and outputting only the identifying data associated with the changed blocks or sections if the number of changed blocks or sections is less than a predetermined number.
Also in accordance with the fifth aspect of the present invention, there is provided a method for processing moving image data such as video data, comprising the steps of dividing an image frame into blocks 'or sections made up of a plurality of pixels, storing a plurality of blocks or sections having different predetermined combinations of pixels, each of said blocks or sections being associated with unique identifying data, comparing the block or section of image frame with the stored blocks or sections and identifying a combination of pixels from said code- book means which substantially matches (either exactly or to within a predetermined tolerance -level) the combination of pixels in said block or section of said frame, determining the number of blocks or sections in the frame whose combination of pixels has changed since the previous frame of a sequence, and outputting the identifying data associated only with the changed blocks if the number of changed blocks is less than a predetermined number.
The predetermined number is preferably user-defined as m%, where m is a positive number.
In a preferred embodiment, one or more feedback loops are preferably provided at the point where the number of changed blocks or sections is compared with the predetermined number or value. If the number of changed blocks or sections is less than the predetermined number, the system outputs the identifying data associated only with the changed blocks or sections. If, however, the number of changed blocks exceeds the predetermined number, the system may be arranged to shift all of the blocks or sections by one pixel in any direction and the number of changed blocks or sections may then be determined again. If the number is now less than the predetermined number, the identifying data associated only with the changed blocks or sections is output. If, however, the number of changed blocks still exceeds the predetermined number, the system may be arranged to shift the blocks or sections by one pixel in the opposite direction, and the process repeated, and so on. The number of different pixel blocks or sections which are required to be stored in the code book may be tens of thousands, especially as ' the entries are built up after encoding several different types of image sequence. When performing the encoding process, it may require an unacceptable amount of processing time and or capability to compare an input block or section against every block or section in a single code book, especially restrictions on processing time _when dealing with a live sequence of video frames, taking into account the inherent particularly as the size of the-code book increases. Furthermore, for a code book having 65,000 block section entries, each block or section must be identified by a 16-bit stream which can result in an excessive burden on the network over which the image data is to be transmitted, especially for low speed connections. In accordance with a sixth aspect of the present invention, there is provided apparatus for processing image data, comprising means for dividing an image into blocks or sections made up of a plurality of pixels, first memory means for storing a plurality of blocks or sections having different predetermined combinations of pixels, each of said combinations being associated with unique identifying data, one or more second memory or code book means for storing a plurality of blocks or sections, which are at least a subset of the blocks or sections stored in the first memory means, together with unique identifying data associated with each of the blocks or sections in said subset means for comparing each block or section of said image with the blocks or sections stored in the or one or more of the second memory means and identifying a combination of pixels from said second memory means which substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in said block of said image, and means for outputting the identifying data associated with said matching combination.
Also in accordance with the sixth aspect of the invention, there is provided a method of processing image data, comprising the steps of dividing an image into blocks or sections made up of a plurality of pixels, storing in a first memory means a plurality of blocks or sections having different predetermined combinations of pixels, each of said combinations being associated with unique identifying data, storing in each of one or more second memory means a plurality of blocks or sections which are at least a partial subset of the blocks or sections stored in the first memory means together with unique identifying data associated with each of said blocks in said subset, comparing each block or section of said image with the blocks stored in the or one or more of the second memory means, identifying a combination of pixels from said second memory means which substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in said block or section of said image, and outputting the identifying data associated with said matching combination.
Thus, the first memory means is a type of reference or master library with perhaps tens or even hundreds of thousands blocks stored in it, each identified by a unique address having a large number of bits. The or each second memory means or code book may store subsets of reference library entries, together with addresses having fewer bits (because each code book has much fewer entries) . The or each code book to be used in encoding an image or image sequence such as a video sequence may be user-defined for each image or sequence. However, there may be one or more pre-defined code books, especially for certain types of sequence, for example, sports matches, or for certain colour palettes or combinations . These ready-made code books can then be selected by the user or by the system itself based on one or more parameters entered or deduced from analysis of the image or sequence.
As stated above, in order to decode the encoded image data, the end user terminal should have a duplicate of the code book(s) used to encode the image. Thus, the apparatus beneficially includes means for determining that a combination of pixels in said block or section of said image does not match (either exactly or to within a predetermined tolerance level) any of the combinations of pixels stored in the or one or more of said second memory means and for comparing said combination of pixels in said block or section of • said image with the blocks or sections stored in said first memory means, and means for outputting the identifying data associated with a block or section stored in said first memory means if that stored block or section substantially matches (either exactly or to within a predetermined tolerance level) the block or section of said image and for storing any combination of pixels which does not substantially match any of the stored blocks or sections, together with unique identifying data, in one or both of said first or second memory means . This newly-stored combination of pixels and associated unique identifying data is transmitted with the rest of the encoded data to update the code book in the end user terminal .
Further, in accordance with a seventh aspect of the present invention, there is provided apparatus for receiving and decoding image data, comprising means for receiving unique identifying data corresponding to combinations of pixels making up an image, first memory means for storing a plurality of blocks or sections having different predetermined combinations of pixels and identifying data corresponding to each combination, one or more second memory means for storing unique identifying data corresponding to a subset of the blocks or sections stored in said first memory means, means for comparing the incoming identifying data with the identifying data stored in the or a selected one or more of said second memory means, means for identifying the pixel blocks or sections to which said data corresponds, and means for outputting the respective pixel blocks or sections.
Also in accordance with the seventh aspect of the present invention there is provided a method of receiving and decoding image data, comprising the steps of receiving unique identifying data corresponding to combinations of pixels making up an image, storing in first memory means a plurality of blocks or sections having different predetermined combinations of pixels and identifying data corresponding to each combination, storing in one or more second memory .means unique identifying data corresponding to a subset of the blocks or sections stored in said first memory means, comparing the incoming identifying data with the identifying data stored in the or a selected one or more second memory means, identifying the pixel blocks or sections to which said data corresponds, and outputting the respective pixel blocks or sections.
The actual pixel blocks or sections may also be stored in the second memory means, or they may be retrieved from the first memory means using the unique identifying data in the second memory means as a reference.
Information relating to the code book(s) used to encode an image or a frame of, for example, a video sequence may be transmitted at the start of each stream of identifying data relating to that image or frame, so that the receiver knows which code book to use to decode the data and constitute the image or frame. If more than one code book is used and information relating to new code book entries is also transmitted, it should preferably include information as to which code book (i.e. second memory means) it should be added to, to ensure correct duplication of code books in the receiver. The new entry would beneficially also be added to the reference library (first memory means) of the receiver.
Thus, smaller libraries can be built using codes from a master library, and this process does not need to be performed in real time. In fact, the choice of codes making up a library may be hard coded into the encoding and decoding software. The smaller code books provide a translation between a subset of the master library and the master library, but allow smaller bit streams to be used to identify each pixel block, and substantially reduce processing time required for the comparison of a pixel block or section of an image or frame and the code book entries.;
In one preferred embodiment of the present invention according to any one of the seven aspects, a library of different predetermined combinations of pixels is stored in the code book or memory means, the library comprising one or more source blocks consisting of colour and/or pixel data representing a predetermined combination of pixels, and one or more transform means which, when applied to a source block, produces a block having a different predetermined combination of the same pixels, thereby enabling a very large number of blocks to be stored in a relatively small memory capacity.
It would, of course, be advantageous to further compress the encoded data stream to minimise the data required to represent an image or video sequence to ensure that the maximum picture size and frame rate can be achieved for a connection of fixed speed.
Known statistical compression techniques such as Huffman and Arithmetic coding, and their derivatives, rely 'on knowledge of the probability of a particular sequence of data appearing in a data stream. The sequence with the highest probability of occurrence is coded with the least number of bits. For example, in the English Language the letter E appears mos.t frequently. In a statistical compression process the letter E would be encoded with the least number of bits as it has to be coded most often. A letter such as Z which appears relatively infrequently would be encoded with more bits.
In an image streaming system for encoding image data, the probability of a particular combination of pixels appearing in a single image or sequence of frames may be variable, which minimises the benefit of statistical compression.
We have now devised an arrangement which enables an image data stream to be compressed using statistical compression techniques.
In accordance with an eighth aspect of the present invention, there is provided apparatus for processing image data, comprising means for dividing an image into blocks or sections made up of a plurality of pixels, memory means for storing a plurality of blocks or sections having different predetermined combinations of pixels, each of said combinations being associated with unique identifying data, means for comparing each block or section of said image with the blocks or sections stored in the memory means and identifying a combination of pixels which substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in said block or section of said frame, and means for outputting the identifying data associated with said matching combination, wherein one or more of said stored combinations of pixels is allocated a probability of occurrence in an image or sequence of images, the unit length of said unique identifying data associated with said at least one stored combination being dependent on the probability of occurrence in an image or sequence of images of said combination of pixels.
Thus, the combinations of pixels which are most likely to occur, i.e. ' those expected to occur most often, can be allocated unique identifying data comprising a relatively short bit stream, and the combinations less likely to occur can be allocated longer bit streams.
Also in accordance with the eighth aspect of the present invention, there is provided a method of processing image data, comprising the steps of dividing an image into blocks or sections made up of a plurality of pixels, storing a plurality of blocks or sections having different predetermined combinations of pixels, each of said combinations being associated with unique identifying data, comparing each block or section of said image with the stored blocks or sections and identifying a combination of pixels which substantially matches (either exactly or to within a predetermined threshold level) the combination of pixels in said block or section of said image, and outputting the identifying data associated with said matching combination, wherein at least one of said stored combinations of pixels is allocated a probability of occurrence in an image or sequence of images, the unit length of said unique identifying data associated with said- at least one stored combination being dependent on the probability of occurrence in an image or sequence of images of said combination of pixels. The probabilities of occurrence of at least some of the stored pixel combinations may beneficially be stored along with the unique identifying data associated with each combination. The probabilities of occurrence of at least some combinations of pixels may be predefined, i.e. hard-coded into the system. For example, the combination of all green pixels in a video sequence of a football match will have a very high probability of occurrence.. In addition, or alternatively, once an image frame or sequence has been encoded, but before it is transmitted, the system may be arranged to determine the number of occurrences of each combination of pixels and allocate probabilities and identifying data accordingly, thereby ensuring that the minimum amount of data is used to represent every image or frame. In this case, however, the combinations of pixels for which the identifying data has been changed would ideally need to be sent (together with the respective new identifying data) to the receiving user end terminal to update the code book so enabling it to decode the incoming data stream. Obviously, if a large number of pixel combinations have changed their identifying data, then a large amount of additional data needs to be sent to update the receiver code book, with the possibility of actually increasing the amount of data required to transmit a sequence .
Thus, in a preferred embodiment, updated probability/ identifying data is only transmitted for combinations of pixels which occur most commonly. The exact number of combinations which fit this classification may be user-defined or hard-coded into the system.
In -one embodiment, a predetermined number of combinations and their updated probabilities are transmitted, with all other probabilities set to an equal level. In an alternative embodiment, the predetermined number of combinations and their updated probabilities are normalised against the original code probabilities.
In either case, provided the encoding and decoding systems handle the remaining combinations and probabilities in the same way, only the newly-identified highest probabilities need be transmitted.
Also in accordance with the present invention, there is provided an apparatus and/or method comprising two or more of the ten aspects of the invention as defined above.
In all cases, the image data may comprise one or more still images, such as photographs or moving images, such as video data made up of a plurality of frames.
Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which:
FIGURE 1A is a schematic diagram to illustrate a first neural network training process;
FIGURE IB is a schematic diagram to illustrate a second neural network training process;
FIGURE 2 is an image divided into equally sized pixel blocks;
FIGURE 3 is a schematic block diagram of apparatus according to a first embodiment of the present invention; FIGURE 4 is a schematic block diagram of the code book and pixel block analysis/comparison means of the apparatus of Figure 3 ;
FIGURE 5 is an example of an image which can be encoded using different sized pixel blocks; FIGURE 6 is an image divided into different sized pixel blocks ;
FIGURE 7 is a schematic block diagram of apparatus according to a second embodiment of the present invention;
FIGURE 8 is a schematic diagram of the combination of pixels required to depict a predetermined element of an image which occupies a single pixel in a 3 x 3 block of pixels;
FIGURE 9 is a schematic diagram of the combinations of pixels which may be required to depict a predetermined element of an image which occupies more than one pixel in a 3 x 3 block of pixels;
FIGURE 10 shows two adjacent frames of a video sequence ;
FIGURE 11 is a schematic block diagram of the operation code book and pixel block analysis/comparison means of Figure 4 when there has been a single pixel shift between adjacent frames of a video sequence;
FIGURE 12 is a schematic diagram of the part of the apparatus according an aspect of the present invention;
FIGURE 13 is a schematic diagram of a single master library and several smaller libraries for use in an embodiment of the invention;
FIGURE 14 is a schematic diagram illustrating a source block and a relational block of pixels;
FIGURE 15 is a schematic diagram illustrating various pixel block transforms which can be used in an exemplary embodiment of the invention;
FIGURE 16 is a schematic diagram illustrating the method of locating delta blocks using run length which is employed in an exemplary embodiment of the invention; FIGURE 17 is a schematic diagram to illustrate the operation of an aspect of the invention; and
FIGURES 18 - 21 are tables illustrating the operation of an aspect of the invention.
Neural networks generally comprise of thousands of individual neurons, analogous to nerves in the human brain.
Each neuron is capable of only simple calculations, e.g. summing inputs and applying a threshold function to determine if the neuron will have an output . The outputs of the neurons can be weighted and categorised to determine the best solution to an input problem. Consider the simple two-dimension space shown in Figure 1. An input PI is received by a neural network, and the user trains the neural network that it should be classified as "type X, with a tolerance of a certain radius. A second input P2 is received and the user trains the neural network that it should be classified as type Y. A third input P3 is received that is within the tolerance of PI, P3 is therefore classified as also being of type X. This is known as the radial based function (RBF) approach. By more and more training it is possible to completely cover the two dimensional space so that any input can be classified as being of a certain type.
An alternative to the RBF approach for mapping an n- dimensions space is the K Nearest Neighbour (KNN) technique. Referring to Figure IB, a number of "prototype" blocks P1,P2,P3,P4 are stored, which are representative of different combinations of pixels, and their field of influence. An unknown input is classified by measuring the distance to the stored prototypes and choosing the one that is nearest . In this approach, the field of influence is no longer spherical as with the RBF technique described with reference to Figure 1A. Instead, the space shown in Figure IB is mapped using four prototypes each having a defined field of influence.
One of the many applications of neural networks is pattern recognition. Neural networks can be trained to recognise patterns such as letters and digits. In this invention, such recognition has been further developed, with the neural network being trained to recognise and classify combinations of pixels that make up an image. Figure 2 shows an example of a frame 10 of a video sequence that has been split into component blocks 12. Each block can be any shape, size or combination of colours. However, in the following description, each block comprises 3 by 3 pixels. Referring to Figure 3 of the. drawings, apparatus according to- an exemplary embodiment of the present invention comprises a sampling circuit 20 which receives and samples a digital video signal 22. The sampling rate of the sampling circuit 20 is user-defined in accordance with the data throughput capability of the network across which the video signal is to be transmitted. A typical PAL video signal may have a rate of 25 frames per second, whereas a typical low speed communications network may only be capable of handling 5 frames per second. Therefore, in this case, every fifth frame would be sampled.
The sampled data is converted by a file formatter 24 into a format for inputting information into a neural network 26 for analysis. The neural network 26 is pre-programmed with information such as image size and shape, colour depth and desired block size and shape, so that it divides the incoming image into a plurality of blocks for analysis using a code book 28.
Referring to Figure 4 of the drawings, the code book 28 stores a plurality of blocks 30, each of which has a different combination of pixels. A block of pixels is input to the neural network 26, which then analyses the block and compares the combination of pixels with the combinations stored in the code book 28. The combinations held in the code book are set with tolerance levels, so that incoming blocks do not need to be identical to the stored combinations in order to be considered to match. The neural network 26 identifies the stored combination which most closely matches the block 33 under analysis, within the predetermined tolerance level and outputs a data stream 34 corresponding to that combination to a data buffer 37.
In the example shown, the block 33 being analysed most closely matches Block 3 in the code book, so the neural network outputs 0011 which is the binary representation of the number 3. Referring back to Figure 3 of the drawings, in some cases, the input blocks being analysed will not fall within the tolerance levels of any of the blocks stored in the code book 28. When this occurs, the unrecognised block is dynamically added (at 35) to the code book 28, together with—a -unique identifying number as well as the relevant red, green and blue or grey scale colour information for each pixel in the block. The newly-stored block remains stored in the code book 28, complementing the other entries, for use when categorising other, subsequent blocks. The new block information stored in the code book 28 is also duplicated and sent to a file formatter 36, where it is merged with the identifying data relating to each frame in a sequence output by the neural network 26 as a result of the analysis process. The output file output by the file formatter 36 is added to a server 38 for transmission over a data or telecommunications network 40 at the request of an end user.
When an end user connects to the server 38 over the network 40 and selects a video stream, the formatted data held on the server 38 is transmitted across the network 40 to a data buffer 42 in the end user's terminal device, which may be a personal computer, games console, mobile telephone or other multimedia device.
The end user's terminal device includes software adapted to process and view the video stream produced by the apparatus. The software includes a duplicate (code book 44) of the code book 28 used to classify each block of pixels during the encoding process .
The incoming data is examined for any additional code book information, generated by unrecognised blocks in the encoding process. The information is used (at 46) to update the code book (44) and maintain synchronism between the code books 28,44 in the encoding and decoding processes respectively.
The received data is analysed on an identifying code by code basis. For a specific code, e.g. 0011, the terminal simply looks up the block information held in the code book 44 (at 48) for that code and determines the pixel information necessary - for display. This process is applied across the entire input stream to reconstitute each frame (at -5-0) .and in time the complete sequence of frames.
The sequence is displayed on a screen 52. It is important to note that the encoding process uses the Neural Network for significant levels of processing to encode the video stream, but the terminal device is simply required to look up the pixel information and does not require a Neural Network to decode the stream.
One of the key advantages of using the approach of the' present invention to video streaming is that it is only necessary to send a digital code to uniquely identify a block of pixels, instead of detailed colour information. This can lead to a significant reduction in the amount of data to be transmitted.
Furthermore, the streaming solution can be tailored so that adjacent frames are analysed to see which blocks have changed, corresponding to motion in the sequence. It is only necessary to send information on the blocks that have changed between frames rather than the entire frame of blocks, which leads to further reductions in data transmission.
The quantity of data transmission can be yet further reduced by a second embodiment of the present invention.
Referring to Figure 5 of the drawings, in general, the use of smaller blocks gives better picture quality. In the area of the frame 10 marked 1 , picture quality is relatively much more important than in the areas marked 2 and 3 , where the areas are of substantially the same colour or shade with relatively few details, so relatively small blocks should be used to code area 1 but larger blocks can be used to code areas 2 and 3 , thereby reducing the amount of data to be transmitted with little impact on picture quality. There are two ways in which the frame can be coded using different sized blocks. The first is through user control, and the second is through analysis by the Neural Network.
In the case of user control, it is possible for the user to define a minimum of two areas in the frame 10 and use larger blocks 60 in one area and smaller blocks 12 in another. Typically the smaller blocks 12 would be in the centre of the frame 10 as shown in Figure 6 of the drawings, and the larger blocks 60 toward the edges. Alternatively, the Neural Network can be programmed to identify areas where larger blocks can be used instead of smaller ones. This may be achieved by analysing a frame in parallel across two Neural networks analysing the smaller blocks 12 in a frame 10 using one of the networks and analysing the larger blocks 60 using the other neural network or the same neural network which switches between different code books can be used for both block sizes. Of course, many different sized blocks can be used to code a frame, depending on user requirements and specifications. Figure 7 shows an example of two neural networks analysing an input frame using different sized code books. A frame in a video 10 is input to the two neural networks 26a, 26b.
For the purpose of this example, code book 70 is made up of codes relating to n x n pixel blocks, and code book 72 is made up of codes relating to 2n x 2n pixel blocks, where n is an integer greater than 1.
As the second code book 72 has larger blocks it is statistically less likely to find an exact or close match to an input block, as the potential variation in the block is very much higher for 2n x 2n pixels than for n x n pixels, even though the tolerance levels for the 2n x 2n blocks can be set higher than for n x n blocks. As a result, coding the entire frame at 2n x 2n would not give as good a picture as at n x n, although there will be cases where close matches to a 2n x 2_π block might be found.
The two neural networks 26a, 26b analyse the frame 10, in parallel. The output n x n analysis is carried out as described with reference to Figure 4 of the drawings^using code book 70, and the output from the n x n analysis is stored (at 78) . If the 2n x 2n analysis, which is carried out as described with reference to Figure 4 using code book 72, does not identify any suitable blocks, then only the n x n blocks are used for encoding the frame . If however there are suitable 2n x 2π blocks which fall within the tolerance levels set for code book 72, the corresponding codes are used to replace (at 80) the relevant codes for the n x n blocks occupying the same physical location in the frame. The revised combination of n x n and 2n x 2n codes is then output at 82.
The end user's terminal device includes both the n x n and the 2n x 2n code books, so that it can reconstitute each frame of a video signal and output it as described with reference to Figure 2. Of course, both code books can be updated with new pixel blocks if necessary, as previously described.
It is important to recognise that the mixing of n x n and 2n x 2n block coding may have ramifications for the layout for the blocks in the frame. Therefore, in order to ensure that the image is reconstructed correctly it may be necessary to add location information to the 2n x 2n blocks to exactly identify the co-ordinates that the block should occupy in the frame .
The following relates to means for identifying and encoding important contextual objects in frames of a video sequence .
Referring to Figure 8 of the drawings, for particular types of video it is possible to pay specific attention to certain objects. Consider again the example of a football in a video of a soccer match. When the football is far from the camera it might only occupy a single pixel on screen.
If the video sequence is being coded by a neural network with 3 x3 pixel blocks, the ball can appear_in any one of 9 positions in the block. In order to ensure that the ball is always in picture and not lost in the compression, the code book which acts as the look-up table for the blocks in the neural network system is taught each of nine different options 84a-84i for the ball against a green (grass) background, as shown in Figure 8.
As the ball can appear randomly in any location on screen, it is necessary to ensure that the nine blocks 84a-84i shown in Figure 8 appear in the neural network code book. For any input block with a football, one of these blocks can be used to identify the location of the ball on screen.
If the ball is nearer the camera it might take up more pixels as shown at 86 of Figure 9, where the central white pixel is surrounded by some blurring. In this case it is necessary to learn the additional blocks shown in Figure 9, which represents all of the possible combinations of pixels for any pixel block including the ball.
This recognition approach and the teaching of the code book to look for key objects in the video sequence can of course be applied to a wide range of objects. The following relates to a means for accounting for slight pixel shifts between frames, caused by problems such as camera shake, when encoding streamed video.
The following also offers the potential for inter-frame compression by only requiring the transmission of blocks that have changed between adjacent frames.
For optimum compression between frames, little or no movement should be observed. While this might look to be the case upon casual observation, even changes as small as 1 pixel in any direction can cause the complete structure of a block or section pattern to change causing the frame to be entirely re-coded and transmitted.
Figure 10 shows adjacent frames from a video sequence. While they might look identical, the image in the second frame 88 is one pixel higher in the vertical direction than—the image in the first frame 90. If the image is being coded into blocks of 3 pixels by 3 pixels using a neural network, the pixel information in block 92 is shifted relative to the same block 94 in the previous frame.
As a result when coded using a neural network, as shown in Figure 11, the input block is shifted up by one pixel, with the result that a different code book entry provides the best classification and a different code is output. The situation is mirrored in all of the other blocks in the frame.
A single pixel shift has caused the entire frame to be re-transmitted at the cost of valuable data transmission capacity. These sorts of pixel-level discrepancies between adjacent frames are very common. Two of the main causes are slight camera shake when the video is being shot and errors at some stage in the video post production process. This aspect of the invention includes apparatus and a technique for compensating for the shift in pixels in any direction, thereby optimising the frame sequence for inter- frame compression and avoiding unnecessary data transfer.
Changes in the block structure of adjacent frames will usually be as a result of some change in the scene of video, e.g. changing from one camera angle to another. This embodiment of the invention performs some simple testing to see if there is evidence of an unstable source. Briefly, to achieve this, the neural network performs a series of tests on a frame, shifting it by one or more pixels in one or both of the x and y directions, re-encoding and comparing with the previous frame .
Figure 12 shows the sequence of actions to achieve this solution in more detail. A frame (n) 20 is input to the neural network 22 and encoded using the code book 28. A . test is performed to determine whether a given number of blocks have changed since the previous frame (n~l) . The threshold for this, comparison, % may be under the control. of the user or hard- coded into the system. If less than the threshold number of blocks have changed, the codes representing the changed blocks are output •at 102.
If however, more than the threshold number of blocks have changed, it could be indicative of a change of scene, or an unstable source, perhaps caused by camera shake. The frame
(n) is shifted by one pixel sequentially in each direction at
104, and re-encoded by the -neural network at 106. If at any point in these cycles the number of blocks changing between frames n and n -1 falls below the threshold, the codes representing the blocks are output and the process moves on to frame n+1.
In this way it takes four feedback loops to test for a pixel's worth of movement in each direction. Similarly, eight or more feedback loops could be used for 2 or more pixel shifts in each direction.
If, after testing for the defined number of shifts in each direction the number of blocks changing is still above the threshold, the system effectively concludes that the reason is due to change of scene and not due to an unstable image. The system then codes all blocks in frame n without any pixel shifts and moves on to the next frame.
The following relates to a means of flexible management of the code books required to encode a frame in a video sequence using neural networks . To produce a reasonable representation of an image when encoded and decoded using the neural network approach, described above with reference to Figures 1 to 4 in particular, it may be necessary to have access to a very large number of reference blocks to choose from, in some cases over 100,000. This number is of course proportional to the size of the block of pixels. An image coded using 3 x 3 pixels will need a lot less reference blocks than an image code using 8 x 8 blocks. This is because larger blocks lead to far greater combinations of colours . As described with reference to Figure 4 of the drawings, an input block 33 is compared with all of the blocks held in a code book 28.
The neural network 26 compares the input block 33 with every block held in the code book 28. Working under rules established by the user and/or the system, the neural network can determine that Block 3 is the best match. The output of the process is the identifier of the specific block in the code book, which in this case is Oil, which is the binary representation of the number three. However, the neural network may impose a practical limit on the number of blocks that can be analysed simultaneously. It is usually not possible to encode an entire video sequence using a single code book of a few thousand entries, which may be the limitation of the neural network. Even if it were possible to look up an input block against tens of thousands of reference blocks in a code book, the more reference blocks available, the more bits required to uniquely identify them and as a result the data transmission requirement starts to increase. For example for a code book with 256 entries, only 8 bits are required to uniquely identify each entry. For a code book with 20,000 blocks 15 bits would be required for each code.
This aspect of the invention provides multiple code books for use in a sequence of video to provide an overall reference block count of many tens of thousands, while at the same time keeping the data addressing requirements of the code books to a minimum. In this way the video quality can be optimised without compromising the limited data capabilities of some network connections. An exemplary embodiment of this aspect of the invention uses a master library 110 of codes and then chooses blocks from this library to make up several smaller code books. Figure 13 shows an example of a master library of blocks of 3 x 3 pixels. In this example it is assumed that there are up to 65,000 entries in the master library, and as such 16-bit addressing is required to uniquely identify each block.
When performing the encoding process using a neural network, it may be impossible to compare an input block against every block in the master library due to the restrictions on processing time when dealing with a live sequence of video frames. This becomes much more likely as the size of the master library increases. Additionally it may be undesirable to send 16-bits to identify each block as this can result in an excessive burden for, in particular, low speed connections in a network.
Instead, a number of smaller libraries 112 are built using codes from the master library 110. This process need not be done in real time. In fact the choice of codes making up a library may be hard-coded into the encoding and decoding software.
The object of the smaller code books 112 is to provide a translation between a subset of the master library 110 and the library itself. In the example of Figure 13, the code book uses 12-bit addressing and then each entry is translated to a 16-bit address to uniquely identify the block in the master library 110. 'With 12-bit addressing each code book 112 would have up to 4,096 entries. When a frame of video in the neural network uses the 12 -bit code book 112 for each input block and only 12 -bits to encode each block. The decoder needs to know which code book to use for decoding. This information can be set at the start of each frame. Allocating 8 bits -.in each frame would allow for 256 code books, each drawn from the population of the master library 110. In this way it is possible to create code books for certain types of video and instruct the neural network to use these when possible. For example a football match may have a typical code book, or a scene featuring a particular palette of colours may be more suitable for encoding using one particular code book over another.
The benefit of this approach is that the number of bits required for addressing the codes is minimised, while maintaining the advantage of the large number of codes in the master library to choose from. If the neural network processing capability allows, the apparatus may be arranged so as to try one or more code books for every frame. In this way the best image quality for the frame can be achieved. However, there needs to be a balanced approach as changing code books for each frame may remove the potential for inter-frame compression, where only codes identifying blocks which have changed are transmitted.
Applying the code book management process as described above is complementary to the dynamic code book management described above where unknown blocks are added to the code book at both the encoder and decoder. In this case, if a combination of pixels from a frame cannot be matched to an entry in one of the smaller code .books, the system can compare it against the master library (or one or more of the other smaller code books) , and only send the combination of pixels itself and newly-allocated identifying data to update the receiver code book, if it cannot be matched anywhere.
The colour information for all of the pixels in the master library makes it a very large file, especially if it is made up of several hundred thousand or more blocks . For example a library of say 1 million blocks, each of
16 pixels and 24 bit colour would require 1,000,000 x 16 x 24 = 384Mbits or 48MB of storage. This would be a significant problem for playback devices with a low technical specification, e.g. mobile terminals. An alternative approach is to classify the blocks in two ways. The first type is a source block which has the same colour and pixel data as described above. The second type is a relational block derived from the source block. The pixel and colour data for this block is not stored, but Ts instead derived by applying a transform to the relevant source block. Consider the example in Figure 14. The block on the left is a source block. Its pixel and colour data will be stored in the library. The relational block on the right is the same block rotated through 90 degrees. Its colour data can be derived by applying a known transform to the correct source block.
Using the approach of source and relational blocks it is possible to define a set of transforms . The following shows 16 different transforms that might be used. Figure 15 shows how these transforms would look when applied to a source block.
Figure imgf000034_0001
Figure imgf000035_0001
In this case only 1 in 16 of the blocks in the code book will be a source block with colour pixel data. The rest can be identified by a 4-bit transform code and the identity of the source block to which the transform is to be applied. Using this approach a library of lm blocks of 16-pixels may be created using 65,536 source blocks, each with potential for 16 different transforms. The space required for the source blocks will be: 65,536 x 16 x 24 = 3.15MB
The space required for each relational block will required 16 bits to identify the correct source block and 4 bits to identify the transform. Assuming that the remainder of the lm blocks is made up of relational blocks, the additional space would be:
934,464 X (16 + 4)= 2.33MB This would give a total library size of 5.48MB, which is very much smaller than the 48MB required to store the library if only source blocks are used.
In a known video streaming system using neural networks, such as that described in US-5, 638, 125, inter-frame compression is achieved by comparing adjacent frames in a sequence and only sending information on blocks that have changed between the frames. These are known as Delta Blocks.
Information on blocks that have changed carries an overhead in terms of location of the block. For example a
'frame consisting of 256 blocks requires that each delta block have an 8-bit address to uniquely identify its location in the frame.
There are many instances where changes in the image have occurred between adjacent frames, but it is more efficient to send a new key frame than it is to send the block change and address information. For example a key frame with say 256 blocks and 16-bit coding per block requires 4,096 bits (ignoring the header). Suppose that only 70%, or 179 blocks, change between this frame and the next one. Applying inter-frame compression it is possible to send only the 350 delta blocks to recreate the image. However, adding an 8 -bit address to identify the position in the frame for each of the delta blocks makes the new frame size:
179 x (16 + 8) - 4,296 bits This shows that sending this amount of block changes is less efficient than sending the frame in its entirety.
The need to add an address to identify the unique location of the delta block would severely limit the value of inter frame compression. An alternative to the unique identifier is to make use of a run-length approach. In this case the delta frame is made up as shown' in Figure 16. 37 low by neural network standards but it is a convenient basis for the explanation) . The probability of a block appearing in an input frame can be determined by experiment and this information may be held permanently in the encoder and""decoder. Consider the situation where 30 codes are used as shown in Figure 18, along with the probability of occurrence expressed as a percentage at 120. For ease of explanation in this case the probability decreases as the code number increases. If an input frame is encoded and the occurrence of the codes broadly matches the stored probabilities, then statistical compression is likely to offer significant advantages in the reduction of the amount of data required to transmit the information.
If, however, the occurrences of the codes are as shown in Figure 19, then applying statistical compression using the stored probability information will actually increase the amount of data required to transmit the information.
One solution to the problem would be to completely refresh the probability information in the code book with the new data and send this information to the end user for updating the probability information used in decompression.
While this might be possible with a small code book, this is not likely to be possible with a neural network using thousands of codes in a code book and restricted to low speed data connection.
This aspect of the intermediate course of action where updated probability information is sent involves only codes that occur commonly. The exact number of codes that fit this classification can be determined by the user. A predetermined process, identifies the process (see below) for dealing with the other codes and this process is followed by both the compressor and decompressor ensuring that synchronisation of the probability data is maintained.
Two alternative processes for dealing with the dynamic probability information are proposed. In the first process a 35
This example shows a part of a delta frame. The black blocks are delta blocks, i.e. those that have changed since the previous frame. The first block that is sent is a reference block (shown in grey) occupying the top left corner of the frame. From this point on, blocks are written and read from left to right. The first delta block is nine places to the right of the reference block. This is run length Rl . It is only necessary to send this number, rather than an x, y coordinate to identify the location of the block. After sending the code for the first delta block, the run length to the next delta block (R2) is sent.
Using this approach it is possible to choose a maximum of say a 4 -bit run length. In cases where the run length between data blocks is greater than 16 blocks, an additional reference block is sent as shown at the end of row 2.
Taking the earlier example, the delta frame size is now:
179 x (16 + 4) = 3,580 bits This is more efficient than sending the key frame, which is 4,096 bits.
Although the starting point was chosen to be the top left corner of the frame, the technique can be applied from any point so long as it is consistently applied.
The following relates to a means of further compressing the encoded data stream output from a neural network when used to encode video for streaming over a network, as described above .
This aspect of the invention concerns means of applying statistical compression techniques to the output of the encoding process by including information in the data stream concerning the probability of appearance of a particular code book entry. This data can then be used for the encoding and decoding process to maximise potential for statistical compression. With particular reference to the aspect of the 36 invention described with reference to Figure 13, large code books or libraries may be split into smaller more manageable sized code books to match the limitations of the neural network. In this way it is possible to develop different code books for different applications. For example a specific palette of codes may be used to create a code book for a video such as a soccer match. In this case, as the green field will appear in most frames, it should be possible to identify that a code identifying a pixel 'block with all green colours is more likely to appear than most others. This code can then be allocated a higher probability of occurrence than other codes. By experience it will be possible to determine the average probability of occurrence of any code for a broad classification of input frames. The probability of a code being used can be added to the code book information and duplicated at the encoder and decoder. Figure 17 shows an example of a code book 28 that contains code probabilities. This information can be included in the software used to decode video streams encoded using any aspect of the present invention.
While this type of "fixed" approach to probability coding can provide some significant benefits, there are some situations when the distribution of block in a particular frame does not match the probabilities held in the code book. This might have a negative effect of statistical compression, meaning that the compression process adds to the data requirement, rather than subtracts from it.
To avoid this problem, by means of another aspect of the invention, it is possible to dynamically update the probability information on a frame by frame basis and include this information for transfer to the decoder where it can be used to update the probability information held locally in memory.
By way of example, consider a neural network with 30 neurons and a code book of 30 entries. (This number is very predetermined number of codes and their probabilities are sent to the decompressor and all other probabilities are set to an equal level. In the frame considered in Figure 19, the new probability information used by the compressor and de"cbmpressor will be as shown in Figure 20. The probabilities of the less frequent codes have been equalised.
In the second process the highest probabilities are again transmitted but the remaining -probabilities are normalised against the original code probabilities for the code book (as shown in Figure 18) . This means that code which had a high probability of occurrence in the original code book will have higher probability of occurrence than others when normalised. Figure 21 shows how the data in Figure 19 will appear under this scheme. The less frequently appearing codes are now in the same order as the original table see 140, i.e. ascending numerically in this case.
As long as the compressor and decompressor follow the same rules, only the probabilities for the highest occurring codes need to be sent . This means that statistical compression can be effective for only a small overhead in the management of the probability data.
Specific embodiments of the invention have been described above by way of example only. For example, still images, such as photographs, can be compressed, transmitted and decompressed in the same manner as has been disclosed in relation to individual video frames . It will be appreciated by persons skilled in the art that several modi ications are possible without departing from the scope of the invention as defined by the appended claims.

Claims

Claims
1) Apparatus for processing image data, comprising means for dividing an image into blocks or sections made" up of a plurality of pixels, code book or memory means for storing a plurality of blocks having different combinations of pixels each of said combinations being associated with unique identifying data, means for comparing each block of said image with said stored blocks or sections and identifying a combination of pixels from said code book means which substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in said block or section of said image within a predetermined tolerance level .
2) Apparatus according to claim 1, wherein said image data comprises one or more still images, or moving images such as video data made up of a plurality of frames.
3) Apparatus according to claim 1 or claim 2, including means for determining that said combination of pixels in said block or section of said image does not match any of the combinations of pixels, stored in said code book means, either exactly or to within said predetermined tolerance level, and means for storing said unmatched combination together with unique identifying data in said code book means.
4) Apparatus according to any one of claims 1 to 3 , wherein said identifying data is a bit stream.
5) Apparatus according to any one of claims 1 to 4 , wherein each pixel block or section comprises a rectangle or square consisting N x M pixels, where N and M are integers greater than 1. 6) Apparatus according to any one of claims 1 to 4, wherein said pixel blocks or sections comprises an abstract shape consisting of a plurality of pixels.
7) Apparatus according to any one of claims 1 to 6, including means for combining said pixel blocks or sections to form relatively larger pixel blocks or sections, second memory or code book meant for storing different combinations of pixels for said relatively larger pixel blocks or sections, each of said combinations being associated with unique identifying data, means for comparing each relatively larger block or section of said image with said stored relatively larger blocks or sections stored in said second code book means and identifying a combination of pixels from said code book means which substantially matches the combination of pixels in said block or section of said image, either exactly or to within a predetermined tolerance level, and outputting the identifying data associated with said matching combination.
8) Apparatus according to any one of claims 1 to 7 comprising means for comparing corresponding pixel blocks or sections from adjacent images or frames of, for example, a sequence of video data, means for identifying any differences in the blocks or sections (corresponding, for example, to motion in a video sequence) and means for only outputting identifying data corresponding to blocks or sections which have changed.
9) Apparatus according to any one of claims 1 to 8 including a neural network.
10) A method of processing image data, comprising the steps of dividing an image into blocks or sections made up of a plurality of pixels, storing a plurality of blocks or sections having different combinations being associated with unique identifying data, comparing each block or section of said image with said stored blocks or sections, identifying a stored combination of pixels which substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in said block or section of said image, and outputting the identifying data associated with the matching combination.
11) A method according to claim 10, wherein said image data comprises one or more still images, or moving images, such as video data made up of a plurality of frames.
12) Apparatus for processing video image substantially as herein described with reference to the accompanying drawings.
13) A method of processing image data substantially as herein described with reference to the accompanying drawings.
14) Apparatus for receiving and decoding image data, comprising means for receiving identifying data corresponding to different combinations of pixels making up an image, code book or memory means for storing a plurality of blocks or sections having different combinations of pixels and identifying data corresponding to each combination, means for comparing the incoming identifying data with the identifying data stored in said code book means, means for identifying the pixel block or section to which said data corresponds, and means for outputting the respective pixel block or section.
15) Apparatus according to claim 14, including means for receiving a block or section having a combination of pixels, and its unique identifying' data, which is not stored in said code book means, and for storing the new combination together with its unique identifying data in said code book means . 16) Apparatus according to claim 14 or claim 15, wherein said image data comprises one or more still images, or moving, images, such as video data made up of a plurality of frames.
17) A method of receiving and decoding image data comprising the steps of receiving identifying data corresponding to different combinations of pixels making up an image, storing- a plurality of blocks or sections having different combinations of pixels and identifying data corresponding to each combination, comparing the incoming identifying data with the stored identifying data, identifying the pixel block or section to which said data corresponds, and outputting the respective pixel block or section.
18) A method according to claim 17, wherein said image data comprises one or more still images, or moving images, such as video data made up of a plurality of frames.
19) Apparatus for receiving and decoding image data substantially as herein describe with reference to the accompanying drawings .
20) A method of receiving and decoding image data substantially as herein described with reference to the accompany drawings .
21) Apparatus for processing image data comprising means for dividing an image into blocks or sections made up of a plurality of pixels, code book or memory means for storing a plurality of blocks or sections having different combinations of pixels, each of said combinations being associated with unique identifying data, means for comparing each block or section of said image with said stored blocks or sections, and means for determining that a combination of pixels in said block or section of said image does not match (either exactly or to within a predetermined tolerance level) any of the combinations of pixels stored in said code book means and means for storing said unmatched combination together with unique identifying data in said memory or code book means—
22) Apparatus according to claim 21, wherein said image data comprises one or more still images, or moving images, such as video data, made up of a plurality of frames.
23) A method of processing image data comprising the steps of dividing an image into blocks or sections made up of a plurality of pixels, storing a plurality of blocks or sections having different combinations of pixels, each of said combinations being associated with identifying data, comparing each block or section of said image with said stored blocks or sections, and determining that a combination of pixels in said block or section of said image does not match (either exactly or to within a predetermined tolerance level) any of the stored combinations of pixels and storing said unmatched combination together with unique identifying data.
24) A method according to claim 23, wherein said image data comprises one or more still images, or moving images, such as video data made up of a plurality of frames.
25) A video or image streaming system comprising apparatus according to any one of claims 1 to 8 and apparatus according to any one of claims 14 or 15.
26) A video or image streaming system according to claim 25, including apparatus according to claim 21.
27) A video or image streaming method comprising the method of claim 9 and the method of claim 17. 28) A video or image streaming method according to claim 27 including the method of claim 23.
29) Apparatus for processing image data, comprising means for dividing at least a portion of an image into blocks or sections made up of a plurality of pixels, code book or memory means for storing a plurality of blocks or sections having different combinations of pixels corresponding to the position of a predetermined element of said image relative to the other pixels in the block or section, each of said combinations being associated with unique identifying data, means for comparing each block or section of the image with said stored blocks or sections and identifying a block or section of pixels from said code book means which substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in said block or section of said image, and means for outputting the identifying data associated with said matching combination.
30) Apparatus according to claim 29, wherein said means for comparing each block or section of the image with said stored blocks or sections and identifying a block or section of pixels from said code book means which substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in said block or section of said image comprises a neural network.
31) Apparatus according to claim 29 or claim.30, wherein each of the stored blocks or sections of pixels corresponds to a different position of said predetermined element relative to the other pixels in the block or section.
32) A method of proceeding image data, comprising the steps of dividing at least a portion of an image into blocks or sections made up of a plurality of pixels-, storing a plurality of blocks or sections having different combinations of pixels corresponding to the position of a predetermined element of said image relative to the other pixels in the block or section, each of said combinations being associated w±-th-unique identifying data, comparing each block or section of the image with said stored blocks or sections and identifying a block or section of pixels from said stored blocks or sections which substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in said block or section of said image, and outputting the identifying data associated with the matching combination.
33) Apparatus for processing moving image data such as video data, comprising means for dividing an image frame into blocks or sections made up of a plurality of pixels, code book or memory means for storing a plurality of blocks or sections having different combinations of pixels, each of said blocks or sections being associated with unique identifying data, means for comparing each block or section of said image frame with said stored blocks or sections and identifying a combination of pixels from said code book means which substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in said block or section of said image frame, and means for outputting the identifying data associated with said matching combination, the apparatus further comprising means for determining the number of blocks or sections in the image frame whose combination of pixels has changed from the previous frame and outputting only the identifying data associated with the changed blocks or sections if the number of changed blocks or sections is less than a predetermined number.
34) Apparatus according to claim 33, wherein said predetermined number is user defined as m%, where m is a positive number. 35) Apparatus according to claim 33 or claim 34, wherein one or more feedback loops are provided at the point where the number of changed blocks or sections is compared with the predetermined number or value .
36) A method for processing moving image data such as video data, comprising the steps of dividing an image frame into blocks or sections made up of a plurality of pixels, storing a plurality of blocks or sections having different combinations of pixels, each of said blocks or sections being associated with unique identifying data, comparing the block or section of image frame with the stored blocks or sections and identifying a combination of pixels from said code book means which substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in said block or section of said frame, determining the number of blocks or sections in the frame whose combination of pixels has changed since the previous frame of a sequence, and outputting the identifying data associated only with the changed blocks if the number of changed blocks is less than a predetermined number.
37) Apparatus for processing image data, comprising means for dividing an image into blocks or sections made up of a plurality of pixels first memory means for storing a plurality of blocks or sections having different combinations of pixels, each of said combinations being associated with unique identifying data, one or more second memory or code book means for storing a plurality of blocks or sections, which are at least a subset of the blocks or sections stored in the first memory means, together with unique identifying data associated with each of the blocks or sections in said subset means for comparing each block or section of said image with the blocks or sections stored in the or one or more of the second memory means and identifying a combination of pixels from said second memory means which substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in said block of said image, and means for outputting the identifying data associated with said matching combination.
38) A method of processing image data, comprising the steps of dividing an image into blocks or sections made up of a plurality of pixels, storing in a first memory means a plurality of blocks or sections having different combinations of pixels, each of said combinations being associated with unique identifying data, storing in each of one or more second memory means a plurality of blocks or sections which are at least a partial subset of the blocks or sections stored in the first memory means together with unique identifying data associated with each of said block in said subset, comparing each block in said subset, comparing each block or section of said image with the blocks stored in the or one or more of the second memory means, identifying a combination of pixels from said second memory means substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in said block or section of said image, and outputting the identifying data associated with said matching combination.
39) Apparatus for receiving and decoding image data comprising means for receiving unique identifying data corresponding to combination of pixels making up and image, first memory means for storing a plurality of blocks or sections having different combinations of pixels and identifying data corresponding to each combination, one or more second memory means for storing unique identifying data corresponding to a subset of the blocks or sections stored in said first memory means, means for comparing the incoming identifying data with the identifying data stored in the or a selected one or more second memory means, means for identifying the pixel blocks or sections to which said data corresponds, and means for outputting the respective pixel blocks or sections.
40) A method of receiving and decoding image data, comprising the steps of receiving unique identifying data corresponding to combinations of pixels making up an image, storing in first memory means a plurality of blocks or sections having different combinations of pixels and identifying data corresponding to each combination, storing in one or more second means unique identifying data corresponding to a subset of the blocks or sections stored in said first memory means, comparing the incoming identifying data with the identifying data stored in the or a selected one or more second memory means, identifying the pixel blocks or sections to which said data corresponds, and outputting the respective pixel blocks or sections.
41) Apparatus or method according to any one of the preceding claims, wherein a library of different predetermined combinations of pixels is stored in said code book or memory means, said library comprising one or more source blocks consisting of colour and/or pixel data representing a predetermined combination of pixels, and one of more transform means which, when applied to a source block, produces a block having a different predetermined combination of the same pixels.
42) Apparatus for processing image data, comprising means for dividing an image into blocks or sections made up of a plurality of pixels, memory means for storing a plurality of blocks or sections having - different combinations of pixels, each of said combinations being associated with unique identifying data, means for comparing each block or section of said image with the blocks or sections .stored in the memory means and identifying a combination of pixels which substantially matches (either exactly or to within a predetermined tolerance level) the combination of pixels in said block or section of said frame, and means for -e-utputting the identifying data associated with said matching combination, wherein one or more of said stored combinations of pixels is allocated to probability of occurrence in an image or sequence of images, the unit length of said unique identifying data associated with said at least one stored combination being dependent on the probability of occurrence in an image or sequence of images of said combination of pixels.
43) A method of processing of image data, comprising the steps of dividing an image into blocks or sections made up of a plurality of pixels, storing a plurality of blocks or sections having different combinations of pixels, each of said combinations being associated with unique identifying data, comparing each block or section of said image with the stored blocks or sections and identifying a combinations of pixels which substantially matches (either exactly or to within a predetermined threshold level) the combination of pixels in said block or section of said image, and outputting the identifying data associated with said matching combination, wherein at least one of said stored combinations of pixels is allocated a probability of occurrence in an image or sequence of images, the unit length of said unique identifying data associated with said at least one stored combination being dependent on the probability of occurrence of an image or sequence of images of said combination of pixels.
44) Apparatus according to claim 41 or a method according to claim 42, wherein the probabilities of occurrence of at least some of the stored pixel combinations are stored together with the unique identifying data associated with each combination. 45) A video or image streaming system substantially as herein described with reference to the accompanying drawings .
46) A video or image streaming method substantially as herein described with reference to the accompanying drawings .
PCT/GB2001/001830 2000-05-03 2001-04-25 Video data transmission WO2001084849A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU52355/01A AU5235501A (en) 2000-05-03 2001-04-25 Video data transmission

Applications Claiming Priority (10)

Application Number Priority Date Filing Date Title
GB0010549.4 2000-05-03
GB0010552.8 2000-05-03
GB0010552A GB0010552D0 (en) 2000-05-03 2000-05-03 Video data transmission
GB0010549A GB0010549D0 (en) 2000-05-03 2000-05-03 Video data transmission
GB0015670A GB0015670D0 (en) 2000-05-03 2000-06-28 Video data transmission
GB0015668.7 2000-06-28
GB0015668A GB0015668D0 (en) 2000-05-03 2000-06-28 Video data transmission
GB0015670.3 2000-06-28
GB0104606.9 2001-02-23
GB0104606A GB2362055A (en) 2000-05-03 2001-02-23 Image compression using a codebook

Publications (1)

Publication Number Publication Date
WO2001084849A1 true WO2001084849A1 (en) 2001-11-08

Family

ID=27515937

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2001/001830 WO2001084849A1 (en) 2000-05-03 2001-04-25 Video data transmission

Country Status (2)

Country Link
AU (1) AU5235501A (en)
WO (1) WO2001084849A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7238697B2 (en) 2001-02-22 2007-07-03 Bayer Cropscience Ag Pyridylpyrimidines for use as pesticides
US9146747B2 (en) 2013-08-08 2015-09-29 Linear Algebra Technologies Limited Apparatus, systems, and methods for providing configurable computational imaging pipeline
US9196017B2 (en) 2013-11-15 2015-11-24 Linear Algebra Technologies Limited Apparatus, systems, and methods for removing noise from an image
US9270872B2 (en) 2013-11-26 2016-02-23 Linear Algebra Technologies Limited Apparatus, systems, and methods for removing shading effect from image
US9727113B2 (en) 2013-08-08 2017-08-08 Linear Algebra Technologies Limited Low power computational imaging
KR101781776B1 (en) 2014-06-27 2017-09-25 구루로직 마이크로시스템스 오이 Encoder and decoder
US9842271B2 (en) 2013-05-23 2017-12-12 Linear Algebra Technologies Limited Corner detection
US9910675B2 (en) 2013-08-08 2018-03-06 Linear Algebra Technologies Limited Apparatus, systems, and methods for low power computational imaging
US10001993B2 (en) 2013-08-08 2018-06-19 Linear Algebra Technologies Limited Variable-length instruction buffer management
US10460704B2 (en) 2016-04-01 2019-10-29 Movidius Limited Systems and methods for head-mounted display adapted to human visual mechanism
US10949947B2 (en) 2017-12-29 2021-03-16 Intel Corporation Foveated image rendering for head-mounted display devices
US11768689B2 (en) 2013-08-08 2023-09-26 Movidius Limited Apparatus, systems, and methods for low power computational imaging

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1994006099A1 (en) * 1992-09-01 1994-03-17 Apple Computer, Inc. Improved vector quantization
EP0765085A2 (en) * 1995-09-21 1997-03-26 AT&T Corp. Method and apparatus for image processing using model-based localized quantization

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1994006099A1 (en) * 1992-09-01 1994-03-17 Apple Computer, Inc. Improved vector quantization
EP0765085A2 (en) * 1995-09-21 1997-03-26 AT&T Corp. Method and apparatus for image processing using model-based localized quantization

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MANIKOPOULOS C N: "FINITE STATE VECTOR QUANTISATION WITH NEURAL NETWORK CLASSIFICATION OF STATES", IEE PROCEEDINGS F. COMMUNICATIONS, RADAR & SIGNAL PROCESSING, INSTITUTION OF ELECTRICAL ENGINEERS. STEVENAGE, GB, vol. 140, no. 3 PART F, 1 June 1993 (1993-06-01), pages 153 - 161, XP000381034, ISSN: 0956-375X *
NASRABADI N M ET AL: "A MULTILAYER ADDRESS VECTOR QUANTIZATION TECHNIQUE", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, IEEE INC. NEW YORK, US, vol. 37, no. 7, 1 July 1990 (1990-07-01), pages 912 - 921, XP000160478 *
NASRABADI N M ET AL: "IMAGE CODING USING VECTOR QUANTIZATION: A REVIEW", IEEE TRANSACTIONS ON COMMUNICATIONS, IEEE INC. NEW YORK, US, vol. 36, no. 8, 1 August 1988 (1988-08-01), pages 957 - 971, XP000052119, ISSN: 0090-6778 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7238697B2 (en) 2001-02-22 2007-07-03 Bayer Cropscience Ag Pyridylpyrimidines for use as pesticides
US9842271B2 (en) 2013-05-23 2017-12-12 Linear Algebra Technologies Limited Corner detection
US11605212B2 (en) 2013-05-23 2023-03-14 Movidius Limited Corner detection
US11062165B2 (en) 2013-05-23 2021-07-13 Movidius Limited Corner detection
US11042382B2 (en) 2013-08-08 2021-06-22 Movidius Limited Apparatus, systems, and methods for providing computational imaging pipeline
US10572252B2 (en) 2013-08-08 2020-02-25 Movidius Limited Variable-length instruction buffer management
US9727113B2 (en) 2013-08-08 2017-08-08 Linear Algebra Technologies Limited Low power computational imaging
US9910675B2 (en) 2013-08-08 2018-03-06 Linear Algebra Technologies Limited Apparatus, systems, and methods for low power computational imaging
US9934043B2 (en) 2013-08-08 2018-04-03 Linear Algebra Technologies Limited Apparatus, systems, and methods for providing computational imaging pipeline
US10001993B2 (en) 2013-08-08 2018-06-19 Linear Algebra Technologies Limited Variable-length instruction buffer management
US10360040B2 (en) 2013-08-08 2019-07-23 Movidius, LTD. Apparatus, systems, and methods for providing computational imaging pipeline
US11768689B2 (en) 2013-08-08 2023-09-26 Movidius Limited Apparatus, systems, and methods for low power computational imaging
US10521238B2 (en) 2013-08-08 2019-12-31 Movidius Limited Apparatus, systems, and methods for low power computational imaging
US9146747B2 (en) 2013-08-08 2015-09-29 Linear Algebra Technologies Limited Apparatus, systems, and methods for providing configurable computational imaging pipeline
US11579872B2 (en) 2013-08-08 2023-02-14 Movidius Limited Variable-length instruction buffer management
US11567780B2 (en) 2013-08-08 2023-01-31 Movidius Limited Apparatus, systems, and methods for providing computational imaging pipeline
US11188343B2 (en) 2013-08-08 2021-11-30 Movidius Limited Apparatus, systems, and methods for low power computational imaging
US9196017B2 (en) 2013-11-15 2015-11-24 Linear Algebra Technologies Limited Apparatus, systems, and methods for removing noise from an image
US9270872B2 (en) 2013-11-26 2016-02-23 Linear Algebra Technologies Limited Apparatus, systems, and methods for removing shading effect from image
KR101781776B1 (en) 2014-06-27 2017-09-25 구루로직 마이크로시스템스 오이 Encoder and decoder
US10460704B2 (en) 2016-04-01 2019-10-29 Movidius Limited Systems and methods for head-mounted display adapted to human visual mechanism
US10949947B2 (en) 2017-12-29 2021-03-16 Intel Corporation Foveated image rendering for head-mounted display devices
US11682106B2 (en) 2017-12-29 2023-06-20 Intel Corporation Foveated image rendering for head-mounted display devices

Also Published As

Publication number Publication date
AU5235501A (en) 2001-11-12

Similar Documents

Publication Publication Date Title
GB2362055A (en) Image compression using a codebook
JP3978478B2 (en) Apparatus and method for performing fixed-speed block-unit image compression with estimated pixel values
CN100521550C (en) Coding method, decoding method, coder and decoder for digital video
CN101184236B (en) Video compression system
US7162091B2 (en) Intra compression of pixel blocks using predicted mean
US7415154B2 (en) Compression of palettized color images with variable length color codes
RU2417518C2 (en) Efficient coding and decoding conversion units
US5463701A (en) System and method for pattern-matching with error control for image and video compression
US20060017592A1 (en) Method of context adaptive binary arithmetic coding and apparatus using the same
US20050259877A1 (en) Intra compression of pixel blocks using predicted mean
CN110087083B (en) Method for selecting intra chroma prediction mode, image processing apparatus, and storage apparatus
US9179143B2 (en) Compressed video
CN1155221C (en) Method and system for encoding and decoding method and system
CN113542740B (en) Image transmission method and device
US6614939B1 (en) Image compression apparatus and decoding apparatus suited to lossless image compression
CN105100814B (en) Image coding and decoding method and device
JPH05300382A (en) Method and device for encoding bit plane
EP2198613A2 (en) Textual image coding
JP2005516554A6 (en) Compression of paletted color images using variable-length color codes
WO2019023709A1 (en) Efficient lossless compression of captured raw image information systems and methods
CN1166211C (en) Method and apparatus for encoding motion vector based on number of vilid reference motion vectors
WO2001084849A1 (en) Video data transmission
CN101653004A (en) Decoder for selectively decoding predetermined data units from a coded bit stream
US5933105A (en) Context-based arithmetic encoding/decoding method and apparatus
CN112218092A (en) Encoding method, apparatus and storage medium for string encoding technique

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP