EP0653125A1 - Data compression system using source representation - Google Patents

Data compression system using source representation

Info

Publication number
EP0653125A1
EP0653125A1 EP93918503A EP93918503A EP0653125A1 EP 0653125 A1 EP0653125 A1 EP 0653125A1 EP 93918503 A EP93918503 A EP 93918503A EP 93918503 A EP93918503 A EP 93918503A EP 0653125 A1 EP0653125 A1 EP 0653125A1
Authority
EP
European Patent Office
Prior art keywords
data
sequence
sequences
maximal
series
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP93918503A
Other languages
German (de)
English (en)
French (fr)
Inventor
Laurence E. Wright, Jr.
Ernest G. Kimme
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
REDBAND TECHNOLOGIES Inc
Original Assignee
REDBAND TECHNOLOGIES Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by REDBAND TECHNOLOGIES Inc filed Critical REDBAND TECHNOLOGIES Inc
Publication of EP0653125A1 publication Critical patent/EP0653125A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B1/00Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
    • H04B1/66Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission for reducing bandwidth of signals; for improving efficiency of transmission

Definitions

  • This invention relates to a system for compressing or transmitting digital or analog data sequences with a high degree of compression.
  • data carriers such as telephone and electrical lines, radio and microwave links, optical fibers, and the like have a limited capacity to carry data.
  • transmitters and receivers There are corresponding limits to the ability of transmitters and receivers to handle data.
  • many offices now have telefax machines that allow digital data (obtained by converting the light and dark regions of the document to be sent into a series of data "bits," that is, to a sequence of "l's" and "O's) to be transferred over telephone lines to a receiving machine elsewhere. Because of known physical and electrical limitations of the metal telephone wires, data cannot be transmitted faster than at a certain rate.
  • the "modems,” built into the telefax machines, that actually apply the digital data to the telephone lines also have upper limits on their transmission speed. No matter how fast the telefax machine were to be able to convert a document to digital form, if its internal modem cannot transmit more than, say, 2400 data bits per second, the message cannot be sent and received at a faster rate. Finally, the speed at which the telefax machine can scan a document also limits the speed at which a message can be sent.
  • a way to improve transmission speed is to make the machines themselves faster. If the telefax machine is able to scan documents fast enough, if its modem can transmit at 4800 bits per second instead of only 2400, if the transmission medium is able to carry data at this rate, and if the receiving device is able to receive at this rate, then transmission speed can be doubled.
  • Optical fibers for example, transmit data in the form of light pulses, and as such they have a much greater data capacity than traditional copper wires, over which data is transmitted in the form of electrical signals.
  • Still another way to increase transmission speed is to compress the data to be transmitted.
  • this document is scanned as a large number of lines, each consisting of a large number of adjacent points or picture elements ("pixels") .
  • pixels adjacent points or picture elements
  • each pixel is for example assigned a value of "1" if it is mostly black and a value of "0" if it is mostly white.
  • the page can therefore be represented as a string of 1 million bits (l's and O's). With only ten lines of actual text, however, very few of the points are black, and probably at least 99% of the transmitted data string will consist of zeroes; in particular, the bottom half of the page contains no text at all, but needs 500,000 bits, all zeroes, to represent it.
  • One transmission method in this example would be to transmit all 1 million data bits as they are.
  • the telefax receiver When the telefax receiver receives the string (10, 10, 5, 500,000) it begins by printing 10 black points, then switches to printing 10 white points, then it switches to printing 5 black points, then it switches to printing 500,000 white points.
  • the string (10, 10, 5, 500,000) it begins by printing 10 black points, then switches to printing 10 white points, then it switches to printing 5 black points, then it switches to printing 500,000 white points.
  • the receiver can receive and store the string (10, 10, 5, 500,000) very quickly even though it will take much longer actually to reprint the telefaxed document.
  • the drawback of this data compression scheme is that it achieves a high degree of compression only when the transmitted data contains many long strings of unchanging binary digits. The maximum degree of compression would be had if the transmitted page were all black (1 million "l's" in a row) or all white (1 million “O's”), since one then would only have to transmit a single number (1,000,000) to recreate the text fully.
  • this scheme would achieve no compression at all if the original document consisted of a checkerboard pattern of alternating black and white pixels; in such case, the system would have to transmit a string of a million “l's" (each string is only one pixel long, and there is a color change after every pixel) . Since these would not be simple binary bits, but rather a million blocks of bits (each long enough to represent the largest possible transmitted number 1,000,000), it would actually take much longer to transmit the "compressed" data stream.
  • the compression system requires the input data itself to have a certain structure in order for the system to be efficient.
  • the telefax example was given as one type of system where compression is useful but may not in every case achieve the desired result. Similar problems of data compression and efficiency are encountered in many other technical fields, such as the transmission of speech in digital form, digital television, and other areas in which information is to be communicated as a sequence of numbers. Since a majority of transmission of information, such as over modern carriers (satellites, optical fibers, etc.), is accomplished digitally (by transmitting binary numbers) , the goal of increased capacity through data compression is found in many aspects of modern telecommunications technology. Accordingly, there is a need for a data compression system which provides for a greater capacity for transmitting or handling data, and a system which provides faster handling of data resulting in greater data or information throughput. There is also a need for a system and method providing for a greater data compression than previously afforded.
  • an object of this invention is to provide a system for data communication that achieves a greater degree of data compression than is possible using existing systems.
  • a data compression system and process which allows for greater capacity for data handling and transmission, faster data handling and therefore greater information throughput and which provides for greater data compression, all without loss.
  • a data compression system is provided with means for accepting a series of data at an input representing information to be transferred.
  • Means are provided for selecting one or more labels to represent at least a portion of the series of data, the selecting means including systems of equations having corresponding labels and which systems of equations produce numerical data sequences.
  • Means are also provided for transmitting the labels.
  • a data compression system includes means for selecting one or more equations, representing one or more families of orthogonal bases to represent at least a portion of the series of data wherein each equation has a corresponding unique label. Means are further provided for transmitting to an output the one or more labels corresponding to the selected equations.
  • the means for selecting includes means for selecting one or more maximal sequences.
  • a data ⁇ compression system is provided which includes means for accepting a series of data representing information to be transferred. Means are also provided for selecting one or more maximal sequences to represent at least a portion of the series of data. Means are also provided for transmitting to an output representations of the one or more maximal sequences.
  • a method is also described herein for constructing a set of algorithms to be used in a data compression system, such as may be in the form of a plurality of finite state machines.
  • the method includes the steps of defining a set of equations wherein equations in the set of equations are orthogonal with respect to each other to form orthogonal bases to be used to represent at least a part of a sequence of data and retaining the set of orthogonal bases.
  • the method further includes the step of assigning a unique label to each basis, wherein each label represents the corresponding basis formed in the step of defining the step of equations.
  • the step of defining a set of equations includes the step of defining at least one maximal sequence.
  • the step of defining at least one maximal sequence may include defining at least one maximal sequence of length N, and
  • N - 1 cyclic shifts of the maximal sequence of length N In a further preferred form of the invention, an encoding processor divides the data sequence generated by a source into blocks of N binary bits, which are then converted into blocks of N numbers.
  • the N numbers are the corresponding arithmetic values of the binary bits or numbers.
  • the value N is preferably chosen to be a Mersenne prime number of suitable size to enable efficient compression yet provide for acceptably rapid computation.
  • the encoder then calculates a maximal sequence of length N for the first block. This maximal sequence, plus its N - 1 cyclic shifts, forms a basis of N vectors.
  • the system may also provide a number "i" specifying by how much the "0th" basis is to be shifted to get a different basis vector.
  • the N vectors or bases may then be stored in one or more processors, for example, such as a transmitting computer and a receiving computer or a computer suitable for storing compressed data, and assigned unique labels for use in communicating data.
  • processors for example, such as a transmitting computer and a receiving computer or a computer suitable for storing compressed data, and assigned unique labels for use in communicating data.
  • incoming data is converted, a maximal sequence is found and one or more of its N - 1 cyclic shifts is selected to represent a portion of the data.
  • the labels for the maximal sequence and the related cyclic shifts are then transmitted to the receiving computer, and the receiving computer can recreate the same data block without loss.
  • the system transmits the label for the maximal sequence and the indices of the basis (the degree of shift of the "0th" basis) .
  • the system can increase the capacity by which it handles data, thereby increasing the speed with which data is handled and increasing the throughput of data or other information.
  • This system provides a relatively high amount of data compression.
  • This method and system can be used for a wide range of applications, including the telefax application previously discussed.
  • One aspect of the telefax example that one should observe is that, using the described data compression scheme, one is not transmitting the actual data string representing the original message (the digitized document) , but rather a "blueprint" for how the receiving machine itself can reconstruct the original message.
  • the system instead of transmitting a shortened (compressed) version of the data sequence, constructs a series of data "building blocks” or “data generators” which it can combine to form a model of the data source that, when activated, generates the same data sequence one wishes to transmit. Instead of transmitting data, one may transmit information that tells the receiver how to combine its data generators in such a way that the receiver itself creates the data sequence anew.
  • the "data generators” may be described mathematically as so-called “finite state machines,” which may be implemented using one or more processors. The characteristics of these finite state machines are pre-defined and stored both in the transmitter and in the receiver.
  • the system analyzes the input sequence to determine which of the series of constructed finite state machines will generate the data input sequence.
  • the characteristics of the finite state machines may be created upon initialization of the system. Thereafter, codes or instructions used for creating the characteristics of the finite state machines may be transmitted to the receiver.
  • the transmitter and receiver are then similarly configured, and the transmitter can then send signals identifying which finite state machine, or combination thereof, will recreate the data sequence input to the transmitter.
  • the transmitter instead transmits a signal identifying which, or which combination, of the finite state machines is able to recreate the input sequence; such a transmission requires very few data bits.
  • the receiver can then activate this finite state machine and let it run to regenerate the data input sequence.
  • the invention therefore develops a model or representation of the original data source, whereupon the model recreates the desired data sequence without loss within the receiver.
  • Fig. 1 is a schematic representation of two data processing units which can be used for the data compression of the present invention.
  • Fig. 2 is a schematic and block diagram of a transmitter and receiver incorporating aspects of the present invention and depicting the process according to the present invention.
  • Figs. 3A and 3B are flow charts depicting the overall process of analysis and encoding and synthesis and decoding, respectively, according to the process of the present invention. Description of the Invention
  • a data compression system depicted in the top block of Fig. 2, may be implemented in a number of applications.
  • Fig. 1 depicts a transmitting and processing computer 2 containing a data compression system to receive and compress data according to the present invention and output one or more maximal sequences selected by the data compression system to represent at least a portion of the series of data to a modem 3 to transmit the maximal sequences to a receiving modem 4 linked to a receiving computer 5 containing the synthesis and decoding system for accepting the maximal sequences and synthesizing the series of data from the maximal sequences and the other information received by the transmitting computer 2.
  • the systems depicted in Fig. 1 are simply one representation of apparatus which can be beneficially used with the present invention to more easily and quickly handle data and information.
  • Other applications of the invention easily come to mind, including for telefax applications, transmission of speech in digital form, digital television or recording and the like.
  • FIG. 2 illustrates the concept in the form of a greatly simplified telecommunications system.
  • a transmitter 10 is intended to transmit data over a transmission medium 12 to a receiver 14.
  • the medium 12 may be electrical wires, electromagnetic radiation, optical fibers, etc.
  • the transmitter includes, or, more commonly, is connected to, a data source 16, which produces an output signal that can be converted into numerical form.
  • the source could be, for example, a document scanner, a speech digitizer, a computer containing data one wishes to transmit, a television camera, telemetry or scanning equipment in a satellite, or any of the virtually countless other modern devices that generate information in digital form.
  • the numerical data may be numerical sequences from primary digital data or discretized forms of analog data, for example.
  • the source 16 has generated a string of fourteen binary digits.
  • the transmitter also contains a coder 17, whose function is described below.
  • the receiver 14 includes a controller or processor 18 and N digital networks or finite state machines (labelled FSM1, FSM2, . . .
  • FSMN FSMN
  • Each finite state machine generates a unique output signal XI, X2, ... , XN in the form of a finite string of binary digits.
  • FSM1 for example, when activated, generates the sequence 1100101.
  • the structure of finite state machines is well understood in the field of digital electronics, and is discussed, for example, in Digital Networks and Computer Systems. Taylor L. Booth, John Wiley and Sons, Inc, 1971. Of interest to this discussion is that the components FSMl-FSMN generate unique strings of a fixed number of binary digits.
  • the controller 18 is connected (via output lines Al - AN) to each of the finite state machines FSM1 - FSMN, and receives the generated data strings from the state machines via input lines II - IN.
  • the controller 18 is able to select any or all of the state machines and to combine their output signals.
  • (101 AND Oil) is 001; that is, each digit in the result is a one only if the corresponding digits in both the input strings was a one.
  • To form (X OR Y) one forms a resulting string that has ones only in positions where either or both of the input strings has a one in the corresponding position.
  • (101 OR 011) is 111.
  • the source 16 has generated an output string of 14 binary digits (10001001001110) that are to be transmitted. As is mentioned above, one could simply transmit these 14 bits, but this would achieve no compression. One could also try some encoding scheme such as transmitting the length of strings, but in this case, since one cannot assume long, unchanging strings, even such a scheme will not increase transmission efficiency.
  • the coder 17 contains information about the output strings that the state machines FSM1 - FSMN generate. The coder first divides the source output string into blocks Bl and B2 each having seven bits (the same length as the output strings of the state machines.
  • the transmitter can therefore simply transmit short signals identifying which state machine(s) the controller 18 is to select, as well as short signals indicating which operations are to be performed on their output signals.
  • the state machines through proper combination, act as the source itself — the source is modelled within the receiver, and all. that is transmitted is information necessary to configure the model.
  • the finite state machines shown in FIG. 2 are merely representative of the class of structures that can be used to represent a data source. In certain applications, it may be sufficient to have a single state machine, which generates different output strings depending on what starting values the controller gives it.
  • the state machines each have a set output data sequence that they generate, but each sequence can be generated in a different cyclical order; for example, a machine whose output was ABC could also generate CAB and BCA by shifting each letter to the right one step and "wrapping around" so that the last letter in one sequence becomes the first letter of the next sequence. (Letters of the alphabet are used only for the sake of clarity; strings of binary digits can be cycled in the same manner.) With the example shown in FIG. 2 it is not altogether apparent that one will be able to model all possible sources 16 by combination or other manipulation of the outputs of the state machines FSM1 - FSMN, although it is almost always possible to construct a finite state machine that will generate a desired finite output.
  • the degree of compression that can be achieved by such a transmission system is relatively small, since it may take more than seven bits actually to model the source.
  • the key is to choose a set of output data (bases) from state machines or their equivalents (such as simple, stored listings of their desired output sequences) , and suitable operations to perform on the set, such that one can model efficiently a desired or anticipated class of sources.
  • This invention provides just such a key; it determines a complete basis of a much higher order than three, so that the system according to the invention is able to reconstruct sources that produce much longer data sequences, and thus is able to achieve a high degree of data compression. Maximals as a Complete Basis
  • a data sequence of binary digits is first converted to an equivalent sequence of "real numbers,” in part so that ordinary arithmetic operations may be performed on each digit in the normally understood manner (for example, rather than having to manipulate AND's and OR's, one can use normal multiplication and addition) .
  • One way to do this is to assign the value -1 to every binary "1” and the value +1 to every binary "0"; thus, the binary sequence (1, 0, 0, 1) is converted to the arithmetic sequence (-1, 1, 1, -1) .
  • the assignment can be represented using the known function "aval,” which stands for
  • All operations in the system according to the invention are then carried out using conventional arithmetical operations on the arithmetical ("aval") equivalent of data sequences.
  • the system according to the invention also calculates a complete basis for data sequences that will fully model all sources in a predetermined class.
  • the basis that is calculated is a collection of so-called maximal seguences.
  • the theory of maximal sequences is developed in the literature, but a brief description is given below.
  • a maximal sequence of length L and its N-l cyclic shifts are chosen as the base sequences.
  • N is a Mersenne prime
  • a single maximal sequence of length N and its N-l cyclic shifts comprises a universal basis for all sequences of N real numbers. In other words, if one is given a sequence of N real numbers, one can also construct a single maximal sequence that, together with its N-l cyclically shifted sequences (shifted one step to the left or right and
  • the method according to the invention is accordingly for an encoding processor to divide the data sequence generated by a source into blocks of N binary bits, which are then converted into blocks of N numbers (the corresponding arithmetic values of the binary bits or numbers) .
  • N is chosen to be a Mersenne prime number of suitable size to enable efficient compression yet acceptably rapid computation.
  • the size of N will depend on the application, and on any knowledge of the structure of the source sequences.
  • the encoder then calculates a maximal sequence of length N (using predetermined and well-defined equations) for the first block.
  • This maximal sequence plus its N-l cyclic shifts, forms a basis of N vectors. Note, however, that it is not necessary to transmit all N vectors; it is sufficient to transmit the original maximal (0-shift) . Since all others have the same elements, just shifted by a certain number of positions, once a receiver has the "0 th " basis, it is only necessary to transmit a single number i specifying by how much the "0 th " basis is to be shifted to get a different basis vector.
  • the values to be transmitted to the receiver are the indices of the bases (the degree of shift of the "0 th " basis) , and a decoding processor in the receiver can then reconstruct the source data block.
  • the receiver creates a model or representation of the source, which is then "activated” to recreate the data sequence the "real" source generated. If the successive data blocks differ too greatly from the preceding (determined according to known formulas) , the encoding processor recalculates a new maximal and the encoding and compression procedure is carried out again.
  • the encoding and decoding processors also perform certain other calculations on the various data sequences in order to transform them into more easily manipulated and analyzed forms.
  • the blocking of source data into blocks of Mersenne prime length, the calculation of maximals, and the transmission only of maximals and shift indices for source reconstruction remains.
  • the method according to the invention provides a much greater degree of data compression for long data sequences than is possible using existing compression systems.
  • a binary data sequence (the input sequence) is analyzed to determine which finite state machine, which may be in the form of an algorithm
  • Source Representation Data Communication is utilized as an operational element of a data transmission system in which the Source Representation of the input sequence is itself represented as a binary sequence (the Source Representation Code of the input sequence) , and if this Source Representation Code is the sequence actually transmitted, and if the receive element of the system retrieves the input sequence as the output sequence in accordance with the above description of Source Representation Data Communication, then:
  • Representation Data Communication is that the input sequence is transmitted and its replica output sequence is received with a code-bandwidth transmission factor that is the ratio of the length (number of binary elements or bits) of the input sequence to the length (similarly defined) of the Source Representation Code of the input sequence.
  • Construction of Generator Classes of Algorithms An algorithm is a specification of a finite collection of arithmetic processes that produce assignments of numerical values to resultants (outputs) by application of arithmetic operations to numbers that are assigned values of operands (inputs) .
  • An algorithm is described by a collection of algebraic equations. For the purposes of this discussion, the complexity of an algorithm is indicated by the number of independent equations in its specification.
  • Algorithms may have intermediate resultants, and some of these may also be intermediate operands; such algorithms are said to be recursive. This concept is also the basis for decomposition and concatenation of algorithms. These last attributes of algorithms are strictly ancillary to this discussion.
  • Any generator class of algorithms is a subclass of the class of all binary arithmetic algorithms. Every finite binary sequence (in any realization, a
  • Source Representation Data Communication can operate only upon finite sequences) has at least one (trivial) generator algorithm.
  • the example just given of the trivial generator indicates that any minimal generator of a sequence of length n has at most n + 1 independent (non-redundant) equations in its specification.
  • the set of basis algorithms is an algebraic structure common to all generator algorithms for the given collection of sequences.
  • This set of basis algorithms can be minimized by discarding those basis algorithms that can be synthesized using other basis algorithms; the basis algorithm syntheses of the elements of the generator class of algorithms can then be restated in terms of this minimal set of basis algorithms.
  • the basis algorithm syntheses can also be reduced in complexity by the elimination of redundant terms. Application of these steps will produce a minimum basis algorithm synthesis for each generator algorithm, but there is in general no guarantee that this minimum synthesis will be unique; there may be several valid minimum syntheses for a given generator algorithm. Notwithstanding this fact, specification of any minimum basis algorithm synthesis of a particular generator algorithm will uniquely identify the binary sequence produced by that generator.
  • the details of specification of the elements of the basis generator class of algorithms are unnecessary to this identification if the objective of identification is restricted to selection only from among the original collection of sequences; all of these sequences have the same set of basis generators and differ only in their basis algorithm syntheses.
  • Each such basis representation completely specifies a corresponding synthesis formula;
  • Each such synthesis formula completely specifies a generator algorithm for the original sequence.
  • the original sequence is retrieved by execution of the generator algorithm.
  • any basis representation of the original sequence specifies a synthesis formula, and any such synthesis formula enables retrieval of the original sequence.
  • ival The function inverse to ival is designated ival "1 and is defined in the conventional manner.
  • This function is defined for the real numbers +1 and -1 and takes on binary number values.
  • the correlation of two binary sequences is a real number that quantifies the degree of similarity of the sequences. Sequence pairs with large absolute correlation values are highly similar, and pairs with low absolute correlation values are highly dissimilar.
  • the value of the correlation function for any two binary sequences is the number of termwise agreements less the number of termwise disagreements. Sequences with zero correlation have exactly as many terms that differ as terms that are the same. Clearly sequences whose lengths are odd numbers cannot have zero correlation; such sequences may exhibit a minimal degree of correlation with correlation values of +1 or -1. Sequences with correlation values of N are termwise identical, and sequences with correlation values of -N are termwise complementary.
  • the correlation of binary sequences is the conventional numerical correlation of the arithmetic evaluation of the sequences given by the aval function. If all binary numbers are represented in the Source Representation Data Communication computations by their aval equivalents, then correlations can be implemented with the conventional correlation formulas.
  • Source Representation Data Communication is initialized by application of the aval function to replace all input sequences by their arithmetic equivalents, and all Source Representation Data Communication processing is performed using conventional arithmetic operations thereon.
  • binary data is represented biuniquely as sequences of +1 and -1 numbers. This representation is to be understood throughout the remainder of this presentation; the aval notation will be suppressed, and binary data sequences will be treated as sequences of real numbers having values +1 and -1.
  • the formula for the correlation function is then simply
  • basis collections are orthonormal (orthogonal and normal) , that is, the correlation of two basis sequences has the value 0 (orthogonality) if the sequences are distinct, but has the value 1 (normality) if they are identical. Normality is easily achieved for any sequence by multiplicative scaling, but orthogonality requires some calculation effort if the candidate collection does not already have this property.
  • the Gram-Schmidt process will convert any collection of candidate basis elements into an orthonormal collection, but if the collection has ab initio a common correlation value, this process reduces to a simple arithmetic scaling. Since the Gram-Schmidt process in general requires on the order of N 2 computations to orthogonalize N sequences of length N, it will materially simplify a Source Representation Data Communication to start with a collection of sequences that have some small common correlation value.
  • a maximal sequence and its termwise cyclic shifts are a strong candidate for this selection; the proper mutual correlations of such a collection of sequences all have the same minimum value (0, +1, or -1) .
  • the disadvantage of this selection is that maximal sequences are structurally complicated except for the sequences whose lengths are Mersenne primes, that is, for lengths N that are prime numbers and that have the form 2 P - 1. (p must also be prime for such an N to be prime.)
  • Mersenne primes have been found of such large magnitude that a requirement that the sequences to be considered be of Mersenne prime lengths is not a practical restriction on any Source Representation Data Communication that depends on such an assumption.
  • the formal statement of maximality is as follows:
  • Trivial replication of complements is avoided by uniform choice of the negative option.
  • the collection of cyclic shifts of a maximal ⁇ m ⁇ is converted to an orthonormal basis (relative to the metric generated by the correlation operation) by a linear arithmetic transformation; if (m) is any element of the collection, its corresponding orthonormal basis element is
  • Shifted versions of m produce shifted versions of m .
  • This procedure is referred to here as the arithmetization transformation. Its inverse is:
  • N sequences im) are orthonormal, as is readily verified by direct calculation of their correlations, and therefore any sequence of N real numbers can be represented uniquely as a linear combination of shifts of im) .
  • the shifts of this maximal therefore generate a universal basis for representation of all numerical sequences of length N; the computational processes of the representation are implemented in principle by application of the inverse arithmetization transformation just described to the conventional orthogonal basis computations.
  • N is any fixed Mersenne prime
  • a single maximal sequence of length N and its N - l cyclic shifts comprise a universal basis for all sequences of N real numbers.
  • the maximals of a given length have structural properties that separate them into properly and trivially distinct species; when N is a Mersenne prime, the number of properly distinct species of maximals is (N - l ) / ⁇
  • N a Mersenne prime
  • the properly distinct species are structurally related in such a way that the transformations from one such species to another form an Abelian group, that is, these transforms are all iterations of a single transform.
  • a universal basis can always be constructed for a collection of sequences of realizable length N that requires no more than p ⁇ (1 + log 2 M m ) + 1 binary numbers for its complete specification, where M CN] is the smallest Mersenne prime that is not less than N. Partitions of Collections of Sequences Generated by Universal Basis Representations
  • N is a Mersenne prime
  • ⁇ (s) whose domain is the non-negative integers mod (N - l)/p.
  • This function is referred to as the characteristic or signature of the species, and it specifies the distribution of signs in the binary arithmetic form of any maximal.
  • the characteristic is constrained by the relation
  • any cyclic shift of a characteristic is also a characteristic.
  • the cyclic shift operations on characteristics form an Abelian group since all cyclic shifts are iterations of the cyclic shift that advances every term by exactly one position.
  • b ( n , k ; v) is the n th coefficient of the k-term cyclic shift of a maximal sequence of species v, and L is equal to (N -l)/p.
  • Any binary sequence a • Xactually, any sequence) has a unique basis representation in terms of each one of these bases:
  • the totality of possible N-term representations therefore creates a basis representation partition of the collection of all sequences of length N, and the structure of this partition depends entirely on the family B of basis collections.
  • a basis representation partition consists of disjoint collections of sequences, and specification of any sequence of a given partition requires only specification of its unique basis index v.
  • Any finite ordered family B of orthonormal bases partitions the totality of sequences of a fixed length into disjoint collections of sequences with a common basis representation; any sequence is completely identified relative to the partition to which it belongs only by specification of the order index of its associated basis.
  • An ordered family B of N-term orthonormal bases is specified.
  • An input bit stream is segmented into sequences of common length N.
  • a specific basis is selected and the basis representation of the first of these input sequences in terms of the selected basis is computed.
  • Each subsequent input sequence is compared with all the sequences of the partition generated by the basis representation found for the first input sequence, and if a termwise match is found, the index of the corresponding basis is recorded as the representation number for that particular sequence. When no match is found, a new basis representation is computed and the above process is repeated.
  • the partitions of sequences of length N created by the basis representations are independent of the organization of the basis family (construction and order) ; they depend only upon the input sequence. These partitions were seen to be uniquely labeled by the N-term sequences identified as basis representations and the prior selection of a 0 th basis.
  • the basis representations are therefore independent of the history of the particular Source Representation Data
  • the basis family then consists of the orthonormal bases generated by all the cyclic shifts of these maximals. The remaining steps are generic:
  • the generator class of algorithms consists of all linear combinations of the basis elements from single basis sequence collections of B.
  • the synthesis formulas in this case are specifications of the members of the family of bases. If the sequence length is a Mersenne
  • the synthesis formulas can be reduced to generator algorithms for the maximal sequences of length N.
  • the source representation of any input sequence consists of successive blocks of binary data that include the initializing basis representation followed by a sequence of basis indices.
  • the Source Representation Code for R blocks of input data as described in section 7 above consists of N strings of N binary numbers (the binary form of the basis representation of the first block) followed by R - 1 basis indices which are numbers from 0 to L - 1, L being the number of distinct bases in the family B.
  • the number of bits required to specify the basis representation is N 2
  • the number of bits required to represent a basis index is log 2 L ; the number of bits required to represent R blocks of input data is then
  • N is a Mersenne prime
  • N - 2 P and L ⁇ 2 p /p , so that this ratio becomes p - log 2 p
  • a Mersenne prime is a prime number N of the form 2 p - 1.
  • the integer p is the order of a multiplicative subgroup of the residue class mod N
  • the integer L (N - l)/p (N - l is divisible by p if N is a Mersenne prime) is the order of a multiplicative subgroup of the residue class mod N; this second subgroup is isomorphic with the quotient group of the residue class mod N relative to the first subgroup.
  • every element of the residue class mod N is either 0 or uniquely of the form g ⁇ - 2° for some ⁇ , p subject to 0 ⁇ ⁇ ⁇ - l, 0 ⁇ p ⁇ p - l.. ⁇ is a member of the residue class mod L and p is a member of the residue class mod p. Sign Invariance of Maximals Relative to the Generator 2 Subgroup and the Characteristic
  • the 2's sampling property states the invariance of the sign of principal-phase maximal sequence elements over the generator 2 subgroup of its index set; jn(2 p g ⁇ )is a function of ⁇ alone.
  • the characteristic is specified above in terms of the coefficients of its maximal sequence. This specification can be phrased so as to exhibit the coefficients in terms of the characteristic; the device used is the Kronecker delta ( ⁇ 5 ( • ) defined to have the value 1 only when its argument is zero, and to have the value zero otherwise:
  • X ( ⁇ i , ⁇ 2 ) ⁇ ⁇ ( ( 1 - g 2 Pl - g 2 Pz ) mod N ) ;
  • the range of the numbers n p mod N, n being the non-zero elements of the residue class mod N, consists of powers of the quotient group generator g.
  • ⁇ g ⁇ mod N 0 ⁇ ⁇ ⁇ Ii-l
  • a Source Representation Data Communication Process may consist basically of two independent system processes; one process operates on input sequences and produces representations of them, and the other operates on representations and reconstructs the original input sequences. These two systems are referred to, respectively, as the Analysis System (Encode) and the Synthesis System (Decode) . Obviously Analysis is associated with transmission and synthesis with reception. Generic flow charts of these processes are exhibited in Figure 3; the necessary elements of prior system design are included also.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Apparatus For Radiation Diagnosis (AREA)
EP93918503A 1992-08-03 1993-07-29 Data compression system using source representation Withdrawn EP0653125A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US92418892A 1992-08-03 1992-08-03
US924188 1992-08-03
PCT/US1993/007153 WO1994003984A1 (en) 1992-08-03 1993-07-29 Data compression system using source representation

Publications (1)

Publication Number Publication Date
EP0653125A1 true EP0653125A1 (en) 1995-05-17

Family

ID=25449841

Family Applications (1)

Application Number Title Priority Date Filing Date
EP93918503A Withdrawn EP0653125A1 (en) 1992-08-03 1993-07-29 Data compression system using source representation

Country Status (9)

Country Link
EP (1) EP0653125A1 (enrdf_load_stackoverflow)
JP (1) JPH07509824A (enrdf_load_stackoverflow)
CN (1) CN1082784A (enrdf_load_stackoverflow)
AU (1) AU4793093A (enrdf_load_stackoverflow)
CA (1) CA2141669A1 (enrdf_load_stackoverflow)
IL (1) IL106335A0 (enrdf_load_stackoverflow)
MX (1) MX9304659A (enrdf_load_stackoverflow)
TW (1) TW256970B (enrdf_load_stackoverflow)
WO (1) WO1994003984A1 (enrdf_load_stackoverflow)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6373986B1 (en) * 1998-04-08 2002-04-16 Ncr Corporation Compression of data transmission by use of prime exponents

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01195770A (ja) * 1988-01-29 1989-08-07 Yokogawa Hewlett Packard Ltd 画像データ圧縮伝送方法
US5109438A (en) * 1990-04-25 1992-04-28 Hughes Aircraft Company Data compression system and method
US5235418A (en) * 1991-11-19 1993-08-10 Scientific-Atlanta, Inc. Method and apparatus for low frequency removal in vector quantization

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO9403984A1 *

Also Published As

Publication number Publication date
TW256970B (enrdf_load_stackoverflow) 1995-09-11
IL106335A0 (en) 1993-12-28
WO1994003984A1 (en) 1994-02-17
MX9304659A (es) 1994-02-28
CA2141669A1 (en) 1994-02-17
JPH07509824A (ja) 1995-10-26
CN1082784A (zh) 1994-02-23
AU4793093A (en) 1994-03-03

Similar Documents

Publication Publication Date Title
US5790599A (en) Data compression system using source representation
US6038317A (en) Secret key cryptosystem and method utilizing factorizations of permutation groups of arbitrary order 2l
US7921145B2 (en) Extending a repetition period of a random sequence
Knagenhjelm et al. The Hadamard transform-a tool for index assignment
Meier et al. Fast correlation attacks on stream ciphers
RU2232463C2 (ru) Устройство и способ кодирования/декодирования канала в системе мобильной связи множественного доступа с кодовым разделением каналов
US5045852A (en) Dynamic model selection during data compression
JP2011004406A (ja) 係数の位置をコード化する方法及び装置
US6636549B1 (en) Method for calculating phase shift coefficients of an M sequence
WO1997033254A1 (en) System and method for the fractal encoding of datastreams
Penzhorn Correlation attacks on stream ciphers: Computing low-weight parity checks based on error-correcting codes
US5270956A (en) System and method for performing fast algebraic operations on a permutation network
Chiou et al. A complexity analysis of the JPEG image compression algorithm
WO1999059330A2 (en) Method and apparatus for decoding jpeg symbols
AU2004225405A1 (en) Apparatus for decoding an error correction code in a communication system and method thereof
Al Jabri et al. Zero-error codes for correlated information sources
Cohen Hermite and Smith normal form algorithms over Dedekind domains
WO1994003984A1 (en) Data compression system using source representation
US7539719B2 (en) Method and apparatus for performing multiplication in finite field GF(2n)
AU745212B2 (en) Circuit and method for arbitrarily shifting M-sequence
Sabin On determining all codes in semi-simple group rings
EP1497928A4 (en) SYSTEM AND METHOD FOR USING MICROWAVES IN COMMUNICATIONS
JP2000516072A (ja) データの符号化および復号の方法および装置
RU2251816C2 (ru) Способ поточного кодирования дискретной информации
Huguet et al. Vector quantization in image sequence coding

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 19950206

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LI NL PT SE

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 19950304