RU2080738C1

RU2080738C1 - Method for compression of information signals

Info

Publication number: RU2080738C1
Application number: RU93007368A
Authority: RU
Inventors: Д.А. Железцов; А.В. Железцов
Original assignee: Железцов Дмитрий Александрович
Priority date: 1993-02-03
Filing date: 1993-02-03
Publication date: 1997-05-27

Abstract

FIELD: code conversion, devices for data transmission in complex information systems. SUBSTANCE: method involves separation of sequence of information signals into sequences of given length L, by-element comparison analysis of input sequences to content of given static dictionary of size $$$ and to dynamic dictionary of size $$$. Then input flow is encoded by code words of fixed length M; sequences of information signals which duration is M+L are stored in dynamic dictionary and are sent for transmission. In addition dynamic dictionary is enriched with values of correspondence between mapping of changed binary flow of code words of fixed length M and number of use of flow. EFFECT: compression of minimal information sequences for transmission without waiting for compression of all amount of data from input source. 2 dwg

Description

Изобретение относится к преобразованию кодов и может быть использовано для сжатия передаваемой информации в системах передачи данных сложных информационных систем. The invention relates to code conversion and can be used to compress transmitted information in data transmission systems of complex information systems.

Известен способ сжатия информации [1] Хаффмана. Способ использует префиксную схему кодирования для обозначения кодовыми словами длины переменных. Минимальная избыточность достигается, если обозначать самыми короткими кодами наиболее вероятные символы, а самые длинные коды использовать для наименее коротких символов. A known method of compressing information [1] Huffman. The method uses a prefix encoding scheme to indicate the length of variables with code words. The minimum redundancy is achieved if the shortest codes are used to denote the most probable characters, and the longest codes are used for the least short characters.

Основным недостатком способа Хаффмана является необходимость предварительной фиксации вероятности появления символов. The main disadvantage of the Huffman method is the need for preliminary fixing the probability of occurrence of characters.

Наиболее близким по технической сущности является способ сжатия Лемпеля-Зива [2]
Способ сжатия информации Лемпеля-Зива использует синтаксический метод для устранения поточной избыточности путем динамического кодирования данных выходного источника. При этом способ потоки символов синтетически анализируются в данных и используются для построения динамического словаря потоков хранящего кодовые отображения поток-слово.The closest in technical essence is the method of compression Lempel-Ziv [2]
The Lempel-Ziv information compression method uses a syntactic method to eliminate stream redundancy by dynamically encoding output source data. In this method, symbol streams are synthetically analyzed in the data and used to build a dynamic dictionary of streams storing stream-word code displays.

Каждый поток обозначается кодовым словом, основанным на адресе потока в словаре. Чаще встречающиеся потоки группируются в более длинные потоки, которые дают много символов, представляемых некоторым простым кодовым словом. Длина кодового слова зависит от размера динамического словаря потоков и принимает значения 9-13 бит. В свою очередь кодовое слово является просто словарным адресом соответствующего потока. Each stream is indicated by a codeword based on the address of the stream in the dictionary. More frequent streams are grouped into longer streams, which produce many characters represented by some simple codeword. The length of the codeword depends on the size of the dynamic dictionary of streams and takes values of 9-13 bits. In turn, the code word is simply the dictionary address of the corresponding stream.

Недостатками указанного способа являются невозможность сжатия малых объемов информации; необходимость полного сжатия всего объема данных входного источника для последующей передачи; сжатие потоков символьной информации. The disadvantages of this method are the inability to compress small amounts of information; the need for full compression of the entire data volume of the input source for subsequent transmission; compression of streams of symbolic information.

Целью изобретения является устранение недостатков прототипа и в первую очередь сжатие минимальных двоичных информационных объемов для передачи не ожидая окончания сжатия всего объема данных входного источника. The aim of the invention is to eliminate the disadvantages of the prototype and primarily the compression of the minimum binary information volumes for transmission without waiting for the end of compression of the entire data volume of the input source.

Предлагаемый способ содержит операцию анализа информационных потоков данных входного источника и операцию построения динамического словаря потоков хранящего кодовые отображения поток-слово. The proposed method comprises an operation for analyzing information flows of data of an input source and an operation for constructing a dynamic dictionary of flows storing stream-word code displays.

Новыми существенными признаками изобретения являются операция последовательного разбиения двоичного информационного потока на элементарные потоки не превышающие заданной длины L, полученные элементарные потоки группируются и претерпевают видоизменение путем операции анализа на поэлементное сравнение с содержимым вводимого статического словаря потоков, содержащего соответствие 2^L двоичных комбинаций кодовым словам фиксированной длины М, динамического словаря потоков размером 2^M-2^L, и операции перекодирования кодовыми словами фиксированной длины М, видоизмененный двоичный поток длины M+L заносится в динамический словарь потоков и передается, причем в динамический словарь дополнительно вводится соответствие полученного отображения видоизменений двоичный поток - кодовое слово фиксированной длины M, количеству использований потока.New essential features of the invention are the operation of sequentially dividing a binary information stream into elementary streams not exceeding a given length L, the obtained elementary streams are grouped and modified by analysis by element-by-element comparison with the contents of the input static dictionary of streams containing 2 ^L binary combinations of fixed-length codewords M flows dynamic dictionary size of 2 ^M -2 ^L, and the recoding operation codewords phi xed length M, the mutated binary stream of length M + L is stored in the dynamic dictionary streams and transmitted, and in the dynamic dictionary is additionally introduced matching the obtained mapping modifications bitstream - a codeword of fixed length M, the number of uses flow.

Последовательность операций способа сжатия передаваемых данных рассмотрим для случая: кодовое слово фиксированной длины M=3, а элементарный поток заданной длины L=2. We consider the sequence of operations of the method of compression of the transmitted data for the case: a codeword of a fixed length M = 3, and an elementary stream of a given length L = 2.

На фиг. 1 представлена операция разбиения выбранного информационного потока; на фиг.2 совокупность операций составляющих сущность способа. In FIG. 1 shows the operation of splitting the selected information stream; figure 2 a set of operations constituting the essence of the method.

На фиг. 1 показано последовательное разбиение двоичного информационного потока, поступающего от входного источника, на элементарные потоки не превышающие заданной длины. In FIG. Figure 1 shows the sequential decomposition of a binary information stream coming from an input source into elementary streams not exceeding a given length.

На фиг.2 показан вводимый статический словарь потоков размером 2^L=4, где позиция 1 возможные элементарные потоки заданной длины и им соответствующие кодовые слова фиксированной длины позиция 2. По итогам операции анализа на поэлементарное сравнение с содержимым статического словаря потоков, а затем и с имеющимися значениями в динамическом словаре потоков, который дополняется новыми строками до размера 2^M-2^L. В позиции 1 фиксируется количество использований анализируемой группы элементарных потоков при следующих построениях в динамическом словаре потоков, в позиции 2 записывается кодовое слово фиксированной длины, в позиции 3 показан соответствующий исходный вид анализируемый группы элементарных потоков, в позиции 4 сжатое кодовое отображение сохраняемое и используемое для передачи.Figure 2 shows the introduced static dictionary of streams of size 2 ^L = 4, where position 1 is possible elementary streams of a given length and the corresponding code words of a fixed length are position 2. According to the results of the analysis by element-wise comparison with the contents of the static dictionary of streams, and then with the available values in the dynamic dictionary of flows, which is supplemented by new lines up to size 2 ^M -2 ^L. In position 1, the number of uses of the analyzed group of elementary streams is recorded for the following constructions in the dynamic dictionary of flows, in position 2 a code word of fixed length is recorded, in position 3 the corresponding initial view of the analyzed group of elementary streams is shown, in position 4 a compressed code display is saved and used for transmission .

Согласно выбранному примеру в первой строке динамического словаря потоков в позиции 2 записывается следующее по порядку кодовое слово фиксированной длины 100, в позиции 3 соответствующее ему анализируемое объединение элементарных потоков длина которого на один элементарный поток больше, чем можно закодировать известными кодовыми словами фиксированной длины из статического словаря потоков или динамического словаря потоков 0001. В позиции 4 000 01, первому элементарному потоку из объединения 00 в статическом словаре потоков соответствует кодовое слово фиксированной длины - 000, а второй элементарный поток из объединения 01 просто переписывается. Полученное сжатое кодовое отображение постоянной длины 000 01 готово к передаче. Затем из данных входного источника выбирается следующее объединение элементарных потоков длина которого на один элементарный поток больше, чем можно закодировать известными кодовыми словами фиксированной длины из статического словаря потоков и динамического словаря потоков 00 01 10. Во второй строке динамического словаря потоков, в позиции 2 записывается следующее по порядку кодовое слово фиксированной длины 101, в позиции 3 соответствующее ему выбранное объединение 00 01 10. Слиянию первых двух элементарных потоков из выбранного объединения, в первой строке динамического словаря потоков соответствует кодовое слово фиксированной длины 100, поэтому в позицию 1 первой строки заносится 1 количество использований объединенного потока, а третий элементарный поток из объединения 10 просто переписывается. Таким образом во второй строке динамического словаря потоков в позиции 4 получается сжатое кодовое отображение постоянной длины 100 10 готовое к передаче. Подобным образом осуществляется анализ и сжатие всего двоичного информационного потока поступающего от входного источника, дальнейшие этапы сжатия показаны в оставшихся строках динамического словаря потоков. According to the chosen example, in the first line of the dynamic dictionary of flows in position 2, the next in order code word of a fixed length of 100 is written, in position 3 the corresponding union of elementary streams analyzed is one more length of an elementary stream that can be encoded by known code words of a fixed length from a static dictionary threads or dynamic dictionary of threads 0001. At position 4,000 01, the first elementary thread from pool 00 in the static thread dictionary corresponds to the code howling the word of fixed length - 000, and the second elementary stream of the 01 associations simply rewritten. The resulting compressed code mapping of constant length 000 01 is ready for transmission. Then, from the input source data, the following union of elementary streams is selected, the length of which is one elementary stream longer than can be encoded with known fixed-length codewords from the static stream dictionary and dynamic stream dictionary 00 01 10. In the second line of the dynamic stream dictionary, in position 2, the following in order, the codeword is a fixed length of 101, at position 3 the selected union corresponding to it 00 01 10. Merge the first two elementary streams from the selected union Corresponds to a fixed length codeword in the first row 100 of the dynamic dictionary threads, so to position 1 the first line is stored number 1 uses a combined stream and the third elementary stream of the union 10 merely overwritten. Thus, in the second line of the dynamic dictionary of flows in position 4, a compressed code mapping of constant length 100 10 is obtained, ready for transmission. The analysis and compression of the entire binary information stream coming from the input source is carried out in a similar way, further stages of compression are shown in the remaining lines of the dynamic dictionary of flows.

При переполнении динамического словаря потоков используется операция коррекции словаря на основании рассчитываемого показателя среднего числа использований потока по соотношению:

где A_i количество использований i-го потока позиции 1, n общее количество потоков зафиксированных на момент переполнения.When the dynamic dictionary of flows overflows, the dictionary correction operation is used based on the calculated indicator of the average number of uses of the stream according to the ratio:

where A _{i is the} number of uses of the i-th stream of position 1, n is the total number of threads recorded at the time of overflow.

Согласно выбранному примеру фиг.2
S 3/4.According to the selected example of figure 2
S 3/4.

Строки динамического словаря потоков у которых накопленное количество использование потока меньше S очищаются это третья и четвертая строка, так же очищаются все ссылки на эти строки. Оставшиеся отображения поток-слово сдвигаются к началу динамического словаря потоков, а кодовые слова фиксированной длины при сдвинутых соответствующих потоках модифицируются. The lines of the dynamic dictionary of threads in which the accumulated amount of use of the stream is less than S are cleared; this is the third and fourth line, all links to these lines are also cleared. The remaining stream-word mappings are shifted to the beginning of the dynamic dictionary of streams, and codewords of a fixed length with the corresponding streams shifted are modified.

Таким образом введение новых существенных признаков позволяет сочетать сжатие двоичных информационных потоков минимальной длины 2•L и более с возможностью передачи сжатого объема постоянной длины M+L, сохраняемого в периодически корректируемом динамическом словаре потоков, что видно из примера (фиг.1-2) и ниже приведенного расчета коэффициента сжатия:
K_сж=(1-C/R)•100% (1)
где C сумма длин сжатия потоков фиг.2 позиция 4,
R сумма длин исходных элементарных потоков фиг.1.Thus, the introduction of new significant features allows you to combine the compression of binary information flows of a minimum length of 2 • L or more with the possibility of transmitting a compressed volume of constant length M + L stored in a periodically adjusted dynamic dictionary of flows, which can be seen from the example (Fig. 1-2) and Below is a calculation of the compression ratio:
K _cr = (1-C / R) • 100% (1)
where C is the sum of the compression lengths of the flows of FIG. 2, position 4,
R is the sum of the lengths of the initial elementary streams of figure 1.

По соотношению (1):
K_сж=(1-20/26)•100%23%
Очевидно, что изобретение не ограничивается вышеописанным примером его осуществления, исходя из него могут быть предусмотрены и другие варианты, не выходящие за рамки предмета изобретения, например в зависимости от стандартных длин используемых слов, для информационного обмена в сложных информационных системах, величины M и L могут принимать значения L=8, M=12.According to the ratio (1):
K _cr = (1-20 / 26) • 100% 23%
Obviously, the invention is not limited to the above-described example of its implementation, on the basis of it other options may be provided that do not go beyond the scope of the subject invention, for example, depending on the standard lengths of the words used, for information exchange in complex information systems, the values of M and L take values L = 8, M = 12.

Claims

A method of compressing a sequence of information signals, based on the reception of the original sequences of signals, forming a sequence of code signals of the same duration, corresponding to sequences of information signals and their subsequent transmission, characterized in that the received sequences of information signals are divided into subsequences of a given duration L, in the initial state they form a static line by line dictionary code signals containing 2 ^L of all possible combinations s subsequences of information signals and the duration L 2 ^L subsequences corresponding code signal length M, where L + 1 M≥ form a dynamic dictionary code signals, which row is filled to a volume of 2 ^L 2 ^M subsequences information signals of different duration and the uniquely corresponding code signals with a duration of M + L, as they are formed, they are transmitted, code signals of a duration of M + L are formed by k-fold sequential combining the analysis of subsequences of information signals according to the analysis by element-wise comparison with the contents of the dynamic and static dictionaries of code signals, and the sequentially performed operations of combining and analysis are completed upon receipt of the combined subsequence of information signals of duration k • L one more than can be found in the dictionaries, records of the combined subsequences in the first free line of a dynamic dictionary with an unambiguous assignment of a code signal la of duration M + L, including the line number of a static or dynamic dictionary, represented by a subsequence of code signals of duration M in which the last analysis operation was performed, and the last subsequence of information signals of duration L recorded in the current line, and the number of the first line of the dynamic dictionary is unit numbers of the last line static dictionary code signal, wherein each line of the dynamic dictionary filled form signal a _i, respectively Enikeev number subsequence recurrence contained combining information signals in its subsequent rows dynamic dictionary code signals converted in excess of its volume by eliminating those rows where the value A _i is less than S, where

n the total number of reused combinations of subsequences of information signals,
moreover, the subsequent correspondence of the subsequence of information signals to a code signal is converted to the previous one by replacing the value of the subsequence of the code signal of duration M with its new value of the same duration corresponding to the new location in the dictionary.