CN103427844B

CN103427844B - A kind of high-speed lossless data compression method based on GPU and CPU mixing platform

Info

Publication number: CN103427844B
Application number: CN201310321071.7A
Authority: CN
Inventors: 金海�; 郑然�; 周斌
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2013-07-26
Filing date: 2013-07-26
Publication date: 2016-03-02
Anticipated expiration: 2033-07-26
Also published as: CN103427844A

Abstract

The invention discloses a kind of high-speed lossless data compression method based on GPU and CPU mixing platform, comprise: CPU reads data file to be compressed, by in the global storage of this data file to be compressed from memory copying to GPU, thread block group bk [a] on GPU is set, number of threads b in each thread block, the length arranging compression dictionary window is c, and the head pointer arranging sensing first compression dictionary window is p_dic_h, it is d that setting pre-reads window size, point to the pointer p_pre_r of first pre-reading window, the initial value of this pointer is set to p_dic_h-c, initial work sets of threads threads [a*b], and (a*b/2)/c gMatrix matrix, its size is c*d, calling q=(a*b/2)/c length in (a*b/2) the individual thread process data file to be compressed in worker thread group threads [a*b] is the data of c+d, q matrix of consequence gMatrix each in find have continuous at most 1 oblique line section, determine the ternary result array locations [p] of each matrix of consequence.The present invention can improve the compression speed of mass data greatly.

Description

A kind of high-speed lossless data compression method based on GPU and CPU mixing platform

Technical field

The invention belongs to computer data compression technique area, more specifically, relate to a kind of high-speed lossless data compression method based on GPU and CPU mixing platform.

Background technology

Show according to International Data Corporation's (InternationalDataCorporation is called for short IDC) research: over nearly 10 years, global information total amount often spends 2 years, will double.2011, the data total amount that the whole world is created and is replicated was 1.8ZB(1800EB), will 8ZB(8000EB be reached by 2015), and about coming decade and the year two thousand twenty, the data in the whole world will exceed 50 times than now.Meanwhile, develop fast although the network bandwidth and new memory technology have also been obtained, but still far can not meet the performance requirement of current mass data transfers and storage.And one of the key technology solving mass data transfers and store facing challenges is exactly data compression.By data compression technique, the data capacity needing transmission and store effectively can be reduced, thus effective control data transmission and the cost deposited, the low-cost high-efficiency realizing data manages.

What traditional compression theory based on CPU platform and compression algorithm were paid close attention to focuses on the compression ratio how improving data, and the mass data epoch speed of compressing data is had higher requirement.In order to improve compression speed, data parallelly compressed becomes a new developing direction.The prerequisite of carrying out parallel data compression needs to find other concurrency of data level, and data are carried out piecemeal, and to carry out compression to each piecemeal be a kind of naturally parallel idea simultaneously.Existing J.Gilchrist, GZIP, S.Pradhan scheduling algorithm utilizes multithreading respectively, the mode of multinuclear and cluster, and CPU platform achieves parallel data compression.But in the application of reality, along with the increase needing data volume to be processed, due to the mutual middle mass communication amount produced, intermediate object program needs to take a large amount of memory headrooms, and CPU hardware architecture inherently be not suitable for large-scale parallel computation characteristic, all of these factors taken together makes under CPU platform, adopt parallel compression algorithm that the speed compressed can not be made to reach the set goal.

Also have scholar by the lossless data compression algorithms of some classics under some CPU platforms, as run-length encoding algorithm, BZIP2 algorithm etc. pass through to improve, under being transplanted to GPU platform.The deficiency that these algorithms compress to solve above-mentioned CPU, is absorbed in the shared storage and global storage that how to utilize GPU, thus reduces the communication between disparate modules to greatest extent, reduces taking of internal memory, and then improves the speed of compression.But because these algorithms itself are based on the invention of CPU platform, in itself and be not suitable for this different hardware configuration of GPU, therefore in the application of reality, the raising of compression speed still has much room for improvement.

Summary of the invention

For above defect or the Improvement requirement of prior art, the invention provides a kind of high-speed lossless data compression method based on GPU and CPU mixing platform, its object is to by by the procedure decomposition of data compression being parallel computation and serial computing two parts, parallel computation part comprises the multiple packed data dictionary of division and pre-reading window, be organized as multiple matrix, and data file to be compressed transferred to the multiple thread block in GPU to walk abreast, and transfer to CPU to complete the generation of compressed encoding in serial computing part and output, this mode of operation combines the respective advantage of GPU and CPU itself and strong point, thus when ensureing that compression ratio does not reduce, greatly improve the compression speed of mass data.

For achieving the above object, according to one aspect of the present invention, provide a kind of high-speed lossless data compression method based on GPU and CPU mixing platform, comprise the following steps:

(1) CPU reads data file to be compressed, by the global storage of this data file to be compressed from memory copying to GPU;

(2) arrange the thread block group bk [a] on GPU, the number of threads b in each thread block, wherein a is the sum of thread block;

(3) length arranging compression dictionary window is c, and the head pointer arranging sensing first compression dictionary window is p_dic_h;

(4) setting pre-reads window size is d, and point to the pointer p_pre_r of first pre-reading window, the initial value of this pointer is set to p_dic_h-c;

(5) initial work sets of threads threads [a*b], and (a*b/2)/c gMatrix matrix, its size is c*d;

(6) calling q=(a*b/2)/c length in (a*b/2) the individual thread process data file to be compressed in worker thread group threads [a*b] is the data of c+d;

(7) q matrix of consequence gMatrix each in find have continuous at most 1 oblique line section, determine the ternary result array locations [p] of each matrix of consequence, each element in array stores ternary result (x, y, length), wherein p is the quantity of this matrix of consequence bend section, and equal c+d – 1, x represents the side-play amount of the compression dictionary that oblique line section is corresponding relative to its place matrix of consequence, y represents the side-play amount of the pre-reading window that this oblique line section is corresponding relative to its place matrix of consequence, and length represents the length of this oblique line section;

(8) find the element in locations corresponding to each gMatrix [p] array with maximum length value: thread T3 is set, its thread number is th3, T3 be in sets of threads threads [a*b] the 0th thread in (q-1) individual thread, T3 thread is responsible for finding the element in ternary result array locations [p] corresponding to each gMatrix matrix with maximum length value, and by the parameter x of its correspondence, y and length is stored in the matching result array match [q] of the overall situation, each element in this array also stores ternary result (x, y, length),

(9) treat compression data file according to matching result array match [q] to compress;

(10) judge whether pointer p_pre_r has arrived the afterbody of data file to be compressed, and if so, then process terminates; Otherwise forward slip dictionary window and pre-reading window, namely arrange p_pre_r=p_pre_r+q*d, p_dic_h=p_dic_h+q*d, then return step (6).

Preferably, the value of d equals 16*n, and the span of n is 1-8.

Preferably, in step (6), p_dic_h points to first compression dictionary window stem of data file to be compressed, p_pre_r points to first pre-reading window stem of data file to be compressed, and (p_dic_h-d) points to compression data file second compression dictionary window stem, (p_pre_r-d) points to the second pre-reading window stem of data file to be compressed, so divides, once circulate, q can be processed to compression dictionary window and pre-reads data window.

Preferably, step (6) is specially, and the data of the compression dictionary window pending for each and the pre-reads data window corresponding with it, perform following steps respectively:

(6-1) counter i=0 is set;

(6-2) thread T1 is set, its thread number is th1, T1 be in sets of threads threads [a*b/2] (c*k) individual thread in (c* (k+1)-1) individual thread, it judges in a kth compression dictionary window, whether (th1modc) bit byte mates with the i-th * 16 to (i+1) * 16-1 bit byte in a kth pre-reads data window, wherein 0<=k<q, when two bytes match, return value 1, otherwise return value 0, and this matching result is write back position ((th1modc) * d+i*16) in a kth gMatrix matrix of global storage in position ((th1modc) * d+i*16+16),

(6-3) i=i+1 is set, and has judged whether i<n, if it is proceed to step (6-2), otherwise proceed to step (7).

Preferably, when length is for being less than 3, show not find coupling, now, x and y indirect assignment is-1.

Preferably, step (7) specifically comprises following sub-step:

(7-1) thread T2 is set, its thread number is th2, T2 be in sets of threads threads [a*b] (c*k) individual thread in (c* (k+1)+d-1) individual thread, T2 is responsible for having in searching oblique line section continuous 1 maximum subsegment, and records this subsegment corresponding parameter x, y and length;

(7-2) thread T2 is obtained corresponding data x, y and length are stored in (th2modp) individual element of ternary result array locations.

Preferably, step (9) specifically comprises following sub-step:

(9-1) matching result array match [q] is transferred to the main storage CPU from GPU;

(9-2) data reduction be stored in matching result array match [q] is become side-play amount and the length of the substring of pre-reading window the longest coupling in compression dictionary window by CPU, and output squeezing coding tlv triple compress [q], each element in compressed encoding three-number set stores ternary result (flag, offset, length) wherein, flag is flag byte, whether what show output is packed data, offset and length represents side-play amount and the length of the substring of pre-reading window the longest coupling in the compression dictionary window of correspondence respectively;

(9-3) treat compression data file according to flag byte and carry out data compression.

Preferably, in step (9-3), the flag byte type obtained comprises original logo byte and Mixed markers byte, original logo byte first is 0, the initial data length that rear 7 bit representations export, continuous 128 initial data bytes can be exported at most below, Mixed markers byte first is 1, the original blended data with compressing of rear 7 bit representation 7, what corresponding position represented output when being 0 is initial data, and what represent output when being 1 is compressed encoding, realizes the compression of data thus, if do not find coupling, then former state exports data.

According to another aspect of the present invention, provide a kind of high-speed lossless data compression system based on GPU and CPU mixing platform, comprising:

First module, for reading data file to be compressed, by the global storage of this data file to be compressed from memory copying to GPU;

Second module, for arranging the thread block group bk [a] on GPU, the number of threads b in each thread block, wherein a is the sum of thread block;

3rd module is c for arranging the length of compression dictionary window, and the head pointer arranging sensing first compression dictionary window is p_dic_h;

Four module, pre-reading window size for setting is d, and point to the pointer p_pre_r of first pre-reading window, the initial value of this pointer is set to p_dic_h-c;

5th module, for initial work sets of threads threads [a*b], and (a*b/2)/c gMatrix matrix, its size is c*d;

6th module is the data of c+d for calling q=(a*b/2)/c length in (a*b/2) the individual thread process data file to be compressed in worker thread group threads [a*b];

7th module, for q matrix of consequence gMatrix each in find have continuous at most 1 oblique line section, determine the ternary result array locations [p] of each matrix of consequence, each element in array stores ternary result (x, y, length), wherein p is the quantity of this matrix of consequence bend section, and equal c+d – 1, x represents the side-play amount of the compression dictionary that oblique line section is corresponding relative to its place matrix of consequence, y represents the side-play amount of the pre-reading window that this oblique line section is corresponding relative to its place matrix of consequence, length represents the length of this oblique line section,

8th module, for finding the element in locations corresponding to each gMatrix [p] array with maximum length value: arrange thread T3, its thread number is th3, T3 be in sets of threads threads [a*b] the 0th thread in (q-1) individual thread, T3 thread is responsible for finding the element in ternary result array locations [p] corresponding to each gMatrix matrix with maximum length value, and by the parameter x of its correspondence, y and length is stored in the matching result array match [q] of the overall situation, each element in this array also stores ternary result (x, y, length),

9th module, compresses for treating compression data file according to matching result array match [q];

Tenth module, for judging whether pointer p_pre_r has arrived the afterbody of data file to be compressed, and if so, then process terminates; Otherwise forward slip dictionary window and pre-reading window, namely arrange p_pre_r=p_pre_r+q*d, p_dic_h=p_dic_h+q*d, then return the 6th module.

In general, the above technical scheme conceived by the present invention compared with prior art, can obtain following beneficial effect:

(1) the present invention is mated by matrix parallel, accelerate the matching speed of data: the mode of operation of the matrix parallel coupling that the present invention proposes, the matrix matching of a parallel processing q c*q in once circulating, make the step-length of circulation coupling increase q*d byte by d byte, accelerate matching speed thus;

(2) present invention achieves asynchronous process, overlapping redundant data finds the time exported with compressed encoding, operation in CPU and GPU can be performed with the form of similar streamline, thus also decrease the time of whole data compression to a certain extent, improve the speed of compression.

Accompanying drawing explanation

Fig. 1 is the flow chart of the high-speed lossless data compression method that the present invention is based on GPU and CPU mixing platform.

Fig. 2 is the compression dictionary window of data file to be compressed and the pre-reading window division schematic diagram corresponding with it.

Fig. 3 to Fig. 6 is the schematic diagram of application example of the present invention.

Embodiment

In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.In addition, if below in described each execution mode of the present invention involved technical characteristic do not form conflict each other and just can mutually combine.

As shown in Figure 1, the high-speed lossless data compression method that the present invention is based on GPU and CPU mixing platform comprises the following steps:

(2) arrange the thread block group bk [a] on GPU, the number of threads b in each thread block, wherein a is the sum of thread block, and it is positive integer, and the span of b is 256 to 1024;

(3) length arranging compression dictionary window is c, and the head pointer arranging sensing first compression dictionary window is p_dic_h; Specifically, the span of c is 2KB-8KB, and preferred value is taken as 4KB, and p_dic_h initial value is the beginning pointing to this compression data file;

(4) setting pre-reads window size is d, and point to the pointer p_pre_r of first pre-reading window, the initial value of this pointer is set to p_dic_h-c, and d value equals 16*n, and the span of n is 1-8, and preferred value is 4, and namely the preferred value of each pre-reading window is 64B;

(6) calling q=(a*b/2)/c length in (a*b/2) the individual thread process data file to be compressed in worker thread group threads [a*b] is the data of c+d; The compression dictionary window of data file to be compressed and the division of the pre-reading window corresponding with it are as shown in Figure 2, p_dic_h points to first compression dictionary window stem of data file to be compressed, and p_pre_r points to first pre-reading window stem of data file to be compressed; And (p_dic_h-d) points to compression data file second compression dictionary window stem, (p_pre_r-d) the second pre-reading window stem of data file to be compressed is pointed to, division like this, once circulates, and can process q to compression dictionary window and pre-reads data window.Specifically, the data of the compression dictionary window pending for each and the pre-reads data window corresponding with it, perform following steps respectively:

(6-1) counter i=0 is set;

(6-2) thread T1 is set, its thread number is th1, T1 be in sets of threads threads [a*b/2] (c*k) individual thread in (c* (k+1)-1) individual thread, it judges in a kth compression dictionary window, whether (th1modc) bit byte mates with the i-th * 16 to (i+1) * 16-1 bit byte in a kth pre-reads data window (namely whether the two is equal), wherein 0<=k<q, when two bytes match (namely equal), return value 1, otherwise return value 0, and this matching result is write back position ((th1modc) * d+i*16) in a kth gMatrix matrix of global storage in position ((th1modc) * d+i*16+16),

(6-3) i=i+1 is set, and has judged whether i<n, if it is proceed to step (6-2), otherwise proceed to step (7);

(7) q matrix of consequence gMatrix each in find have continuous at most 1 oblique line section, determine the ternary result array locations [p] of each matrix of consequence, each element in array stores ternary result (x, y, length), wherein p is the quantity of this matrix of consequence bend section, and equal c+d – 1, x represents the side-play amount of the compression dictionary that oblique line section is corresponding relative to its place matrix of consequence, y represents the side-play amount of the pre-reading window that this oblique line section is corresponding relative to its place matrix of consequence, length represents the length (namely comprising the number of " 1 ") of this oblique line section, when length is for being less than 3, show not find coupling (if this is when to be length owing to mating substring be less than two bytes, code after compression is also longer than what do not compress, meaningless), now, x and y is nonsensical, can indirect assignment be-1,

This step specifically comprises following sub-step:

(7-2) thread T2 is obtained corresponding data x, y and length are stored in (th2modp) individual element of ternary result array locations;

(9) treat compression data file according to matching result array match [q] to compress, specifically comprise following sub-step;

(9-3) treat compression data file according to flag byte and carry out data compression; Specifically, the flag byte type obtained has two kinds, and one is original logo byte, one is Mixed markers byte, original logo byte first is 0, rear 7 bit representations export initial data length, after can export at most continuous 128 initial data bytes; And Mixed markers byte first is 1, the original blended data with compressing of rear 7 bit representation 7, what represent output when corresponding position is 0 is initial data, and what represent output when being 1 is compressed encoding, realizes the compression of data thus; If do not find coupling, then former state exports data;

The present invention is based on the high-speed lossless data compression system of GPU and CPU mixing platform, it is characterized in that, comprising:

Example

In order to clearly set forth principle of the present invention, below illustrate implementation procedure of the present invention.

(1) CPU reads data file to be compressed, by the global storage of this compression data file from memory copying to GPU;

(2) the thread block group bk [1024] on GPU is set, the number of threads 512 in each thread block;

(3) length arranging compression dictionary window is 4096 bytes, and the head pointer pointing to first compression dictionary window is p_dic_h=0;

(4) setting pre-reads length of window is 64 bytes, and the pointer pointing to first pre-reading window is p_pre_r=4096;

(5) initial work sets of threads threads [1024*512], and 64 gMatrix matrixes, its size is 4096*64;

(6) data that 64 length are (4096+64) B are called in 1024*256 thread process data file to be compressed in worker thread group threads [1024*512]; The compression dictionary window of data file to be compressed and the division of the pre-reading window corresponding with it be as shown in Figure 3:

P_dic_h points to first compression dictionary window stem of data file to be compressed, and p_pre_r points to first pre-reading window stem of data file to be compressed; And (p_dic_h-64) points to compression data file second compression dictionary window stem, (p_pre_r-64) the second pre-reading window stem of data file to be compressed is pointed to, division like this, once circulation can process 64 pairs of compression dictionary windows and pre-reads data window.Specifically, the data of the compression dictionary window pending for each and the pre-reads data window corresponding with it, perform following steps (at this, we select to be described the work for the treatment of of first pair of compression dictionary window and pre-reads data window) respectively:

In order to simple and clear description compression process, we select the data in data file to be compressed in first compression dictionary to be " hellothisaeybc ... isa ", and length is 4096 bytes; And the front 16B data of the data of first pre-reading window are " thisisaexampletosh ".T0 to T4095 totally 4096 threads in sets of threads, its thread number is respectively 0 to 4095, Tm1 (m1 ∈ [0,4095]) be wherein any one thread, m1 is its thread number, this thread is responsible for judging in compression dictionary window, whether m1 bit byte mates with the 0 to 15 bit byte in pre-reads data window (namely whether the two is equal), when two bytes match (namely equal), return value 1, otherwise return value 0, and this matching result is write back in the position m1*64 to position m1*64+16 in the gMatrix matrix of the correspondence of global storage.Because the data that only have selected 16B are compressed, then once circulation when we only perform i=0; Obtain the result that size is 1/4 in the matrix of consequence gMatrix of 4096*64, the result sizes namely this time obtained is 4096*16, as shown in Figure 4:

(7) be only described in below in first gMatrix in 64 matrix of consequence gMatrix and how find the oblique line section with continuous at most 1.

(7-1) T0 to T4110 (4096+15) individual thread altogether in sets of threads, its thread number is respectively 0 to 4110, Tm2 (m2 ∈ [0,4110]) be wherein any one thread, m2 is its thread number, Tm2 is responsible for having in searching oblique line section continuous 1 maximum subsegment, as shown in Figure 5:

(7-2) thread Tm2 is got parms x, y and length, and they is stored in m2 the element of ternary result array locations;

In this example, thread T5 and thread T10 has found the sub-line segment of multiple continuous 1, and it is the element assignment in corresponding locations array: locations (5)={ 5,0,6}, locations (10)={ 10,2,3}, the value of other elements of locations array is {-1,-1,0}, as shown in Figure 6:

(8) element in locations corresponding to each gMatrix [p] array with maximum length value is found: T0 to T63 totally 64 threads in sets of threads, its thread number is respectively 0 to 63, Tm3 (m3 ∈ [0, 63]) be wherein any one thread, m3 is its thread number, Tm3 is responsible for finding the element in locations array corresponding to each gMatrix matrix with maximum length value, and by the parameter x of its correspondence, y and length is stored in the matching result array match [64] of the overall situation, each element in this array also stores ternary result (x, y, length).In this example, the locations element with maximum length value that we utilize thread T0 to find in the 0th corresponding matrix of consequence gMatrix is the 5th element, by the 0th element in its relevant parameter write array match, i.e. and match (0)={ 5,0,6};

(9) treat compression data file according to matching result array match [64] to compress, specifically comprise following sub-step;

(9-1) matching result array match [64] is transferred to the main storage CPU from GPU;

(9-2) data reduction be stored in matching result array match [64] is become side-play amount and the length of the substring of pre-reading window the longest coupling in compression dictionary window by CPU, and output squeezing coding tlv triple compress [64], in this example, be that displacement offset and the length length of the longest coupling of the 0th pre-reading window in the 0th compression dictionary window is respectively 5 and 6 by the data reduction of match (0), and by output Mixed markers byte, namely flag value is 224(11100000), compress (0)={ 224,5,6}; Now before data file to be compressed, (4096+16) individual byte is compressed rear output and is: hellothisaeybc ... isa22456amplet3osh.Now, 4096B data in first compression dictionary window can not be compressed, former state exports, represent in pre-reads data window have the substring of 6 byte longs to be compressed immediately following 22456 below, its original text is that compression dictionary window is from displacement 5 place, length is 6 bytes, then exports a string not by compress amplet below, long 6 bytes; Then export an original logo byte, its value is 3(00000011), finally, to export thereafter the un-compressed initial data osh of 3 bytes.

(10) judge whether pointer p_pre_r has arrived the afterbody of data file to be compressed, if point to tail of file, then process terminates; Otherwise forward slip dictionary window and pre-reading window, namely arrange p_pre_r=p_pre_r+64*64, p_dic_h=p_dic_h+64*64, then return step (6).

Those skilled in the art will readily understand; the foregoing is only preferred embodiment of the present invention; not in order to limit the present invention, all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.

Claims

1., based on a high-speed lossless data compression method for GPU and CPU mixing platform, it is characterized in that, comprise the following steps:

(6) calling q=(a*b/2)/c length in (a*b/2) the individual thread process data file to be compressed in worker thread group threads [a*b] is the data of c+d, in step (6), p_dic_h points to first compression dictionary window stem of data file to be compressed, p_pre_r points to first pre-reading window stem of data file to be compressed, and (p_dic_h-d) points to compression data file second compression dictionary window stem, (p_pre_r-d) the second pre-reading window stem of data file to be compressed is pointed to, division like this, once circulate, q can be processed to compression dictionary window and pre-reads data window, this step is specially, the data of the compression dictionary window pending for each and the pre-reads data window corresponding with it, perform following steps respectively:

(6-1) counter i=0 is set;

(7) q matrix of consequence gMatrix each in find have continuous at most 1 oblique line section, determine the ternary result array locations [p] of each matrix of consequence, each element in array stores ternary result (x, y, length), wherein p is the quantity of this matrix of consequence bend section, and equal c+d – 1, x represents the side-play amount of the compression dictionary that oblique line section is corresponding relative to its place matrix of consequence, y represents the side-play amount of the pre-reading window that this oblique line section is corresponding relative to its place matrix of consequence, and length represents the length of this oblique line section; This step specifically comprises following sub-step:

(9) treat compression data file according to matching result array match [q] to compress; This step specifically comprises following sub-step:

(9-2) data reduction be stored in matching result array match [q] is become side-play amount and the length of the substring of pre-reading window the longest coupling in compression dictionary window by CPU, and output squeezing coding tlv triple compress [q], each element in compressed encoding three-number set stores ternary result (flag, offset, length), wherein, flag is flag byte, whether what show output is packed data, offset and length represents side-play amount and the length of the substring of pre-reading window the longest coupling in the compression dictionary window of correspondence respectively;

(9-3) treat compression data file according to flag byte and carry out data compression;

2. high-speed lossless data compression method according to claim 1, is characterized in that, the value of d equals 16*n, and the span of n is 1-8.

3. high-speed lossless data compression method according to claim 1, is characterized in that, when length is for being less than 3, shows not find coupling, and now, x and y indirect assignment is-1.

4. high-speed lossless data compression method according to claim 1, it is characterized in that, in step (9-3), the flag byte type obtained comprises original logo byte and Mixed markers byte, original logo byte first is 0, the initial data length that rear 7 bit representations export, continuous 128 initial data bytes can be exported at most below, Mixed markers byte first is 1, the original blended data with compressing of rear 7 bit representation 7, what corresponding position represented output when being 0 is initial data, what represent output when being 1 is compressed encoding, realize the compression of data thus, if do not find coupling, then former state exports data.