CN117896441B - Gateway data intelligent optimization method - Google Patents

Gateway data intelligent optimization method Download PDF

Info

Publication number
CN117896441B
CN117896441B CN202410257458.9A CN202410257458A CN117896441B CN 117896441 B CN117896441 B CN 117896441B CN 202410257458 A CN202410257458 A CN 202410257458A CN 117896441 B CN117896441 B CN 117896441B
Authority
CN
China
Prior art keywords
character segment
compressed
character
interpolation
segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410257458.9A
Other languages
Chinese (zh)
Other versions
CN117896441A (en
Inventor
曲宝春
王玲兰
张斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Aixiongsi Communication Technology Co ltd
Original Assignee
Suzhou Aixiongsi Communication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Aixiongsi Communication Technology Co ltd filed Critical Suzhou Aixiongsi Communication Technology Co ltd
Priority to CN202410257458.9A priority Critical patent/CN117896441B/en
Publication of CN117896441A publication Critical patent/CN117896441A/en
Application granted granted Critical
Publication of CN117896441B publication Critical patent/CN117896441B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/04Protocols for data compression, e.g. ROHC
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/565Conversion or adaptation of application format or content
    • H04L67/5651Reducing the amount or size of exchanged application data

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention relates to the technical field of data compression, in particular to an intelligent gateway data optimization method. The method comprises the following steps: acquiring a data stream to be compressed; based on the query window, matching characters in the data stream to be compressed, and obtaining the corresponding similarity degree when each character segment in the data stream to be compressed is matched according to the number of segmented sub-character segments when each character segment in the data stream to be compressed is matched and the interval of adjacent sub-character segments in the search buffer zone; determining interpolation feasibility corresponding to each character segment based on the similarity degree, and screening target character segments; obtaining interpolation rationality corresponding to each target character segment according to the number of single interval areas, the number of multiple interval areas and the similarity degree when each target character segment is matched, and further screening the character segments to be compressed; and compressing the character segment to be compressed by adopting an LZ series algorithm to obtain compressed data. The invention improves the compression efficiency while guaranteeing the compression effect of the data stream.

Description

Gateway data intelligent optimization method
Technical Field
The invention relates to the technical field of data compression, in particular to an intelligent gateway data optimization method.
Background
With the development of the internet, a large data age has come, whether the internet or a local server is used, data transmission and storage are needed at any moment, the data volume of original data received by equipment is huge, a large amount of bandwidth and space are occupied in the transmission and storage processes, and the transmission time is multiplied along with the increase of the data volume, so that some data compression algorithms are derived.
The LZ series algorithm is an algorithm capable of carrying out lossless compression on data, and the principle is that a dynamic window is used for sliding the data flow, the window is divided into a look-up buffer area and a look-up buffer area, the look-up buffer area is much larger than the look-up buffer area, and characters in the look-up buffer area are represented by the offset and the matching length of the look-up buffer area in the sliding process, so that the aim of reducing the storage space is achieved. For the LZ series algorithm, the larger the window, the better the compression effect, but the longer the time spent in compression, the lower the compression efficiency, and hence the higher the time cost.
Disclosure of Invention
In order to solve the problem of low compression efficiency of the existing LZ series algorithm when compressing data streams, the invention aims to provide an intelligent gateway data optimization method, which adopts the following technical scheme:
The invention provides an intelligent optimization method for gateway data, which comprises the following steps:
Acquiring a data stream to be compressed;
Constructing a query window, wherein the query window comprises a look-ahead buffer area and a look-up buffer area; based on the query window, respectively matching characters in the data stream to be compressed, and obtaining the corresponding similarity degree when each character segment in the data stream to be compressed is matched according to the number of segmented sub-character segments when each character segment in the data stream to be compressed is matched and the interval of adjacent sub-character segments in a search buffer zone; determining interpolation feasibility corresponding to each character segment based on the similarity degree; screening a target character segment based on the interpolation feasibility;
Obtaining interpolation rationality corresponding to each target character segment according to the number of single interval areas, the number of multiple interval areas and the similarity degree when each target character segment is matched; screening character segments to be compressed based on the interpolation rationality;
and compressing the character segment to be compressed by adopting an LZ series algorithm to obtain compressed data.
Preferably, the obtaining the similarity degree corresponding to each character segment in the data stream to be compressed when matching according to the number of the segmented sub-character segments when each character segment in the data stream to be compressed matches and the interval of the adjacent sub-character segments in the search buffer zone includes:
For the nth character segment in the data stream to be compressed:
The m-th interval in the lookup buffer for adjacent sub-segments when the n-th segment matches: taking the ratio of the length of the mth interval to the total length of the nth character segment as the mth interval duty ratio;
And obtaining the corresponding similarity degree when the nth character segment is matched according to all the interval duty ratios of adjacent sub-character segments in the searching buffer area and the number of the segmented sub-character segments when the nth character segment is matched in the data stream to be compressed.
Preferably, the following formula is used to calculate the corresponding similarity when the nth character segment is matched:
;
Wherein E n is the degree of similarity corresponding to the nth character segment when the nth character segment is matched, k is the number of sub-character segments segmented when the nth character segment is matched, x i is the length of the mth interval of the adjacent sub-character segments in the search buffer, L n is the total length of the nth character segment, and E is a natural constant.
Preferably, the determining the interpolation feasibility corresponding to each character segment based on the similarity degree includes:
If the similarity is greater than a preset similarity threshold, enabling interpolation feasibility corresponding to the corresponding character segment to be a preset first numerical value;
If the similarity is smaller than or equal to a preset similarity threshold, enabling interpolation feasibility corresponding to the corresponding character segment to be a preset second numerical value;
the preset first value is greater than the preset second value.
Preferably, filtering the target character segment based on the interpolation feasibility includes:
and taking the character segment with interpolation feasibility as a preset first numerical value as a target character segment.
Preferably, the obtaining interpolation rationality corresponding to each target character segment according to the number of single spacing areas, the number of multiple spacing areas and the similarity degree when each target character segment is matched includes:
For the a-th target character segment:
The number of the multiple interval areas when the a target character segment is matched is doubled as a first characteristic value; determining the sum of the first characteristic value, the constant 2 and the number of single interval areas in the timing of the a-th target character segment as the length of the compressed a-th target character segment in the timing of the a-th target character segment;
And obtaining interpolation rationality corresponding to the a-th target character segment based on the difference between the compressed length and the original length of the a-th target character segment and the corresponding similarity degree when the a-th target character segment is matched.
Preferably, obtaining interpolation rationality corresponding to the a-th target character segment based on a difference between the compressed length and the original length of the a-th target character segment and a similarity corresponding to the a-th target character segment when the a-th target character segment is matched, includes:
calculating the ratio of the compressed length of the a-th target character segment to the original length of the a-th target character segment, and calculating the difference between a constant 1 and the ratio;
And determining the product of the similarity degree corresponding to the difference value and the a-th target character segment when the difference value is matched with the a-th target character segment as interpolation rationality corresponding to the a-th target character segment.
Preferably, the filtering the character segment to be compressed based on the interpolation rationality includes:
and determining the target character segment with interpolation rationality larger than a preset rationality threshold as the character segment to be compressed.
Preferably, the compressing the character segment to be compressed by adopting the LZ series algorithm to obtain compressed data includes:
When any character segment to be compressed is compressed by adopting an LZ series algorithm, the format of the tuple stored data is specifically as follows: (offset, single character, matching length, interpolation amount, interpolation length).
Preferably, the LZ series algorithm is adopted to match the characters in the data stream to be compressed.
The invention has at least the following beneficial effects:
According to the method, characters in a data stream to be compressed are matched, according to the number of segmented sub-character segments in the data stream to be compressed when each character segment is matched and the interval of adjacent sub-character segments in a search buffer zone, the corresponding similarity degree of each character segment in the data stream to be compressed is obtained, the smaller the similarity degree is, the more places needing interpolation are described, namely the longer the time length needed to be calculated by an encoder is required, so that the method further determines the interpolation feasibility corresponding to each character segment based on the similarity degree, further screens out target character segments, obtains the interpolation rationality corresponding to each target character segment according to the number of single interval regions, the number of multi-interval regions and the similarity degree when each target character segment is matched, and the shorter the calculation time is when the target character segment with the larger interpolation rationality is subjected to compression processing, the space is saved after the compression processing.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a gateway data intelligent optimization method according to an embodiment of the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description is given to a gateway data intelligent optimization method according to the present invention with reference to the accompanying drawings and the preferred embodiments.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the gateway data intelligent optimization method provided by the invention with reference to the accompanying drawings.
An embodiment of a gateway data intelligent optimization method comprises the following steps:
The specific scene aimed at by this embodiment is: the embodiment mainly adopts an LZ series algorithm to compress a data stream to be compressed, when the data stream to be compressed is compressed, under the condition of ensuring that a window of the LZ algorithm is unchanged, namely under the condition of unchanged searching time, bytes in a look-up buffer zone are subjected to fuzzy matching in the look-up buffer zone, characters in the look-up buffer zone are found out, compared with characters in the look-up buffer zone, the characters in the look-up buffer zone are 'incomplete', and the 'incomplete' part is 'complement' and marked, so that the effect of fusing a plurality of characters into one character is achieved, and the purpose of saving space is achieved; and deleting the 'deficiency' character in the decompression process to achieve the purpose of lossless compression.
The embodiment provides a gateway data intelligent optimization method, as shown in fig. 1, which comprises the following steps:
step S1, a data stream to be compressed is obtained.
In this embodiment, the data file is collected by the collection module first, where the data file may include an image, a numerical data file, and the like. And reading the acquired data file, finding the starting position of the data to be compressed, and constructing a data stream to be compressed.
To this end, a data stream to be compressed is obtained.
S2, constructing a query window, wherein the query window comprises a look-ahead buffer area and a look-up buffer area; based on the query window, respectively matching characters in the data stream to be compressed, and obtaining the corresponding similarity degree when each character segment in the data stream to be compressed is matched according to the number of segmented sub-character segments when each character segment in the data stream to be compressed is matched and the interval of adjacent sub-character segments in a search buffer zone; determining interpolation feasibility corresponding to each character segment based on the similarity degree; and screening the target character segment based on the interpolation feasibility.
The embodiment obtains the data stream to be compressed, and then the embodiment compresses the data stream to be compressed based on the LZ series algorithm, wherein in the data compression of the LZ series algorithm, not only the identical characters but also similar characters exist, and the characters can be represented by the tuple of the LZ algorithm through a series of operations, so that the effects of fusing a plurality of groups of characters into a group and reducing the compression space are achieved.
In the compression process of the LZ series algorithm, characters in a preceding buffer area and characters in a searching buffer area in a compressed dynamic window are matched, the matched characters are represented, and when the matching is carried out, the matching is carried out on identical characters, fuzzy matching is also needed, and characters with higher similarity are also matched. The embodiment provides a method for filling the defect of the 'incomplete' character, marks the defect character and the defect position after filling the defect, and compresses the character string after filling the defect, thereby achieving the purpose of reducing the tuple length. The character length after the character string is marked with the deficiency is increased compared with the number of the character with the deficiency, and the interpolation rationality of the deficiency needs to be judged at the moment, so that whether the interpolation has a gain effect or not is judged; for similar but too far apart or too short two character lengths in the look-up buffer, the interpolation markers will increase the characters anyway, undoubtedly reducing the compression effect. Based on this, the present embodiment will determine whether interpolation of the fuzzy matched character is required in combination with the interpolation feasibility and interpolation rationality.
The LZ series algorithm finds out the same characters before and after the matching of the same parts before and after the data stream, and uses the characters stored in the dictionary to represent the characters inputted into the encoder, wherein the searching operation needs to be completed in a window, so this embodiment first constructs a searching window and prescribes a searching mode, the size of the window is selected to be 96KB, the size of the look-ahead buffer is 32KB, the size of the look-ahead buffer is 64KB, and because there may be a situation that the look-ahead buffer is extremely similar to the characters of the look-ahead buffer during the matching, the length of the look-ahead buffer is 2 times that of the look-ahead buffer in this embodiment, and in specific application, the implementer can set according to specific situations.
When the query window performs sliding search in the data stream to be compressed, some characters corresponding to the data stream to be compressed are reserved in each sliding, and after any sliding, all characters corresponding to the advance buffer zone in the query window form a character segment; each sliding corresponds to a character segment. It should be noted that: in the embodiment, when searching and matching are performed on characters in a data stream to be compressed, only character segments with the length being more than 3 are searched and matched, and the character segments needing to be searched and matched are recorded as character segments to be matched.
When searching and matching any character segment in the data stream to be compressed, starting from the last bit of the character segment, matching the searching buffer area, and after finding the character matched with the last bit, continuing searching forward, wherein three conditions can occur in the searching and matching process, namely complete matching, interval matching and non-matching. A perfect match means that there is a perfect match with the character segment in the lookup buffer, so the conventional LZ algorithm compression is performed, with the number of sub-character segments split at the time of a perfect match being 1. The interval matching indicates that, from the last character, a plurality of character segments are stored in the search buffer, and the character segments can be combined into character segments of the advance buffer, but intervals exist among the character segments, the character segments of the interval matching are compressed by adopting the LZ optimization algorithm provided by the embodiment, and the character segments of the advance buffer are divided into a plurality of segments in the search buffer during interval matching, and each segment is a sub-character segment. The mismatch indicates that the two sections of characters are different and are not compressed, and it is to be noted that the character sections mentioned later in this embodiment are all character sections to be searched for a match.
The LZ optimization algorithm in this embodiment matches the same character from the last character of the character segment in sequence forward, if the last character is not matched successfully, all the characters in front of the last character are not matched, if the last character is matched to the first character of the character segment in the range, the description is completed, otherwise, the description is not matched, and because of fuzzy matching, the compression time is increased due to the overlarge query range. According to the method, the corresponding similarity degree of each character segment in the data stream to be compressed is obtained according to the number of segmented sub-character segments when the character segments are matched and the interval of adjacent sub-character segments in the search buffer zone.
For the nth character segment in the data stream to be compressed:
The m-th interval in the lookup buffer for adjacent sub-segments when the n-th segment matches: taking the ratio of the length of the mth interval to the total length of the nth character segment as the mth interval duty ratio; by adopting the method, the interval duty ratio of each interval of adjacent sub-character segments in the search buffer area when the nth character segment is matched can be obtained. And obtaining the corresponding similarity degree when the nth character segment is matched according to all the interval duty ratios of adjacent sub-character segments in the searching buffer area and the number of the segmented sub-character segments when the nth character segment is matched in the data stream to be compressed. The specific calculation formula of the corresponding similarity degree when the nth character segment is matched is as follows:
;
Wherein E n is the degree of similarity corresponding to the nth character segment when the nth character segment is matched, k is the number of sub-character segments segmented when the nth character segment is matched, x i is the length of the mth interval of the adjacent sub-character segments in the search buffer, L n is the total length of the nth character segment, and E is a natural constant.
The number of sub-segments segmented and the length of the space when the nth character segment is matched can reflect the segmentation degree of the nth character segment, and the more the number of sub-segments segmented, the more places where interpolation is needed, namely the more time needed for the encoder to calculate. When the number of the segmented sub-character segments is larger and the interval length of the adjacent sub-character segments in the searching buffer area is longer when the nth character segment is matched, the similarity degree of the nth character segment and characters in the searching buffer area is smaller when the nth character segment is searched and matched; when only k=1, interval is 0, where the nth character segment matches exactly the character of the lookup buffer.
The greater the degree of similarity corresponding to the nth character segment matching, i.e., the fewer the number of segments to be segmented and the fewer the distance between the segments to be segmented, the fewer bytes required to interpolate the segments, the shorter the time it takes for the encoder to take, so this embodiment will determine the interpolation feasibility based on the degree of similarity. Specifically, if the similarity is greater than a preset similarity threshold, making the interpolation feasibility corresponding to the corresponding character segment be a preset first numerical value; if the similarity is smaller than or equal to a preset similarity threshold, enabling interpolation feasibility corresponding to the corresponding character segment to be a preset second numerical value; the preset first value is greater than the preset second value. The specific judgment formula of interpolation feasibility corresponding to the nth character segment is as follows:
;
Wherein G n is interpolation feasibility corresponding to the nth character segment, T 1 is a preset first value, and T 2 is a preset second value.
In this embodiment, the preset first value is 1, and the preset second value is 0, and in a specific application, an implementer may set according to a specific situation.
The interpolation feasibility is that the number of times of segmentation of a character segment with a preset first value is small, and the interval between adjacent sub-character segments after segmentation is short, so that the distance required to be interpolated is short, and the maximum interpolation time allowed by an encoder at the moment is longer; as the degree of similarity approaches 1, the interpolation time approaches 0.
When the dynamic query window slides across the data stream, interpolation feasibility is calculated for the character strings in the look-ahead buffer, wherein the feasibility is character strings with preset first values, and the following interpolation step is executed, namely, the character segments with preset first values are used as target character segments. The embodiment screens the character segments based on interpolation feasibility, and screens target character segments from all character segments.
Step S3, obtaining interpolation rationality corresponding to each target character segment according to the number of single interval areas, the number of multiple interval areas and the similarity degree when each target character segment is matched; and filtering the character segment to be compressed based on the interpolation rationality.
The LZ series algorithm adopts a triplet mode for characters stored under a dictionary, and the specific format is as follows: (offset, match length, next character) the space occupied is 3 bytes and if tuple representation is used for a single matched character the space occupied is increased, so the matched data stream needs to have a minimum representation length, selecting a character segment with a length greater than 3. The interpolation method used in this embodiment, if using LZ series algorithm tuples to represent the length and offset of the interval part, requires two bytes for each interval part, but there is a case that a segment of data stream is divided by a plurality of single characters, and the label for the interval part needs to be represented by two bytes, where the compressed byte length is equal to or even more time-consuming and space-consuming than the original LZ algorithm, and the purpose of reducing the computation time of the compressed space is not achieved.
In order to solve the problem of single character compression, the embodiment redefines the format of the data stored in the tuple based on the tuple of the original LZ series algorithm, and the specific form is as follows: (offset, single character, matching length, interpolation length), offset and matching length are consistent with conventional LZ algorithms, single characters of the interval are represented using their relative positions in the lookup buffer of the data stream offset, while for multi-interval characters use < interpolation, interpolation length > is represented, interpolation is the relative position with offset, interpolation length is the length of the interval. The matching length is the full length of the string, and therefore the matching length is the largest value in the tuple, and on this basis, the matching length represents the position of a single space before and the interpolation area of multiple spaces after.
The interpolation method for the data stream specifically comprises the following steps: firstly, selecting a data stream with interpolation feasibility as a preset first value, finding the position of a first character in a search buffer zone, marking the position as an offset, traversing from front to back from the offset, recording each single character to a tuple as single character bits, recording multiple intervals (interpolation quantity and interpolation length), and finally putting a matching length between the two, thus completing the construction of the tuple.
The tuple comprises an offset, a single interval region, a matching length and a plurality of interval regions, the length of the character segments is flexible and changeable, and the interval distance is also changeable, so that the interval and the length of the character segments are not required in the embodiment, but the compression result is not necessarily required for practical situations. Taking the example of the matched data stream 123456789, if the look-ahead buffer reads 1235789 characters, then the compressed byte is 1469; when the forward buffer area reads 12569, the compressed byte is 193272, and the byte is unchanged from the original byte; when the look-ahead buffer reads 13579, the compressed byte is 124689, which is rather increased compared to the original data. The compression results of the three character segments are different, in order to reduce the compression time while ensuring the compression effect, in this embodiment, it is necessary to determine whether the compression results are reasonable, for the characters with multiple intervals, the compressed data must be shorter, and the longer the data stream, the better the compression effect, but when the intervals are more single characters, the compression rationality needs to be considered. Based on this, in this embodiment, according to the number of single-interval regions, the number of multiple-interval regions and the similarity degree when each character segment in the data stream to be compressed is matched, the interpolation rationality corresponding to each target character segment is obtained.
For the a-th target character segment:
The number of the multiple interval areas when the a target character segment is matched is doubled as a first characteristic value; determining the sum of the first characteristic value, the constant 2 and the number of single interval areas in the timing of the a-th target character segment as the length of the compressed a-th target character segment in the timing of the a-th target character segment; calculating the ratio of the compressed length of the a-th target character segment to the original length of the a-th target character segment, and calculating the difference between a constant 1 and the ratio; it should be noted that: the original length of the a-th target character segment is the length of the a-th target character segment when the a-th target character segment is not compressed. And determining the product of the similarity degree corresponding to the difference value and the a-th target character segment when the difference value is matched with the a-th target character segment as interpolation rationality corresponding to the a-th target character segment. The specific calculation formula of interpolation rationality corresponding to the a-th target character segment is as follows:
;
Wherein P a is interpolation rationality corresponding to the a-th target character segment, l 1 is the number of single interval areas when the a-th target character segment is arranged, l 2 is the number of single interval areas when the a-th target character segment is arranged, For the original length of the a-th target character segment,The corresponding similarity degree when the a-th target character segment is matched.
For representing the compressed length of the a-th target character segment. When/>When the length is positive, the length after compression is smaller than the original length, so that the compression of the a-th target character segment is more reasonable, namely the interpolation rationality corresponding to the a-th target character segment is larger than 0; when/>When the length is 0, the length after compression is equal to the original length, the occupied space is unchanged compared with the original byte, and the compression is unreasonable at the moment, namely the interpolation rationality corresponding to the a-th target character segment is equal to 0; when/>When the number is negative, the length after compression is larger than the original length, the space occupied by the compressed material is larger, and the purpose of compression is not achieved. When the degree of similarity is larger, the interval of interpolation time consumed is smaller, the compression efficiency is higher, and at this time, the value of interpolation rationality is larger.
In the embodiment, a target character segment with interpolation rationality larger than a preset rationality threshold is determined as a character segment to be compressed. The preset rationality threshold in this embodiment is 0, and in a specific application, the implementer may set according to the specific situation.
So far, the character segments to be compressed are screened out by adopting the method provided by the embodiment, and the character segments to be compressed are subsequently compressed.
And S4, compressing the character segment to be compressed by adopting an LZ series algorithm to obtain compressed data.
In the embodiment, the character segments are screened in the steps, the character segments to be compressed are determined, the character segments to be compressed are more suitable for compression processing, the compression efficiency of compressing the character segments to be compressed is higher, and a large amount of space can be saved, so that the character segments to be compressed are compressed.
Specifically, compression processing is carried out on the character segment to be compressed by adopting an LZ series algorithm, and compressed data are obtained. And after the data is compressed, the data is transmitted and stored according to the requirement. The LZ series algorithm is prior art and will not be described in detail here.
The decompression process is that when the character input into the decoder is a common character, the character is directly output; when the input is a tuple, firstly reading a first bit offset, then reading the position of a character corresponding to the offset of each subsequent bit, deleting the characters at the positions until the maximum value is read, wherein the maximum value is a matching length, then reading once every two bits, for interpolation relative positions and interpolation lengths, deleting the matched region in the character section of the matching length after the offset, and outputting the final character, namely the original character. Compared with the original data stream, the data stream is unchanged, and belongs to lossless compression of LZ series algorithms.
According to the method, characters in a data stream to be compressed are matched, according to the number of segmented sub-character segments when each character segment in the data stream to be compressed is matched and the interval between adjacent sub-character segments in a search buffer area, the corresponding similarity degree when each character segment in the data stream to be compressed is obtained, the smaller the similarity degree is, the more places needing interpolation are described, namely the longer the time length needed to be calculated by an encoder is required, therefore, the method further determines the interpolation feasibility corresponding to each character segment based on the similarity degree, further screens out target character segments, and according to the number of single interval areas, the number of multiple interval areas and the similarity degree when each target character segment is matched, the calculation time is shorter when the target character segment with the larger interpolation rationality is subjected to compression processing, the space is saved after the compression processing, the character segments to be compressed are selected based on the interpolation rationality, the compression processing is performed on the character segments to be compressed by adopting an LZ series algorithm, the effect of combining a plurality of characters into one character is guaranteed, and the compression efficiency is improved.
It should be noted that: the foregoing description of the preferred embodiments of the present invention is not intended to be limiting, but rather, any modifications, equivalents, improvements, etc. that fall within the principles of the present invention are intended to be included within the scope of the present invention.

Claims (7)

1. The intelligent gateway data optimization method is characterized by comprising the following steps of:
Acquiring a data stream to be compressed;
Constructing a query window, wherein the query window comprises a look-ahead buffer area and a look-up buffer area; based on the query window, respectively matching characters in the data stream to be compressed, and obtaining the corresponding similarity degree when each character segment in the data stream to be compressed is matched according to the number of segmented sub-character segments when each character segment in the data stream to be compressed is matched and the interval of adjacent sub-character segments in a search buffer zone; determining interpolation feasibility corresponding to each character segment based on the similarity degree; screening a target character segment based on the interpolation feasibility;
Obtaining interpolation rationality corresponding to each target character segment according to the number of single interval areas, the number of multiple interval areas and the similarity degree when each target character segment is matched; screening character segments to be compressed based on the interpolation rationality;
compressing the character segment to be compressed by adopting an LZ series algorithm to obtain compressed data;
The method for obtaining the similarity degree corresponding to each character segment in the data stream to be compressed when the character segments are matched according to the number of the segmented sub-character segments in the data stream to be compressed and the interval of the adjacent sub-character segments in the search buffer zone comprises the following steps:
For the nth character segment in the data stream to be compressed:
The m-th interval in the lookup buffer for adjacent sub-segments when the n-th segment matches: taking the ratio of the length of the mth interval to the total length of the nth character segment as the mth interval duty ratio;
obtaining the corresponding similarity degree when the nth character segment is matched according to all interval duty ratios of adjacent sub-character segments in the searching buffer area and the number of the segmented sub-character segments when the nth character segment is matched in the data stream to be compressed;
the corresponding similarity degree when the nth character segment is matched is calculated by adopting the following formula:
wherein E n is the degree of similarity corresponding to the nth character segment when the nth character segment is matched, k is the number of sub-character segments segmented when the nth character segment is matched, x i is the length of the mth interval of the adjacent sub-character segments in the search buffer, L n is the total length of the nth character segment, and E is a natural constant;
obtaining interpolation rationality corresponding to each target character segment according to the number of single spacing areas, the number of multiple spacing areas and the similarity degree when each target character segment is matched, wherein the interpolation rationality comprises the following steps:
For the a-th target character segment:
The number of the multiple interval areas when the a target character segment is matched is doubled as a first characteristic value; determining the sum of the first characteristic value, the constant 2 and the number of single spacing areas when the a-th target character segment is matched as the length of the compressed a-th target character segment;
Obtaining interpolation rationality corresponding to the a-th target character segment based on the difference between the compressed length and the original length of the a-th target character segment and the corresponding similarity degree when the a-th target character segment is matched;
Wherein the single interval region is that the number of characters existing between two characters is 1, and the multiple interval region is that the number of characters existing between two characters is greater than or equal to 2.
2. The intelligent optimization method for gateway data according to claim 1, wherein the determining the interpolation feasibility corresponding to each character segment based on the similarity degree includes:
If the similarity is greater than a preset similarity threshold, enabling interpolation feasibility corresponding to the corresponding character segment to be a preset first numerical value;
If the similarity is smaller than or equal to a preset similarity threshold, enabling interpolation feasibility corresponding to the corresponding character segment to be a preset second numerical value;
the preset first value is greater than the preset second value.
3. The intelligent optimization method of gateway data according to claim 2, wherein the filtering the target character segment based on the interpolation feasibility comprises:
and taking the character segment with interpolation feasibility as a preset first numerical value as a target character segment.
4. The intelligent optimization method of gateway data according to claim 1, wherein obtaining interpolation rationality corresponding to the a-th target character segment based on a difference between a compressed length and an original length of the a-th target character segment and a similarity corresponding to the a-th target character segment when the a-th target character segment is matched, comprises:
calculating the ratio of the compressed length of the a-th target character segment to the original length of the a-th target character segment, and calculating the difference between a constant 1 and the ratio;
And determining the product of the similarity degree corresponding to the difference value and the a-th target character segment when the difference value is matched with the a-th target character segment as interpolation rationality corresponding to the a-th target character segment.
5. The intelligent optimization method of gateway data according to claim 1, wherein the filtering the character segments to be compressed based on the interpolation rationality comprises:
and determining the target character segment with interpolation rationality larger than a preset rationality threshold as the character segment to be compressed.
6. The intelligent optimization method of gateway data according to claim 1, wherein the compressing the character segments to be compressed by adopting the LZ series algorithm to obtain compressed data comprises:
When any character segment to be compressed is compressed by adopting an LZ series algorithm, the format of the tuple stored data is specifically as follows: (offset, single character, matching length, interpolation amount, interpolation length).
7. The intelligent optimization method of gateway data according to claim 1, wherein the matching of characters in the data stream to be compressed is performed by adopting an LZ series algorithm.
CN202410257458.9A 2024-03-07 2024-03-07 Gateway data intelligent optimization method Active CN117896441B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410257458.9A CN117896441B (en) 2024-03-07 2024-03-07 Gateway data intelligent optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410257458.9A CN117896441B (en) 2024-03-07 2024-03-07 Gateway data intelligent optimization method

Publications (2)

Publication Number Publication Date
CN117896441A CN117896441A (en) 2024-04-16
CN117896441B true CN117896441B (en) 2024-05-24

Family

ID=90648996

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410257458.9A Active CN117896441B (en) 2024-03-07 2024-03-07 Gateway data intelligent optimization method

Country Status (1)

Country Link
CN (1) CN117896441B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002368625A (en) * 2001-06-11 2002-12-20 Fuji Xerox Co Ltd Encoding quantity predicting device, encoding selection device, encoder, and encoding method
JP2009187292A (en) * 2008-02-06 2009-08-20 Fuji Xerox Co Ltd Image processing apparatus and image processing program
CN116932493A (en) * 2022-03-30 2023-10-24 华为技术有限公司 Data compression method and related device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002368625A (en) * 2001-06-11 2002-12-20 Fuji Xerox Co Ltd Encoding quantity predicting device, encoding selection device, encoder, and encoding method
JP2009187292A (en) * 2008-02-06 2009-08-20 Fuji Xerox Co Ltd Image processing apparatus and image processing program
CN116932493A (en) * 2022-03-30 2023-10-24 华为技术有限公司 Data compression method and related device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Efficient regular expression matching on LZ77 compressed strings using negative factors;Yutong Han等;World Wide Web;20190323;全文 *
基于近似模式匹配的并行压缩算法的研究与实现;王乂冉;中国优秀硕士论文全文数据库;20200831;全文 *

Also Published As

Publication number Publication date
CN117896441A (en) 2024-04-16

Similar Documents

Publication Publication Date Title
EP0695040B1 (en) Data compressing method and data decompressing method
JPH11274938A (en) Adaptive probability estimation method, adaptive coding method and adaptive decoding method
CN109871362A (en) A kind of data compression method towards streaming time series data
US5353024A (en) Method for data compression having an improved encoding algorithm which utilizes a token stacking technique
CN101667843B (en) Methods and devices for compressing and uncompressing data of embedded system
CN115866287B (en) Efficient data transmission method for intelligent campus management platform
JP2000050268A (en) Image coding device
US6301394B1 (en) Method and apparatus for compressing data
CN115204754B (en) Heating power supply and demand information management platform based on big data
CN116684631B (en) Image compression method for document
JP2006526367A (en) Lossless high-speed image compression system based on adjacent comparison
CN115695564B (en) Efficient transmission method of Internet of things data
CN103702133A (en) Image compression display method and image compression display device
CN101299611A (en) Data compression method based on set run
CN115882866A (en) Data compression method based on data difference characteristic
CN107277109B (en) Multi-string matching method for compressed flow
CN116347517A (en) Quick transmission method for wireless network data
CN117896441B (en) Gateway data intelligent optimization method
CN114157305B (en) Method for rapidly realizing GZIP compression based on hardware and application thereof
CN117857648A (en) Big data-based construction engineering management cloud server communication method
CN107343203A (en) JPEG lossless compression methods based on OPEN EXR images
CN111327905A (en) Preprocessing method and system for realizing similar image compression based on FPGA
CN112506876B (en) Lossless compression query method supporting SQL query
CN112217521B (en) gZIP-based large file distributed compression method
CN107801031A (en) A kind of lossless compression-encoding method to pure three primary colors image data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant