CN117156014B

CN117156014B - Engineering cost data optimal storage method and system

Info

Publication number: CN117156014B
Application number: CN202311217477.0A
Authority: CN
Inventors: 梁艳香; 崔改孝; 彭青山
Original assignee: Zhejiang Huachi Project Management Consulting Co ltd
Current assignee: Zhejiang Huachi Project Management Consulting Co ltd
Priority date: 2023-09-20
Filing date: 2023-09-20
Publication date: 2024-03-12
Anticipated expiration: 2043-09-20
Also published as: CN117156014A

Abstract

The invention relates to the technical field of data compression and storage, in particular to a method and a system for optimizing and storing engineering cost data, which comprise the following steps: collecting engineering cost data; acquiring the size of a dictionary and a sliding window; in the process of encoding engineering cost data by using LZ77 encoding, acquiring each matching character string under each matching and the offset and length of each matching character string; according to each matching character string under each matching, acquiring the longest length of the matching character string of the next pre-matching corresponding to each matching character string under each matching; obtaining the optimal matching character strings according to the preference of each matching character string under each matching; compressing engineering cost data according to the optimal matching character string to obtain compressed data; the invention improves the compression efficiency by storing the compressed data to the server.

Description

Engineering cost data optimal storage method and system

Technical Field

The invention relates to the technical field of data compression and storage, in particular to an engineering cost data optimal storage method and system.

Background

The engineering cost refers to the cost and the expense involved in construction, civil engineering or other engineering projects, covers the expense of each stage from the previous planning to the construction completion of the engineering projects, can effectively manage and control the engineering projects according to engineering cost data, and is beneficial to ensuring that the engineering projects can be completed on time, quality and cost, so that the acquired engineering cost data of the projects need to be compressed and stored, and can be decompressed when the acquired engineering cost data need to be checked.

The LZ77 compression algorithm is a lossless compression algorithm based on a dictionary and sliding window, constructing a dictionary with long character strings that occur frequently, and using shorter numerical codes instead of more complex character strings. When data is compressed, characters to be compressed in a sliding window are matched with characters in a dictionary to obtain a matched character string, the matched character string in the data to be compressed is marked according to the distance value between the first character of the matched character string in the dictionary and the first character of the matched character string in the sliding window and the length of the matched character string, so that data is compressed, when the characters to be compressed in the sliding window are matched with the characters in the dictionary, a plurality of matched character strings can exist, the existing method is to acquire the longest matched character string for marking and coding, and if the distance value between the first character of the longest matched character string in the dictionary and the first character of the longest matched character string in the sliding window is larger, the corresponding binary code is longer, and the occupied storage space is larger when a compression result is stored later.

Disclosure of Invention

In order to solve the problems, the invention provides an engineering cost data optimized storage method and system.

The invention relates to an engineering cost data optimized storage method which adopts the following technical scheme:

an embodiment of the present invention provides a method for optimally storing engineering cost data, comprising the steps of:

collecting engineering cost data;

acquiring the size of a dictionary and a sliding window; in the process of encoding engineering cost data by using LZ77 encoding, acquiring each matching character string under each matching; according to each matching character string under each matching, acquiring the offset and the length of each matching character string under each matching; according to each matching character string under each matching, acquiring the longest length of the matching character string of the next pre-matching corresponding to each matching character string under each matching; according to the offset and the length of each matching character string under each matching, combining the longest length of the matching character string, corresponding to each matching character string under each matching, of the next pre-matching, and acquiring the preference of each matching character string under each matching; acquiring the optimal matching character strings under each matching according to the preference of each matching character string under each matching;

compressing engineering cost data according to the sizes of the dictionary and the sliding window and the optimal matching character strings under each matching to obtain compressed data;

the compressed data is stored to a server.

Preferably, the step of obtaining the size of the dictionary and the sliding window includes the following specific steps:

the size of the preset sliding window is N, and the size of the preset dictionary is 10 XN.

Preferably, in the process of encoding the engineering cost data by using the LZ77 encoding, each matching character string under each matching is obtained, which comprises the following specific steps:

in the process of coding engineering cost data by using LZ77 coding, the process of matching the character to be compressed in the sliding window with the character in the dictionary at any time is recorded as the current matching, the first character in the sliding window under the current matching is obtained as a character string to be matched, when the character string identical to the character string to be matched exists in the dictionary, all the character strings identical to the character string to be matched in the dictionary are obtained, the character string closest to the character string to be matched is taken as the matching character string of the character string to be matched, and the character string to be matched is called the last character string to be matched; when the last character string to be matched has a matching character string, the first two characters in the sliding window under the current matching are obtained to be used as new character strings to be matched, when the character strings which are the same as the new character strings to be matched exist in the dictionary, all the character strings which are the same as the new character strings to be matched in the dictionary are obtained, the character string closest to the new character strings to be matched is used as the matching character string of the new character strings to be matched, and the new character strings to be matched are called as the last character strings to be matched; and analogizing is carried out until the last character string to be matched does not exist, and iteration is stopped;

and taking all the obtained matching character strings as each matching character string under the current matching.

Preferably, the step of obtaining the offset and the length of each matching string under each matching according to each matching string under each matching includes the following specific steps:

the method comprises the steps of obtaining the distance between the first character of each matching character string in a dictionary under each matching and the first character of the corresponding character string to be matched in a sliding window of each matching character string, taking the distance as the offset of each matching character string under each matching, and obtaining the number of characters of each matching character string under each matching, and taking the number of characters of each matching character string under each matching as the length of each matching character string under each matching.

Preferably, the obtaining the longest length of the matching string of the next pre-match corresponding to each matching string of each match according to each matching string of each match includes the following specific steps:

and for any matching character string under the current matching, adding a sliding window under the current matching and the dictionary to slide rightwards according to the length of the matching character string to obtain a sliding window under the next pre-matching and the dictionary, carrying out longest matching on the character to be compressed in the sliding window under the next pre-matching and the character in the dictionary, and taking the length of the obtained matching result as the longest length of the matching character string under the next pre-matching corresponding to the matching character string under the current matching.

Preferably, the obtaining the preference of each matching string under each matching according to the offset and the length of each matching string under each matching and combining the longest length of the matching string of the next pre-matching corresponding to each matching string under each matching includes the following specific steps:

,

wherein, p represents the preference of any matching character string under the current matching; l (L) ₁ Representing the length of any matching character string under the current matching; l (L) ₂ Representing the longest length of the matching character string of the next pre-matching corresponding to any matching character string under the current matching; d represents the offset of any matching character string under the current matching; d, d _max Representing the maximum offset of all matching strings under the current matching; exp () represents an exponential function based on a natural constant.

Preferably, the obtaining the optimal matching character string under each matching according to the preference of each matching character string under each matching includes the following specific steps:

and obtaining the matching character string with the largest preference under each matching as the optimal matching character string under each matching.

Preferably, the method compresses the engineering cost data according to the sizes of the dictionary and the sliding window and the optimal matching character string under each matching to obtain compressed data, and comprises the following specific steps:

sequentially preloading characters in engineering cost data into a sliding window according to the size of the sliding window to serve as characters to be compressed, wherein the dictionary is empty; firstly, encoding a first character in a sliding window, encoding the first character into the first character, and outputting the encoding;

after the sliding window and the dictionary are moved rightwards by one character length, matching the character to be compressed in the sliding window with the character in the dictionary, if no matching character string exists, encoding a first character in the sliding window, encoding the first character into the first character, outputting the encoding, and moving the sliding window and the dictionary rightwards by one character length for next matching;

if a plurality of matching character strings exist, acquiring an optimal matching character string according to the preference degree of each matching character string, taking the offset of the optimal matching character string, the length of the optimal matching character string and the next character of the character string to be matched corresponding to the optimal matching character string in the sliding window as the code of the optimal matching character string, outputting the code, and moving the sliding window and the dictionary to the right according to the length of the optimal matching character string to perform the next matching code;

and the like, stopping iteration until the sliding window is empty, and forming a coding sequence by all the output codes according to the output sequence to serve as compressed data of engineering cost data.

Preferably, the storing the compressed data in the server includes the following specific steps:

the compressed data is converted into binary numbers and stored in a server.

The embodiment of the invention provides an engineering cost data optimization storage system, which comprises the following modules:

the data acquisition module is used for acquiring engineering cost data;

the preference obtaining module is used for obtaining the size of the dictionary and the sliding window; in the process of encoding engineering cost data by using LZ77 encoding, acquiring each matching character string under each matching; according to each matching character string under each matching, acquiring the offset and the length of each matching character string under each matching; according to each matching character string under each matching, acquiring the longest length of the matching character string of the next pre-matching corresponding to each matching character string under each matching; according to the offset and the length of each matching character string under each matching, combining the longest length of the matching character string, corresponding to each matching character string under each matching, of the next pre-matching, and acquiring the preference of each matching character string under each matching; acquiring the optimal matching character strings under each matching according to the preference of each matching character string under each matching;

the project cost data compression module compresses project cost data according to the sizes of the dictionary and the sliding window and the optimal matching character strings under each matching to obtain compressed data;

and the compressed data storage management module is used for storing the compressed data to the server.

The technical scheme of the invention has the beneficial effects that: in the invention, each matching character string under each matching is obtained in the process of encoding engineering cost data by using LZ77 encoding; according to each matching character string under each matching, acquiring the offset and the length of each matching character string under each matching; according to each matching character string under each matching, acquiring the longest length of the matching character string of the next pre-matching corresponding to each matching character string under each matching; according to the offset and the length of each matching character string under each matching and the longest length of the matching character string of the next pre-matching corresponding to each matching character string under each matching, acquiring the preference of each matching character string under each matching; acquiring the optimal matching character strings under each matching according to the preference of each matching character string under each matching; compressing engineering cost data according to the sizes of the dictionary and the sliding window and the optimal matching character strings under each matching to obtain compressed data; according to the optimization degree of each matching character string under each matching, the optimal matching character string under each matching is obtained for coding, so that the compression efficiency of the obtained optimal matching character string is not affected under the condition of smaller offset, and the situation that the binary number corresponding to the offset is stored in the follow-up process due to the fact that the offset is larger when the longest matching character string is directly obtained for coding is avoided.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of steps of an optimized storage method for project cost data according to the present invention;

FIG. 2 is a system block diagram of an engineering cost data optimized storage system of the present invention;

FIG. 3 is a schematic diagram of matching of a sliding window and a dictionary.

Detailed Description

In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following is a detailed description of the specific implementation, structure, characteristics and effects of an engineering cost data optimizing storage method according to the invention with reference to the attached drawings and the preferred embodiment. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following specifically describes a specific scheme of the engineering cost data optimizing and storing method provided by the invention with reference to the attached drawings.

Referring to fig. 1, a flowchart of steps in an optimized storage method for engineering cost data according to an embodiment of the present invention is shown, where the method includes the following steps:

s001, collecting engineering cost data.

The cost of engineering projects in the design stage is collected, including building design, structural design, drainage design cost and the like, and the cost of engineering projects in the construction stage is collected, including labor cost, material collection cost and the like, and the collected cost of engineering projects is recorded as engineering cost data.

S002, acquiring a dictionary and the size of a sliding window.

Note that the LZ77 compression algorithm is a lossless compression algorithm based on a dictionary and a sliding window, constructs a dictionary from the encoded characters, and uses a shorter number code instead of a more complex character string. When data is compressed, the characters to be compressed in the sliding window are matched with the coded characters in the dictionary to obtain a matched character string, the matched character string in the data to be compressed is marked according to the distance value of the first character of the matched character string in the dictionary and the length of the matched character string, so that data is compressed, and because the LZ77 compression algorithm needs to specify the sizes of the dictionary and the sliding window before coding, the larger dictionary can usually accommodate more coded characters, repeated character strings in engineering cost data can be better utilized, when the characters in the dictionary and the data to be coded in the sliding window are matched, more and longer matched character strings can be obtained, therefore, when the size of the dictionary is set, the size of the dictionary is larger than the size of the sliding window, when the data to be coded in the sliding window and the coded characters in the dictionary are matched, the length of the obtained matched character string is as long as possible, and therefore in the embodiment, the size of the sliding window is set according to the value, and more characters are accommodated according to the set size of the sliding window.

In the embodiment of the present invention, the size of the sliding window is set to N according to the empirical value, the size of the dictionary is set to 10×n, in the embodiment of the present invention, the size of the sliding window is set to n=10, and in other embodiments, the operator can set the value of N according to the specific implementation situation.

The size of the dictionary and the size of the sliding window are obtained, and the construction cost data can be conveniently compressed according to the size of the dictionary and the size of the sliding window.

S003, acquiring the preference of each matching character string under the current matching, and acquiring the optimal matching character string under the current matching according to the preference of each matching character string under the current matching to encode.

It should be noted that, after setting a larger dictionary, in the process of compressing engineering cost data by using LZ77 coding, when matching a character to be compressed in a sliding window with a character in the dictionary, there may be a plurality of matching character strings, and the existing method is to use the longest matching character string to perform coding, but if the distance value between the first character of the longest matching character string in the dictionary and the first character of the longest matching character string in the sliding window is larger, that is, when the offset is larger, the binary code corresponding to the offset is longer when the compression result is stored subsequently, the compression efficiency is reduced, and further, because the longer matching character string can eliminate more repeated information in the character to be compressed, thereby improving the compression efficiency, if the offset of any matching character string under the current matching is smaller, and when the length is larger, the matching character string should be selected for coding at the current matching.

It should be further noted that, in the LZ77 encoding process, if the current matching is completed, the sliding window and the dictionary slide rightward, and then the next matching is performed on the character to be compressed in the sliding window and the character in the dictionary, so that the length of the matching character string obtained by the next matching is affected by the matching character string obtained by the current matching, if the length of the longest matching character string obtained by the next matching becomes smaller due to the optimal matching character string obtained by the current matching, more repeated information in the character to be compressed cannot be eliminated by the longest matching character string obtained by the next matching, and the compression efficiency of the next matching is reduced, so that if the preference of each matching character string is obtained only according to the offset and the length of each matching character string under the current matching, the length of the longest matching character string obtained by the next pre-matching corresponding to each matching character string under the current matching needs to be combined, and if the offset of any one matching character string under the current matching is smaller and the length of the longest matching character string obtained by the next matching is larger, and the preferred matching character string obtained by the next matching is higher.

In the embodiment of the invention, in the process of encoding engineering cost data by LZ77 encoding, the process of matching the character to be compressed in the sliding window with the character in the dictionary at any time is recorded as the current matching, and each matching character string under the current matching and the offset and length of each matching character string are obtained: acquiring a first character in a sliding window under the current matching as a character string to be matched, and acquiring a character string closest to the character string to be matched from all character strings which are the same as the character string to be matched in the dictionary when the character string which is the same as the character string to be matched exists in the dictionary, wherein the character string closest to the character string to be matched is used as the matching character string of the character string to be matched, and the character string to be matched is called as the last character string to be matched; when the last character string to be matched has a matching character string, the first two characters in the sliding window under the current matching are obtained to be used as new character strings to be matched, when the character strings which are the same as the new character strings to be matched exist in the dictionary, all the character strings which are the same as the new character strings to be matched in the dictionary are obtained, the character string closest to the new character strings to be matched is used as the matching character string of the new character strings to be matched, and the new character strings to be matched are called as the last character strings to be matched; and analogizing is carried out until the last character string to be matched does not exist, and iteration is stopped;

The method comprises the steps of obtaining the distance between the first character of each matching character string under current matching in a dictionary and the first character of each matching character string to be matched corresponding to each matching character string in a sliding window, taking the distance as the offset of each matching character string under current matching, and obtaining the number of characters of each matching character string under current matching, and taking the number of characters of each matching character string under current matching as the length of each matching character string under current matching.

For example: referring to fig. 3, the characters in the dictionary are: the characters to be matched in the sliding window are ddaabacldbb: aabca;

the first character of the sliding window is obtained to be a and is matched with the characters in the dictionary, and the matching result is that: a third character, a fourth character and a seventh character in the dictionary, wherein the seventh character in the dictionary is closest to the first character a of the sliding serial port, and the seventh character in the dictionary is used as a matching character string; the offset of the matching character string is 3, and the length is 1;

the first two characters in the sliding window are obtained to be aa and matched with the characters in the dictionary, and the matching result is that: the character string spliced by the third character and the fourth character in the dictionary is used as a matching character string; the offset of the matching character string is 6, and the length is 2; and so on, stopping iteration until the obtained character strings in the sliding window do not have the matching character strings.

Obtaining the longest length of the matching character string of the next pre-matching corresponding to each matching character string under the current matching: any matching character string under the current matching is obtained, a sliding window and a dictionary under the current matching are added with forward sliding according to the length of the matching character string, the next pre-matching sliding window and the dictionary are obtained, the characters to be compressed in the next pre-matching sliding window are matched with the characters in the dictionary to the longest, and the length of the obtained matching result is used as the longest length of the next pre-matching character string corresponding to the matching character string under the current matching.

In the embodiment of the invention, the preference degree of each matching character string under the current matching is obtained:

,

wherein, p represents the preference of any matching character string under the current matching; l (L) ₁ Representing the length of any matching character string under the current matching; l (L) ₂ Representing the longest length of the matching character string of the next pre-matching corresponding to any matching character string under the current matching; d represents the offset of any matching character string under the current matching; d, d _max Representing the maximum offset of all matching strings under the current matching; exp () represents an exponential function based on a natural constant; when the offset of any matching character string under the current matching is smaller and the length is larger, and the length of the longest matching character string obtained by the next pre-matching corresponding to the matching character string is larger, the higher the preference of the matching character string under the current matching is, the more the matching character string under the current matching should be selected as the optimal matching character string under the current matching.

And obtaining the matching character string with the highest preference under the current matching as the optimal matching character string under the current matching.

Therefore, according to the offset and the length of each matching character string under the current matching and the length of the longest matching character string obtained by the next pre-matching corresponding to each matching character string under the current matching, the preference of each matching character string under the current matching is obtained, and according to the preference of each matching character string under the current matching, the optimal matching character string under the current matching is obtained for encoding, so that the compression efficiency of the obtained optimal matching character string is not affected under the condition that the offset is smaller, and the problem that the offset possibly existing when the longest matching character string is directly obtained for encoding is larger, so that the subsequent binary number corresponding to the offset is stored occupies larger storage space is avoided.

S004, compressing engineering cost data by using LZ77 codes to obtain compressed data.

In the process of coding the engineering cost data by using the LZ77 coding algorithm, when matching the characters in the sliding window with the characters in the dictionary, the existing method is to use the longest matching character string to perform the marker coding, but the offset corresponding to the longest matching character string may be larger, so that when the binary number corresponding to the offset is stored subsequently, a larger storage space is occupied, therefore, if a plurality of matching character strings exist when the characters in the sliding window are matched with the characters in the dictionary, the optimal matching character string is obtained from the plurality of matching character strings according to the method in step S003, and the coding is performed by obtaining the optimal matching character string from the plurality of matching character strings.

In the embodiment of the invention, the engineering cost data is encoded, and the specific process for obtaining the compressed data is as follows:

setting the size of a sliding window as N, and setting the size of a dictionary as 10 XN; sequentially preloading characters in engineering cost data into a sliding window according to the size of the sliding window to serve as characters to be compressed, wherein the dictionary is empty; firstly, encoding a first character in a sliding window, encoding the first character into the first character, and outputting the encoding;

after the sliding window and the dictionary are moved rightwards by one character length, matching the character to be compressed in the sliding window with the character in the dictionary, if no matching character string exists, encoding a first character in the sliding window, encoding the first character into a first character, and then moving the sliding window and the dictionary rightwards by one character length to perform next matching;

So far, the engineering cost data is encoded, and the compressed data is obtained.

S005, optimally storing the compressed data.

And converting the compressed data into binary data, storing the binary data in a server, and decoding the compressed data stored in the server by using an LZ77 algorithm when the engineering cost data is required to be checked or used, so as to obtain decompressed data.

The following specifically describes a specific scheme of the engineering cost data optimization storage system provided by the invention with reference to the accompanying drawings.

Referring now to FIG. 2, a block diagram of a project cost data optimized storage system according to one embodiment of the present invention is shown, the system comprising the following modules:

the data acquisition module is used for acquiring engineering cost data;

The above description is only of the preferred embodiments of the present invention and is not intended to limit the invention, but any modifications, equivalent substitutions, improvements, etc. within the principles of the present invention should be included in the scope of the present invention.

Claims

1. The method for optimally storing the engineering cost data is characterized by comprising the following steps:

collecting engineering cost data;

storing the compressed data to a server;

in the process of encoding engineering cost data by using LZ77 encoding, each matching character string under each matching is obtained, and the method comprises the following specific steps:

taking all the obtained matching character strings as each matching character string under the current matching;

according to the offset and the length of each matching character string under each matching, combining the longest length of the matching character string of the next pre-matching corresponding to each matching character string under each matching, and obtaining the preference of each matching character string under each matching, the method comprises the following specific steps:

；

2. The method for optimally storing engineering cost data according to claim 1, wherein the step of obtaining the size of the dictionary and the sliding window comprises the following specific steps:

3. The method for optimizing and storing engineering cost data according to claim 1, wherein the step of obtaining the offset and the length of each matching character string under each matching according to each matching character string under each matching comprises the following specific steps:

4. The method for optimizing and storing engineering cost data according to claim 1, wherein the step of obtaining the longest length of the matching character string of the next pre-match corresponding to each matching character string of each match according to each matching character string of each match comprises the following specific steps:

5. The method for optimizing and storing engineering cost data according to claim 1, wherein the step of obtaining the optimal matching character string for each matching according to the preference of each matching character string for each matching comprises the following specific steps:

6. The method for optimizing and storing construction cost data according to claim 1, wherein the step of compressing the construction cost data according to the size of the dictionary and the sliding window and the optimal matching character string under each matching to obtain compressed data comprises the following specific steps:

7. The method for optimally storing engineering cost data according to claim 1, wherein the storing the compressed data to the server comprises the following specific steps:

the compressed data is converted into binary numbers and stored in a server.

8. An engineering cost data optimized storage system, the system comprising:

the data acquisition module is used for acquiring engineering cost data;

；

wherein, p represents the preference of any matching character string under the current matching; l (L) ₁ Representing the length of any matching character string under the current matching; l (L) ₂ Representing the longest length of the matching character string of the next pre-matching corresponding to any matching character string under the current matching; d represents the offset of any matching character string under the current matching; d, d _max Representing the maximum offset of all matching strings under the current matching; exp () represents an exponential function based on a natural constant;