CN117156014B - Engineering cost data optimal storage method and system - Google Patents

Engineering cost data optimal storage method and system Download PDF

Info

Publication number
CN117156014B
CN117156014B CN202311217477.0A CN202311217477A CN117156014B CN 117156014 B CN117156014 B CN 117156014B CN 202311217477 A CN202311217477 A CN 202311217477A CN 117156014 B CN117156014 B CN 117156014B
Authority
CN
China
Prior art keywords
matching
character string
under
character
matched
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311217477.0A
Other languages
Chinese (zh)
Other versions
CN117156014A (en
Inventor
梁艳香
崔改孝
彭青山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Huachi Project Management Consulting Co ltd
Original Assignee
Zhejiang Huachi Project Management Consulting Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Huachi Project Management Consulting Co ltd filed Critical Zhejiang Huachi Project Management Consulting Co ltd
Priority to CN202311217477.0A priority Critical patent/CN117156014B/en
Publication of CN117156014A publication Critical patent/CN117156014A/en
Application granted granted Critical
Publication of CN117156014B publication Critical patent/CN117156014B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/04Protocols for data compression, e.g. ROHC
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Human Computer Interaction (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention relates to the technical field of data compression and storage, in particular to a method and a system for optimizing and storing engineering cost data, which comprise the following steps: collecting engineering cost data; acquiring the size of a dictionary and a sliding window; in the process of encoding engineering cost data by using LZ77 encoding, acquiring each matching character string under each matching and the offset and length of each matching character string; according to each matching character string under each matching, acquiring the longest length of the matching character string of the next pre-matching corresponding to each matching character string under each matching; obtaining the optimal matching character strings according to the preference of each matching character string under each matching; compressing engineering cost data according to the optimal matching character string to obtain compressed data; the invention improves the compression efficiency by storing the compressed data to the server.

Description

Engineering cost data optimal storage method and system
Technical Field
The invention relates to the technical field of data compression and storage, in particular to an engineering cost data optimal storage method and system.
Background
The engineering cost refers to the cost and the expense involved in construction, civil engineering or other engineering projects, covers the expense of each stage from the previous planning to the construction completion of the engineering projects, can effectively manage and control the engineering projects according to engineering cost data, and is beneficial to ensuring that the engineering projects can be completed on time, quality and cost, so that the acquired engineering cost data of the projects need to be compressed and stored, and can be decompressed when the acquired engineering cost data need to be checked.
The LZ77 compression algorithm is a lossless compression algorithm based on a dictionary and sliding window, constructing a dictionary with long character strings that occur frequently, and using shorter numerical codes instead of more complex character strings. When data is compressed, characters to be compressed in a sliding window are matched with characters in a dictionary to obtain a matched character string, the matched character string in the data to be compressed is marked according to the distance value between the first character of the matched character string in the dictionary and the first character of the matched character string in the sliding window and the length of the matched character string, so that data is compressed, when the characters to be compressed in the sliding window are matched with the characters in the dictionary, a plurality of matched character strings can exist, the existing method is to acquire the longest matched character string for marking and coding, and if the distance value between the first character of the longest matched character string in the dictionary and the first character of the longest matched character string in the sliding window is larger, the corresponding binary code is longer, and the occupied storage space is larger when a compression result is stored later.
Disclosure of Invention
In order to solve the problems, the invention provides an engineering cost data optimized storage method and system.
The invention relates to an engineering cost data optimized storage method which adopts the following technical scheme:
an embodiment of the present invention provides a method for optimally storing engineering cost data, comprising the steps of:
collecting engineering cost data;
acquiring the size of a dictionary and a sliding window; in the process of encoding engineering cost data by using LZ77 encoding, acquiring each matching character string under each matching; according to each matching character string under each matching, acquiring the offset and the length of each matching character string under each matching; according to each matching character string under each matching, acquiring the longest length of the matching character string of the next pre-matching corresponding to each matching character string under each matching; according to the offset and the length of each matching character string under each matching, combining the longest length of the matching character string, corresponding to each matching character string under each matching, of the next pre-matching, and acquiring the preference of each matching character string under each matching; acquiring the optimal matching character strings under each matching according to the preference of each matching character string under each matching;
compressing engineering cost data according to the sizes of the dictionary and the sliding window and the optimal matching character strings under each matching to obtain compressed data;
the compressed data is stored to a server.
Preferably, the step of obtaining the size of the dictionary and the sliding window includes the following specific steps:
the size of the preset sliding window is N, and the size of the preset dictionary is 10 XN.
Preferably, in the process of encoding the engineering cost data by using the LZ77 encoding, each matching character string under each matching is obtained, which comprises the following specific steps:
in the process of coding engineering cost data by using LZ77 coding, the process of matching the character to be compressed in the sliding window with the character in the dictionary at any time is recorded as the current matching, the first character in the sliding window under the current matching is obtained as a character string to be matched, when the character string identical to the character string to be matched exists in the dictionary, all the character strings identical to the character string to be matched in the dictionary are obtained, the character string closest to the character string to be matched is taken as the matching character string of the character string to be matched, and the character string to be matched is called the last character string to be matched; when the last character string to be matched has a matching character string, the first two characters in the sliding window under the current matching are obtained to be used as new character strings to be matched, when the character strings which are the same as the new character strings to be matched exist in the dictionary, all the character strings which are the same as the new character strings to be matched in the dictionary are obtained, the character string closest to the new character strings to be matched is used as the matching character string of the new character strings to be matched, and the new character strings to be matched are called as the last character strings to be matched; and analogizing is carried out until the last character string to be matched does not exist, and iteration is stopped;
and taking all the obtained matching character strings as each matching character string under the current matching.
Preferably, the step of obtaining the offset and the length of each matching string under each matching according to each matching string under each matching includes the following specific steps:
the method comprises the steps of obtaining the distance between the first character of each matching character string in a dictionary under each matching and the first character of the corresponding character string to be matched in a sliding window of each matching character string, taking the distance as the offset of each matching character string under each matching, and obtaining the number of characters of each matching character string under each matching, and taking the number of characters of each matching character string under each matching as the length of each matching character string under each matching.
Preferably, the obtaining the longest length of the matching string of the next pre-match corresponding to each matching string of each match according to each matching string of each match includes the following specific steps:
and for any matching character string under the current matching, adding a sliding window under the current matching and the dictionary to slide rightwards according to the length of the matching character string to obtain a sliding window under the next pre-matching and the dictionary, carrying out longest matching on the character to be compressed in the sliding window under the next pre-matching and the character in the dictionary, and taking the length of the obtained matching result as the longest length of the matching character string under the next pre-matching corresponding to the matching character string under the current matching.
Preferably, the obtaining the preference of each matching string under each matching according to the offset and the length of each matching string under each matching and combining the longest length of the matching string of the next pre-matching corresponding to each matching string under each matching includes the following specific steps:
,
wherein, p represents the preference of any matching character string under the current matching; l (L) 1 Representing the length of any matching character string under the current matching; l (L) 2 Representing the longest length of the matching character string of the next pre-matching corresponding to any matching character string under the current matching; d represents the offset of any matching character string under the current matching; d, d max Representing the maximum offset of all matching strings under the current matching; exp () represents an exponential function based on a natural constant.
Preferably, the obtaining the optimal matching character string under each matching according to the preference of each matching character string under each matching includes the following specific steps:
and obtaining the matching character string with the largest preference under each matching as the optimal matching character string under each matching.
Preferably, the method compresses the engineering cost data according to the sizes of the dictionary and the sliding window and the optimal matching character string under each matching to obtain compressed data, and comprises the following specific steps:
sequentially preloading characters in engineering cost data into a sliding window according to the size of the sliding window to serve as characters to be compressed, wherein the dictionary is empty; firstly, encoding a first character in a sliding window, encoding the first character into the first character, and outputting the encoding;
after the sliding window and the dictionary are moved rightwards by one character length, matching the character to be compressed in the sliding window with the character in the dictionary, if no matching character string exists, encoding a first character in the sliding window, encoding the first character into the first character, outputting the encoding, and moving the sliding window and the dictionary rightwards by one character length for next matching;
if a plurality of matching character strings exist, acquiring an optimal matching character string according to the preference degree of each matching character string, taking the offset of the optimal matching character string, the length of the optimal matching character string and the next character of the character string to be matched corresponding to the optimal matching character string in the sliding window as the code of the optimal matching character string, outputting the code, and moving the sliding window and the dictionary to the right according to the length of the optimal matching character string to perform the next matching code;
and the like, stopping iteration until the sliding window is empty, and forming a coding sequence by all the output codes according to the output sequence to serve as compressed data of engineering cost data.
Preferably, the storing the compressed data in the server includes the following specific steps:
the compressed data is converted into binary numbers and stored in a server.
The embodiment of the invention provides an engineering cost data optimization storage system, which comprises the following modules:
the data acquisition module is used for acquiring engineering cost data;
the preference obtaining module is used for obtaining the size of the dictionary and the sliding window; in the process of encoding engineering cost data by using LZ77 encoding, acquiring each matching character string under each matching; according to each matching character string under each matching, acquiring the offset and the length of each matching character string under each matching; according to each matching character string under each matching, acquiring the longest length of the matching character string of the next pre-matching corresponding to each matching character string under each matching; according to the offset and the length of each matching character string under each matching, combining the longest length of the matching character string, corresponding to each matching character string under each matching, of the next pre-matching, and acquiring the preference of each matching character string under each matching; acquiring the optimal matching character strings under each matching according to the preference of each matching character string under each matching;
the project cost data compression module compresses project cost data according to the sizes of the dictionary and the sliding window and the optimal matching character strings under each matching to obtain compressed data;
and the compressed data storage management module is used for storing the compressed data to the server.
The technical scheme of the invention has the beneficial effects that: in the invention, each matching character string under each matching is obtained in the process of encoding engineering cost data by using LZ77 encoding; according to each matching character string under each matching, acquiring the offset and the length of each matching character string under each matching; according to each matching character string under each matching, acquiring the longest length of the matching character string of the next pre-matching corresponding to each matching character string under each matching; according to the offset and the length of each matching character string under each matching and the longest length of the matching character string of the next pre-matching corresponding to each matching character string under each matching, acquiring the preference of each matching character string under each matching; acquiring the optimal matching character strings under each matching according to the preference of each matching character string under each matching; compressing engineering cost data according to the sizes of the dictionary and the sliding window and the optimal matching character strings under each matching to obtain compressed data; according to the optimization degree of each matching character string under each matching, the optimal matching character string under each matching is obtained for coding, so that the compression efficiency of the obtained optimal matching character string is not affected under the condition of smaller offset, and the situation that the binary number corresponding to the offset is stored in the follow-up process due to the fact that the offset is larger when the longest matching character string is directly obtained for coding is avoided.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of steps of an optimized storage method for project cost data according to the present invention;
FIG. 2 is a system block diagram of an engineering cost data optimized storage system of the present invention;
FIG. 3 is a schematic diagram of matching of a sliding window and a dictionary.
Detailed Description
In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following is a detailed description of the specific implementation, structure, characteristics and effects of an engineering cost data optimizing storage method according to the invention with reference to the attached drawings and the preferred embodiment. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the engineering cost data optimizing and storing method provided by the invention with reference to the attached drawings.
Referring to fig. 1, a flowchart of steps in an optimized storage method for engineering cost data according to an embodiment of the present invention is shown, where the method includes the following steps:
s001, collecting engineering cost data.
The cost of engineering projects in the design stage is collected, including building design, structural design, drainage design cost and the like, and the cost of engineering projects in the construction stage is collected, including labor cost, material collection cost and the like, and the collected cost of engineering projects is recorded as engineering cost data.
S002, acquiring a dictionary and the size of a sliding window.
Note that the LZ77 compression algorithm is a lossless compression algorithm based on a dictionary and a sliding window, constructs a dictionary from the encoded characters, and uses a shorter number code instead of a more complex character string. When data is compressed, the characters to be compressed in the sliding window are matched with the coded characters in the dictionary to obtain a matched character string, the matched character string in the data to be compressed is marked according to the distance value of the first character of the matched character string in the dictionary and the length of the matched character string, so that data is compressed, and because the LZ77 compression algorithm needs to specify the sizes of the dictionary and the sliding window before coding, the larger dictionary can usually accommodate more coded characters, repeated character strings in engineering cost data can be better utilized, when the characters in the dictionary and the data to be coded in the sliding window are matched, more and longer matched character strings can be obtained, therefore, when the size of the dictionary is set, the size of the dictionary is larger than the size of the sliding window, when the data to be coded in the sliding window and the coded characters in the dictionary are matched, the length of the obtained matched character string is as long as possible, and therefore in the embodiment, the size of the sliding window is set according to the value, and more characters are accommodated according to the set size of the sliding window.
In the embodiment of the present invention, the size of the sliding window is set to N according to the empirical value, the size of the dictionary is set to 10×n, in the embodiment of the present invention, the size of the sliding window is set to n=10, and in other embodiments, the operator can set the value of N according to the specific implementation situation.
The size of the dictionary and the size of the sliding window are obtained, and the construction cost data can be conveniently compressed according to the size of the dictionary and the size of the sliding window.
S003, acquiring the preference of each matching character string under the current matching, and acquiring the optimal matching character string under the current matching according to the preference of each matching character string under the current matching to encode.
It should be noted that, after setting a larger dictionary, in the process of compressing engineering cost data by using LZ77 coding, when matching a character to be compressed in a sliding window with a character in the dictionary, there may be a plurality of matching character strings, and the existing method is to use the longest matching character string to perform coding, but if the distance value between the first character of the longest matching character string in the dictionary and the first character of the longest matching character string in the sliding window is larger, that is, when the offset is larger, the binary code corresponding to the offset is longer when the compression result is stored subsequently, the compression efficiency is reduced, and further, because the longer matching character string can eliminate more repeated information in the character to be compressed, thereby improving the compression efficiency, if the offset of any matching character string under the current matching is smaller, and when the length is larger, the matching character string should be selected for coding at the current matching.
It should be further noted that, in the LZ77 encoding process, if the current matching is completed, the sliding window and the dictionary slide rightward, and then the next matching is performed on the character to be compressed in the sliding window and the character in the dictionary, so that the length of the matching character string obtained by the next matching is affected by the matching character string obtained by the current matching, if the length of the longest matching character string obtained by the next matching becomes smaller due to the optimal matching character string obtained by the current matching, more repeated information in the character to be compressed cannot be eliminated by the longest matching character string obtained by the next matching, and the compression efficiency of the next matching is reduced, so that if the preference of each matching character string is obtained only according to the offset and the length of each matching character string under the current matching, the length of the longest matching character string obtained by the next pre-matching corresponding to each matching character string under the current matching needs to be combined, and if the offset of any one matching character string under the current matching is smaller and the length of the longest matching character string obtained by the next matching is larger, and the preferred matching character string obtained by the next matching is higher.
In the embodiment of the invention, in the process of encoding engineering cost data by LZ77 encoding, the process of matching the character to be compressed in the sliding window with the character in the dictionary at any time is recorded as the current matching, and each matching character string under the current matching and the offset and length of each matching character string are obtained: acquiring a first character in a sliding window under the current matching as a character string to be matched, and acquiring a character string closest to the character string to be matched from all character strings which are the same as the character string to be matched in the dictionary when the character string which is the same as the character string to be matched exists in the dictionary, wherein the character string closest to the character string to be matched is used as the matching character string of the character string to be matched, and the character string to be matched is called as the last character string to be matched; when the last character string to be matched has a matching character string, the first two characters in the sliding window under the current matching are obtained to be used as new character strings to be matched, when the character strings which are the same as the new character strings to be matched exist in the dictionary, all the character strings which are the same as the new character strings to be matched in the dictionary are obtained, the character string closest to the new character strings to be matched is used as the matching character string of the new character strings to be matched, and the new character strings to be matched are called as the last character strings to be matched; and analogizing is carried out until the last character string to be matched does not exist, and iteration is stopped;
and taking all the obtained matching character strings as each matching character string under the current matching.
The method comprises the steps of obtaining the distance between the first character of each matching character string under current matching in a dictionary and the first character of each matching character string to be matched corresponding to each matching character string in a sliding window, taking the distance as the offset of each matching character string under current matching, and obtaining the number of characters of each matching character string under current matching, and taking the number of characters of each matching character string under current matching as the length of each matching character string under current matching.
For example: referring to fig. 3, the characters in the dictionary are: the characters to be matched in the sliding window are ddaabacldbb: aabca;
the first character of the sliding window is obtained to be a and is matched with the characters in the dictionary, and the matching result is that: a third character, a fourth character and a seventh character in the dictionary, wherein the seventh character in the dictionary is closest to the first character a of the sliding serial port, and the seventh character in the dictionary is used as a matching character string; the offset of the matching character string is 3, and the length is 1;
the first two characters in the sliding window are obtained to be aa and matched with the characters in the dictionary, and the matching result is that: the character string spliced by the third character and the fourth character in the dictionary is used as a matching character string; the offset of the matching character string is 6, and the length is 2; and so on, stopping iteration until the obtained character strings in the sliding window do not have the matching character strings.
Obtaining the longest length of the matching character string of the next pre-matching corresponding to each matching character string under the current matching: any matching character string under the current matching is obtained, a sliding window and a dictionary under the current matching are added with forward sliding according to the length of the matching character string, the next pre-matching sliding window and the dictionary are obtained, the characters to be compressed in the next pre-matching sliding window are matched with the characters in the dictionary to the longest, and the length of the obtained matching result is used as the longest length of the next pre-matching character string corresponding to the matching character string under the current matching.
In the embodiment of the invention, the preference degree of each matching character string under the current matching is obtained:
,
wherein, p represents the preference of any matching character string under the current matching; l (L) 1 Representing the length of any matching character string under the current matching; l (L) 2 Representing the longest length of the matching character string of the next pre-matching corresponding to any matching character string under the current matching; d represents the offset of any matching character string under the current matching; d, d max Representing the maximum offset of all matching strings under the current matching; exp () represents an exponential function based on a natural constant; when the offset of any matching character string under the current matching is smaller and the length is larger, and the length of the longest matching character string obtained by the next pre-matching corresponding to the matching character string is larger, the higher the preference of the matching character string under the current matching is, the more the matching character string under the current matching should be selected as the optimal matching character string under the current matching.
And obtaining the matching character string with the highest preference under the current matching as the optimal matching character string under the current matching.
Therefore, according to the offset and the length of each matching character string under the current matching and the length of the longest matching character string obtained by the next pre-matching corresponding to each matching character string under the current matching, the preference of each matching character string under the current matching is obtained, and according to the preference of each matching character string under the current matching, the optimal matching character string under the current matching is obtained for encoding, so that the compression efficiency of the obtained optimal matching character string is not affected under the condition that the offset is smaller, and the problem that the offset possibly existing when the longest matching character string is directly obtained for encoding is larger, so that the subsequent binary number corresponding to the offset is stored occupies larger storage space is avoided.
S004, compressing engineering cost data by using LZ77 codes to obtain compressed data.
In the process of coding the engineering cost data by using the LZ77 coding algorithm, when matching the characters in the sliding window with the characters in the dictionary, the existing method is to use the longest matching character string to perform the marker coding, but the offset corresponding to the longest matching character string may be larger, so that when the binary number corresponding to the offset is stored subsequently, a larger storage space is occupied, therefore, if a plurality of matching character strings exist when the characters in the sliding window are matched with the characters in the dictionary, the optimal matching character string is obtained from the plurality of matching character strings according to the method in step S003, and the coding is performed by obtaining the optimal matching character string from the plurality of matching character strings.
In the embodiment of the invention, the engineering cost data is encoded, and the specific process for obtaining the compressed data is as follows:
setting the size of a sliding window as N, and setting the size of a dictionary as 10 XN; sequentially preloading characters in engineering cost data into a sliding window according to the size of the sliding window to serve as characters to be compressed, wherein the dictionary is empty; firstly, encoding a first character in a sliding window, encoding the first character into the first character, and outputting the encoding;
after the sliding window and the dictionary are moved rightwards by one character length, matching the character to be compressed in the sliding window with the character in the dictionary, if no matching character string exists, encoding a first character in the sliding window, encoding the first character into a first character, and then moving the sliding window and the dictionary rightwards by one character length to perform next matching;
if a plurality of matching character strings exist, acquiring an optimal matching character string according to the preference degree of each matching character string, taking the offset of the optimal matching character string, the length of the optimal matching character string and the next character of the character string to be matched corresponding to the optimal matching character string in the sliding window as the code of the optimal matching character string, outputting the code, and moving the sliding window and the dictionary to the right according to the length of the optimal matching character string to perform the next matching code;
and the like, stopping iteration until the sliding window is empty, and forming a coding sequence by all the output codes according to the output sequence to serve as compressed data of engineering cost data.
So far, the engineering cost data is encoded, and the compressed data is obtained.
S005, optimally storing the compressed data.
And converting the compressed data into binary data, storing the binary data in a server, and decoding the compressed data stored in the server by using an LZ77 algorithm when the engineering cost data is required to be checked or used, so as to obtain decompressed data.
The following specifically describes a specific scheme of the engineering cost data optimization storage system provided by the invention with reference to the accompanying drawings.
Referring now to FIG. 2, a block diagram of a project cost data optimized storage system according to one embodiment of the present invention is shown, the system comprising the following modules:
the data acquisition module is used for acquiring engineering cost data;
the preference obtaining module is used for obtaining the size of the dictionary and the sliding window; in the process of encoding engineering cost data by using LZ77 encoding, acquiring each matching character string under each matching; according to each matching character string under each matching, acquiring the offset and the length of each matching character string under each matching; according to each matching character string under each matching, acquiring the longest length of the matching character string of the next pre-matching corresponding to each matching character string under each matching; according to the offset and the length of each matching character string under each matching, combining the longest length of the matching character string, corresponding to each matching character string under each matching, of the next pre-matching, and acquiring the preference of each matching character string under each matching; acquiring the optimal matching character strings under each matching according to the preference of each matching character string under each matching;
the project cost data compression module compresses project cost data according to the sizes of the dictionary and the sliding window and the optimal matching character strings under each matching to obtain compressed data;
and the compressed data storage management module is used for storing the compressed data to the server.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the invention, but any modifications, equivalent substitutions, improvements, etc. within the principles of the present invention should be included in the scope of the present invention.

Claims (8)

1. The method for optimally storing the engineering cost data is characterized by comprising the following steps:
collecting engineering cost data;
acquiring the size of a dictionary and a sliding window; in the process of encoding engineering cost data by using LZ77 encoding, acquiring each matching character string under each matching; according to each matching character string under each matching, acquiring the offset and the length of each matching character string under each matching; according to each matching character string under each matching, acquiring the longest length of the matching character string of the next pre-matching corresponding to each matching character string under each matching; according to the offset and the length of each matching character string under each matching, combining the longest length of the matching character string, corresponding to each matching character string under each matching, of the next pre-matching, and acquiring the preference of each matching character string under each matching; acquiring the optimal matching character strings under each matching according to the preference of each matching character string under each matching;
compressing engineering cost data according to the sizes of the dictionary and the sliding window and the optimal matching character strings under each matching to obtain compressed data;
storing the compressed data to a server;
in the process of encoding engineering cost data by using LZ77 encoding, each matching character string under each matching is obtained, and the method comprises the following specific steps:
in the process of coding engineering cost data by using LZ77 coding, the process of matching the character to be compressed in the sliding window with the character in the dictionary at any time is recorded as the current matching, the first character in the sliding window under the current matching is obtained as a character string to be matched, when the character string identical to the character string to be matched exists in the dictionary, all the character strings identical to the character string to be matched in the dictionary are obtained, the character string closest to the character string to be matched is taken as the matching character string of the character string to be matched, and the character string to be matched is called the last character string to be matched; when the last character string to be matched has a matching character string, the first two characters in the sliding window under the current matching are obtained to be used as new character strings to be matched, when the character strings which are the same as the new character strings to be matched exist in the dictionary, all the character strings which are the same as the new character strings to be matched in the dictionary are obtained, the character string closest to the new character strings to be matched is used as the matching character string of the new character strings to be matched, and the new character strings to be matched are called as the last character strings to be matched; and analogizing is carried out until the last character string to be matched does not exist, and iteration is stopped;
taking all the obtained matching character strings as each matching character string under the current matching;
according to the offset and the length of each matching character string under each matching, combining the longest length of the matching character string of the next pre-matching corresponding to each matching character string under each matching, and obtaining the preference of each matching character string under each matching, the method comprises the following specific steps:
wherein, p represents the preference of any matching character string under the current matching; l (L) 1 Representing the length of any matching character string under the current matching; l (L) 2 Representing the longest length of the matching character string of the next pre-matching corresponding to any matching character string under the current matching; d represents the offset of any matching character string under the current matching; d, d max Representing the maximum offset of all matching strings under the current matching; exp () represents an exponential function based on a natural constant.
2. The method for optimally storing engineering cost data according to claim 1, wherein the step of obtaining the size of the dictionary and the sliding window comprises the following specific steps:
the size of the preset sliding window is N, and the size of the preset dictionary is 10 XN.
3. The method for optimizing and storing engineering cost data according to claim 1, wherein the step of obtaining the offset and the length of each matching character string under each matching according to each matching character string under each matching comprises the following specific steps:
the method comprises the steps of obtaining the distance between the first character of each matching character string in a dictionary under each matching and the first character of the corresponding character string to be matched in a sliding window of each matching character string, taking the distance as the offset of each matching character string under each matching, and obtaining the number of characters of each matching character string under each matching, and taking the number of characters of each matching character string under each matching as the length of each matching character string under each matching.
4. The method for optimizing and storing engineering cost data according to claim 1, wherein the step of obtaining the longest length of the matching character string of the next pre-match corresponding to each matching character string of each match according to each matching character string of each match comprises the following specific steps:
and for any matching character string under the current matching, adding a sliding window under the current matching and the dictionary to slide rightwards according to the length of the matching character string to obtain a sliding window under the next pre-matching and the dictionary, carrying out longest matching on the character to be compressed in the sliding window under the next pre-matching and the character in the dictionary, and taking the length of the obtained matching result as the longest length of the matching character string under the next pre-matching corresponding to the matching character string under the current matching.
5. The method for optimizing and storing engineering cost data according to claim 1, wherein the step of obtaining the optimal matching character string for each matching according to the preference of each matching character string for each matching comprises the following specific steps:
and obtaining the matching character string with the largest preference under each matching as the optimal matching character string under each matching.
6. The method for optimizing and storing construction cost data according to claim 1, wherein the step of compressing the construction cost data according to the size of the dictionary and the sliding window and the optimal matching character string under each matching to obtain compressed data comprises the following specific steps:
sequentially preloading characters in engineering cost data into a sliding window according to the size of the sliding window to serve as characters to be compressed, wherein the dictionary is empty; firstly, encoding a first character in a sliding window, encoding the first character into the first character, and outputting the encoding;
after the sliding window and the dictionary are moved rightwards by one character length, matching the character to be compressed in the sliding window with the character in the dictionary, if no matching character string exists, encoding a first character in the sliding window, encoding the first character into the first character, outputting the encoding, and moving the sliding window and the dictionary rightwards by one character length for next matching;
if a plurality of matching character strings exist, acquiring an optimal matching character string according to the preference degree of each matching character string, taking the offset of the optimal matching character string, the length of the optimal matching character string and the next character of the character string to be matched corresponding to the optimal matching character string in the sliding window as the code of the optimal matching character string, outputting the code, and moving the sliding window and the dictionary to the right according to the length of the optimal matching character string to perform the next matching code;
and the like, stopping iteration until the sliding window is empty, and forming a coding sequence by all the output codes according to the output sequence to serve as compressed data of engineering cost data.
7. The method for optimally storing engineering cost data according to claim 1, wherein the storing the compressed data to the server comprises the following specific steps:
the compressed data is converted into binary numbers and stored in a server.
8. An engineering cost data optimized storage system, the system comprising:
the data acquisition module is used for acquiring engineering cost data;
the preference obtaining module is used for obtaining the size of the dictionary and the sliding window; in the process of encoding engineering cost data by using LZ77 encoding, acquiring each matching character string under each matching; according to each matching character string under each matching, acquiring the offset and the length of each matching character string under each matching; according to each matching character string under each matching, acquiring the longest length of the matching character string of the next pre-matching corresponding to each matching character string under each matching; according to the offset and the length of each matching character string under each matching, combining the longest length of the matching character string, corresponding to each matching character string under each matching, of the next pre-matching, and acquiring the preference of each matching character string under each matching; acquiring the optimal matching character strings under each matching according to the preference of each matching character string under each matching;
in the process of encoding engineering cost data by using LZ77 encoding, each matching character string under each matching is obtained, and the method comprises the following specific steps:
in the process of coding engineering cost data by using LZ77 coding, the process of matching the character to be compressed in the sliding window with the character in the dictionary at any time is recorded as the current matching, the first character in the sliding window under the current matching is obtained as a character string to be matched, when the character string identical to the character string to be matched exists in the dictionary, all the character strings identical to the character string to be matched in the dictionary are obtained, the character string closest to the character string to be matched is taken as the matching character string of the character string to be matched, and the character string to be matched is called the last character string to be matched; when the last character string to be matched has a matching character string, the first two characters in the sliding window under the current matching are obtained to be used as new character strings to be matched, when the character strings which are the same as the new character strings to be matched exist in the dictionary, all the character strings which are the same as the new character strings to be matched in the dictionary are obtained, the character string closest to the new character strings to be matched is used as the matching character string of the new character strings to be matched, and the new character strings to be matched are called as the last character strings to be matched; and analogizing is carried out until the last character string to be matched does not exist, and iteration is stopped;
taking all the obtained matching character strings as each matching character string under the current matching;
according to the offset and the length of each matching character string under each matching, combining the longest length of the matching character string of the next pre-matching corresponding to each matching character string under each matching, and obtaining the preference of each matching character string under each matching, the method comprises the following specific steps:
wherein, p represents the preference of any matching character string under the current matching; l (L) 1 Representing the length of any matching character string under the current matching; l (L) 2 Representing the longest length of the matching character string of the next pre-matching corresponding to any matching character string under the current matching; d represents the offset of any matching character string under the current matching; d, d max Representing the maximum offset of all matching strings under the current matching; exp () represents an exponential function based on a natural constant;
the project cost data compression module compresses project cost data according to the sizes of the dictionary and the sliding window and the optimal matching character strings under each matching to obtain compressed data;
and the compressed data storage management module is used for storing the compressed data to the server.
CN202311217477.0A 2023-09-20 2023-09-20 Engineering cost data optimal storage method and system Active CN117156014B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311217477.0A CN117156014B (en) 2023-09-20 2023-09-20 Engineering cost data optimal storage method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311217477.0A CN117156014B (en) 2023-09-20 2023-09-20 Engineering cost data optimal storage method and system

Publications (2)

Publication Number Publication Date
CN117156014A CN117156014A (en) 2023-12-01
CN117156014B true CN117156014B (en) 2024-03-12

Family

ID=88884176

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311217477.0A Active CN117156014B (en) 2023-09-20 2023-09-20 Engineering cost data optimal storage method and system

Country Status (1)

Country Link
CN (1) CN117156014B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106656198A (en) * 2016-11-23 2017-05-10 郑州云海信息技术有限公司 LZ77-based coding method
CN107623855A (en) * 2016-07-13 2018-01-23 谭心瑶 A kind of embedded rate steganography device of height based on compressed encoding and steganography method
CN116388767A (en) * 2023-04-11 2023-07-04 河南大学 Security management method for software development data

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5895545B2 (en) * 2012-01-17 2016-03-30 富士通株式会社 Program, compressed file generation method, compression code expansion method, information processing apparatus, and recording medium
US20190377804A1 (en) * 2018-06-06 2019-12-12 Yingquan Wu Data compression algorithm

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107623855A (en) * 2016-07-13 2018-01-23 谭心瑶 A kind of embedded rate steganography device of height based on compressed encoding and steganography method
CN106656198A (en) * 2016-11-23 2017-05-10 郑州云海信息技术有限公司 LZ77-based coding method
CN116388767A (en) * 2023-04-11 2023-07-04 河南大学 Security management method for software development data

Also Published As

Publication number Publication date
CN117156014A (en) 2023-12-01

Similar Documents

Publication Publication Date Title
CN103814396B (en) The method and apparatus of coding/decoding bit stream
CN116681036B (en) Industrial data storage method based on digital twinning
CN1183683C (en) Position adaptive coding method using prefix prediction
CN107483059B (en) Multi-channel data coding and decoding method and device based on dynamic Huffman tree
CN101783788A (en) File compression method, file compression device, file decompression method, file decompression device, compressed file searching method and compressed file searching device
CN106407285A (en) RLE and LZW-based optimized bit file compression and decompression method
CN116016606B (en) Sewage treatment operation and maintenance data efficient management system based on intelligent cloud
CN116521093A (en) Smart community face data storage method and system
CN116610265B (en) Data storage method of business information consultation system
CN116051156B (en) New energy dynamic electricity price data management system based on digital twin
JP2003524983A (en) Method and apparatus for optimized lossless compression using multiple coders
US6055273A (en) Data encoding and decoding method and device of a multiple-valued information source
CN117156014B (en) Engineering cost data optimal storage method and system
CN116915873B (en) High-speed elevator operation data rapid transmission method based on Internet of things technology
KR101023536B1 (en) Lossless data compression method
CN116471337A (en) Message compression and decompression method and device based on BWT and LZW
CN110021368A (en) Comparison type gene sequencing data compression method, system and computer-readable medium
CN116109714A (en) Data coding storage method and system based on neural network
CN115567058A (en) Time sequence data lossy compression method combining prediction and coding
CN1364341A (en) Arithmetic decoding of arithmeticlaly encoded information signal
CN117118456B (en) Magnetic control switch control data processing method based on depth fusion
CN114429200A (en) Standardized Huffman coding and decoding method and neural network computing chip
CN117119120B (en) Cooperative control method based on multiple unmanned mine cars
CN111274950A (en) Feature vector data encoding and decoding method, server and terminal
CN117560016B (en) College recruitment information management method based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant