CN113726341A - Data processing method and device, electronic equipment and storage medium - Google Patents

Data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113726341A
CN113726341A CN202110979093.7A CN202110979093A CN113726341A CN 113726341 A CN113726341 A CN 113726341A CN 202110979093 A CN202110979093 A CN 202110979093A CN 113726341 A CN113726341 A CN 113726341A
Authority
CN
China
Prior art keywords
data
compressed
block
length
compression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110979093.7A
Other languages
Chinese (zh)
Other versions
CN113726341B (en
Inventor
张锦涛
姜伟浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN202110979093.7A priority Critical patent/CN113726341B/en
Publication of CN113726341A publication Critical patent/CN113726341A/en
Application granted granted Critical
Publication of CN113726341B publication Critical patent/CN113726341B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3068Precoding preceding compression, e.g. Burrows-Wheeler transformation
    • H03M7/3071Prediction
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the invention provides a data processing method, a data processing device, electronic equipment and a storage medium, which can acquire data to be compressed with preset data length to be compressed; compressing data to be compressed to obtain compressed data, and recording the data length of the compressed data; calculating the ratio of the data length of the compressed data to the data length of the preset data to obtain the block occupancy rate; when the block occupancy rate meets the condition of the compression block occupancy rate, subtracting the data length of the compressed data from the preset block data length to obtain a filling data length; acquiring a content to be filled with a filling data length, and integrating the compressed data with the content to be filled to obtain block data with a preset block data length; the block data is stored in a data block of the storage medium. The data storage method and the data storage device can achieve the purpose that the block data with the preset block data length is obtained for the compressed data, the block data with the consistent data length is stored, the space of the block can be fully utilized, and the utilization rate of the storage space is improved.

Description

Data processing method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer application technologies, and in particular, to a processing method, an apparatus, an electronic device, and a storage medium.
Background
In the data storage process, in order to reduce the occupation of the data redundancy on the storage space, the data is generally compressed first, and the compressed data is stored. Data compression is an important content for reducing the occupied storage space in the data storage process.
Disclosure of Invention
Embodiments of the present invention provide a data processing method, an apparatus, an electronic device, and a storage medium, so as to improve utilization rate of a storage space. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a data processing method, including:
acquiring data to be compressed with preset data length to be compressed;
compressing the data to be compressed to obtain compressed data, and recording the data length of the compressed data;
calculating the ratio of the data length of the compressed data to the data length of preset data to obtain the block occupancy rate;
when the block occupancy rate meets the condition of the compression block occupancy rate, subtracting the data length of the compressed data from the preset block data length to obtain a filling data length;
acquiring the content to be filled with the length of the filling data, and integrating the compressed data with the content to be filled to obtain block data with the length of the preset block data;
and storing the block data into a data block of a storage medium.
Optionally, the obtaining of the to-be-compressed data with the preset to-be-compressed data length includes:
acquiring original data;
adding the original data to a buffer area to be compressed;
when the data length in the buffer area to be compressed is larger than or equal to a compression threshold, acquiring the recorded compression ratio, and calculating to obtain the length of the preset data to be compressed according to the recorded compression ratio through a compression length prediction algorithm;
and acquiring the data with the preset length of the data to be compressed from the data in the buffer area to be compressed to obtain the data to be compressed.
Optionally, after the ratio of the data length of the compressed data to the preset block data length is calculated to obtain the block occupancy rate, the method further includes:
judging whether the block occupancy rate falls within a preset compression block occupancy rate range or not;
and if so, determining that the block occupancy rate meets the condition of compressing the block occupancy rate.
Optionally, the occupancy rate range of the compressed block includes an occupancy rate upper limit value and an occupancy rate lower limit value;
after determining whether the block occupancy falls within a compressed block occupancy range, the method further comprises:
if the occupancy rate is greater than or equal to the occupancy rate upper limit value, obtaining a new preset data length to be compressed through a compression length reduction algorithm, and returning to the step of obtaining the data with the preset data length to be compressed from the data in the buffer area to be compressed;
if the occupancy rate is smaller than the occupancy rate lower limit value, judging whether the current compression attempt times are larger than or equal to the maximum compression attempt times;
when the occupancy rate is smaller than the occupancy rate lower limit value and the current compression attempt times are larger than or equal to the maximum compression attempt times, determining that the block occupancy rate meets the compression block occupancy rate condition;
and when the occupancy rate is smaller than the occupancy rate lower limit value and the current compression attempt times are smaller than the maximum compression attempt times, obtaining a new preset data length to be compressed through a compression length increasing algorithm, and returning to the step of obtaining the data with the preset data length to be compressed from the data in the buffer area to be compressed.
Optionally, after the data to be compressed is compressed to obtain compressed data, the method further includes:
and increasing the number of compression attempts by 1, and recording a compression ratio corresponding to the current compression, wherein the compression ratio is the ratio of the data length of the compressed data to the data length of the data to be compressed.
Optionally, the calculating the preset length of the data to be compressed according to the compression ratio by using a compression length prediction algorithm includes:
according to the compression ratio, by formula
Figure BDA0003228319730000031
Calculating the length of preset data to be compressed;
wherein L ispredictFor presetting the length of data to be compressed, RlastFor said compression ratio, UminIs the lower limit of occupancy, LblockIs the preset block data length.
Optionally, the obtaining of the new preset data length to be compressed by the compression length reduction algorithm includes:
by the formula Lpredict1=98.5%*LpredictObtaining a new preset data length to be compressed;
obtaining a new preset data length to be compressed by a compression length ascending algorithm, comprising the following steps:
by the formula Lpredict1=101.5%*LpredictObtaining a new preset data length to be compressed;
wherein L ispredict1For a new preset length of data to be compressed, LpredictThe length of the data to be compressed is preset before updating.
Optionally, after storing the block data in a data block of a storage medium, the method further includes:
when a data access request is received, acquiring data mapping information, wherein the data mapping information is used for indicating the position of the block data;
searching the block data from the cache based on the data mapping information;
if the block data exists in the cache, analyzing the compressed data from the block data, and decompressing the compressed data to obtain decompressed data;
and if the block data does not exist in the cache, acquiring the block data from a storage medium according to the data mapping information, and storing the block data into the cache.
In a second aspect, an embodiment of the present invention provides a data processing apparatus, including:
the first acquisition module is used for acquiring data to be compressed with preset data length to be compressed;
the compression module is used for compressing the data to be compressed to obtain compressed data;
the first recording module is used for recording the data length of the compressed data;
the calculating module is used for calculating the ratio of the data length of the compressed data to the data length of the preset data to obtain the block occupancy rate;
the obtaining module is used for subtracting the data length of the compressed data from the preset data length to obtain the filling data length when the block occupancy rate meets the condition of the compression block occupancy rate;
the second acquisition module is used for acquiring the content to be filled with the filling data length;
the integration module is used for integrating the compressed data and the content to be filled to obtain block data with the preset block data length;
and the storage module is used for storing the block data into a data block of a storage medium.
Optionally, the first obtaining module is specifically configured to obtain original data; adding the original data to a buffer area to be compressed; when the data length in the buffer area to be compressed is larger than or equal to a compression threshold, acquiring the recorded compression ratio, and calculating to obtain the length of the preset data to be compressed according to the recorded compression ratio through a compression length prediction algorithm; and acquiring the data with the preset length of the data to be compressed from the data in the buffer area to be compressed to obtain the data to be compressed.
Optionally, the apparatus further comprises:
the judging module is used for judging whether the block occupancy rate falls within a preset compression block occupancy rate range or not after the block occupancy rate is obtained by calculating the ratio of the data length of the compressed data to the preset block data length; and if so, determining that the block occupancy rate meets the condition of compressing the block occupancy rate.
Optionally, the occupancy rate range of the compressed block includes an occupancy rate upper limit value and an occupancy rate lower limit value;
the judging module is specifically configured to, if the occupancy rate is greater than or equal to the occupancy rate upper limit value, obtain a new preset data length to be compressed through a compression length reduction algorithm, and return the new preset data length to the obtaining module; if the occupancy rate is smaller than the occupancy rate lower limit value, judging whether the current compression attempt times are larger than or equal to the maximum compression attempt times; when the occupancy rate is smaller than the occupancy rate lower limit value and the current compression attempt times are larger than or equal to the maximum compression attempt times, determining that the block occupancy rate meets the compression block occupancy rate condition; and when the occupancy rate is smaller than the occupancy rate lower limit value and the current compression attempt times are smaller than the maximum compression attempt times, obtaining a new preset data length to be compressed through a compression length increasing algorithm, and returning to the acquisition module.
Optionally, the apparatus further comprises:
an increasing module, configured to increase, by 1, the number of compression attempts after the data to be compressed is compressed to obtain compressed data;
and the second recording module is used for recording a compression ratio corresponding to the current compression, wherein the compression ratio is the ratio of the data length of the compressed data to the data length of the data to be compressed.
Optionally, the calculating module is specifically configured to determine the compression ratio according to a formula
Figure BDA0003228319730000051
Figure BDA0003228319730000052
Calculating the length of preset data to be compressed;
wherein L ispredictFor presetting the length of data to be compressed, RlastFor said compression ratio, UminIs the lower limit of occupancy, LblockIs a preset blockThe length of the data.
Optionally, the determining module is specifically configured to determine the current value according to a formula Lpredict1=98.5%*LpredictObtaining a new preset data length to be compressed; by the formula Lpredict1=101.5%*LpredictObtaining a new preset data length to be compressed;
wherein L ispredict1For a new preset length of data to be compressed, LpredictThe length of the data to be compressed is preset before updating.
Optionally, the apparatus further comprises:
a third obtaining module, configured to obtain data mapping information when a data access request is received after the storing of the block data into a data block of a storage medium, where the data mapping information is used to indicate a location of the block data;
the searching module is used for searching the block data from the cache based on the data mapping information;
the decompression module is used for analyzing compressed data from the block data and decompressing the compressed data to obtain decompressed data if the block data exists in the cache;
and the cache module is used for acquiring the block data from a storage medium according to the data mapping information and storing the block data into the cache if the block data does not exist in the cache.
In a third aspect, an embodiment of the present invention provides a data processing device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of the first aspect when executing the program stored in the memory.
An embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the method steps in the first aspect.
Embodiments of the present invention also provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method steps of the first aspect.
The embodiment of the invention has the following beneficial effects:
the data processing method, the data processing device, the electronic equipment and the storage medium provided by the embodiment of the invention can acquire the data to be compressed with the preset length of the data to be compressed; compressing data to be compressed to obtain compressed data, and recording the data length of the compressed data; calculating the ratio of the data length of the compressed data to the data length of the preset data, and taking the ratio as the block occupancy rate; when the block occupancy rate meets the condition of the compression block occupancy rate, subtracting the data length of the compressed data from the preset block data length to obtain a filling data length; acquiring a content to be filled with a filling data length, and integrating the compressed data with the content to be filled to obtain block data with a preset block data length; the block data is stored in a data block of the storage medium. The method and the device can achieve the purposes that block data with preset block data length is obtained for compressed data, and the block data with consistent data length is stored. And in the process of obtaining the block data with the preset block data length, considering the block occupancy rate, namely the proportion of the compressed data in the block data, and integrating the compressed data with the content to be filled when the block occupancy rate meets the condition of the compressed block occupancy rate to obtain the block data with the preset block data length.
Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by referring to these drawings.
FIG. 1 is a flow chart of a data processing method according to an embodiment of the present invention;
FIG. 2 is another flow chart of a data processing method according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating an embodiment of obtaining data to be compressed with a predetermined length of the data to be compressed;
FIG. 4 is a flowchart of a data processing method according to an embodiment of the present invention;
FIG. 5 is a flow chart of a data processing method to which embodiments of the present invention are applied;
FIG. 6 is another flow chart of a data processing method to which embodiments of the present invention are applied;
FIG. 7 is a flowchart of a data processing method according to an embodiment of the present invention;
FIG. 8 is a flowchart of a data processing method according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention;
FIG. 10 is a schematic diagram of another structure of a data processing apparatus according to an embodiment of the present invention;
FIG. 11 is a schematic diagram of another structure of a data processing apparatus according to an embodiment of the present invention;
FIG. 12 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention;
fig. 13 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived from the embodiments given herein by one of ordinary skill in the art, are within the scope of the invention.
The data processing method provided by the embodiment of the invention can be applied to electronic equipment, and specifically, the electronic equipment can be a server and the like.
An embodiment of the present invention provides a data processing method, which may include:
acquiring data to be compressed with preset data length to be compressed;
compressing data to be compressed to obtain compressed data, and recording the data length of the compressed data;
calculating the ratio of the data length of the compressed data to the data length of the preset data to obtain the block occupancy rate;
when the block occupancy rate meets the condition of the compression block occupancy rate, subtracting the data length of the compressed data from the preset block data length to obtain a filling data length;
acquiring a content to be filled with a filling data length, and integrating the compressed data with the content to be filled to obtain block data with a preset block data length;
the block data is stored in a data block of the storage medium.
In the embodiment of the invention, the data to be compressed with the preset length of the data to be compressed can be obtained; compressing data to be compressed to obtain compressed data, and recording the data length of the compressed data; calculating the ratio of the data length of the compressed data to the data length of the preset data, and taking the ratio as the block occupancy rate; when the block occupancy rate meets the condition of the compression block occupancy rate, subtracting the data length of the compressed data from the preset block data length to obtain a filling data length; acquiring a content to be filled with a filling data length, and integrating the compressed data with the content to be filled to obtain block data with a preset block data length; the block data is stored in a data block of the storage medium. The method and the device can achieve the purposes that block data with preset block data length is obtained for compressed data, and the block data with consistent data length is stored. And in the process of obtaining the block data with the preset block data length, considering the block occupancy rate, namely the proportion of the compressed data in the block data, and integrating the compressed data with the content to be filled when the block occupancy rate meets the condition of the compressed block occupancy rate to obtain the block data with the preset block data length.
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present invention. Referring to fig. 1, a data processing method provided in an embodiment of the present invention may include:
s101, acquiring data to be compressed with preset data length to be compressed.
The data processing method of the embodiment of the application can be implemented by electronic equipment, and specifically, the electronic equipment can be a personal computer, a smart phone, a server or the like. In one example, the electronic device is a collection of a plurality of servers in a cloud storage service.
The initial value of the preset length of the data to be compressed can be determined according to actual requirements, empirical values and the like. The adjustment can be performed subsequently through a compressed length prediction algorithm and a compressed length ascending and descending algorithm, and specifically, the adjustment process is described subsequently and is not described herein again.
S102, compressing the data to be compressed to obtain compressed data, and recording the data length of the compressed data.
The compression algorithm is mainly divided into a lossy compression algorithm and a lossless compression algorithm, the lossy compression algorithm realizes the compression of unstructured data such as image and audio through certain data removal, and the data file is reduced through certain data precision loss within a tolerable range; the lossless compression algorithm reduces the space occupation of data by recoding the repeated data under the condition of not losing the data precision, and the lossless compression algorithm is widely used for the transmission and storage of structured data in the fields of computers and communication. The embodiment of the invention does not limit the compression mode, and can adopt the existing compression algorithm to compress.
In one implementation, the compression may be performed using a lossless compression algorithm, such as a compression algorithm like Snappy, Gzip (file compressor for GNU free software), LZ4, etc.
And S103, calculating the ratio of the data length of the compressed data to the data length of the preset data to obtain the block occupancy rate.
The block occupancy rate may reflect the proportion of compressed data in block data, and may also be understood as reflecting the proportion of valid data in block data.
The preset block data length can be determined according to actual requirements, empirical values and the like. In an implementation manner, the determination may be based on hardware performance, such as read-write performance, of the storage medium. For example, if the storage medium has the best read/write performance in a certain data size (e.g., 4K, 8K, 16K, etc.), the data size is selected as the preset block data length.
And S104, when the block occupancy rate meets the condition of the compression block occupancy rate, subtracting the data length of the compressed data from the preset block data length to obtain the filling data length.
The padding data length is the data length of the content to be padded to be integrated with the compressed data.
The preset block data length is a data length of block data to be stored.
And S105, acquiring the content to be filled with the length of the filling data, and integrating the compressed data with the content to be filled to obtain block data with the preset block data length.
The content to be filled may include invalid blank fills and/or block metadata, among others. The block metadata may include any metadata corresponding to the block data, which is not limited by the embodiments of the present invention. In addition, the embodiment of the invention also does not limit the position of the content to be filled.
According to the embodiment of the invention, the block data obtained by integrating the compressed data and the filling data can be ensured to be the preset block data length.
When the block occupancy rate meets the condition of compressing the block occupancy rate, the compression can be understood as successful, and at the moment, the compressed data and the content to be filled can be integrated to obtain block data with the preset block data length, so that the finally stored block data is obtained.
And S106, storing the block data into the data block of the storage medium.
It may also be understood as implementing a data drop.
The storage medium may be a medium that implements persistent storage, such as a hard disk, a network disk, a database, and so forth.
In the embodiment of the invention, the block data can be stored in the form of data files, and different data files can be divided into different data blocks.
In the embodiment of the present invention, data mapping information for indicating the position of the block data may be stored, and the data mapping information may include an offset of the block data.
In the embodiment of the invention, the size of the data before compression is added to the effective compressed data area, namely the front part of the compressed data, and the data is stored as metadata.
In an implementation manner, in the embodiment of the present invention, data mapping information may be generated in a skip list manner and stored in a file system.
According to the embodiment of the invention, the block data with the preset block data length can be obtained aiming at the compressed data, and the block data with the consistent data length can be stored. And in the process of obtaining the block data with the preset block data length, considering the block occupancy rate, namely the proportion of the compressed data in the block data, and integrating the compressed data with the content to be filled when the block occupancy rate meets the condition of the compressed block occupancy rate to obtain the block data with the preset block data length.
In one implementation, the compression block occupancy condition may include that the block occupancy falls within a preset compression block occupancy range.
As shown in fig. 2, after calculating a ratio of the data length of the compressed data to the preset block data length to obtain the block occupancy rate, the method may include:
and S107, judging whether the block occupancy rate is in a preset compression block occupancy rate range.
If yes, that is, if the block occupancy rate falls within the preset compression block occupancy rate range, it is determined that the block occupancy rate satisfies the compression block occupancy rate condition, and step S104 is executed.
The preset compressed block occupancy range may include an occupancy upper limit and an occupancy lower limit. The occupancy upper limit value is 1. The occupancy lower limit value may be determined based on the storage space of the storage medium, the storage space of the cache, the memory, and the like, or may also be determined based on the actual demand for occupancy.
The occupancy of the block falling within the preset compressed block occupancy range may include the occupancy of the block being greater than or equal to an occupancy lower limit and less than an occupancy upper limit.
In one implementation, the compression block occupancy condition considers both occupancy and the number of compression attempts. Specifically, the compression block occupancy condition includes that the occupancy is less than an occupancy lower limit value, and the current compression attempt number is greater than or equal to the maximum compression attempt number.
Therefore, the number of compression attempts is combined on the basis of considering the occupancy rate, and the overlong compression period is avoided.
On the basis of the embodiment shown in fig. 1, in order to achieve that the occupancy rate satisfies the condition of compressing the occupancy rate as soon as possible and improve the compression efficiency, in an alternative embodiment of the present invention, as shown in fig. 3, S101 may include:
s1011, acquiring the raw data.
The original data may include data carried in a write request sent by a user.
S1012, adding the original data to the buffer to be compressed.
Copying the original data to a buffer to be compressed.
And S1013, when the data length in the buffer area to be compressed is greater than or equal to the compression threshold, obtaining the compression ratio of the record, and calculating to obtain the length of the preset data to be compressed according to the compression ratio of the record by a compression length prediction algorithm.
The compression threshold may be determined based on actual requirements or empirical values, etc.
And when the data length in the buffer area to be compressed is smaller than the compression threshold, directly exiting, waiting for the next write request, adding the data in the next write request to the buffer area to be compressed, continuously judging whether the data length in the buffer area to be compressed is larger than or equal to the compression threshold or not, executing S1013, and performing subsequent processes.
Therefore, when the data length in the buffer area to be compressed is larger than or equal to the compression threshold value, the preset data length to be compressed is calculated, the data with the preset data length to be compressed is obtained from the data in the buffer area to be compressed, the data to be compressed is obtained, and the subsequent compression process is carried out, so that the frequent compression of the data with smaller data length can be avoided, and the consumption of computing resources and the like can be reduced.
Before the data to be compressed is obtained for the first time and compressed, the length of the data to be compressed can be initialized and preset, in the subsequent compression process, the compression ratio of the record can be obtained, and the length of the data to be compressed is obtained through calculation according to the compression ratio of the record and a compression length prediction algorithm. The compression rate of the record may include the compression rate of the last compression, that is, the compression rate of the last compression of the current compression; the recorded compression ratios may also include the compression ratios of the last batch of compression, such as the compression ratio of the previous preset time of the current compression, and the preset time may be determined according to the requirement of the pre-compression length prediction algorithm, such as the previous 5 times, the previous 3 times, the previous 11 times, and so on. In an alternative manner, the compression ratio of the previous preset compression may be calculated, for example, an average value of the compression ratios of the previous preset compression is calculated, an average value of the rest compression ratios is calculated after removing a maximum value and a minimum value of the compression ratios of the previous preset compression, and the length of the preset data to be compressed is calculated according to the calculated value through a compression length prediction algorithm.
The data to be compressed is compressed to obtain compressed data, which can be understood as performing one-time compression, after the data to be compressed is compressed to obtain the compressed data, the number of compression attempts can be increased by 1, and the compression ratio corresponding to the current compression is recorded, wherein the compression ratio is the ratio of the data length of the compressed data to the data length of the data to be compressed.
The initial value of the number of compression attempts may be 0, which is increased by 1 every time compression is performed.
Can be based on the compression ratio by formula
Figure BDA0003228319730000121
Calculating the length of preset data to be compressed;
wherein L ispredictFor presetting the length of data to be compressed, RlastFor compression ratio, UminIs the lower limit of occupancy, LblockIs the preset block data length.
And S1014, acquiring data with preset length of the data to be compressed from the data in the buffer area to be compressed to obtain the data to be compressed.
In the embodiment of the invention, the original data with different lengths can be compressed, the size of the compressed block is fixed, and the utilization rate of the block is limited within a certain range. The preset data length to be compressed is obtained by calculation based on the compression ratio recorded after the last compression or the compression ratio recorded after the previous multiple compressions, the preset data length to be compressed is considered in the process of calculating the preset data length to be compressed, the data with the preset data length to be compressed is obtained from the data in the buffer area to be compressed for compression, the block occupancy rate can meet the occupancy rate condition of the compressed block as soon as possible, the compression efficiency is improved, namely, the number of compression attempts can be reduced and the overall compression efficiency is improved by the compression ratio memory and the prediction of the data length to be compressed.
On the basis of the embodiment shown in fig. 1, as shown in fig. 4, after determining whether the occupancy rate falls within the compressed occupancy rate range, the method may further include:
if the occupancy is greater than or equal to the occupancy upper limit value, step S108 is performed: and obtaining a new preset data length to be compressed by a compression length reduction algorithm, and returning to the step of obtaining the data with the preset data length to be compressed from the data in the buffer area to be compressed, namely returning to the step S101.
The compression length reduction algorithm may include an algorithm for reducing the length of the preset data to be compressed, and the length of the preset data to be compressed may be reduced according to a preset proportion, where the preset proportion may be determined according to actual requirements or empirical values. In particular, it can be represented by the formula Lpredict1=98.5%*LpredictObtaining a new preset data length to be compressed; l ispredict1For a new preset length of data to be compressed, i.e. forI.e. the preset data length to be compressed, L after the compression length reduction algorithm is executedpredictIs the preset data length to be compressed before updating, namely the preset data length to be compressed before executing the compression length reduction algorithm.
In an implementation manner, if the occupancy rate is equal to the occupancy rate upper limit value, that is, the data length of the obtained compressed data is equal to the preset block data length, at this time, the compressed data may also be directly stored in the data block of the storage medium.
If the occupancy is less than the occupancy lower limit, step S109 is executed: it is determined whether the current number of compression attempts is greater than or equal to the maximum number of compression attempts.
When the occupancy is less than the occupancy lower limit value and the current number of compression attempts is greater than or equal to the maximum number of compression attempts, it is determined that the occupancy satisfies the compression occupancy condition, and step S104 is performed.
When the occupancy rate is less than the occupancy rate lower limit value and the current number of compression attempts is less than the maximum number of compression attempts, step S110 is executed: and obtaining a new preset data length to be compressed by a compression length increasing algorithm, and returning to the step of obtaining the data with the preset data length to be compressed from the data in the buffer area to be compressed, namely returning to the step S101.
The compression length increasing algorithm may include an algorithm for increasing the length of the preset data to be compressed, and the length of the preset data to be compressed may be increased according to a preset proportion, where the preset proportion may be determined according to actual needs or empirical values, and the like. In particular, it can be represented by the formula Lpredict1=101.5%*LpredictObtaining a new preset data length to be compressed, Lpredict1For a new preset length of data to be compressed, i.e. the preset length of data to be compressed after the compression length increasing algorithm is executed, LpredictIs the preset length of the data to be compressed before updating, namely the preset length of the data to be compressed before executing the compression length increasing algorithm.
In the embodiment of the invention, the length of the data to be compressed is adjusted, so that the data with the length of the data to be compressed is obtained from the data in the buffer area to be compressed, the data to be compressed is obtained, the data to be compressed is compressed, the compressed data is obtained, the block occupancy rate meets the occupancy rate condition of a compression block, and when the block occupancy rate meets the occupancy rate condition of the compression block, the compressed data and the content to be filled are integrated, and the block data with the length of the data to be compressed is obtained. The space of the block can be fully utilized on the basis of ensuring the size of the compressed data block to be consistent, and the utilization rate of the storage space is improved.
FIG. 5 is a flow chart of a data processing method to which embodiments of the present invention are applied; fig. 6 is another flowchart of a data processing method according to an embodiment of the present invention.
Referring to fig. 5 and 6, first, a preset block data length L is initializedblockCompression threshold τ and occupancy rate lower limit UminNumber of compression attempts K, maximum number of compression attempts Kmax. For example, a preset block data length L is initializedblock4096KB and a compression threshold τ of 1.2 × LpredictLower limit of occupancy rate Umin95%, the number of compression attempts K is 0, and the maximum number of compression attempts KmaxIs 10.
Obtaining data needing to be written in from the writing request, namely original data, copying the original data to a buffer area to be compressed, checking whether the effective data length of the current buffer area to be compressed meets a compression threshold tau, if so, carrying out the next step, otherwise, exiting.
Obtaining the compression rate of the last batch of compression, and obtaining the expected compression length L by a compression length prediction algorithm according to the compression ratepredict. The expected compression length in fig. 5 and fig. 6 is the length of the data to be compressed in the above embodiment.
The expected compression length can be continuously adjusted by a compression length ascending algorithm or a compression length descending algorithm according to the compression rate and the preset block data length (which can also be understood as the block size).
Obtaining the expected compression length L in the buffer area to be compressedpredictCompressing the data with size by using a compression algorithm, writing the compressed data into a compression buffer, increasing the number of compression attempts by 1, and recordingThe compression ratio of this compression.
The compression may be performed by a compression engine and the compressed data may be written to a compression buffer.
Obtaining the data length L of the compressed data after compressioncompressedCalculating the data length L of the compressed datacompressedAnd a preset block data length LblockAnd obtaining the block occupancy rate.
Then, checking whether the block occupancy rate of the current compression block is in accordance with the compression block occupancy rate interval [ U ] of the preset compression block occupancy rate rangeminAnd 1), namely judging whether the block occupancy rate is in a preset compression block occupancy rate range, if the block occupancy rate is in accordance with the compression block occupancy rate interval, recording that the compression is successful, and executing an integrated storage step: integrating effective compressed data and ineffective blank filling, at the same time, adding block metadata region (not limiting metadata content and metadata position in block) as new compressed data block to obtain LblockThe block data of (2) is additionally written into the data file, and mapping information from the data entry to the compressed block, that is, the data mapping information, is generated.
If the block occupancy rate is larger than 1, obtaining a new expected compression length L through an expected compression length reduction algorithmpredictAnd returns to obtain the expected compression length L in the buffer to be compressedpredictAnd continuing to perform the compression process based on the new expected compression length.
If the occupancy rate is equal to 1, a new expected compression length L can be obtained through an expected compression length reduction algorithmpredictAnd returns to obtain the expected compression length L in the buffer to be compressedpredictAnd continuing to perform the compression process based on the new expected compression length. Or directly storing the compressed data into a storage medium.
If the block occupancy rate is less than UminThe number of compression attempts is greater than or equal to the maximum number of compression attempts Kmax,And if the compression is successful, executing the step of integrating and storing.
If the block occupancy rate is less than UminThe number of compression attempts is less thanMaximum number of compression attempts KmaxThen, through the expected compressed length ascending algorithm, obtain the new expected compressed length LpredictAnd returns to obtain the expected compression length L in the buffer to be compressedpredictAnd continuing to perform the compression process based on the new expected compression length.
If the block occupancy rate meets the compression block occupancy rate interval in the preset compression block occupancy rate range, data integration is performed to realize data destaging, that is, the obtained block data is stored in a storage medium, where the storage medium may be understood as a storage medium that realizes persistent storage, such as an external storage medium (e.g., a hard disk) and the like.
In the embodiment of the invention, the data length of the compressed data, namely the data length of the obtained block data is consistent by trying the original data with different data lengths, and the storage space of the block is fully utilized as far as possible.
On the basis of the embodiment shown in fig. 1, in an alternative embodiment of the present invention, after storing block data in a data block of a storage medium, as shown in fig. 7, the method may further include:
s701, when a data access request is received, data mapping information is obtained.
The data mapping information is used to indicate the location of block data.
A data access request may also be understood as a read request.
S702, based on the data mapping information, searching block data from the cache.
If the cache has block data, executing S703; if no block data exists in the cache, S704 is performed.
In the embodiment of the present invention, after the block data is stored in the data block of the storage medium, a caching process may be performed to store the data block of the storage medium in the cache. Therefore, when a data access request is received, data can be searched from the cache, if the data to be searched does not exist in the cache, the data is acquired from the storage medium, and the acquired data is stored in the cache, so that the data can be directly searched from the cache when the data is accessed next time.
In the embodiment of the present invention, compressed data, that is, block data obtained in the embodiment shown in fig. 1, is stored in the cache, and the compressed data is directly cached, so that the utilization rate of the cache space can be further improved.
In addition, in the embodiment of the invention, the block data has consistent size, and the data length of a plurality of block data can be understood to be the same, so that the block data can be stored by adopting a continuous address space, the fragmentation phenomenon does not exist, garbage recovery is not needed, and the waste of cache resources and the maintenance cost are greatly reduced. In addition, Input/Output (IO) overhead is reduced by compression and buffering with consistent block sizes, and data throughput capacity of the disk and the memory device is greatly improved.
And S703, analyzing the compressed data from the block data, and decompressing the compressed data to obtain decompressed data.
The decompression method corresponds to the compression method in S102. For example, if compression is performed using LZ4 in S102, decompression is performed using a decompression algorithm corresponding to LZ 4.
After the decompressed data is obtained, the decompressed data may be returned, for example, the decompressed data is returned to the client sending the data access request, and so on.
And S704, acquiring block data from the storage medium according to the data mapping information, and storing the block data into a cache.
In one implementation, the data mapping information may include a compressed block number of a data block in which the block data is located.
The corresponding block data may be located from the data file according to the compressed block number, read and added to the cache, and may be placed in the cache by a buffer replacement algorithm. For example, the block data is replaced by the block data existing in the cache according to the heat information of the block data. The heat information of one block data in the cache can be randomly selected and judged, and when the selected heat information of the block data is larger than a heat threshold value, replacement is not executed; and when the heat information of the selected block data is not greater than the heat threshold value, replacing the selected block data with the read block data. Due to the fact that the data length of the block data is consistent, garbage collection cannot be caused in the process of caching the block data through the buffer area replacement algorithm.
The cache is widely applied to storage systems such as databases, and data in low-speed storage equipment (such as a hard disk) is stored in high-speed storage equipment (such as the cache), so that the access frequency of the low-speed access equipment is reduced, and the access delay is reduced.
Fig. 8 is another flowchart of a data processing method according to an embodiment of the present invention. Referring to fig. 8, a read-write request layer, a compression decompression layer, a compression block cache layer, and a file system layer may be included.
In the embodiment of the present invention, a write request may be acquired, which may also be understood as a request for storing data, then, a data compression process is implemented through S101 to S104, and S105 is executed after compression, and specifically, a plurality of block data with the same data length may be stored in the file system layer. In particular, the file system layer may include a plurality of data files, such as data file 01, data file 02, … …, data file N, and so forth. Each data file comprises a plurality of data blocks for storing a plurality of Block data, such as Block 01-001, Block 01-002, Block 01-003, … … and Block 01-064 in the data file 01; the data file 02 stores Block 02-001, Block 02-002, Block 02-003, … … and Block 02-064; block N-001, Block N-002, Block N-003, … … and Block N-064 are stored in the data file N.
The block data stored in the file system layer may be cached to the compressed block cache layer through a caching process, which may also be referred to as a cache region. The space of the cache is limited, and part of the Block data stored in the file system layer is cached to a compressed Block cache layer, for example, the compressed Block cache layer caches Block 01-001, Block 02-003, … …, Block 05-008, and the like.
When a read request is obtained, block data can be searched from a cache, if the block data exists in the cache, compressed data is analyzed from the block data, and the compressed data is decompressed to obtain decompressed data; and if the block data does not exist in the cache, acquiring the block data from the storage medium according to the data mapping information, and storing the block data into the cache.
Corresponding to the data processing method provided in the foregoing embodiment, an embodiment of the present invention further provides a data processing apparatus, as shown in fig. 9, which may include:
a first obtaining module 901, configured to obtain to-be-compressed data with a preset to-be-compressed data length;
a compression module 902, configured to compress data to be compressed to obtain compressed data;
a first recording module 903, configured to record a data length of the compressed data;
a calculating module 904, configured to calculate a ratio of the data length of the compressed data to a preset block data length, so as to obtain a block occupancy rate;
an obtaining module 905, configured to subtract the data length of the compressed data from the preset block data length to obtain a padding data length when the block occupancy rate satisfies the compressed block occupancy rate condition;
a second obtaining module 906, configured to obtain content to be filled with a filling data length;
an integration module 907 for integrating the compressed data with the content to be filled to obtain block data with a preset block data length;
a storage module 908 for storing the block data in a data block of the storage medium.
Optionally, the first obtaining module 901 is specifically configured to obtain original data; adding original data to a buffer area to be compressed; when the data length in the buffer area to be compressed is larger than or equal to the compression threshold, acquiring the recorded compression ratio, and calculating to obtain the length of the preset data to be compressed according to the recorded compression ratio through a compression length prediction algorithm; and acquiring data with preset length of the data to be compressed from the data in the buffer area to be compressed to obtain the data to be compressed.
Optionally, as shown in fig. 10, the apparatus further includes:
the judging module 1001 is configured to, after calculating a ratio of a data length of the compressed data to a preset block data length to obtain a block occupancy rate, judge whether the block occupancy rate falls within a preset compression block occupancy rate range; and if so, determining that the block occupancy rate meets the condition of the compression block occupancy rate.
Optionally, the occupancy rate range of the compression block includes an occupancy rate upper limit value and an occupancy rate lower limit value;
the judging module 1001 is specifically configured to, if the occupancy rate is greater than or equal to the occupancy rate upper limit value, obtain a new preset data length to be compressed through a compression length reduction algorithm, and return the new preset data length to the obtaining module; if the occupancy rate is smaller than the occupancy rate lower limit value, judging whether the current compression attempt times are larger than or equal to the maximum compression attempt times; when the occupancy rate is less than the occupancy rate lower limit value and the current compression attempt times are more than or equal to the maximum compression attempt times, determining that the block occupancy rate meets the compression block occupancy rate condition; and when the occupancy rate is smaller than the occupancy rate lower limit value and the current compression attempt times are smaller than the maximum compression attempt times, obtaining a new preset data length to be compressed through a compression length increasing algorithm, and returning to the acquisition module.
Optionally, as shown in fig. 11, the apparatus further includes:
an increasing module 1101, configured to increase the number of compression attempts by 1 after compressing the data to be compressed to obtain compressed data;
the second recording module 1102 is configured to record a compression ratio corresponding to the current compression, where the compression ratio is a ratio of a data length of the compressed data to a data length of the data to be compressed.
Optionally, the calculating module 904 is specifically configured to calculate the compression rate according to a formula
Figure BDA0003228319730000191
Figure BDA0003228319730000192
Calculating the length of preset data to be compressed;
wherein L ispredictFor presetting the length of data to be compressed, RlastFor compression ratio, UminIs the lower limit of occupancy, LblockIs the preset block data length.
Optionally, the determining module 1001 is specifically configured to determine the formula Lpredict1=98.5%*LpredictObtaining a new preset data length to be compressed; by the formula Lpredict1=101.5%*LpredictObtaining a new preset data length to be compressed;
wherein L ispredict1For a new preset length of data to be compressed, LpredictThe length of the data to be compressed is preset before updating.
Optionally, as shown in fig. 12, the apparatus further includes:
a third obtaining module 1201, configured to, after storing the block data into a data block of the storage medium, obtain data mapping information when receiving a data access request, the data mapping information being used to indicate a location of the block data;
a searching module 1202, configured to search block data from the cache based on the data mapping information;
a decompression module 1203, configured to parse compressed data from the block data and decompress the compressed data to obtain decompressed data if the block data exists in the cache;
and a cache module 1204, configured to, if no block data exists in the cache, obtain the block data from the storage medium according to the data mapping information, and store the block data in the cache.
The data processing device provided by the embodiment of the invention is a device applying the data processing method, so that all the embodiments of the data processing method are suitable for the device and can achieve the same or similar beneficial effects.
An embodiment of the present invention further provides a data processing apparatus, as shown in fig. 13, including a processor 1301, a communication interface 1302, a memory 1303, and a communication bus 1304, where the processor 1301, the communication interface 1302, and the memory 1303 complete mutual communication through the communication bus 1304.
A memory 1303 for storing a computer program;
the processor 1301 is configured to implement the method steps of the data processing method when executing the program stored in the memory 1303.
The communication bus mentioned in the above data processing apparatus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the data processing device and other devices.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
In a further embodiment provided by the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, which, when being executed by a processor, implements the method steps of the data processing method in the above-mentioned embodiments.
In a further embodiment provided by the present invention, there is also provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method steps of the data processing method in the above-described embodiment.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the device, the computer-readable storage medium, and the computer program product embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and in relation to the description, reference may be made to some of the description of the method embodiments.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (11)

1. A data processing method, comprising:
acquiring data to be compressed with preset data length to be compressed;
compressing the data to be compressed to obtain compressed data, and recording the data length of the compressed data;
calculating the ratio of the data length of the compressed data to the data length of preset data to obtain the block occupancy rate;
when the block occupancy rate meets the condition of the compression block occupancy rate, subtracting the data length of the compressed data from the preset block data length to obtain a filling data length;
acquiring the content to be filled with the length of the filling data, and integrating the compressed data with the content to be filled to obtain block data with the length of the preset block data;
and storing the block data into a data block of a storage medium.
2. The method according to claim 1, wherein the obtaining of the data to be compressed with the preset data length to be compressed comprises:
acquiring original data;
adding the original data to a buffer area to be compressed;
when the data length in the buffer area to be compressed is larger than or equal to a compression threshold, acquiring the recorded compression ratio, and calculating to obtain the length of the preset data to be compressed according to the recorded compression ratio through a compression length prediction algorithm;
and acquiring the data with the preset length of the data to be compressed from the data in the buffer area to be compressed to obtain the data to be compressed.
3. The method of claim 2, wherein after the calculating the ratio of the data length of the compressed data to the preset block data length to obtain the block occupancy rate, the method further comprises:
judging whether the block occupancy rate falls within a preset compression block occupancy rate range or not;
and if so, determining that the block occupancy rate meets the condition of compressing the block occupancy rate.
4. The method of claim 3, wherein the compressed block occupancy range includes an occupancy upper limit and an occupancy lower limit;
after determining whether the block occupancy falls within a compressed block occupancy range, the method further comprises:
if the occupancy rate is greater than or equal to the occupancy rate upper limit value, obtaining a new preset data length to be compressed through a compression length reduction algorithm, and returning to the step of obtaining the data with the preset data length to be compressed from the data in the buffer area to be compressed;
if the occupancy rate is smaller than the occupancy rate lower limit value, judging whether the current compression attempt times are larger than or equal to the maximum compression attempt times;
when the occupancy rate is smaller than the occupancy rate lower limit value and the current compression attempt times are larger than or equal to the maximum compression attempt times, determining that the block occupancy rate meets the compression block occupancy rate condition;
and when the occupancy rate is smaller than the occupancy rate lower limit value and the current compression attempt times are smaller than the maximum compression attempt times, obtaining a new preset data length to be compressed through a compression length increasing algorithm, and returning to the step of obtaining the data with the preset data length to be compressed from the data in the buffer area to be compressed.
5. The method according to claim 4, wherein after the compressing the data to be compressed to obtain compressed data, the method further comprises:
and increasing the number of compression attempts by 1, and recording a compression ratio corresponding to the current compression, wherein the compression ratio is the ratio of the data length of the compressed data to the data length of the data to be compressed.
6. The method according to claim 2, wherein the calculating the preset data length to be compressed according to the compression rate by a compression length prediction algorithm comprises:
according to the compression ratio, by formula
Figure FDA0003228319720000021
Calculating the length of preset data to be compressed;
wherein L ispredictFor presetting the length of data to be compressed, RlastFor said compression ratio, UminIs the lower limit of occupancy, LblockIs the preset block data length.
7. The method according to claim 4, wherein the obtaining of the new preset data length to be compressed by the compression length reduction algorithm comprises:
by the formula Lpredict1=98.5%*LpredictObtaining a new preset data length to be compressed;
obtaining a new preset data length to be compressed by a compression length ascending algorithm, comprising the following steps:
by the formula Lpredict1=101.5%*LpredictObtaining a new preset data length to be compressed;
wherein,Lpredict1For a new preset length of data to be compressed, LpredictThe length of the data to be compressed is preset before updating.
8. The method of claim 1, wherein after said storing said block data into a data block of a storage medium, said method further comprises:
when a data access request is received, acquiring data mapping information, wherein the data mapping information is used for indicating the position of the block data;
searching the block data from the cache based on the data mapping information;
if the block data exists in the cache, analyzing the compressed data from the block data, and decompressing the compressed data to obtain decompressed data;
and if the block data does not exist in the cache, acquiring the block data from a storage medium according to the data mapping information, and storing the block data into the cache.
9. A data processing apparatus, comprising:
the first acquisition module is used for acquiring data to be compressed with preset data length to be compressed;
the compression module is used for compressing the data to be compressed to obtain compressed data;
the first recording module is used for recording the data length of the compressed data;
the calculating module is used for calculating the ratio of the data length of the compressed data to the data length of the preset data to obtain the block occupancy rate;
the obtaining module is used for subtracting the data length of the compressed data from the preset data length to obtain the filling data length when the block occupancy rate meets the condition of the compression block occupancy rate;
the second acquisition module is used for acquiring the content to be filled with the filling data length;
the integration module is used for integrating the compressed data and the content to be filled to obtain block data with the preset block data length;
and the storage module is used for storing the block data into a data block of a storage medium.
10. The data processing equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing the communication between the processor and the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1 to 8 when executing a program stored in the memory.
11. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-8.
CN202110979093.7A 2021-08-25 2021-08-25 Data processing method and device, electronic equipment and storage medium Active CN113726341B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110979093.7A CN113726341B (en) 2021-08-25 2021-08-25 Data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110979093.7A CN113726341B (en) 2021-08-25 2021-08-25 Data processing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113726341A true CN113726341A (en) 2021-11-30
CN113726341B CN113726341B (en) 2023-09-01

Family

ID=78677684

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110979093.7A Active CN113726341B (en) 2021-08-25 2021-08-25 Data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113726341B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114510362A (en) * 2022-02-18 2022-05-17 歌尔股份有限公司 Data caching processing method, device, equipment and storage medium
CN117220686A (en) * 2023-09-18 2023-12-12 青岛展诚科技有限公司 Parasitic parameter compression and extraction system and method

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1247669A (en) * 1996-12-18 2000-03-15 汤姆森消费电子有限公司 Efficient fixed-length block compression and decompression
JP2003219188A (en) * 2002-01-25 2003-07-31 Ricoh Co Ltd Image compression processor
US20130321182A1 (en) * 2012-05-31 2013-12-05 International Business Machines Corporation Compressing and decompressing signal data
CN105103137A (en) * 2013-03-15 2015-11-25 西部数据技术公司 Compression and formatting of data for data storage systems
CN106648469A (en) * 2016-12-29 2017-05-10 华为技术有限公司 Method and device for processing cache data and storage controller
CN109062502A (en) * 2018-07-10 2018-12-21 郑州云海信息技术有限公司 A kind of data compression method, device, equipment and computer readable storage medium
CN109683825A (en) * 2018-12-24 2019-04-26 广东浪潮大数据研究有限公司 A kind of storage system online data compression method, device and equipment
CN109690681A (en) * 2016-06-24 2019-04-26 华为技术有限公司 Handle method, storage device, solid state hard disk and the storage system of data
CN109981108A (en) * 2017-12-27 2019-07-05 杭州海康威视数字技术股份有限公司 Data compression method, decompression method, device and equipment
CN110557124A (en) * 2018-05-30 2019-12-10 华为技术有限公司 Data compression method and device
CN110784225A (en) * 2018-07-31 2020-02-11 华为技术有限公司 Data compression method, data decompression method, related device, electronic equipment and system
CN113301123A (en) * 2021-04-30 2021-08-24 阿里巴巴新加坡控股有限公司 Data stream processing method, device and storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1247669A (en) * 1996-12-18 2000-03-15 汤姆森消费电子有限公司 Efficient fixed-length block compression and decompression
JP2003219188A (en) * 2002-01-25 2003-07-31 Ricoh Co Ltd Image compression processor
US20130321182A1 (en) * 2012-05-31 2013-12-05 International Business Machines Corporation Compressing and decompressing signal data
CN105103137A (en) * 2013-03-15 2015-11-25 西部数据技术公司 Compression and formatting of data for data storage systems
CN109690681A (en) * 2016-06-24 2019-04-26 华为技术有限公司 Handle method, storage device, solid state hard disk and the storage system of data
CN106648469A (en) * 2016-12-29 2017-05-10 华为技术有限公司 Method and device for processing cache data and storage controller
CN109981108A (en) * 2017-12-27 2019-07-05 杭州海康威视数字技术股份有限公司 Data compression method, decompression method, device and equipment
CN110557124A (en) * 2018-05-30 2019-12-10 华为技术有限公司 Data compression method and device
CN109062502A (en) * 2018-07-10 2018-12-21 郑州云海信息技术有限公司 A kind of data compression method, device, equipment and computer readable storage medium
CN110784225A (en) * 2018-07-31 2020-02-11 华为技术有限公司 Data compression method, data decompression method, related device, electronic equipment and system
CN109683825A (en) * 2018-12-24 2019-04-26 广东浪潮大数据研究有限公司 A kind of storage system online data compression method, device and equipment
CN113301123A (en) * 2021-04-30 2021-08-24 阿里巴巴新加坡控股有限公司 Data stream processing method, device and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114510362A (en) * 2022-02-18 2022-05-17 歌尔股份有限公司 Data caching processing method, device, equipment and storage medium
CN117220686A (en) * 2023-09-18 2023-12-12 青岛展诚科技有限公司 Parasitic parameter compression and extraction system and method
CN117220686B (en) * 2023-09-18 2024-02-23 青岛展诚科技有限公司 Parasitic parameter compression and extraction system and method

Also Published As

Publication number Publication date
CN113726341B (en) 2023-09-01

Similar Documents

Publication Publication Date Title
US9965394B2 (en) Selective compression in data storage systems
US8904116B2 (en) System and method of selectively caching information based on the interarrival time of requests for the same information
CN108710639B (en) Ceph-based access optimization method for mass small files
CN113726341B (en) Data processing method and device, electronic equipment and storage medium
CN108055302B (en) Picture caching processing method and system and server
CN107301215B (en) Search result caching method and device and search method and device
US11836133B2 (en) In-memory database (IMDB) acceleration through near data processing
CN111611250A (en) Data storage device, data query method, data query device, server and storage medium
AU2015201273B2 (en) System and method of caching information
CN111857574A (en) Write request data compression method, system, terminal and storage medium
US11327929B2 (en) Method and system for reduced data movement compression using in-storage computing and a customized file system
CN111913913A (en) Access request processing method and device
CN113366463A (en) System, method and apparatus for eliminating duplicate and value redundancy in computer memory
CN116027982A (en) Data processing method, device and readable storage medium
US20090259617A1 (en) Method And System For Data Management
US10838727B2 (en) Device and method for cache utilization aware data compression
CN117331514B (en) Solid-state disk data compression system and method based on region division
CN117573357A (en) Cloud edge collaborative caching method, system and medium based on perceptual redundancy
CN103905432A (en) Web former server and webpage access request response method thereof
CN116450590A (en) Metadata management method and related components
CN116089383A (en) Method for realizing low-delay caching based on file system characteristics
CN117478743A (en) Data caching method, device, equipment and medium for balancing freshness and access frequency

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant