CN112306974A - Data processing method, device, equipment and storage medium - Google Patents
Data processing method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN112306974A CN112306974A CN201910697041.3A CN201910697041A CN112306974A CN 112306974 A CN112306974 A CN 112306974A CN 201910697041 A CN201910697041 A CN 201910697041A CN 112306974 A CN112306974 A CN 112306974A
- Authority
- CN
- China
- Prior art keywords
- data
- target
- compression
- log file
- data block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 16
- 238000007906 compression Methods 0.000 claims abstract description 231
- 230000006835 compression Effects 0.000 claims abstract description 229
- 238000012545 processing Methods 0.000 claims abstract description 47
- 238000005192 partition Methods 0.000 claims description 48
- 238000000034 method Methods 0.000 claims description 30
- 238000004590 computer program Methods 0.000 claims description 11
- 238000000638 solvent extraction Methods 0.000 claims description 4
- 238000004064 recycling Methods 0.000 claims 2
- 230000000694 effects Effects 0.000 abstract description 5
- 235000019580 granularity Nutrition 0.000 description 68
- 238000013500 data storage Methods 0.000 description 5
- 238000011084 recovery Methods 0.000 description 5
- 230000006837 decompression Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 238000003491 array Methods 0.000 description 2
- 230000014759 maintenance of location Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000004883 computer application Methods 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1744—Redundancy elimination performed by the file system using compression, e.g. sparse files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1748—De-duplication implemented within the file system, e.g. based on file segments
- G06F16/1752—De-duplication implemented within the file system, e.g. based on file segments based on file chunks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1805—Append-only file systems, e.g. using logs or journals to store data
- G06F16/1815—Journaling file systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a data processing method, which comprises the following steps: when a trigger condition for writing target data is reached, determining a current service scene; determining a target compression algorithm aiming at target data according to a service scene; compressing the target data by using a target compression algorithm to obtain data to be written corresponding to the target data; and writing the data to be written into the target log file. By applying the technical scheme provided by the embodiment of the invention, different compression algorithms are selected for different service scenes to compress the data to be written, the compression rate can be sacrificed for some data to ensure the compression performance, the compression rate can be ensured for some data to ensure the compression performance, and the balance of the compression rate and the compression performance can be realized. The invention also discloses a data processing device, equipment and a storage medium, and has corresponding technical effects.
Description
Technical Field
The present invention relates to the field of computer application technologies, and in particular, to a data processing method, apparatus, device, and storage medium.
Background
With the rapid development of computer technology and internet technology, the business data volume of various industries is increasing, and the demand for the storage capacity of data centers is increasing. Meanwhile, the requirements of real-time interactive applications such as cloud computing, online payment and mobile social contact on the access performance and the time delay of a storage system of the data center are higher and higher. Accordingly, data centers are gradually shifting from a scheme of building a storage system using a conventional mechanical hard disk to a scheme using a full flash memory. The full flash memory system can provide higher IOPS (Input/Output Operations Per Second), lower IO (Input/Output) access delay and higher access throughput, and is often applied in scenes such as financial securities, online trading, electronic commerce, network ticket buying, and the like.
However, full flash memory systems also suffer from high price, small capacity, and limited write/erase cycles. Therefore, in a full flash memory system, data is usually compressed and then stored, so as to save storage space and reduce the actual cost per GB. However, a high compression ratio is currently pursued, so that the compression performance of some data is degraded.
Disclosure of Invention
The invention aims to provide a data processing method, a data processing device, data processing equipment and a data processing storage medium, so as to realize the balance of compression rate and compression performance.
In order to solve the technical problems, the invention provides the following technical scheme:
a method of data processing, comprising:
when a trigger condition for writing target data is reached, determining a current service scene;
determining a target compression algorithm aiming at the target data according to the service scene;
compressing the target data by using the target compression algorithm to obtain data to be written corresponding to the target data;
and writing the data to be written into a target log file.
In a specific embodiment of the present invention, the compressing the target data by using the target compression algorithm to obtain data to be written corresponding to the target data includes:
compressing the target data by using the target compression algorithm to obtain a compression processing result;
and determining the data to be written corresponding to the target data according to the compression processing result.
In a specific embodiment of the present invention, the determining, according to the compression processing result, data to be written corresponding to the target data includes:
determining a compression rate for compressing the target data according to the compression processing result;
if the compression ratio is higher than a preset first compression ratio threshold value, determining compressed data obtained after the target data is compressed as data to be written in;
otherwise, determining the target data as the data to be written.
In one embodiment of the present invention, the method further comprises:
when a data block to be read is to be read, inquiring the physical address of the data block to be read;
reading the data block to be read in a corresponding log file through the obtained physical address by inquiring, wherein the head of each data block in each log file at least comprises a compression parameter field;
and decompressing the data block to be read according to a compression parameter field contained in the head of the data block to be read to obtain the conventional data corresponding to the data block to be read.
In a specific embodiment of the present invention, the compressing the target data by using the target compression algorithm, where the target data is initially written data, includes:
determining a first partition granularity for partitioning the target data according to the data volume of the target data;
dividing the target data into a plurality of first data blocks using the first division granularity;
and respectively compressing each first data block by using the target compression algorithm.
In an embodiment of the present invention, after the compressing each first data block by using the target compression algorithm, the method further includes:
if the compression rate of each first data block in the compression process is not higher than a preset second compression rate threshold value, dividing the target data block into a plurality of second data blocks by using a second division granularity, wherein the second division granularity is smaller than the first division granularity;
and compressing each second data block by using the target compression algorithm.
In a specific embodiment of the present invention, before determining a current business scenario when the trigger condition for writing target data is reached, the method further includes:
when a garbage collection triggering condition is met, determining a log file to be collected;
respectively determining whether each effective data block in the log file to be recovered is cold data;
and taking the valid data block determined as cold data as the target data.
In a specific embodiment of the present invention, after determining that the log file to be recovered is to be recovered when the garbage recovery triggering condition is reached, the method further includes:
and writing each effective data block determined as hot data in the log file to be recovered into a new log file.
In a specific embodiment of the present invention, the determining, when the garbage collection triggering condition is reached, a log file to be collected includes:
and when the garbage collection triggering condition is reached, determining the log file of which the ratio of the data quantity of the invalid data block to the data quantity of the valid data block is greater than a set ratio threshold value as the log file to be collected.
In one embodiment of the present invention, the method further comprises:
when a third data block in the target log file is updated, writing the updated third data block into a new log file;
marking the third data block in the target log file as a stale data block.
A data processing apparatus comprising:
the service scene determining module is used for determining the current service scene when the trigger condition for writing target data is reached;
a compression algorithm determining module, configured to determine a target compression algorithm for the target data according to the service scenario;
the data obtaining module is used for compressing the target data by using the target compression algorithm to obtain data to be written corresponding to the target data;
and the data writing module is used for writing the data to be written into a target log file.
A data processing apparatus comprising:
a memory for storing a computer program;
a processor for implementing the steps of any of the above data processing methods when executing the computer program.
A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the data processing method of any of the preceding claims.
By applying the technical scheme provided by the embodiment of the invention, when the trigger condition for writing the target data is reached, the current service scene is determined, the target compression algorithm for the target data is determined according to the service scene, the target compression algorithm is used for compressing the target data, the data to be written corresponding to the target data is obtained, and the data to be written is written into the target log file. Different compression algorithms are selected for different service scenes to compress the data to be written, the compression rate can be sacrificed for some data to ensure the compression performance, the compression rate can be ensured for some data to sacrifice the compression performance, and the balance of the compression rate and the compression performance can be realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a block diagram of a data processing system according to an embodiment of the present invention;
FIG. 2 is a flow chart of a data processing method according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention.
Detailed Description
The core of the invention is to provide a data processing method which can be adapted to a full flash memory system and, of course, can also be adapted to other memory systems. In the embodiment of the invention, based on the current service scene, a target compression algorithm for target data to be written is determined, the target compression algorithm is used for compressing the target data, actual data to be written is determined, and the data to be written is written into a target log file. And selecting a compression mode suitable for the data to be written into the log file to compress the data based on the service scene, wherein the compression rate can be sacrificed for some data to ensure the compression performance, and the compression rate can be ensured for some data to realize the balance of the compression rate and the compression performance.
By All-Flash-Array System (AFA), it is meant a self-contained storage Array or device that is entirely composed of solid-state storage media (typically NAND Flash), without a Hard Disk Drive (HDD), and that can be used to enhance the performance of environments that may contain arrays of disks, or to replace All conventional hard disk storage arrays. Full flash memory systems can provide higher IOPS performance, lower latency, and higher throughput. NAND flash memory faces the challenges of being expensive per GB and having limited write/erase cycles. Therefore, the full flash memory system can adopt data reduction technologies such as data deduplication, automatic thin provisioning, data compression and the like to ensure good operation of the full flash memory system and reduce the actual cost per GB. Generally, a full flash memory system may perform data deduplication to eliminate duplicate data, and then perform further byte level compression on non-duplicate data. The scenes that need to be compressed in a full flash memory system are mainly as follows: the scenario of data entering the storage system for the first time (real-time write operation) and the scenario of cold data storage during garbage collection.
The journaling file system provides efficient random write performance for a full flash memory system. The log file system may also be referred to as log-structured file system (log-structured file system), which is a log-type storage system supporting redirection during writing, service data may be written into the system in the form of additional writing, address mapping from a logical address to a physical address is added or updated in a metadata center, and valid data is pointed to through the address mapping, data not pointed to by the address mapping is marked as invalid data, and is recovered through a background garbage recovery mechanism to release a storage space. In the log structured file system, the basic unit of data storage is a log file, or called a data segment. When the storage space is insufficient, the log files can be sorted according to the utilization rate of the log files, the log files with low utilization rate are recycled, and available data are copied to new log files.
In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, which is an overall framework diagram of a data processing system according to an embodiment of the present invention, the data processing system may include a metadata center 110, a metadata cache (cache) center 120, a compression engine 130, and a log storage center 140.
The metadata center 110 is configured to store metadata information and provide indexing and addressing services for data read/write requests. Metadata cache center 120 may be used to cache metadata information, reducing the overhead of accessing the metadata center. The compression engine 130 includes a compression buffer (buffer) unit, a service scene recognition unit, a variable-length granularity division unit, a compression format output unit, and a compression pipeline. When data is to be written, the data first reaches a compression buffer unit, a current service scene is identified through a service scene identification unit, a compression algorithm is determined, the data is divided through a variable-length granularity division unit, a determined compression algorithm is used for compressing divided data blocks through a compression pipeline, a header containing a set compression parameter field is added to the compressed data through a compression format output unit, then the compressed data is written into a log file of a log storage center 140, file attribute information such as the size of the compressed data is updated, and subsequently the access frequency of each data block and the storage time in a system can be recorded in the log storage center 140.
The compression pipeline can provide batch compression service, the running context of data block compression/decompression, improve the throughput rate, utilize limited computing resources to the maximum extent and reduce the compression time overhead.
The data compressed by the compression pipeline and the original data are organized into a set fixed format, such as a header writing algorithm name, a compression length, a compression parameter such as a compressed data identifier and the like, so that whether other systems or working units sense the data in the log file or not can be guaranteed.
The compressed data can be sorted in batch, data can be landed, a landing request is pushed to the metadata cache center 120, and metadata information is recorded. The write back success may be returned upon updating the relevant metadata information to the metadata cache center 120, and then the metadata cache center 120 may update the address mapping information to the metadata center 110 in the background.
The entire compression engine 130 and metadata cache center 120 run in memory. The log storage center 140 may include a log header area and a log data area, which are stored in an underlying solid state storage SSD. The log header area can store data self-description information, and the log data area stores compressed or uncompressed data.
Referring to fig. 2, there is shown a flowchart for implementing a data processing method according to an embodiment of the present invention, where the method may include the following steps:
s210: and when a trigger condition for writing target data is reached, determining the current service scene.
In the process of business, there may be a need to write data, such as new data to be written, or data to be migrated. When a write request for target data by a user or other systems is received, it can be considered that a trigger condition for writing the target data is reached.
When a trigger condition for writing target data is reached, a current business scenario may be determined. As described above, there are two main types of service scenarios, one is a real-time write scenario, and the other is a cold data storage scenario during garbage collection. Of course, more scenes can be distinguished according to actual needs.
S220: and determining a target compression algorithm aiming at the target data according to the service scene.
Under different service scenes, the requirements on access delay, compression rate and the like of data are different. For example, in a real-time writing scene, data enters for the first time, is new data, has a high probability of being accessed later, can sacrifice compression ratio properly, and adopts a compression algorithm with high compression performance and decompression performance to reduce the influence on the time delay of read and write operations as much as possible, for example, compression algorithms such as LZ4, Snappy, ZSTD, and Intel I-SAL can be selected; in a cold data storage scene during garbage recovery, the data access frequency is reduced, the compression and decompression speed can be properly sacrificed, and a compression algorithm with high compression rate, such as a compression algorithm like zlib deflate-6 and Def9_128_ SW, is selected, so that the storage overhead is reduced to the maximum extent.
S230: and compressing the target data by using a target compression algorithm to obtain the data to be written corresponding to the target data.
After the target compression algorithm for the target data is determined, the target compression algorithm can be used for compressing the target data to obtain the data to be written corresponding to the target data.
In one embodiment of the present invention, step S230 may include the following steps:
the method comprises the following steps: compressing the target data by using a target compression algorithm to obtain a compression processing result;
step two: and determining the data to be written corresponding to the target data according to the compression processing result.
For convenience of description, the above two steps are combined for illustration.
After the target compression algorithm is determined, the target compression algorithm is used for compressing the target data, compressed data can be obtained, and a compression processing result is obtained. The compression processing result may include information of the data amount of the compressed data, the compression rate at which the target data is subjected to the compression processing, and the like. According to the compression processing result, the data to be written corresponding to the target data can be determined.
Specifically, the compression rate of the compression processing on the target data may be determined according to the compression processing result. The compression rate of the compression processing on the target data can be characterized by the ratio of the data amount of the target data to the data amount of the compressed data. If the compression rate is higher than a preset first compression rate threshold, it indicates that the compression effect is good, the compressed data obtained after the target data is compressed can be determined as the data to be written, and if the compression rate is not higher than the first compression rate threshold, it indicates that the compression effect is general, and the target data can be determined as the data to be written.
The first threshold value of the compression rate may be set and adjusted according to actual situations, which is not limited in the embodiment of the present invention.
Generally speaking, a best effort compression strategy is adopted, if the compression rate is too low, the original data is kept to improve the access efficiency, otherwise, the compressed data is written to save the storage space.
S240: and writing the data to be written into the target log file.
After the data to be written corresponding to the target data is obtained, the data to be written can be written into the target log file. The size of a single log file can be configured, for example, 64M, and a new log file is created and written after the log file is fully written. If all the current log files are fully written, a target log file is created first, then the data to be written is written, if the current log files are not fully written, the log files which are not fully written are determined as the target log file, and the data to be written is written.
By applying the method provided by the embodiment of the invention, when the trigger condition for writing the target data is reached, the current service scene is determined, the target compression algorithm for the target data is determined according to the service scene, the target compression algorithm is used for compressing the target data, the data to be written corresponding to the target data is obtained, and the data to be written is written into the target log file. Different compression algorithms are selected for different service scenes to compress the data to be written, the compression rate can be sacrificed for some data to ensure the compression performance, the compression rate can be ensured for some data to sacrifice the compression performance, and the balance of the compression rate and the compression performance can be realized.
In practical application, when a target compression algorithm is used to compress target data, the target data may be segmented to obtain a plurality of data blocks, and each data block is compressed. For each data block, if the compression rate of the data block is small, the data block can be directly written in, and if the compression rate of the data block is large, the compressed data block obtained after the data block is compressed can be written in a log file. The size of the compression ratio may be determined by comparing the compression ratio to a threshold compression ratio.
In an embodiment of the present invention, the target data is the initial write data, and the step S230 may include the following steps:
the first step is as follows: determining a first division granularity for dividing the target data according to the data volume of the target data;
the second step is that: dividing target data into a plurality of first data blocks using a first division granularity;
the third step: and respectively compressing each first data block by using a target compression algorithm.
For convenience of description, the embodiment of the present invention combines the above three steps.
After the target compression algorithm is determined, a first partition granularity for partitioning the target data may be determined according to the data size of the target data. Specifically, the largest partition size may be selected among the selectable partition sizes. If all the partition granularities are respectively as follows: 32KB,24KB,16KB,8KB, if the selectable granularity of partitioning according to the amount of data of the target data is: 32KB,24KB,16KB,8KB, 32KB may be determined as the first partition granularity, if the selectable partition granularities are: 24KB,16KB,8KB, 24KB may be determined as the first partition granularity.
The target data may be divided into a plurality of first data blocks using a first division granularity, and then each first data block may be compressed using a target compression algorithm. According to the compression processing result, the data to be written can be determined. For example, each first data block after compression is determined as data to be written, or each original first data block before compression is determined as data to be written.
In an embodiment of the present invention, after each first data block is compressed by using the target compression algorithm, if a compression rate of the compression processing on each first data block is not higher than a preset second compression rate threshold, the target data may be divided into a plurality of second data blocks by using a second division granularity, and each second data block is compressed by using the target compression algorithm. The second partition size is smaller than the first partition size.
After each first data block is compressed, if the compression rate of each first data block is not higher than a preset second compression rate threshold, indicating that the compression effect is general, in this case, the second partition granularity may be selected, the target data may be divided into a plurality of second data blocks by using the second partition granularity, and each second data block may be compressed by using the target compression algorithm. Of course, if the compression rate of compressing each second data block is not higher than the preset second compression rate threshold, a third partition granularity may also be selected, where the third partition granularity is smaller than the second partition granularity, the target data is partitioned by using the third partition granularity, and then each partitioned data block is compressed by using the target compression algorithm.
In practical application, a partition size set may be preset, where the partition size set may include N partition sizes, and the N partition sizes are arranged in the partition size set according to a size order. If the set of partition granularities is {32KB,24KB,16KB,8KB }, wherein 32KB,24KB,16KB,8KB are all partition granularities and are arranged in the set of partition granularities in the order of decreasing size.
When the target data is compressed, the method can further comprise the following steps:
the first step is as follows: determining the ith division granularity in a division granularity set as a target division granularity according to the data volume of target data, wherein the division granularity set comprises N sequentially reduced division granularities, and both i and N are positive integers;
the second step is that: dividing target data into a plurality of target data blocks according to the target division granularity;
the third step: and respectively compressing each target data block by using a target compression algorithm.
After the target compression algorithm is determined, the ith partition granularity in the partition granularity set may be determined as the target partition granularity according to the data amount of the target data. Specifically, the partition granularity at which the data amount of the target data is closest to the lower part may be selected, that is, the large block partition is performed first, and the block length is selected to decrease with the actual length. For example, if the data amount of the target data is 128KB, a 1 st partition granularity, i.e., 32KB, in the set of partition granularities is determined as the target partition granularity, and if the data amount of the target data is 30KB, a 2 nd partition granularity, i.e., 24KB, in the set of partition granularities is determined as the target partition granularity.
And dividing the target data into a plurality of target data blocks according to the target division granularity, and then respectively compressing each target data block by using a target compression algorithm. According to the compression processing result, the data to be written can be determined. For example, each compressed data block is determined as data to be written.
In practical application, the target data may be first cut into a plurality of small data blocks by using the minimum partition granularity, and when the target data needs to be partitioned by using a certain partition granularity, the data blocks with the size corresponding to the partition granularity may be formed by taking the corresponding number of small data blocks. If the target data is divided into 8 small data blocks of 8KB by using the 8KB division granularity and the target division granularity is determined to be 32KB, the first 4 small data blocks and the last 4 small data blocks of 8KB are combined to form 2 data blocks of 32 KB.
Of course, after the target partition granularity is determined, the target data may be partitioned into a plurality of target data blocks by using the target partition granularity.
In one embodiment of the present invention, after each target data block is compressed, if the compression rate of the compression processing on each target data block is not higher than a preset second compression rate threshold, let i be i +1, and repeatedly perform the step of determining the ith partition granularity in the partition granularity set as the target partition granularity until i is N or the compression rate of the compression processing on each target data block is higher than the second compression rate threshold.
After each target data block is compressed, if the compression rate of each target data block is not higher than a preset second compression rate threshold, indicating that the compression effect is general, i +1 may be set, a next partition granularity is selected, the partition granularity is determined as the target partition granularity repeatedly, the target data is divided into a plurality of target data blocks according to the target partition granularity, and each target data block is compressed by using a target compression algorithm. And until i is equal to N, selecting the last division granularity which is also the minimum division granularity, determining the division granularity as a target division granularity, dividing the target data into a plurality of target data blocks according to the target division granularity, and compressing each target data block by using a target compression algorithm. If the compression rate of the compression processing on each target data block is still not higher than the preset second compression rate threshold, the compression on each target data block can be abandoned, and the target data block is directly written into the target log file as the data to be written.
Of course, in any process of repeated execution, if the compression rate of the compression processing performed on each target data block is higher than the preset second compression rate threshold, the operations such as division and compression performed using the next division granularity may not be performed any more, and the corresponding compressed data block is written into the target log file as the data to be written.
The compression rate of each target data block can be represented by the mean value of each compression rate corresponding to the compression of each target data block, or by the ratio of the data volume of the target data block to the data volume of the compressed data block obtained by compressing each target data block. The second threshold value of the compression ratio can be set and adjusted according to actual conditions, and can be the same as or different from the first threshold value of the compression ratio.
In an embodiment of the present invention, before determining the current business scenario when the trigger condition for writing the target data is reached, the method may further include the following steps:
the method comprises the following steps: when a garbage collection triggering condition is met, determining a log file to be collected;
step two: respectively determining whether each effective data block in the log file to be recovered is cold data;
step three: the valid data block determined as cold data is taken as target data.
For convenience of description, the above three steps are combined for illustration.
In practical application, as the business progresses, more log files are available, and more storage space is occupied. When the occupancy rate of the storage space reaches the set occupancy rate threshold, the garbage collection triggering condition can be considered to be reached. Or when a garbage collection instruction is received, the garbage collection triggering condition can be considered to be reached. Or when the set time interval is reached, the garbage collection triggering condition can be considered to be reached.
When the garbage collection triggering condition is reached, the log file to be collected can be determined. The log file to be recovered can be manually specified, or determined according to the data condition in the log file. Specifically, when the garbage collection triggering condition is reached, the log file in which the ratio of the data amounts of the invalid data block and the valid data block is greater than the set ratio threshold value may be determined as the log file to be collected. That is, for each log file, if the ratio of the data amount of the invalid data block to the valid data block in the log file is greater than a set ratio threshold, the log file may be determined as a log file to be recovered.
For the invalid data block in the log file, the third data block may be manually specified, or when the third data block in the target log file is updated, the updated third data block is written into a new log file, and the third data block in the target log file is marked as the invalid data block.
It can be understood that the log file to be recovered may have both invalid data blocks and valid data blocks, and the valid data blocks have use values and cannot be recovered. For each valid data block in the log file to be reclaimed, it can be determined whether the valid data block is cold data or hot data. Namely, the data blocks in the log file to be recovered are separated into cold data and hot data, and whether each effective data block in the log file to be recovered is cold data is determined respectively.
Specifically, for the valid data block, the survival time and the access frequency of the valid data block may be counted, and the cold and hot separation may be performed accordingly. The compression engine may record the time of entry into the system for each block of data, as well as the number of accesses to the block of data. For data with the data block storage time exceeding a set time threshold, such as 15 days or 30 days, the data blocks may be sorted according to the access times or access frequency of the data blocks, and cold data with low access times or low access frequency is used. The proportion of the cold data and the hot data can be adjusted, for example, 20% of the data are hot data, and 80% of the data are cold data.
A valid data block determined to be cold data may be targeted data. That is, for each valid data block, if it is determined that the valid data block is cold data, the valid data block may be determined as target data, and it is determined that the trigger condition for writing the target data is currently reached, and the operations of step S210 to step S240 are continuously performed. That is, in the data migration, the cold data is further compressed by a compression algorithm with a high compression rate and written into the target log file, so as to save the storage space. In this case, the target log file may be a newly created log file.
After determining the log file to be recovered when the garbage recovery triggering condition is reached, writing each valid data block determined as hot data in the log file to be recovered into a new log file. That is, for each valid data block in the log file to be recycled, if the valid data block is determined to be hot data, the valid data block may be written into a new log file.
For updates involving data storage locations, the metadata information for the data blocks will eventually need to be updated in the metadata center after the update.
After the effective data blocks in the log file to be recovered are migrated, the log file to be recovered can be recovered. If the log file is stored at the set position, the log file to be recovered is deleted after the set retention time length is reached, so that the log file can be restored within the retention time length when the log file needs to be restored, or the deletion processing is directly carried out.
In one embodiment of the invention, the method may further comprise the steps of:
the first step is as follows: when a data block to be read is to be read, inquiring the physical address of the data block to be read;
the second step is that: reading data blocks to be read in corresponding log files through the obtained physical addresses by inquiring, wherein the head of each data block in each log file at least comprises a compression parameter field;
the third step: and decompressing the data block to be read according to the compression parameter field contained in the head of the data block to be read, so as to obtain the conventional data corresponding to the data block to be read.
For convenience of description, the above three steps are combined for illustration.
In the embodiment of the present invention, the header of each data block in each log file at least contains a compression parameter field, such as a compression algorithm field, a compression length field, a compression data identification field, and the like. The data blocks in the log file may be compressed data blocks or uncompressed data blocks.
When the data block to be read is to be read, the physical address of the data block to be read can be queried first. Specifically, the corresponding physical address may be queried in the metadata cache center according to the logical address of the data block to be read, and if the physical address is not queried, the physical address is queried in the metadata cache center.
By inquiring the obtained physical address, the log file in which the data block to be read is located can be determined, and the data block to be read can be read in the corresponding log file. And decompressing the data block to be read according to the compression parameter field contained in the head of the data block to be read, so as to obtain the conventional data corresponding to the data block to be read.
When a read request for one data is received, the data block corresponding to the data may be determined first, each data block is read by using the above method, and after decompression processing, the obtained corresponding plurality of regular data are combined and returned to the requesting party.
In the embodiment of the invention, by carrying out service scene perception variable length compression on the data in the full flash memory system, the service read-write network flow and the storage flow can be reduced, and the storage cost of each GB of the full flash memory is reduced.
Corresponding to the above method embodiment, the embodiment of the present invention further provides a data processing apparatus, and a data processing apparatus described below and a data processing method described above may be referred to in correspondence with each other.
Referring to fig. 3, the apparatus includes:
a service scenario determining module 310, configured to determine a current service scenario when a trigger condition for writing target data is met;
a compression algorithm determining module 320, configured to determine a target compression algorithm for the target data according to the service scenario;
the data obtaining module 330 is configured to perform compression processing on the target data by using a target compression algorithm to obtain data to be written corresponding to the target data;
and the data writing module 340 is configured to write the data to be written into the target log file.
By applying the device provided by the embodiment of the invention, when the trigger condition for writing in the target data is reached, the current service scene is determined, the target compression algorithm for the target data is determined according to the service scene, the target compression algorithm is used for compressing the target data, the data to be written corresponding to the target data is obtained, and the data to be written is written into the target log file. Different compression algorithms are selected for different service scenes to compress the data to be written, the compression rate can be sacrificed for some data to ensure the compression performance, the compression rate can be ensured for some data to sacrifice the compression performance, and the balance of the compression rate and the compression performance can be realized.
In an embodiment of the present invention, the data obtaining module 330 is specifically configured to:
compressing the target data by using a target compression algorithm to obtain a compression processing result;
and determining the data to be written corresponding to the target data according to the compression processing result.
In an embodiment of the present invention, the data obtaining module 330 is specifically configured to:
determining a compression rate for compressing the target data according to the compression processing result;
if the compression ratio is higher than a preset first compression ratio threshold value, determining compressed data obtained after the target data is compressed as data to be written in;
otherwise, the target data is determined as the data to be written.
In an embodiment of the present invention, the target data is first-time write data, and the data obtaining module 330 is specifically configured to:
determining a first division granularity for dividing the target data according to the data volume of the target data;
dividing target data into a plurality of first data blocks using a first division granularity;
and respectively compressing each first data block by using a target compression algorithm.
In an embodiment of the present invention, the data obtaining module 330 is further configured to:
if the compression rate of each first data block is not higher than a preset second compression rate threshold, dividing the target data block into a plurality of second data blocks by using a second division granularity, wherein the second division granularity is smaller than the first division granularity;
and respectively compressing each second data block by using a target compression algorithm.
In an embodiment of the present invention, the method further includes a data determining module, configured to:
when a trigger condition for writing target data is reached, determining a log file to be recovered before determining a current service scene and when a garbage recovery trigger condition is reached;
respectively determining whether each effective data block in the log file to be recovered is cold data;
the valid data block determined as cold data is taken as target data.
In an embodiment of the present invention, the data writing module 340 is further configured to:
when the garbage collection triggering condition is reached, after the log file to be collected is determined, writing each effective data block determined as hot data in the log file to be collected into a new log file.
In an embodiment of the present invention, the data determining module is specifically configured to:
and when the garbage collection triggering condition is reached, determining the log file of which the ratio of the data quantity of the invalid data block to the data quantity of the valid data block is greater than a set ratio threshold value as the log file to be collected.
In an embodiment of the present invention, the apparatus further includes a data marking module, configured to:
when the third data block in the target log file is updated, writing the updated third data block into a new log file;
and marking a third data block in the target log file as a failure data block.
In an embodiment of the present invention, the apparatus further includes a data reading module, configured to:
when a data block to be read is to be read, inquiring the physical address of the data block to be read;
reading data blocks to be read in corresponding log files through the obtained physical addresses by inquiring, wherein the head of each data block in each log file at least comprises a compression parameter field;
and decompressing the data block to be read according to the compression parameter field contained in the head of the data block to be read, so as to obtain the conventional data corresponding to the data block to be read.
Corresponding to the above method embodiment, an embodiment of the present invention further provides a data processing apparatus, as shown in fig. 4, the apparatus including:
a memory 410 for storing a computer program;
the processor 420 is configured to implement the steps of the data processing method when executing the computer program.
Corresponding to the above method embodiments, the present invention further provides a computer readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the data processing method.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The principle and the implementation of the present invention are explained in the present application by using specific examples, and the above description of the embodiments is only used to help understanding the technical solution and the core idea of the present invention. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.
Claims (13)
1. A data processing method, comprising:
when a trigger condition for writing target data is reached, determining a current service scene;
determining a target compression algorithm aiming at the target data according to the service scene;
compressing the target data by using the target compression algorithm to obtain data to be written corresponding to the target data;
and writing the data to be written into a target log file.
2. The method according to claim 1, wherein the compressing the target data by using the target compression algorithm to obtain data to be written corresponding to the target data comprises:
compressing the target data by using the target compression algorithm to obtain a compression processing result;
and determining the data to be written corresponding to the target data according to the compression processing result.
3. The method according to claim 2, wherein the determining, according to the compression processing result, data to be written corresponding to the target data comprises:
determining a compression rate for compressing the target data according to the compression processing result;
if the compression ratio is higher than a preset first compression ratio threshold value, determining compressed data obtained after the target data is compressed as data to be written in;
otherwise, determining the target data as the data to be written.
4. The method of claim 1, further comprising:
when a data block to be read is to be read, inquiring the physical address of the data block to be read;
reading the data block to be read in a corresponding log file through the obtained physical address by inquiring, wherein the head of each data block in each log file at least comprises a compression parameter field;
and decompressing the data block to be read according to a compression parameter field contained in the head of the data block to be read to obtain the conventional data corresponding to the data block to be read.
5. The method according to any one of claims 1 to 4, wherein the target data is initially written data, and the compressing the target data by using the target compression algorithm comprises:
determining a first partition granularity for partitioning the target data according to the data volume of the target data;
dividing the target data into a plurality of first data blocks using the first division granularity;
and respectively compressing each first data block by using the target compression algorithm.
6. The method according to claim 5, further comprising, after said compressing each of said first data blocks using said target compression algorithm, respectively:
if the compression rate of each first data block in the compression process is not higher than a preset second compression rate threshold value, dividing the target data block into a plurality of second data blocks by using a second division granularity, wherein the second division granularity is smaller than the first division granularity;
and compressing each second data block by using the target compression algorithm.
7. The method according to any of claims 1 to 4, wherein before determining a current traffic scenario when the trigger condition for writing target data is reached, further comprising:
when a garbage collection triggering condition is met, determining a log file to be collected;
respectively determining whether each effective data block in the log file to be recovered is cold data;
and taking the valid data block determined as cold data as the target data.
8. The method according to claim 7, wherein after determining the log file to be recycled when the garbage recycling triggering condition is reached, further comprising:
and writing each effective data block determined as hot data in the log file to be recovered into a new log file.
9. The method of claim 7, wherein determining the log file to be recycled when the garbage recycling triggering condition is reached comprises:
and when the garbage collection triggering condition is reached, determining the log file of which the ratio of the data quantity of the invalid data block to the data quantity of the valid data block is greater than a set ratio threshold value as the log file to be collected.
10. The method of claim 9, further comprising:
when a third data block in the target log file is updated, writing the updated third data block into a new log file;
marking the third data block in the target log file as a stale data block.
11. A data processing apparatus, comprising:
the service scene determining module is used for determining the current service scene when the trigger condition for writing target data is reached;
a compression algorithm determining module, configured to determine a target compression algorithm for the target data according to the service scenario;
the data obtaining module is used for compressing the target data by using the target compression algorithm to obtain data to be written corresponding to the target data;
and the data writing module is used for writing the data to be written into a target log file.
12. A data processing apparatus, characterized by comprising:
a memory for storing a computer program;
a processor for implementing the steps of the data processing method according to any one of claims 1 to 10 when executing the computer program.
13. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the data processing method according to any one of claims 1 to 10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910697041.3A CN112306974B (en) | 2019-07-30 | Data processing method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910697041.3A CN112306974B (en) | 2019-07-30 | Data processing method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112306974A true CN112306974A (en) | 2021-02-02 |
CN112306974B CN112306974B (en) | 2024-10-22 |
Family
ID=
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113326001A (en) * | 2021-05-20 | 2021-08-31 | 锐掣(杭州)科技有限公司 | Data processing method, device, apparatus, system, medium, and program |
CN113676727A (en) * | 2021-08-18 | 2021-11-19 | 深圳市朗强科技有限公司 | WIFI-based ultra-high-definition video sending and receiving method and device |
CN113885787A (en) * | 2021-06-08 | 2022-01-04 | 荣耀终端有限公司 | Memory management method and electronic equipment |
CN114422608A (en) * | 2021-05-17 | 2022-04-29 | 深圳希施玛数据科技有限公司 | Data transmission method, device and equipment |
WO2022217517A1 (en) * | 2021-04-14 | 2022-10-20 | 华为技术有限公司 | Storage control device and method executed in storage control device |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102291773A (en) * | 2011-07-18 | 2011-12-21 | 电信科学技术研究院 | Data compression method and equipment |
CN102609360A (en) * | 2012-01-12 | 2012-07-25 | 华为技术有限公司 | Data processing method, data processing device and data processing system |
US20130097210A1 (en) * | 2011-10-17 | 2013-04-18 | International Business Machines Corporation | Efficient garbage collection in a compressed journal file |
US20140156609A1 (en) * | 2011-12-06 | 2014-06-05 | International Business Machines Corporation | Database table compression |
WO2015024160A1 (en) * | 2013-08-19 | 2015-02-26 | 华为技术有限公司 | Data object processing method and device |
US20170177602A1 (en) * | 2015-12-16 | 2017-06-22 | International Business Machines Corporation | Compressed data layout with variable group size |
CN108427538A (en) * | 2018-03-15 | 2018-08-21 | 深信服科技股份有限公司 | Storage data compression method, device and the readable storage medium storing program for executing of full flash array |
CN109325006A (en) * | 2018-08-23 | 2019-02-12 | 郑州云海信息技术有限公司 | A kind of method and apparatus for compressing the method and apparatus stored, decompression downloading |
CN109697025A (en) * | 2017-10-20 | 2019-04-30 | 株式会社日立制作所 | The storage medium of storage device, data managing method and data administrator |
CN109756536A (en) * | 2017-11-03 | 2019-05-14 | 株洲中车时代电气股份有限公司 | A kind of method, apparatus and system of data transmission |
CN109802684A (en) * | 2018-12-26 | 2019-05-24 | 华为技术有限公司 | The method and apparatus for carrying out data compression |
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102291773A (en) * | 2011-07-18 | 2011-12-21 | 电信科学技术研究院 | Data compression method and equipment |
US20130097210A1 (en) * | 2011-10-17 | 2013-04-18 | International Business Machines Corporation | Efficient garbage collection in a compressed journal file |
US20140156609A1 (en) * | 2011-12-06 | 2014-06-05 | International Business Machines Corporation | Database table compression |
CN102609360A (en) * | 2012-01-12 | 2012-07-25 | 华为技术有限公司 | Data processing method, data processing device and data processing system |
WO2015024160A1 (en) * | 2013-08-19 | 2015-02-26 | 华为技术有限公司 | Data object processing method and device |
CN105051724A (en) * | 2013-08-19 | 2015-11-11 | 华为技术有限公司 | Data object processing method and device |
US20170177602A1 (en) * | 2015-12-16 | 2017-06-22 | International Business Machines Corporation | Compressed data layout with variable group size |
CN109697025A (en) * | 2017-10-20 | 2019-04-30 | 株式会社日立制作所 | The storage medium of storage device, data managing method and data administrator |
CN109756536A (en) * | 2017-11-03 | 2019-05-14 | 株洲中车时代电气股份有限公司 | A kind of method, apparatus and system of data transmission |
CN108427538A (en) * | 2018-03-15 | 2018-08-21 | 深信服科技股份有限公司 | Storage data compression method, device and the readable storage medium storing program for executing of full flash array |
CN109325006A (en) * | 2018-08-23 | 2019-02-12 | 郑州云海信息技术有限公司 | A kind of method and apparatus for compressing the method and apparatus stored, decompression downloading |
CN109802684A (en) * | 2018-12-26 | 2019-05-24 | 华为技术有限公司 | The method and apparatus for carrying out data compression |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022217517A1 (en) * | 2021-04-14 | 2022-10-20 | 华为技术有限公司 | Storage control device and method executed in storage control device |
CN114422608A (en) * | 2021-05-17 | 2022-04-29 | 深圳希施玛数据科技有限公司 | Data transmission method, device and equipment |
CN114422608B (en) * | 2021-05-17 | 2024-01-26 | 深圳希施玛数据科技有限公司 | Data transmission method, device and equipment |
CN113326001A (en) * | 2021-05-20 | 2021-08-31 | 锐掣(杭州)科技有限公司 | Data processing method, device, apparatus, system, medium, and program |
CN113885787A (en) * | 2021-06-08 | 2022-01-04 | 荣耀终端有限公司 | Memory management method and electronic equipment |
CN113885787B (en) * | 2021-06-08 | 2022-12-13 | 荣耀终端有限公司 | Memory management method and electronic equipment |
CN113676727A (en) * | 2021-08-18 | 2021-11-19 | 深圳市朗强科技有限公司 | WIFI-based ultra-high-definition video sending and receiving method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9612774B2 (en) | Metadata structures for low latency and high throughput inline data compression | |
US10031675B1 (en) | Method and system for tiering data | |
CN108733306B (en) | File merging method and device | |
CN113176857A (en) | Massive small file access optimization method, device, equipment and storage medium | |
CN107209714A (en) | The control method of distributed memory system and distributed memory system | |
CN110532201B (en) | Metadata processing method and device | |
CN111475507B (en) | Key value data indexing method for workload adaptive single-layer LSMT | |
CN112684975A (en) | Data storage method and device | |
US20210141721A1 (en) | System and method for facilitating efficient utilization of nand flash memory | |
US9378214B2 (en) | Method and system for hash key memory reduction | |
CN114564457B (en) | Storage space optimization method and system for database files | |
US11327929B2 (en) | Method and system for reduced data movement compression using in-storage computing and a customized file system | |
CN115756312A (en) | Data access system, data access method, and storage medium | |
CN107423425B (en) | Method for quickly storing and inquiring data in K/V format | |
US20240231657A1 (en) | Data processing method and storage system | |
CN111984651A (en) | Column type storage method, device and equipment based on persistent memory | |
US20210117132A1 (en) | Deep data-compression | |
CN111949222B (en) | Method for data migration in garbage recovery in full-flash disk array | |
CN115480692A (en) | Data compression method and device | |
CN112306974B (en) | Data processing method, device, equipment and storage medium | |
CN108334457B (en) | IO processing method and device | |
CN112306974A (en) | Data processing method, device, equipment and storage medium | |
CN112000289B (en) | Data management method for full flash storage server system and related components | |
CN115878308A (en) | Resource scheduling method and device | |
CN110795034B (en) | Data migration method, device and equipment of storage system and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |