CN112988685A - Data compression and decompression method and device - Google Patents

Data compression and decompression method and device Download PDF

Info

Publication number
CN112988685A
CN112988685A CN202110350481.9A CN202110350481A CN112988685A CN 112988685 A CN112988685 A CN 112988685A CN 202110350481 A CN202110350481 A CN 202110350481A CN 112988685 A CN112988685 A CN 112988685A
Authority
CN
China
Prior art keywords
data
file
programmable device
compressed
field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110350481.9A
Other languages
Chinese (zh)
Inventor
杨俊�
李嘉树
卢冕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
4Paradigm Beijing Technology Co Ltd
Original Assignee
4Paradigm Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 4Paradigm Beijing Technology Co Ltd filed Critical 4Paradigm Beijing Technology Co Ltd
Priority to CN202110350481.9A priority Critical patent/CN112988685A/en
Publication of CN112988685A publication Critical patent/CN112988685A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Stored Programmes (AREA)

Abstract

A data compression and decompression method and apparatus thereof are provided. The data compression method comprises the following steps: receiving, by the configured programmable device, data to be compressed from a database; compressing the data to be compressed by the programmable device to obtain compressed data; writing, by the programmable device, the compressed data to a file in persistent storage.

Description

Data compression and decompression method and device
Technical Field
The present application relates generally to the field of data processing, and more particularly, to data compression and decompression methods and apparatuses thereof.
Background
At present, most of the compression methods for databases based on persistent storage devices adopt a CPU-based compression method implemented inside a database system, as shown in fig. 1, in which case users cannot flexibly change compression algorithms without modifying database system codes and cannot release CPU resources. When the CPU performs data compression, a large amount of CPU computing resources are occupied, so that the performance of the entire system is affected.
In addition, in the system shown in fig. 1, the user may also adopt a method of data compression inside by modifying the file system without modifying the database system, however, in practical use, this method limits the database system to only run on the operating system having this file system. In addition, this method is also not advantageous for flexibly changing the compression algorithm, since it means that the highest privilege is required to modify the operating system, which is very costly.
Disclosure of Invention
An exemplary embodiment of the present invention is to provide a data compression and decompression method and apparatus thereof to solve at least the above-mentioned problems of the prior art.
According to an exemplary embodiment of the present invention, there is provided a data compression method including: receiving, by the configured programmable device, data to be compressed from a database; compressing the data to be compressed by the programmable device to obtain compressed data; writing, by the programmable device, the compressed data to a file in persistent storage.
Optionally, the step of receiving, by the programmable device, the data to be compressed from a database may include: and receiving the data to be compressed from the database in a predetermined unit through a POSIX file system interface by the programmable device.
Optionally, the step of writing, by the programmable device, the compressed data into the file in persistent storage may comprise: writing, by the programmable device, the compressed data into the file through a POSIX file system interface.
Optionally, the step of writing, by the programmable device, the compressed data into a file in persistent storage may comprise: according to the information about the position in the file which needs to be accessed by the POSIX file system interface and the unit used when the database accesses the file, retrieving a position field and a size field corresponding to the data to be compressed from a lookup table of the file; writing the compressed data into the file according to the location field and the size field.
Optionally, the step of writing the compressed data into the file according to the location field and the size field includes: setting the value of the location field to the location of the end of the file, setting the value of the size field to the size of the compressed data, and writing the compressed data starting from the location in the file indicated by the value of the location field, if the values of the location field and the size field are initial values; if the values of the location field and the size field are not initial values, final values of the size field and the location field are determined according to a comparison result between the value of the size field and the size of the compressed data, and the compressed data is written starting from a location in the file indicated by the value of the location field.
Alternatively, the step of determining the final values of the size field and the position field according to the comparison result between the value of the size field and the size of the compressed data may include: updating the value of the size field to the size of the compressed data if the value of the size field is greater than or equal to the size of the compressed data; and if the value of the size field is smaller than the size of the compressed data, updating the value of the position field to the position of the tail of the file, and updating the value of the size field to the size of the compressed data.
According to an exemplary embodiment of the present invention, there is provided a data decompression method, which may include: reading, by the configured programmable device, the compressed data from the file in the persistent storage; decompressing, by the programmable device, the read compressed data to obtain decompressed data; providing, by the programmable device, the decompressed data to a database.
Optionally, the step of reading, by the programmable device, compressed data from the file in persistent storage may comprise: reading, by the programmable device, the compressed data from the file in persistent storage through a POSIX file system interface.
Optionally, the step of providing, by the programmable device, the decompressed data to a database may comprise: providing, by the programmable device, the decompressed data to a database through a POSIX file system interface.
Optionally, the step of reading, by the programmable device, the compressed data from the file in the persistent storage may comprise: retrieving a location field and a size field corresponding to the compressed data from a lookup table of the file according to information about a location in the file that a POSIX file system interface needs to access and a unit used when the database accesses the file; reading the compressed data from the file according to the location field and the size field.
According to an exemplary embodiment of the present invention, there is provided a data compression apparatus, including: a programmable device configured to: receiving data to be compressed from a database; compressing the data to be compressed to obtain compressed data; and writing the compressed data to a file in a persistent storage.
Optionally, the programmable device may be configured to receive the data to be compressed from the database in predetermined units through a POSIX file system interface.
Optionally, the programmable device may be configured to write the compressed data into the file through a POSIX file system interface.
Optionally, the programmable device may be configured to write the compressed data to a file in persistent storage by: according to the information about the position in the file which needs to be accessed by the POSIX file system interface and the unit used when the database accesses the file, retrieving a position field and a size field corresponding to the data to be compressed from a lookup table of the file; writing the compressed data into the file according to the location field and the size field.
Optionally, the programmable device may be configured to write the compressed data into the file according to the location field and the size field by: setting the value of the location field to the location of the end of the file, setting the value of the size field to the size of the compressed data, and writing the compressed data starting from the location in the file indicated by the value of the location field, if the values of the location field and the size field are initial values; if the values of the location field and the size field are not initial values, final values of the size field and the location field are determined according to a comparison result between the value of the size field and the size of the compressed data, and the compressed data is written starting from a location in the file indicated by the value of the location field.
Optionally, the programmable device may be configured to determine the final value of the size field and the location field by: updating the value of the size field to the size of the compressed data if the value of the size field is greater than or equal to the size of the compressed data; and if the value of the size field is smaller than the size of the compressed data, updating the value of the position field to the position of the tail of the file, and updating the value of the size field to the size of the compressed data.
According to an exemplary embodiment of the present invention, there is provided a data decompression apparatus, which may include: a programmable device configured to: reading compressed data from a file in a persistent storage; decompressing the read compressed data to obtain decompressed data; and provides the decompressed data to the database.
Optionally, the programmable device may be configured to read the compressed data from the file in persistent storage through a POSIX file system interface.
Optionally, the programmable device may be configured to provide the decompressed data to a database through a POSIX file system interface.
Optionally, the programmable device may be configured to read compressed data from a file in persistent storage by: retrieving a location field and a size field corresponding to the compressed data from a lookup table of the file according to information about a location in the file that a POSIX file system interface needs to access and a unit used when the database accesses the file; reading the compressed data from the file according to the location field and the size field.
According to the data compression and decompression method and the data compression and decompression device, the programmable device can be used for data compression and decompression, the storage space of a hard disk can be increased, meanwhile, the calculation resources of a database system using a CPU are not affected, and further the highest performance is obtained. In addition, the data compression and decompression method and the device thereof according to the exemplary embodiment of the present application can redirect the call of the database system to the POSIX file system interface to be taken over by the programmable device, so that after the programmable device compresses (writes files) or decompresses (reads files) the data, the programmable device reads and writes files in the downstream file system through the POSIX file system interface. According to the data compression and decompression method and the data compression and decompression device, the compression and decompression operation originally performed by the CPU can be realized by the programmable device without modifying an upstream database system and a downstream file system, so that CPU resources are released, and flexible change of a compression algorithm can be facilitated. In other words, the data compression and decompression method and apparatus thereof according to the exemplary embodiments of the present application can perform lossless compression or decompression on all data of an arbitrary hard disk database using a programmable device without being aware of both the database system and the file system. In addition, the data compression and decompression method and apparatus thereof according to the exemplary embodiments of the present application may be applicable to any database and any POSIX file system that use a POSIX file system interface to read and write data, and have portability to run on different operating systems since the data compression and decompression method and apparatus thereof according to the exemplary embodiments of the present application use a POSIX file system interface.
Additional aspects and/or advantages of the present general inventive concept will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the general inventive concept.
Drawings
These and/or other aspects and advantages of the present application will become more apparent and more readily appreciated from the following detailed description of the embodiments of the present application, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a diagram illustrating a prior art computer architecture;
FIG. 2 is a diagram illustrating a computer architecture according to an exemplary embodiment of the present application;
FIG. 3 is a flowchart illustrating a data compression method according to an exemplary embodiment of the present application;
fig. 4 is a flowchart illustrating a data decompression method according to an exemplary embodiment of the present application;
FIG. 5 is a block diagram illustrating a data compression apparatus according to an exemplary embodiment of the present application;
fig. 6 is a block diagram illustrating a data decompression apparatus according to an exemplary embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings. The embodiments are described below in order to explain the present invention by referring to the figures.
To facilitate a better understanding of the present application, the general inventive concept will first be described, which adds a programmable device to the conventional computer architecture shown in fig. 1 to help a database system perform a compression of data before writing to a file in the file system and to perform a write operation of compressing data to a file in the file system. As shown in fig. 2, the upstream database system does not need to be modified, but still reads and writes data from and to files in the file system in the hard disk through a POSIX file system interface (i.e., POSIX file system API), but the configured programmable device takes over the execution work of these APIs, and when executing the write operation of files in the file system, the programmable device is first used to receive data from the database system, then the received data is compressed to generate compressed data, and then the compressed data is written to files in the downstream file system through the POSIX file system API; when the reading operation of reading data from the file in the file system is executed, the programmable device can be used for decompressing the compressed data in the file system, and then the decompressed original data is returned to the database system for use, wherein the programmable device can be solidified with a program for executing the operation through burning operation. Next, a data compression method according to an exemplary embodiment of the present application is first described with reference to fig. 2 and 3.
Fig. 3 is a flowchart illustrating a data compression method according to an exemplary embodiment of the present application.
As shown in fig. 3, at step S310, data to be compressed is received by the configured programmable device from a database. Wherein, the database can save all data needing to be saved in the persistent storage device by calling a file operation interface provided by the POSIX file system so as to ensure the recoverability of the data. In addition, the programmable device can be a programmable gate array (FPGA), a complex programmable device (CPLD), a General Array Logic (GAL), and the like, which can be directly inserted into a PCIe slot of a conventional computer for use, and after being programmed according to the logic of the present application, the programmable device can be directly inserted into the PCIe slot to communicate with devices such as a Central Processing Unit (CPU) of the conventional computer.
The step of receiving, by the programmable device, the data to be compressed from a database may comprise: and receiving the data to be compressed from the database in a predetermined unit through a POSIX file system interface by the programmable device.
Specifically, as shown in fig. 2, when the database system performs a write operation to a file in the file system, the database system calls the POSIX file system interface to write data to the file system, in which process the programmable device takes over the execution work of the POSIX file system interface, and the programmable device obtains the data to be compressed from the database system through the POSIX file system interface using fixed-size pages (pages).
In step S320, the programmable device compresses the data to be compressed to obtain compressed data.
In step S330, the compressed data is written by the programmable device into a file in a persistent storage device, wherein the persistent storage device is a persistent block device (e.g., a conventional hard disk HDD or a solid state disk SSD).
In particular, the step of writing, by the programmable device, the compressed data to a file in persistent storage may comprise: according to the information about the position in the file which needs to be accessed by the POSIX file system interface and the unit used when the database accesses the file, retrieving a position field and a size field corresponding to the data to be compressed from a lookup table of the file; writing the compressed data into the file according to the location field and the size field.
Specifically, the file system maintains a lookup table in units of files for holding specific locations of requested data pages in the file holding the compressed data and length information of the compressed data, so that the programmable device can correctly write the compressed data into the file and correctly read the compressed data in the file for decompression. Each row of the lookup table has a row number (PageID) that represents the serial number of the original data page, which does not need to be stored. There are two fields for each row (numbered from 0) of the lookup table, the first field (i.e., the location field) represents the starting location of the read data page (initial value-1) within the file holding the compressed data, and is in bytes; the second field (i.e., size field) represents the size of the compressed data (initial value is-1) and is in bytes. In addition, at the beginning of the file, the fixed size unit (e.g., denoted by M) in which the file is accessed by the database, i.e., the unit used by the database to access the file, is also saved, since the POSIX file system interface usually indicates in bytes which location (e.g., denoted by K) in the file needs to be accessed, i.e., information about the location in the file that the POSIX file system interface needs to access. This position corresponds to a certain line (PageID) in the look-up table and can be calculated by equation 1 below:
PageID=[K/M] (1)
that is, PageID is equal to the value of K divided by M rounded. Accordingly, the location field and the size field corresponding to the data to be compressed can be retrieved from the lookup table of the file by equation 1. For example, table 1 shows an example of a lookup table, and if PageID ═ K/M ═ 1, the value of the location field may be retrieved from the lookup table as 4285 and the value of the size field is 3876, according to PageID ═ 1.
[ Table 1]
PageID Location field (byte) Size field (byte)
0 0 4285
1 4285 3876
2 8161 3958
3 12119 4007
Writing the compressed data into the file according to the location field and the size field may include: setting the value of the location field to the location of the end of the file, setting the value of the size field to the size of the compressed data, and writing the compressed data starting from the location in the file indicated by the value of the location field, if the values of the location field and the size field are initial values; if the values of the location field and the size field are not initial values, final values of the size field and the location field are determined according to a comparison result between the value of the size field and the size of the compressed data, and the compressed data is written starting from a location in the file indicated by the value of the location field.
This is explained in detail below by way of example. For example, assuming that a database reads and writes a file in 8192-byte (8KB) fixed page units (i.e., M is 8192 bytes), when the database writes an 8KB (8 × 1024) data page at 8192 bytes of the file through a POSIX file system interface, the programmable device obtains PageID [8192/8192] 1 by calculation using equation 1, and searches the lookup table according to PageID [ 1] to determine the values of the location field and the size field.
At this time, there may be the following two cases. The first case is that both the determined location field and size field are initial values, which represents the first write to this data page, so the value of the location field is set to the location of the end of the file, and the value of the size field is set to the size of the data page after compression, after which the programmable device can start writing the compressed data from the location in the file indicated by the value of the location field. For example, it is assumed that the location field and the size field corresponding to PageID 0 of the lookup table are 0 and 4285, respectively, and the values of the location field and the size field corresponding to PageID 1 are both initial values before the current data page is written, in other words, the location of the end of the file is 4285 before the current data page is written, so when the current data page is written, the location field corresponding to PageID 1 may be set to 4285, and the size field corresponding to PageID 1 may be set to the size of the data page after compression, and finally the compressed data is written from the location indicated by the value of the location field in the file.
The second case is that neither the determined position field nor the size field is an initial value, which represents that the data page is updated and written, and at this time, the final values of the size field and the position field need to be determined according to the comparison result between the value of the size field and the size of the compressed data of the current data page after compression. Specifically, the step of determining the final values of the size field and the position field according to the comparison result between the value of the size field and the size of the compressed data may include: updating the value of the size field to the size of the compressed data if the value of the size field is greater than or equal to the size of the compressed data; if the value of the size field is smaller than the size of the compressed data, the value of the location field is updated to the location of the end of file of the file, and the value of the size field is updated to the size of the compressed data. For example, when the values of the position field and the size field corresponding to PageID ═ 1 are 4285 and 3876 in table 1, respectively, and the size of the compressed data after the programmable device compresses the data page of 8KB is size _ new, if 3876 is greater than or equal to size _ new, the programmable device updates the size field corresponding to PageID ═ 1 to the size of the compressed data, and the value of the position field corresponding to PageID ═ 1 is not changed; if 3876 is less than size _ new, it indicates that the 3876 bytes of space from the value of the location field corresponding to PageID 1 in the file cannot hold the compressed data of the current data page, so the programmable device updates the value of the location field corresponding to PageID 1 to the location of the file tail of the file, and updates the value of the size field corresponding to PageID 1 to the size of the compressed data of the current data page, i.e., size _ new, where a part of space will be released at the location indicated by the value of the original location field in the file, and the space can be reclaimed and reused using various existing reclamation and reuse methods. Finally, the compressed data is written starting from the location in the file indicated by the value of the location field.
Further, while the programmable device writes the compressed data to the file in persistent storage, the programmable device writes the compressed data to the file through a POSIX file system interface.
Fig. 4 is a flowchart illustrating a data decompression method according to an exemplary embodiment of the present application.
At step S410, the compressed data is read from the file in the persistent storage by the configured programmable device. The step of reading, by the programmable device, compressed data from the file in persistent storage may comprise: reading, by the programmable device, the compressed data from the file in persistent storage through a POSIX file system interface. Specifically, as shown in FIG. 2, when the database system reads compressed data from a file in the file system, the database system calls the POSIX file system interface to read to the file system, during which the programmable device takes over the execution work of the POSIX file system interface, and the programmable device takes the compressed data from the file system through the POSIX file system interface using fixed-size pages (pages).
In particular, the step of reading compressed data from a file in persistent storage by the FGPA may include: retrieving a location field and a size field corresponding to the compressed data from a lookup table of the file according to information about a location in the file that a POSIX file system interface needs to access and a unit used when the database accesses the file; reading the compressed data from the file according to the location field and the size field. Since the information about the location in the file that the POSIX file system interface needs to access and the units used by the database to access the file have been described in detail above, this will not be described again. This is described in detail below by way of example.
For example, when a database system reads an 8KB data page at 16384 bytes of a file through a POSIX file system interface, the programmable device first calculates a row number in a lookup table, i.e., PageID ═ 16384/8192 ═ 2, from information about a position in the file that the POSIX file system interface needs to access (i.e., K ═ 16384) and a unit used when the database accesses the file (i.e., M ═ 8192), and then retrieves a position field and a size field corresponding to compressed data of the data page from the lookup table according to PageID ═ 2, e.g., the position field and the size field corresponding to PageID ═ 2 are 8161 and 3958, respectively, from table 1 above; thereafter, the programmable device reads 3958 bytes of compressed data starting at the location in the file indicated by the value of the location field.
Thereafter, in step S420, the read compressed data is decompressed by the programmable device to obtain decompressed data. At this time, the programmable device decompresses the compressed data according to a process reverse to the compression process to obtain decompressed data, i.e., to obtain the original 8KB data page.
The decompressed data is then provided to a database by the programmable device in step S430. In particular, the programmable device may provide the decompressed data to the database through the POSIX file system interface, i.e., return the original 8KB page of data to the upstream database, thereby completing the read operation.
Fig. 5 is a block diagram illustrating a data compression apparatus 500 according to an exemplary embodiment of the present application.
As shown in fig. 5, the data compression apparatus 500 may include a programmable device 510, wherein the programmable device 510 may be a programmable gate array (FPGA), a complex programmable device (CPLD), a General Array Logic (GAL), or the like, which may be directly plugged into a PCIe slot of a conventional computer for use, and may be directly plugged into the PCIe slot after being programmed according to the logic of the present application to communicate with a device such as a Central Processing Unit (CPU) of the conventional computer.
Programmable device 510 may be configured to receive data to be compressed from a database. In particular, the programmable device 510 may be configured to receive the data to be compressed from the database in predetermined units through the POSIX file system interface.
Specifically, as shown in fig. 2, when the database system performs a write operation to a file in the file system, the database system calls the POSIX file system interface to write data to the file system, in which process the programmable device 510 may take over the execution work of the POSIX file system interface, and the programmable device 510 may obtain data to be compressed from the database system through the POSIX file system interface using fixed-size pages (pages).
Programmable device 510 may be configured to compress the data to be compressed to obtain compressed data and write the compressed data to a file in persistent storage.
In particular, programmable device 510 may be configured to write compressed data to a file in persistent storage by: according to the information about the position in the file which needs to be accessed by the POSIX file system interface and the unit used when the database accesses the file, retrieving a position field and a size field corresponding to the data to be compressed from a lookup table of the file; writing the compressed data into the file according to the location field and the size field. Since the information about the location in the file that the POSIX file system interface needs to access and the unit used by the database to access the file, and how to retrieve the location field and the size field corresponding to the data to be compressed according to their lookup tables have been described in detail above with reference to fig. 3 and 4, this is not repeated here.
The programmable device 510 may be configured to write the compressed data into the file according to the location field and the size field by: setting the value of the location field to the location of the end of the file, setting the value of the size field to the size of the compressed data, and writing the compressed data starting from the location in the file indicated by the value of the location field, if the values of the location field and the size field are initial values; if the values of the location field and the size field are not initial values, final values of the size field and the location field are determined according to a comparison result between the value of the size field and the size of the compressed data, and the compressed data is written starting from a location in the file indicated by the value of the location field.
In particular, programmable device 510 may be configured to determine the final value of the size field and the location field by: updating the value of the size field to the size of the compressed data if the value of the size field is greater than or equal to the size of the compressed data; and if the value of the size field is smaller than the size of the compressed data, updating the value of the position field to the position of the tail of the file, and updating the value of the size field to the size of the compressed data. Since this has already been described in detail above with reference to fig. 4, it is not described here in detail.
Further, when programmable device 510 writes the compressed data to the file in persistent storage, programmable device 510 may write the compressed data to the file through a POSIX file system interface.
Fig. 6 is a block diagram illustrating a data decompression apparatus 600 according to an exemplary embodiment of the present application.
As shown in fig. 6, the data decompression apparatus 600 may include a programmable device 610. The programmable device 610 may be the same or similar elements as the programmable device 510 in fig. 5, that is, the programmable device 610 may be a device that can be directly inserted into a PCIe slot after being programmed according to the logic of the present application to communicate with a device such as a Central Processing Unit (CPU) of a conventional computer, for example, a programmable gate array (FPGA), a complex programmable device (CPLD), a Generic Array Logic (GAL), or the like that can be directly inserted into a PCIe slot of a conventional computer.
The programmable device 610 may be configured to read compressed data from a file in persistent storage. Specifically, as shown in FIG. 2, when the database system reads compressed data from a file in the file system, the database system calls the POSIX file system interface to read operations to the file system, during which the programmable device 610 takes over the execution work of the POSIX file system interface, and the programmable device 610 then takes a fixed size Page (Page) to obtain the compressed data from the file system through the POSIX file system interface.
In particular, the programmable device 610 may be configured to read compressed data from a file in persistent storage by: retrieving a location field and a size field corresponding to the compressed data from a lookup table of the file according to information about a location in the file that a POSIX file system interface needs to access and a unit used when the database accesses the file; reading the compressed data from the file according to the location field and the size field. Since this has already been described in detail above with reference to fig. 4, it is not described here in detail.
Programmable device 610 may be configured to decompress the read compressed data to obtain decompressed data. At this time, the programmable device 610 may decompress the compressed data to obtain decompressed data, i.e., obtain the original data page, according to a reverse process of the compression process.
Programmable device 610 may be configured to provide the decompressed data to a database. In particular, programmable device 610 may provide the decompressed data to the database through the POSIX file system interface, i.e., return the original data page to the upstream database, thereby completing the read operation.
In the above description, the mentioned programmable device may be configured with a program that performs the above-described method of the present application. Further, in the above description, the data compression apparatus 500 and the data decompression apparatus 600 are separately described with reference to fig. 5 and 6, respectively, however, the data compression apparatus 500 in fig. 5 and the data decompression apparatus 600 in fig. 6 may be implemented by one programmable device, in other words, one programmable device may be configured to be able to perform both the functions implemented by the programmable device 510 in the data compression apparatus 500 in fig. 5 and the functions implemented by the programmable device 610 in the data decompression apparatus 600 in fig. 6, and the one programmable device may be integrated into the programmable device-based compression engine shown in fig. 2. Alternatively, the data compression apparatus 500 of fig. 5 and the data decompression apparatus 600 of fig. 6 may be integrated into the programmable device-based compression engine shown in fig. 2, respectively.
The data compression method and the data decompression method and the device thereof according to the exemplary embodiments of the present application described above can use a programmable device to perform data compression and decompression, and can improve the storage space of the persistent storage device while leaving the computation resources of the CPU used by the database system unaffected, thereby obtaining the highest performance. In addition, according to the data compression method and the data decompression method and the device thereof in the exemplary embodiment of the application, the call of the database system to the POSIX file system interface can be redirected to be taken over by the programmable device, so that after the programmable device compresses (writes files) or decompresses (reads files) the data, the files in the downstream file system are read and written through the POSIX file system interface. The data compression and decompression method and apparatus thereof according to the exemplary embodiments of the present application can release CPU resources without any modification to both the upstream database system and the downstream file system, and can facilitate flexible changes to the compression algorithm. In other words, the data compression and decompression method and apparatus thereof according to the exemplary embodiments of the present application can perform lossless compression or decompression on all data of an arbitrary hard disk database using a programmable device without being aware of both the database system and the file system. In addition, the data compression and decompression method and apparatus thereof according to the exemplary embodiments of the present application may be applicable to any database and any POSIX file system that use a POSIX file system interface to read and write data, and have portability to run on different operating systems since the data compression and decompression method and apparatus thereof according to the exemplary embodiments of the present application use a POSIX file system interface.
While exemplary embodiments of the invention have been described above, it should be understood that the above description is illustrative only and not exhaustive, and that the invention is not limited to the exemplary embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. Therefore, the protection scope of the present invention should be subject to the scope of the claims.

Claims (10)

1. A data compression method, the data compression method comprising:
receiving, by the configured programmable device, data to be compressed from a database;
compressing the data to be compressed by the programmable device to obtain compressed data;
writing, by the programmable device, the compressed data to a file in persistent storage.
2. A method of data compression as claimed in claim 1 in which the step of receiving the data to be compressed by the programmable device from a database comprises:
and receiving the data to be compressed from the database in a predetermined unit through a POSIX file system interface by the programmable device.
3. A method of data compression as claimed in claim 1 in which the step of writing the compressed data by the programmable device to the file in persistent storage comprises:
writing, by the programmable device, the compressed data into the file through a POSIX file system interface.
4. A method of data compression as claimed in claim 1 in which the step of writing the compressed data by the programmable device to a file in persistent storage comprises:
according to the information about the position in the file which needs to be accessed by the POSIX file system interface and the unit used when the database accesses the file, retrieving a position field and a size field corresponding to the data to be compressed from a lookup table of the file;
writing the compressed data into the file according to the location field and the size field.
5. A method of data compression as claimed in claim 4 in which writing the compressed data to the file in dependence on the location field and the size field comprises:
setting the value of the location field to the location of the end of the file, setting the value of the size field to the size of the compressed data, and writing the compressed data starting from the location in the file indicated by the value of the location field, if the values of the location field and the size field are initial values;
if the values of the location field and the size field are not initial values, final values of the size field and the location field are determined according to a comparison result between the value of the size field and the size of the compressed data, and the compressed data is written starting from a location in the file indicated by the value of the location field.
6. The data compression method of claim 5, wherein determining final values of the size field and the location field according to a comparison result between the value of the size field and the size of the compressed data comprises:
updating the value of the size field to the size of the compressed data if the value of the size field is greater than or equal to the size of the compressed data;
and if the value of the size field is smaller than the size of the compressed data, updating the value of the position field to the position of the tail of the file, and updating the value of the size field to the size of the compressed data.
7. A method of data decompression, the method of data decompression comprising:
reading, by the configured programmable device, the compressed data from the file in the persistent storage;
decompressing, by the programmable device, the read compressed data to obtain decompressed data;
providing, by the programmable device, the decompressed data to a database.
8. The data decompression method of claim 7, wherein reading, by the programmable device, compressed data from the file in persistent storage comprises:
reading, by the programmable device, the compressed data from the file in persistent storage through a POSIX file system interface.
9. An apparatus for data compression, the apparatus comprising: a programmable device configured to:
receiving data to be compressed from a database;
compressing the data to be compressed to obtain compressed data; and is
The compressed data is written to a file in a persistent storage.
10. A data decompression apparatus, the data decompression apparatus comprising: a programmable device configured to:
reading compressed data from a file in a persistent storage;
decompressing the read compressed data to obtain decompressed data; and is
The decompressed data is provided to a database.
CN202110350481.9A 2021-03-31 2021-03-31 Data compression and decompression method and device Pending CN112988685A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110350481.9A CN112988685A (en) 2021-03-31 2021-03-31 Data compression and decompression method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110350481.9A CN112988685A (en) 2021-03-31 2021-03-31 Data compression and decompression method and device

Publications (1)

Publication Number Publication Date
CN112988685A true CN112988685A (en) 2021-06-18

Family

ID=76338739

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110350481.9A Pending CN112988685A (en) 2021-03-31 2021-03-31 Data compression and decompression method and device

Country Status (1)

Country Link
CN (1) CN112988685A (en)

Similar Documents

Publication Publication Date Title
CN108427538B (en) Storage data compression method and device of full flash memory array and readable storage medium
CN108427539B (en) Offline de-duplication compression method and device for cache device data and readable storage medium
JP5211751B2 (en) Calculator, dump program and dump method
EP2965189B1 (en) Managing operations on stored data units
US20050246362A1 (en) System and method for dynamci log compression in a file system
JP3416502B2 (en) Array maintenance method
CN111125033B (en) Space recycling method and system based on full flash memory array
CN110941514B (en) Data backup method, data recovery method, computer equipment and storage medium
CN112182010B (en) Dirty page refreshing method and device, storage medium and electronic equipment
CN110888603A (en) High-concurrency data writing method and device, computer equipment and storage medium
EP4154406A1 (en) Compression/decompression using index correlating uncompressed/compressed content
KR102275240B1 (en) Managing operations on stored data units
KR20150035876A (en) Method for de-duplicating data and apparatus therefor
EP3903193A1 (en) Compressing data for storage in cache memories in a hierarchy of cache memories
CN111124939A (en) Data compression method and system based on full flash memory array
US6654867B2 (en) Method and system to pre-fetch compressed memory blocks using pointers
CN112988685A (en) Data compression and decompression method and device
JP5709903B2 (en) METHOD, SYSTEM, COMPUTER PROGRAM, RECORDING MEDIUM, DATA STORAGE MEDIUM STORING DATA COLLECTION, AND CALL DATA RECORDING SYSTEM FOR COMPRESSING DATA RECORD AND PROCESSING COMPRESSED DATA RECORD
CN113377391B (en) Method, device, equipment and medium for making and burning image file
US6694393B1 (en) Method and apparatus for compressing information for use in embedded systems
CN112506651B (en) Method and equipment for data operation in large-data-volume environment
CN114003573A (en) Compression method, device, equipment, storage medium and program product of file system
US6510499B1 (en) Method, apparatus, and article of manufacture for providing access to data stored in compressed files
JP3171160B2 (en) Compressed file server method
CN113641643A (en) File writing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination