CN117111845A - Data compression method, device, equipment and storage medium - Google Patents
Data compression method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN117111845A CN117111845A CN202311053461.0A CN202311053461A CN117111845A CN 117111845 A CN117111845 A CN 117111845A CN 202311053461 A CN202311053461 A CN 202311053461A CN 117111845 A CN117111845 A CN 117111845A
- Authority
- CN
- China
- Prior art keywords
- compression
- target
- map output
- intermediate data
- format
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000013144 data compression Methods 0.000 title claims abstract description 45
- 238000007906 compression Methods 0.000 claims abstract description 457
- 230000006835 compression Effects 0.000 claims abstract description 457
- 238000013507 mapping Methods 0.000 claims abstract description 10
- 230000009467 reduction Effects 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 9
- 230000004044 response Effects 0.000 description 7
- 230000005540 biological transmission Effects 0.000 description 5
- 230000006837 decompression Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The present disclosure provides a data compression method, apparatus, device, and storage medium, where the method includes: and responding to a compression request aiming at target intermediate data output in a Map mapping stage, acquiring compression parameters corresponding to the target intermediate data from a pre-configured file, taking the compression parameters corresponding to the target intermediate data as first compression parameters, determining a target Map output compression format from a plurality of Map output compression formats configured in the first compression parameters, and compressing the target intermediate data based on the target Map output compression format. Therefore, the compression parameters corresponding to the target intermediate data are obtained from the pre-configured file, the target Map output compression format is determined based on the compression parameters corresponding to the target intermediate data, and further the compression of the target intermediate data is achieved based on the target Map output compression format, so that the compression mode of the target intermediate data is enriched, and the use experience of a user is improved.
Description
Technical Field
The present disclosure relates to the field of data processing, and in particular, to a data compression method, apparatus, device, and storage medium.
Background
With the advent of the big data age, the data volume has grown rapidly, and map reduction MapReduce technology has been widely used and developed.
In the Map stage of MapReduce, the network transmission efficiency can be improved by compressing the intermediate data output in the Map stage.
However, the compression mode of the intermediate data output in the Map stage is single, and thus, the use requirement of the user cannot be met, so that how to enrich the compression mode of the intermediate data output in the Map stage to improve the use experience of the user is a technical problem to be solved.
Disclosure of Invention
In order to solve the above technical problems, an embodiment of the present disclosure provides a data compression method.
In a first aspect, the present disclosure provides a data compression method, the method comprising:
responding to a compression request of target intermediate data output in a Map mapping stage, acquiring compression parameters corresponding to the target intermediate data from a pre-configured file, and taking the compression parameters corresponding to the target intermediate data as first compression parameters; wherein, a plurality of Map output compression formats are configured in the first compression parameters;
determining a target Map output compression format from a plurality of Map output compression formats configured in the first compression parameters;
compressing the target intermediate data based on the target Map output compression format to obtain compressed target intermediate data; the compressed target intermediate data is used for being transmitted to a reduction stage.
In an optional implementation manner, after compressing the target intermediate data based on the target Map output compression format to obtain compressed target intermediate data, the method further includes:
responding to a compression request of target result data output by the Reduce stage, acquiring compression parameters corresponding to the target result data from the preconfigured file, and taking the compression parameters corresponding to the target result data as second compression parameters; the target result data is obtained by processing the compressed target intermediate data in the Reduce stage, and a plurality of Reduce output compression formats are configured in the second compression parameters;
determining a target Reduce output compression format from a plurality of Reduce output compression formats configured in the second compression parameters;
compressing the target result data based on the target Reduce output compression format to obtain compressed target processing result data; the target processing result data is used for being transmitted to a storage system.
In an optional implementation manner, before determining the target Map output compression format from the multiple Map output compression formats configured in the first compression parameter, the method further includes:
displaying a plurality of Map output compression formats configured in the first compression parameters;
accordingly, the determining the target Map output compression format from the multiple Map output compression formats configured in the first compression parameter includes:
responding to a selected operation of multiple Map output compression formats configured in the first compression parameters, and taking the Map output compression format corresponding to the selected operation as a target Map output compression format.
In an optional implementation manner, the determining the target Map output compression format from the multiple Map output compression formats configured in the first compression parameter includes:
and determining a target Map output compression format from the plurality of Map output compression formats configured in the first compression parameters according to the arrangement sequence of the plurality of Map output compression formats configured in the first compression parameters.
In an optional implementation manner, the determining the target Map output compression format from the multiple Map output compression formats configured in the first compression parameter includes:
and determining a target Map output compression format from a plurality of Map output compression formats configured in the first compression parameters based on a preset sequence.
In an alternative embodiment, the plurality of Map output compression formats includes a snappy format and a lzo format.
In an alternative embodiment, the plurality of Reduce output compression formats includes lzo format and Bzip2 format.
In a second aspect, the present disclosure provides a data compression apparatus, the apparatus comprising:
the first acquisition module is used for responding to a compression request of target intermediate data output in a Map mapping stage, acquiring compression parameters corresponding to the target intermediate data from a pre-configured file, and taking the compression parameters corresponding to the target intermediate data as first compression parameters; wherein, a plurality of Map output compression formats are configured in the first compression parameters;
the first determining module is used for determining a target Map output compression format from a plurality of Map output compression formats configured in the first compression parameters;
the first compression module is used for compressing the target intermediate data based on the target Map output compression format to obtain compressed target intermediate data; the compressed target intermediate data is used for being transmitted to a reduction stage.
In a third aspect, the present disclosure provides a computer readable storage medium having instructions stored therein, which when run on a terminal device, cause the terminal device to implement the above-described method.
In a fourth aspect, the present disclosure provides a data compression apparatus comprising: the computer program comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the method when executing the computer program.
In a fifth aspect, the present disclosure provides a computer program product comprising computer programs/instructions which when executed by a processor implement the above-described method.
Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has at least the following advantages:
the embodiment of the disclosure provides a data compression method, which is used for responding to a compression request of target intermediate data output in a Map stage, acquiring compression parameters corresponding to the target intermediate data from a pre-configured file, taking the compression parameters corresponding to the target intermediate data as first compression parameters, configuring a plurality of Map output compression formats in the first compression parameters, determining the target Map output compression format from the plurality of Map output compression formats configured in the first compression parameters, compressing the target intermediate data based on the target Map output compression format, and obtaining compressed target intermediate data, wherein the compressed target intermediate data is used for being transmitted to a reduction stage. Therefore, according to the embodiment of the disclosure, the compression parameters corresponding to the target intermediate data are obtained from the pre-configured file, the target Map output compression format is determined based on the compression parameters corresponding to the target intermediate data, and further the compression of the target intermediate data is achieved based on the target Map output compression format, so that the compression mode of the target intermediate data is enriched, and the use experience of a user is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, the drawings that are required for the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
Fig. 1 is a flowchart of a data compression method according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of another data compression method provided by an embodiment of the present disclosure;
FIG. 3 is a flow chart of yet another data compression method provided by an embodiment of the present disclosure;
FIG. 4 is a flow chart of yet another data compression method provided by an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a data compression device according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a data compression device according to an embodiment of the present disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, a further description of aspects of the present disclosure will be provided below. It should be noted that, without conflict, the embodiments of the present disclosure and features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the disclosure.
In order to enrich the compression mode of target intermediate data so as to improve the use experience of users, the embodiment of the disclosure provides a data compression method.
Specifically, in response to a compression request for target intermediate data output in a Map stage, obtaining compression parameters corresponding to the target intermediate data from a pre-configured file, and taking the compression parameters corresponding to the target intermediate data as first compression parameters, wherein multiple Map output compression formats are configured in the first compression parameters, then the target Map output compression format is determined from the multiple Map output compression formats configured in the first compression parameters, the target intermediate data is compressed based on the target Map output compression format, and compressed target intermediate data is obtained, wherein the compressed target intermediate data is used for being transmitted to a reduction stage. Therefore, according to the embodiment of the disclosure, the compression parameters corresponding to the target intermediate data are obtained from the pre-configured file, the target Map output compression format is determined based on the compression parameters corresponding to the target intermediate data, and further the compression of the target intermediate data is achieved based on the target Map output compression format, so that the compression mode of the target intermediate data is enriched, and the use experience of a user is improved.
Based on this, an embodiment of the present disclosure provides a data compression method, referring to fig. 1, which is a flowchart of the data compression method provided by the embodiment of the present disclosure, where the method includes:
s101: and responding to the compression request of the target intermediate data output in the Map mapping stage, acquiring compression parameters corresponding to the target intermediate data from a pre-configured file, and taking the compression parameters corresponding to the target intermediate data as first compression parameters.
Wherein, a plurality of Map output compression formats are configured in the first compression parameters.
The data compression method provided by the embodiment of the disclosure can be applied to a distributed file system Hadoop.
In the embodiment of the disclosure, multiple Map output compression formats may be configured in the first compression parameter, so that the target intermediate data may be compressed based on the multiple Map output compression formats.
In this embodiment of the present disclosure, the target intermediate data may be any intermediate data output in the Map stage, and specifically, after receiving a compression request for the target intermediate data output in the Map stage, compression parameters corresponding to the target intermediate data are obtained from a pre-configured file, so that compression of the target intermediate data is achieved based on multiple maps output compression formats in the compression parameters corresponding to the target intermediate data.
In an alternative embodiment, the plurality of Map output compression formats may be in snappy format and lzo format.
Specifically, snappy is a high-speed compression, decompression format that can provide fast and efficient data compression and data decompression capabilities.
lzo is a data compression algorithm that can be used to compress data, lzo focuses on the speed of compression and decompression.
In the embodiment of the present disclosure, the multiple Map output compression formats may also include other compression formats, and the embodiment of the present disclosure is not limited herein.
Because the compressed target intermediate data needs to be transmitted to each node of the Reduce stage in a network transmission mode after the Map stage is finished, the consumption of the central processing unit is increased when the target intermediate data output by the Map stage is compressed, and therefore, in order to Reduce the consumption of the central processing unit, various Map output compression formats can be a snappy format and a lzo format which are compressed and decompressed faster.
S102: and determining a target Map output compression format from the plurality of Map output compression formats configured in the first compression parameters.
The target Map output compression format may be any one of a plurality of Map output compression formats configured in the first compression parameter, and the embodiments of the present disclosure are not limited in any way.
Specifically, the method for determining the target Map output compression format is as follows:
in an alternative implementation manner, in response to a compression request for target intermediate data output in a Map mapping stage, after compression parameters corresponding to the target intermediate data are obtained from a pre-configured file, multiple Map output compression formats configured in the first compression parameters are displayed; responding to the selected operation of the multiple Map output compression formats configured in the first compression parameters, and taking the Map output compression format corresponding to the selected operation as a target Map output compression format.
In the embodiment of the disclosure, after receiving a compression request for target intermediate data output in a Map stage, acquiring compression parameters corresponding to the target intermediate data from a pre-configured file, and then displaying multiple Map output compression formats configured by the compression parameters corresponding to the target intermediate data based on the compression parameters corresponding to the target intermediate data (i.e., first compression parameters), so that when receiving a selection operation for multiple Map output compression formats configured in the first compression parameters, taking the Map output compression format corresponding to the selection operation as a target Map output compression format.
In another alternative embodiment, the target Map output compression format is determined from the plurality of Map output compression formats configured in the first compression parameter according to the arrangement order of the plurality of Map output compression formats configured in the first compression parameter.
The arrangement sequence of the multiple Map output compression formats may be an addition sequence of each Map output compression format.
In the embodiment of the disclosure, according to the arrangement sequence of the multiple Map output compression formats configured in the first compression parameter, the Map output compression format added preferentially is used as the target Map output compression format.
In yet another alternative embodiment, the target Map output compression format is determined from the plurality of Map output compression formats configured in the first compression parameter based on a preset order.
The preset sequence may be set as needed, and the embodiments of the present disclosure are not limited herein.
In the embodiment of the disclosure, the Map output compression format prioritized in the preset order in the compression parameters corresponding to the target intermediate data may be used as the target Map output compression format according to the preset order.
For ease of understanding, the actual scenario is exemplified:
assuming that the plurality of Map output compression formats include a snappy format and a lzo format, after receiving a compression request for target intermediate data output in the Map stage, acquiring compression parameters corresponding to the target intermediate data from a pre-configured file, displaying whether to designate a compression mode of the target intermediate data, if the compression mode of the target intermediate data is determined to be designated, taking the snappy format as the target Map output compression format, and if the compression mode of the target intermediate data is determined not to be designated, taking the lzo format as the target Map output compression format.
It should be noted that, in order to facilitate understanding of the embodiments of the present disclosure, the above method for determining the target Map output compression format for the target intermediate data is merely described as an example, and the embodiments of the present disclosure do not limit any method for determining the target Map output compression format for the target intermediate data.
S103: and compressing the target intermediate data based on the target Map output compression format to obtain compressed target intermediate data.
Wherein the compressed target intermediate data is for transmission to the Reduce stage.
In the embodiment of the disclosure, after the target Map output compression format is determined, the target intermediate data is compressed based on the target Map output compression format, so that the network transmission efficiency is improved.
In the data compression method provided by the embodiment of the disclosure, in response to a compression request for target intermediate data output in a Map stage, compression parameters corresponding to the target intermediate data are obtained from a pre-configured file, the compression parameters corresponding to the target intermediate data are used as first compression parameters, a plurality of Map output compression formats are configured in the first compression parameters, then the target Map output compression format is determined from the plurality of Map output compression formats configured in the first compression parameters, the target intermediate data are compressed based on the target Map output compression format, and compressed target intermediate data are obtained, wherein the compressed target intermediate data are used for being transmitted to a reduction stage. Therefore, according to the embodiment of the disclosure, the compression parameters corresponding to the target intermediate data are obtained from the pre-configured file, the target Map output compression format is determined based on the compression parameters corresponding to the target intermediate data, and further the compression of the target intermediate data is achieved based on the target Map output compression format, so that the compression mode of the target intermediate data is enriched, and the use experience of a user is improved.
On the basis of the foregoing embodiments, the embodiments of the present disclosure further provide a data compression method, referring to fig. 2, which is a flowchart of another data compression method provided by the embodiments of the present disclosure, where the method includes:
s201: and responding to the compression request of the target intermediate data output in the Map mapping stage, acquiring compression parameters corresponding to the target intermediate data from a pre-configured file, and taking the compression parameters corresponding to the target intermediate data as first compression parameters.
Wherein, a plurality of Map output compression formats are configured in the first compression parameters.
S202: and determining a target Map output compression format from the plurality of Map output compression formats configured in the first compression parameters.
S203: and compressing the target intermediate data based on the target Map output compression format to obtain compressed target intermediate data.
Wherein the compressed target intermediate data is for transmission to the Reduce stage.
It should be noted that steps S201 to S203 are the same as steps S101 to S103 described above, and specific reference is made to the description of steps S101 to S103.
S204: and responding to the compression request of the target result data output by the Reduce stage, acquiring the compression parameters corresponding to the target result data from a pre-configured file, and taking the compression parameters corresponding to the target result data as second compression parameters.
The target result data is obtained by processing the compressed target intermediate data in the Reduce stage, and a plurality of Reduce output compression formats are configured in the second compression parameters.
In the embodiment of the disclosure, the multiple Reduce output compression formats of the second compression parameter configuration may be the same as the multiple Map output compression formats of the first compression parameter configuration, or may be different from the multiple Map output compression formats of the first compression parameter configuration.
In the embodiment of the disclosure, multiple Reduce output compression formats may be configured in the second compression parameter, so that the target result data may be compressed based on the multiple Reduce output compression formats.
In this embodiment of the present disclosure, the target result data may be any result data output by the Reduce stage, and specifically, after receiving a compression request for the target result data output by the Reduce stage, compression parameters corresponding to the target result data are obtained from a pre-configured file, so that compression formats are output based on multiple Reduce output compression formats in the compression parameters corresponding to the target result data, and compression of the target result data is achieved.
In an alternative embodiment, if the target result data also needs to go to the next MapReduce, the multiple Reduce output compression formats may be lzo and Bzip2 formats.
Specifically, lzo is a data compression algorithm that can be used to compress data, lzo focuses on the speed of compression and decompression.
Bzip2 is a data compression algorithm that has a higher compression ratio but a slower compression and decompression rate than other compression algorithms (e.g., lzo).
In the embodiment of the disclosure, if the target result data further needs to enter the next MapReduce, it needs to be considered whether the compressed target processing result data supports segmentation, so the multiple Reduce output compression formats may be lzo format and Bzip2 format.
In another alternative embodiment, if the target result data does not need to go to the next MapReduce, the multiple Reduce output compression formats may be Bzip2 and Gzip formats.
Specifically, gzip is a data compression algorithm that uses the lossless compression algorithm Deflate for data compression.
In the embodiment of the present disclosure, the multiple Reduce output compression formats may also include other compression formats, and the embodiment of the present disclosure is not limited in any way herein.
S205: and determining a target Reduce output compression format from a plurality of Reduce output compression formats configured in the second compression parameters.
Specifically, the method for determining the target Reduce output compression format is similar to the method for determining the target Map output compression format in the above embodiment, and specifically, the method for determining the target Map output compression format in the above embodiment may be referred to, and the embodiments of the disclosure will not be described herein in any detail.
S206: and compressing the target result data based on the target Reduce output compression format to obtain compressed target processing result data.
The target processing result data is used for being transmitted to the storage system.
In an embodiment of the present disclosure, the storage system may be a distributed storage system HDFS.
In the embodiment of the disclosure, after the target Reduce output compression format is determined, the compression of the target result data is realized based on the target Reduce output compression format, so that the storage space of the storage system is saved.
In the actual compression method provided by the embodiment of the disclosure, in response to a compression request for target intermediate data output in a Map mapping stage, compression parameters corresponding to the target intermediate data are obtained from a pre-configured file, and the compression parameters corresponding to the target intermediate data are used as first compression parameters; configuring multiple Map output compression formats in the first compression parameters, determining a target Map output compression format from the multiple Map output compression formats configured in the first compression parameters, and compressing target intermediate data based on the target Map output compression format to obtain compressed target intermediate data; the compressed target intermediate data is used for being transmitted to a reduction stage, compression parameters corresponding to target result data are obtained from a pre-configured file in response to a compression request of target result data output by the reduction stage, and the compression parameters corresponding to the target result data are used as second compression parameters; the target result data is obtained by processing the compressed target intermediate data in a Reduce stage, a plurality of Reduce output compression formats are configured in the second compression parameters, the target Reduce output compression format is determined from the plurality of Reduce output compression formats configured in the second compression parameters, and the target result data is compressed based on the target Reduce output compression format, so that compressed target processing result data is obtained; the target processing result data is used for being transmitted to a storage system.
Therefore, according to the embodiment of the disclosure, the compression parameters corresponding to the target intermediate data are obtained from the pre-configured file, the target Map output compression format is determined based on the compression parameters corresponding to the target intermediate data, and further the compression of the target intermediate data is achieved based on the target Map output compression format, so that the compression mode of the target intermediate data is enriched, and the use experience of a user is improved.
In addition, the embodiment of the disclosure can further obtain the compression parameters corresponding to the target result data from the pre-configured file, and determine the target Map output compression format based on the compression parameters corresponding to the target result data, so as to compress the target result data based on the target Map output compression format, enrich the compression mode of the target result data, and improve the use experience of users.
On the basis of the foregoing embodiments, the embodiments of the present disclosure further provide a data compression method, referring to fig. 3, which is a flowchart of yet another data compression method provided by the embodiments of the present disclosure, where the method includes:
firstly, determining target original data, sending the target original data to a Map stage, processing the target original data in the Map stage to obtain target intermediate data, after receiving a compression request of the target intermediate data output by the Map stage, acquiring compression parameters corresponding to the target intermediate data from a pre-configured file, displaying whether to designate the compression mode of the target intermediate data, if the compression mode of the designated target intermediate data is determined, taking a snappy format as a target Map output compression format, if the compression mode of the target intermediate data is determined not to be designated, taking a lzo format as a target Map output compression format, then compressing the target intermediate data based on the target Map output compression format to obtain compressed target intermediate data, and transmitting the compressed target intermediate data to a Reduce stage.
And processing the compressed target intermediate data to obtain target result data in the Reduce stage, acquiring compression parameters corresponding to the target result data from a pre-configured file after receiving a compression request of the target result data output in the Reduce stage, displaying whether the compression mode of the target result data is specified, outputting a compression format by using a lzo format as the target Reduce if the compression mode of the target result data is determined, outputting the compression format by using a Bzip2 format as the target Reduce if the compression mode of the target result data is determined not to be specified, compressing the target result data based on the target Reduce output compression format to obtain compressed target result data, and transmitting the compressed target result data to a storage system.
Therefore, according to the embodiment of the disclosure, the compression parameters corresponding to the target intermediate data are obtained from the pre-configured file, the target Map output compression format is determined based on the compression parameters corresponding to the target intermediate data, and further the compression of the target intermediate data is achieved based on the target Map output compression format, so that the compression mode of the target intermediate data is enriched, and the use experience of a user is improved.
In addition, the embodiment of the disclosure can further obtain the compression parameters corresponding to the target result data from the pre-configured file, and determine the target Map output compression format based on the compression parameters corresponding to the target result data, so as to compress the target result data based on the target Map output compression format, enrich the compression mode of the target result data, and improve the use experience of users.
On the basis of the foregoing embodiments, the embodiments of the present disclosure further provide a data compression method, referring to fig. 4, which is a flowchart of yet another data compression method provided by the embodiments of the present disclosure, where the method includes:
firstly, determining target data, splitting the target data into a plurality of data blocks (such as a data block 1, a data block 2 and a data block 3 of fig. 4. A database N) through a logic slice, wherein each data block corresponds to one Map task, and for convenience of understanding, description is given of the Map task 1 as an example.
Specifically, the Map task 1 is processed in the Map stage to obtain target intermediate data, after receiving a compression request for the target intermediate data output in the Map stage, compression parameters corresponding to the target intermediate data are obtained from a pre-configured file, multiple Map output compression formats configured by the compression parameters corresponding to the target intermediate data are displayed based on the compression parameters corresponding to the target intermediate data (i.e., first compression parameters), so that a selected operation for the multiple Map output compression formats configured in the first compression parameters is received, the Map output compression format corresponding to the selected operation is used as the target Map output compression format, then the target intermediate data are compressed based on the target Map output compression format to obtain compressed target intermediate data, and the compressed target intermediate data are transmitted to the Reduce stage based on the Shuffle.
Assuming that the key value corresponding to the Map task 1 is k1, and the key value corresponding to the Reduce task 1 is also k1, processing the compressed target intermediate data through the Reduce task 1 in the Reduce stage to obtain target result data, after receiving a compression request for the target result data output in the Reduce stage, acquiring compression parameters corresponding to the target result data from a pre-configured file, displaying multiple Reduce output compression formats configured by the compression parameters corresponding to the target result data based on the compression parameters corresponding to the target result data (i.e., the second compression parameters), accordingly, receiving a selected operation for the multiple Reduce output compression formats configured in the second compression parameters, taking the Reduce output compression format corresponding to the selected operation as a target Reduce output compression format, then compressing the target result data based on the target Reduce output compression format to obtain compressed target result data, and transmitting the compressed target result data to a storage system such as an HDFS.
Therefore, according to the embodiment of the disclosure, the compression parameters corresponding to the target intermediate data are obtained from the pre-configured file, the target Map output compression format is determined based on the compression parameters corresponding to the target intermediate data, and further the compression of the target intermediate data is achieved based on the target Map output compression format, so that the compression mode of the target intermediate data is enriched, and the use experience of a user is improved.
In addition, the embodiment of the disclosure can further obtain the compression parameters corresponding to the target result data from the pre-configured file, and determine the target Map output compression format based on the compression parameters corresponding to the target result data, so as to compress the target result data based on the target Map output compression format, enrich the compression mode of the target result data, and improve the use experience of users.
Based on the above method embodiments, the present disclosure further provides a data compression device, and referring to fig. 5, a schematic structural diagram of the data compression device provided in the embodiments of the present disclosure is provided, where the device includes:
the first obtaining module 501 is configured to obtain, in response to a compression request for target intermediate data output in a Map mapping stage, a compression parameter corresponding to the target intermediate data from a pre-configured file, and take the compression parameter corresponding to the target intermediate data as a first compression parameter; wherein, a plurality of Map output compression formats are configured in the first compression parameters;
a first determining module 502, configured to determine a target Map output compression format from a plurality of Map output compression formats configured in the first compression parameter;
a first compression module 503, configured to compress the target intermediate data based on the target Map output compression format, to obtain compressed target intermediate data; the compressed target intermediate data is used for being transmitted to a reduction stage.
In an alternative embodiment, the apparatus further comprises:
the second acquisition module is used for responding to a compression request of target result data output by the Reduce stage, acquiring compression parameters corresponding to the target result data from the preconfigured file, and taking the compression parameters corresponding to the target result data as second compression parameters; the target result data is obtained by processing the compressed target intermediate data in the Reduce stage, and a plurality of Reduce output compression formats are configured in the second compression parameters;
the second determining module is used for determining a target Reduce output compression format from a plurality of Reduce output compression formats configured in the second compression parameters;
the second compression module is used for compressing the target result data based on the target Reduce output compression format to obtain compressed target processing result data; the target processing result data is used for being transmitted to a storage system.
In an alternative embodiment, the apparatus further comprises:
the display module is used for displaying a plurality of Map output compression formats configured in the first compression parameters;
accordingly, the first determining module 502 is specifically configured to:
responding to a selected operation of multiple Map output compression formats configured in the first compression parameters, and taking the Map output compression format corresponding to the selected operation as a target Map output compression format.
In an alternative embodiment, the first determining module 502 is specifically configured to:
and determining a target Map output compression format from the plurality of Map output compression formats configured in the first compression parameters according to the arrangement sequence of the plurality of Map output compression formats configured in the first compression parameters.
In an alternative embodiment, the first determining module 502 is specifically configured to:
and determining a target Map output compression format from a plurality of Map output compression formats configured in the first compression parameters based on a preset sequence.
In an alternative embodiment, the plurality of Map output compression formats includes a snappy format and a lzo format.
In an alternative embodiment, the plurality of Reduce output compression formats includes lzo format and Bzip2 format.
In the data compression device provided by the embodiment of the disclosure, in response to a compression request for target intermediate data output in a Map stage, compression parameters corresponding to the target intermediate data are obtained from a pre-configured file, the compression parameters corresponding to the target intermediate data are used as first compression parameters, a plurality of Map output compression formats are configured in the first compression parameters, then the target Map output compression format is determined from the plurality of Map output compression formats configured in the first compression parameters, the target intermediate data are compressed based on the target Map output compression format, and compressed target intermediate data are obtained, wherein the compressed target intermediate data are used for being transmitted to a reduction stage. Therefore, according to the embodiment of the disclosure, the compression parameters corresponding to the target intermediate data are obtained from the pre-configured file, the target Map output compression format is determined based on the compression parameters corresponding to the target intermediate data, and further the compression of the target intermediate data is achieved based on the target Map output compression format, so that the compression mode of the target intermediate data is enriched, and the use experience of a user is improved.
In addition to the above methods and apparatuses, the embodiments of the present disclosure further provide a computer readable storage medium, where instructions are stored, when the instructions are executed on a terminal device, to cause the terminal device to implement the data compression method according to the embodiments of the present disclosure.
The disclosed embodiments also provide a computer program product comprising computer programs/instructions which, when executed by a processor, implement the data compression method according to the disclosed embodiments.
In addition, the embodiment of the present disclosure further provides a data compression device, as shown in fig. 6, which may include:
a processor 601, a memory 602, an input device 603 and an output device 604. The number of processors 601 in the data compression device may be one or more, one processor being an example in fig. 6. In some embodiments of the present disclosure, the processor 601, memory 602, input device 603, and output device 604 may be connected by a bus or other means, with the bus connection being exemplified in fig. 6.
The memory 602 may be used to store software programs and modules, and the processor 601 performs various functional applications of the data compression device and data processing by executing the software programs and modules stored in the memory 602. The memory 602 may primarily include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for at least one function, and the like. In addition, the memory 602 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. The input means 603 may be used to receive input numeric or character information and to generate signal inputs related to user settings and function control of the data compression device.
In particular, in this embodiment, the processor 601 loads executable files corresponding to the processes of one or more application programs into the memory 602 according to the following instructions, and the processor 601 executes the application programs stored in the memory 602, so as to implement the various functions of the data compression device.
It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing is merely a specific embodiment of the disclosure to enable one skilled in the art to understand or practice the disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown and described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. A method of data compression, the method comprising:
responding to a compression request of target intermediate data output in a Map mapping stage, acquiring compression parameters corresponding to the target intermediate data from a pre-configured file, and taking the compression parameters corresponding to the target intermediate data as first compression parameters; wherein, a plurality of Map output compression formats are configured in the first compression parameters;
determining a target Map output compression format from a plurality of Map output compression formats configured in the first compression parameters;
compressing the target intermediate data based on the target Map output compression format to obtain compressed target intermediate data; the compressed target intermediate data is used for being transmitted to a reduction stage.
2. The method of claim 1, wherein the compressing the target intermediate data based on the target Map output compression format, after obtaining compressed target intermediate data, further comprises:
responding to a compression request of target result data output by the Reduce stage, acquiring compression parameters corresponding to the target result data from the preconfigured file, and taking the compression parameters corresponding to the target result data as second compression parameters; the target result data is obtained by processing the compressed target intermediate data in the Reduce stage, and a plurality of Reduce output compression formats are configured in the second compression parameters;
determining a target Reduce output compression format from a plurality of Reduce output compression formats configured in the second compression parameters;
compressing the target result data based on the target Reduce output compression format to obtain compressed target processing result data; the target processing result data is used for being transmitted to a storage system.
3. The method of claim 1, wherein the determining the target Map output compression format from the plurality of Map output compression formats configured in the first compression parameter further comprises:
displaying a plurality of Map output compression formats configured in the first compression parameters;
accordingly, the determining the target Map output compression format from the multiple Map output compression formats configured in the first compression parameter includes:
responding to a selected operation of multiple Map output compression formats configured in the first compression parameters, and taking the Map output compression format corresponding to the selected operation as a target Map output compression format.
4. The method of claim 1, wherein determining the target Map output compression format from the plurality of Map output compression formats configured in the first compression parameter comprises:
and determining a target Map output compression format from the plurality of Map output compression formats configured in the first compression parameters according to the arrangement sequence of the plurality of Map output compression formats configured in the first compression parameters.
5. The method of claim 1, wherein determining the target Map output compression format from the plurality of Map output compression formats configured in the first compression parameter comprises:
and determining a target Map output compression format from a plurality of Map output compression formats configured in the first compression parameters based on a preset sequence.
6. The method of claim 1, wherein the plurality of Map output compression formats includes a snappy format and a lzo format.
7. The method of claim 2, wherein the plurality of Reduce output compression formats comprises a lzo format and a Bzip2 format.
8. A data compression apparatus, the apparatus comprising:
the first acquisition module is used for responding to a compression request of target intermediate data output in a Map mapping stage, acquiring compression parameters corresponding to the target intermediate data from a pre-configured file, and taking the compression parameters corresponding to the target intermediate data as first compression parameters; wherein, a plurality of Map output compression formats are configured in the first compression parameters;
the first determining module is used for determining a target Map output compression format from a plurality of Map output compression formats configured in the first compression parameters;
the first compression module is used for compressing the target intermediate data based on the target Map output compression format to obtain compressed target intermediate data; the compressed target intermediate data is used for being transmitted to a reduction stage.
9. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein instructions, which when run on a terminal device, cause the terminal device to implement the method of any of claims 1-7.
10. A data compression apparatus, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1-7 when the computer program is executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311053461.0A CN117111845A (en) | 2023-08-18 | 2023-08-18 | Data compression method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311053461.0A CN117111845A (en) | 2023-08-18 | 2023-08-18 | Data compression method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117111845A true CN117111845A (en) | 2023-11-24 |
Family
ID=88808522
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311053461.0A Pending CN117111845A (en) | 2023-08-18 | 2023-08-18 | Data compression method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117111845A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105302494A (en) * | 2015-11-19 | 2016-02-03 | 浪潮(北京)电子信息产业有限公司 | Compression strategy selecting method and device |
CN111930731A (en) * | 2020-07-28 | 2020-11-13 | 苏州亿歌网络科技有限公司 | Data dump method, device, equipment and storage medium |
CN112925821A (en) * | 2021-02-07 | 2021-06-08 | 江西理工大学 | MapReduce-based parallel frequent item set incremental data mining method |
CN114610792A (en) * | 2022-03-09 | 2022-06-10 | 树根互联股份有限公司 | Data processing method, device and system and industrial equipment |
CN115442024A (en) * | 2022-09-05 | 2022-12-06 | 哈尔滨理工大学 | Chaos-based MapReduce data compression information protection method |
-
2023
- 2023-08-18 CN CN202311053461.0A patent/CN117111845A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105302494A (en) * | 2015-11-19 | 2016-02-03 | 浪潮(北京)电子信息产业有限公司 | Compression strategy selecting method and device |
CN111930731A (en) * | 2020-07-28 | 2020-11-13 | 苏州亿歌网络科技有限公司 | Data dump method, device, equipment and storage medium |
CN112925821A (en) * | 2021-02-07 | 2021-06-08 | 江西理工大学 | MapReduce-based parallel frequent item set incremental data mining method |
CN114610792A (en) * | 2022-03-09 | 2022-06-10 | 树根互联股份有限公司 | Data processing method, device and system and industrial equipment |
CN115442024A (en) * | 2022-09-05 | 2022-12-06 | 哈尔滨理工大学 | Chaos-based MapReduce data compression information protection method |
Non-Patent Citations (1)
Title |
---|
王冬: "配电网时间序列数据的云计算集群快速压缩模型", 《2017年江西省电机工程学会年会论文集》, 31 December 2017 (2017-12-31), pages 299 - 303 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20140215170A1 (en) | Block Compression in a Key/Value Store | |
CN109815261B (en) | Global search function implementation and data real-time synchronization method and device and electronic equipment | |
JP6978467B2 (en) | Systems and methods for converting sparse elements to dense matrices | |
US20170109371A1 (en) | Method and Apparatus for Processing File in a Distributed System | |
CN103246730A (en) | File storage method and device and file sensing method and device | |
JP2017138966A (en) | Systems and methods for transforming sparse elements to dense matrix | |
CN110968585A (en) | Method, device and equipment for storing orientation column and computer readable storage medium | |
CN110888862A (en) | Data storage method, data query method, data storage device, data query device, server and storage medium | |
CN114461955A (en) | Method for automatically generating http interface based on web page configuration | |
CN110334103B (en) | Recommendation service updating method, providing device, access device and recommendation system | |
CN113873013B (en) | Offline package reorganization method and system | |
CN114139040A (en) | Data storage and query method, device, equipment and readable storage medium | |
CN109213950B (en) | Data processing method and device for browser application of IPTV (Internet protocol television) intelligent set top box | |
CN116455956B (en) | Method and system for data acquisition and data playback based on message middleware | |
CN117111845A (en) | Data compression method, device, equipment and storage medium | |
CN113342813B (en) | Key value data processing method, device, computer equipment and readable storage medium | |
CN113448739B (en) | Data processing method and device | |
CN114519037A (en) | Table online previewing method, device and system | |
CN116431585A (en) | File compression method and device, and file decompression method and device | |
CN110750724B (en) | Data processing method, device, equipment and storage medium | |
CN114218175A (en) | Resource cross-platform sharing method and device, terminal equipment and storage medium | |
CN115291793A (en) | Attribute data conversion method and device, storage medium and electronic device | |
CN113641643A (en) | File writing method and device | |
CN111523066B (en) | Data acquisition method and device | |
CN114065123A (en) | Sparse matrix calculation method and acceleration device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |