CN117111845A - Data compression method, device, equipment and storage medium - Google Patents

Data compression method, device, equipment and storage medium Download PDF

Info

Publication number
CN117111845A
CN117111845A CN202311053461.0A CN202311053461A CN117111845A CN 117111845 A CN117111845 A CN 117111845A CN 202311053461 A CN202311053461 A CN 202311053461A CN 117111845 A CN117111845 A CN 117111845A
Authority
CN
China
Prior art keywords
compression
target
map output
intermediate data
format
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311053461.0A
Other languages
Chinese (zh)
Inventor
贾晓露
孙璐
张旭
杨文凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongdian Cloud Computing Technology Co ltd
Original Assignee
Zhongdian Cloud Computing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongdian Cloud Computing Technology Co ltd filed Critical Zhongdian Cloud Computing Technology Co ltd
Priority to CN202311053461.0A priority Critical patent/CN117111845A/en
Publication of CN117111845A publication Critical patent/CN117111845A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The present disclosure provides a data compression method, apparatus, device, and storage medium, where the method includes: and responding to a compression request aiming at target intermediate data output in a Map mapping stage, acquiring compression parameters corresponding to the target intermediate data from a pre-configured file, taking the compression parameters corresponding to the target intermediate data as first compression parameters, determining a target Map output compression format from a plurality of Map output compression formats configured in the first compression parameters, and compressing the target intermediate data based on the target Map output compression format. Therefore, the compression parameters corresponding to the target intermediate data are obtained from the pre-configured file, the target Map output compression format is determined based on the compression parameters corresponding to the target intermediate data, and further the compression of the target intermediate data is achieved based on the target Map output compression format, so that the compression mode of the target intermediate data is enriched, and the use experience of a user is improved.

Description

Data compression method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of data processing, and in particular, to a data compression method, apparatus, device, and storage medium.
Background
With the advent of the big data age, the data volume has grown rapidly, and map reduction MapReduce technology has been widely used and developed.
In the Map stage of MapReduce, the network transmission efficiency can be improved by compressing the intermediate data output in the Map stage.
However, the compression mode of the intermediate data output in the Map stage is single, and thus, the use requirement of the user cannot be met, so that how to enrich the compression mode of the intermediate data output in the Map stage to improve the use experience of the user is a technical problem to be solved.
Disclosure of Invention
In order to solve the above technical problems, an embodiment of the present disclosure provides a data compression method.
In a first aspect, the present disclosure provides a data compression method, the method comprising:
responding to a compression request of target intermediate data output in a Map mapping stage, acquiring compression parameters corresponding to the target intermediate data from a pre-configured file, and taking the compression parameters corresponding to the target intermediate data as first compression parameters; wherein, a plurality of Map output compression formats are configured in the first compression parameters;
determining a target Map output compression format from a plurality of Map output compression formats configured in the first compression parameters;
compressing the target intermediate data based on the target Map output compression format to obtain compressed target intermediate data; the compressed target intermediate data is used for being transmitted to a reduction stage.
In an optional implementation manner, after compressing the target intermediate data based on the target Map output compression format to obtain compressed target intermediate data, the method further includes:
responding to a compression request of target result data output by the Reduce stage, acquiring compression parameters corresponding to the target result data from the preconfigured file, and taking the compression parameters corresponding to the target result data as second compression parameters; the target result data is obtained by processing the compressed target intermediate data in the Reduce stage, and a plurality of Reduce output compression formats are configured in the second compression parameters;
determining a target Reduce output compression format from a plurality of Reduce output compression formats configured in the second compression parameters;
compressing the target result data based on the target Reduce output compression format to obtain compressed target processing result data; the target processing result data is used for being transmitted to a storage system.
In an optional implementation manner, before determining the target Map output compression format from the multiple Map output compression formats configured in the first compression parameter, the method further includes:
displaying a plurality of Map output compression formats configured in the first compression parameters;
accordingly, the determining the target Map output compression format from the multiple Map output compression formats configured in the first compression parameter includes:
responding to a selected operation of multiple Map output compression formats configured in the first compression parameters, and taking the Map output compression format corresponding to the selected operation as a target Map output compression format.
In an optional implementation manner, the determining the target Map output compression format from the multiple Map output compression formats configured in the first compression parameter includes:
and determining a target Map output compression format from the plurality of Map output compression formats configured in the first compression parameters according to the arrangement sequence of the plurality of Map output compression formats configured in the first compression parameters.
In an optional implementation manner, the determining the target Map output compression format from the multiple Map output compression formats configured in the first compression parameter includes:
and determining a target Map output compression format from a plurality of Map output compression formats configured in the first compression parameters based on a preset sequence.
In an alternative embodiment, the plurality of Map output compression formats includes a snappy format and a lzo format.
In an alternative embodiment, the plurality of Reduce output compression formats includes lzo format and Bzip2 format.
In a second aspect, the present disclosure provides a data compression apparatus, the apparatus comprising:
the first acquisition module is used for responding to a compression request of target intermediate data output in a Map mapping stage, acquiring compression parameters corresponding to the target intermediate data from a pre-configured file, and taking the compression parameters corresponding to the target intermediate data as first compression parameters; wherein, a plurality of Map output compression formats are configured in the first compression parameters;
the first determining module is used for determining a target Map output compression format from a plurality of Map output compression formats configured in the first compression parameters;
the first compression module is used for compressing the target intermediate data based on the target Map output compression format to obtain compressed target intermediate data; the compressed target intermediate data is used for being transmitted to a reduction stage.
In a third aspect, the present disclosure provides a computer readable storage medium having instructions stored therein, which when run on a terminal device, cause the terminal device to implement the above-described method.
In a fourth aspect, the present disclosure provides a data compression apparatus comprising: the computer program comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the method when executing the computer program.
In a fifth aspect, the present disclosure provides a computer program product comprising computer programs/instructions which when executed by a processor implement the above-described method.
Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has at least the following advantages:
the embodiment of the disclosure provides a data compression method, which is used for responding to a compression request of target intermediate data output in a Map stage, acquiring compression parameters corresponding to the target intermediate data from a pre-configured file, taking the compression parameters corresponding to the target intermediate data as first compression parameters, configuring a plurality of Map output compression formats in the first compression parameters, determining the target Map output compression format from the plurality of Map output compression formats configured in the first compression parameters, compressing the target intermediate data based on the target Map output compression format, and obtaining compressed target intermediate data, wherein the compressed target intermediate data is used for being transmitted to a reduction stage. Therefore, according to the embodiment of the disclosure, the compression parameters corresponding to the target intermediate data are obtained from the pre-configured file, the target Map output compression format is determined based on the compression parameters corresponding to the target intermediate data, and further the compression of the target intermediate data is achieved based on the target Map output compression format, so that the compression mode of the target intermediate data is enriched, and the use experience of a user is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, the drawings that are required for the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
Fig. 1 is a flowchart of a data compression method according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of another data compression method provided by an embodiment of the present disclosure;
FIG. 3 is a flow chart of yet another data compression method provided by an embodiment of the present disclosure;
FIG. 4 is a flow chart of yet another data compression method provided by an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a data compression device according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a data compression device according to an embodiment of the present disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, a further description of aspects of the present disclosure will be provided below. It should be noted that, without conflict, the embodiments of the present disclosure and features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the disclosure.
In order to enrich the compression mode of target intermediate data so as to improve the use experience of users, the embodiment of the disclosure provides a data compression method.
Specifically, in response to a compression request for target intermediate data output in a Map stage, obtaining compression parameters corresponding to the target intermediate data from a pre-configured file, and taking the compression parameters corresponding to the target intermediate data as first compression parameters, wherein multiple Map output compression formats are configured in the first compression parameters, then the target Map output compression format is determined from the multiple Map output compression formats configured in the first compression parameters, the target intermediate data is compressed based on the target Map output compression format, and compressed target intermediate data is obtained, wherein the compressed target intermediate data is used for being transmitted to a reduction stage. Therefore, according to the embodiment of the disclosure, the compression parameters corresponding to the target intermediate data are obtained from the pre-configured file, the target Map output compression format is determined based on the compression parameters corresponding to the target intermediate data, and further the compression of the target intermediate data is achieved based on the target Map output compression format, so that the compression mode of the target intermediate data is enriched, and the use experience of a user is improved.
Based on this, an embodiment of the present disclosure provides a data compression method, referring to fig. 1, which is a flowchart of the data compression method provided by the embodiment of the present disclosure, where the method includes:
s101: and responding to the compression request of the target intermediate data output in the Map mapping stage, acquiring compression parameters corresponding to the target intermediate data from a pre-configured file, and taking the compression parameters corresponding to the target intermediate data as first compression parameters.
Wherein, a plurality of Map output compression formats are configured in the first compression parameters.
The data compression method provided by the embodiment of the disclosure can be applied to a distributed file system Hadoop.
In the embodiment of the disclosure, multiple Map output compression formats may be configured in the first compression parameter, so that the target intermediate data may be compressed based on the multiple Map output compression formats.
In this embodiment of the present disclosure, the target intermediate data may be any intermediate data output in the Map stage, and specifically, after receiving a compression request for the target intermediate data output in the Map stage, compression parameters corresponding to the target intermediate data are obtained from a pre-configured file, so that compression of the target intermediate data is achieved based on multiple maps output compression formats in the compression parameters corresponding to the target intermediate data.
In an alternative embodiment, the plurality of Map output compression formats may be in snappy format and lzo format.
Specifically, snappy is a high-speed compression, decompression format that can provide fast and efficient data compression and data decompression capabilities.
lzo is a data compression algorithm that can be used to compress data, lzo focuses on the speed of compression and decompression.
In the embodiment of the present disclosure, the multiple Map output compression formats may also include other compression formats, and the embodiment of the present disclosure is not limited herein.
Because the compressed target intermediate data needs to be transmitted to each node of the Reduce stage in a network transmission mode after the Map stage is finished, the consumption of the central processing unit is increased when the target intermediate data output by the Map stage is compressed, and therefore, in order to Reduce the consumption of the central processing unit, various Map output compression formats can be a snappy format and a lzo format which are compressed and decompressed faster.
S102: and determining a target Map output compression format from the plurality of Map output compression formats configured in the first compression parameters.
The target Map output compression format may be any one of a plurality of Map output compression formats configured in the first compression parameter, and the embodiments of the present disclosure are not limited in any way.
Specifically, the method for determining the target Map output compression format is as follows:
in an alternative implementation manner, in response to a compression request for target intermediate data output in a Map mapping stage, after compression parameters corresponding to the target intermediate data are obtained from a pre-configured file, multiple Map output compression formats configured in the first compression parameters are displayed; responding to the selected operation of the multiple Map output compression formats configured in the first compression parameters, and taking the Map output compression format corresponding to the selected operation as a target Map output compression format.
In the embodiment of the disclosure, after receiving a compression request for target intermediate data output in a Map stage, acquiring compression parameters corresponding to the target intermediate data from a pre-configured file, and then displaying multiple Map output compression formats configured by the compression parameters corresponding to the target intermediate data based on the compression parameters corresponding to the target intermediate data (i.e., first compression parameters), so that when receiving a selection operation for multiple Map output compression formats configured in the first compression parameters, taking the Map output compression format corresponding to the selection operation as a target Map output compression format.
In another alternative embodiment, the target Map output compression format is determined from the plurality of Map output compression formats configured in the first compression parameter according to the arrangement order of the plurality of Map output compression formats configured in the first compression parameter.
The arrangement sequence of the multiple Map output compression formats may be an addition sequence of each Map output compression format.
In the embodiment of the disclosure, according to the arrangement sequence of the multiple Map output compression formats configured in the first compression parameter, the Map output compression format added preferentially is used as the target Map output compression format.
In yet another alternative embodiment, the target Map output compression format is determined from the plurality of Map output compression formats configured in the first compression parameter based on a preset order.
The preset sequence may be set as needed, and the embodiments of the present disclosure are not limited herein.
In the embodiment of the disclosure, the Map output compression format prioritized in the preset order in the compression parameters corresponding to the target intermediate data may be used as the target Map output compression format according to the preset order.
For ease of understanding, the actual scenario is exemplified:
assuming that the plurality of Map output compression formats include a snappy format and a lzo format, after receiving a compression request for target intermediate data output in the Map stage, acquiring compression parameters corresponding to the target intermediate data from a pre-configured file, displaying whether to designate a compression mode of the target intermediate data, if the compression mode of the target intermediate data is determined to be designated, taking the snappy format as the target Map output compression format, and if the compression mode of the target intermediate data is determined not to be designated, taking the lzo format as the target Map output compression format.
It should be noted that, in order to facilitate understanding of the embodiments of the present disclosure, the above method for determining the target Map output compression format for the target intermediate data is merely described as an example, and the embodiments of the present disclosure do not limit any method for determining the target Map output compression format for the target intermediate data.
S103: and compressing the target intermediate data based on the target Map output compression format to obtain compressed target intermediate data.
Wherein the compressed target intermediate data is for transmission to the Reduce stage.
In the embodiment of the disclosure, after the target Map output compression format is determined, the target intermediate data is compressed based on the target Map output compression format, so that the network transmission efficiency is improved.
In the data compression method provided by the embodiment of the disclosure, in response to a compression request for target intermediate data output in a Map stage, compression parameters corresponding to the target intermediate data are obtained from a pre-configured file, the compression parameters corresponding to the target intermediate data are used as first compression parameters, a plurality of Map output compression formats are configured in the first compression parameters, then the target Map output compression format is determined from the plurality of Map output compression formats configured in the first compression parameters, the target intermediate data are compressed based on the target Map output compression format, and compressed target intermediate data are obtained, wherein the compressed target intermediate data are used for being transmitted to a reduction stage. Therefore, according to the embodiment of the disclosure, the compression parameters corresponding to the target intermediate data are obtained from the pre-configured file, the target Map output compression format is determined based on the compression parameters corresponding to the target intermediate data, and further the compression of the target intermediate data is achieved based on the target Map output compression format, so that the compression mode of the target intermediate data is enriched, and the use experience of a user is improved.
On the basis of the foregoing embodiments, the embodiments of the present disclosure further provide a data compression method, referring to fig. 2, which is a flowchart of another data compression method provided by the embodiments of the present disclosure, where the method includes:
s201: and responding to the compression request of the target intermediate data output in the Map mapping stage, acquiring compression parameters corresponding to the target intermediate data from a pre-configured file, and taking the compression parameters corresponding to the target intermediate data as first compression parameters.
Wherein, a plurality of Map output compression formats are configured in the first compression parameters.
S202: and determining a target Map output compression format from the plurality of Map output compression formats configured in the first compression parameters.
S203: and compressing the target intermediate data based on the target Map output compression format to obtain compressed target intermediate data.
Wherein the compressed target intermediate data is for transmission to the Reduce stage.
It should be noted that steps S201 to S203 are the same as steps S101 to S103 described above, and specific reference is made to the description of steps S101 to S103.
S204: and responding to the compression request of the target result data output by the Reduce stage, acquiring the compression parameters corresponding to the target result data from a pre-configured file, and taking the compression parameters corresponding to the target result data as second compression parameters.
The target result data is obtained by processing the compressed target intermediate data in the Reduce stage, and a plurality of Reduce output compression formats are configured in the second compression parameters.
In the embodiment of the disclosure, the multiple Reduce output compression formats of the second compression parameter configuration may be the same as the multiple Map output compression formats of the first compression parameter configuration, or may be different from the multiple Map output compression formats of the first compression parameter configuration.
In the embodiment of the disclosure, multiple Reduce output compression formats may be configured in the second compression parameter, so that the target result data may be compressed based on the multiple Reduce output compression formats.
In this embodiment of the present disclosure, the target result data may be any result data output by the Reduce stage, and specifically, after receiving a compression request for the target result data output by the Reduce stage, compression parameters corresponding to the target result data are obtained from a pre-configured file, so that compression formats are output based on multiple Reduce output compression formats in the compression parameters corresponding to the target result data, and compression of the target result data is achieved.
In an alternative embodiment, if the target result data also needs to go to the next MapReduce, the multiple Reduce output compression formats may be lzo and Bzip2 formats.
Specifically, lzo is a data compression algorithm that can be used to compress data, lzo focuses on the speed of compression and decompression.
Bzip2 is a data compression algorithm that has a higher compression ratio but a slower compression and decompression rate than other compression algorithms (e.g., lzo).
In the embodiment of the disclosure, if the target result data further needs to enter the next MapReduce, it needs to be considered whether the compressed target processing result data supports segmentation, so the multiple Reduce output compression formats may be lzo format and Bzip2 format.
In another alternative embodiment, if the target result data does not need to go to the next MapReduce, the multiple Reduce output compression formats may be Bzip2 and Gzip formats.
Specifically, gzip is a data compression algorithm that uses the lossless compression algorithm Deflate for data compression.
In the embodiment of the present disclosure, the multiple Reduce output compression formats may also include other compression formats, and the embodiment of the present disclosure is not limited in any way herein.
S205: and determining a target Reduce output compression format from a plurality of Reduce output compression formats configured in the second compression parameters.
Specifically, the method for determining the target Reduce output compression format is similar to the method for determining the target Map output compression format in the above embodiment, and specifically, the method for determining the target Map output compression format in the above embodiment may be referred to, and the embodiments of the disclosure will not be described herein in any detail.
S206: and compressing the target result data based on the target Reduce output compression format to obtain compressed target processing result data.
The target processing result data is used for being transmitted to the storage system.
In an embodiment of the present disclosure, the storage system may be a distributed storage system HDFS.
In the embodiment of the disclosure, after the target Reduce output compression format is determined, the compression of the target result data is realized based on the target Reduce output compression format, so that the storage space of the storage system is saved.
In the actual compression method provided by the embodiment of the disclosure, in response to a compression request for target intermediate data output in a Map mapping stage, compression parameters corresponding to the target intermediate data are obtained from a pre-configured file, and the compression parameters corresponding to the target intermediate data are used as first compression parameters; configuring multiple Map output compression formats in the first compression parameters, determining a target Map output compression format from the multiple Map output compression formats configured in the first compression parameters, and compressing target intermediate data based on the target Map output compression format to obtain compressed target intermediate data; the compressed target intermediate data is used for being transmitted to a reduction stage, compression parameters corresponding to target result data are obtained from a pre-configured file in response to a compression request of target result data output by the reduction stage, and the compression parameters corresponding to the target result data are used as second compression parameters; the target result data is obtained by processing the compressed target intermediate data in a Reduce stage, a plurality of Reduce output compression formats are configured in the second compression parameters, the target Reduce output compression format is determined from the plurality of Reduce output compression formats configured in the second compression parameters, and the target result data is compressed based on the target Reduce output compression format, so that compressed target processing result data is obtained; the target processing result data is used for being transmitted to a storage system.
Therefore, according to the embodiment of the disclosure, the compression parameters corresponding to the target intermediate data are obtained from the pre-configured file, the target Map output compression format is determined based on the compression parameters corresponding to the target intermediate data, and further the compression of the target intermediate data is achieved based on the target Map output compression format, so that the compression mode of the target intermediate data is enriched, and the use experience of a user is improved.
In addition, the embodiment of the disclosure can further obtain the compression parameters corresponding to the target result data from the pre-configured file, and determine the target Map output compression format based on the compression parameters corresponding to the target result data, so as to compress the target result data based on the target Map output compression format, enrich the compression mode of the target result data, and improve the use experience of users.
On the basis of the foregoing embodiments, the embodiments of the present disclosure further provide a data compression method, referring to fig. 3, which is a flowchart of yet another data compression method provided by the embodiments of the present disclosure, where the method includes:
firstly, determining target original data, sending the target original data to a Map stage, processing the target original data in the Map stage to obtain target intermediate data, after receiving a compression request of the target intermediate data output by the Map stage, acquiring compression parameters corresponding to the target intermediate data from a pre-configured file, displaying whether to designate the compression mode of the target intermediate data, if the compression mode of the designated target intermediate data is determined, taking a snappy format as a target Map output compression format, if the compression mode of the target intermediate data is determined not to be designated, taking a lzo format as a target Map output compression format, then compressing the target intermediate data based on the target Map output compression format to obtain compressed target intermediate data, and transmitting the compressed target intermediate data to a Reduce stage.
And processing the compressed target intermediate data to obtain target result data in the Reduce stage, acquiring compression parameters corresponding to the target result data from a pre-configured file after receiving a compression request of the target result data output in the Reduce stage, displaying whether the compression mode of the target result data is specified, outputting a compression format by using a lzo format as the target Reduce if the compression mode of the target result data is determined, outputting the compression format by using a Bzip2 format as the target Reduce if the compression mode of the target result data is determined not to be specified, compressing the target result data based on the target Reduce output compression format to obtain compressed target result data, and transmitting the compressed target result data to a storage system.
Therefore, according to the embodiment of the disclosure, the compression parameters corresponding to the target intermediate data are obtained from the pre-configured file, the target Map output compression format is determined based on the compression parameters corresponding to the target intermediate data, and further the compression of the target intermediate data is achieved based on the target Map output compression format, so that the compression mode of the target intermediate data is enriched, and the use experience of a user is improved.
In addition, the embodiment of the disclosure can further obtain the compression parameters corresponding to the target result data from the pre-configured file, and determine the target Map output compression format based on the compression parameters corresponding to the target result data, so as to compress the target result data based on the target Map output compression format, enrich the compression mode of the target result data, and improve the use experience of users.
On the basis of the foregoing embodiments, the embodiments of the present disclosure further provide a data compression method, referring to fig. 4, which is a flowchart of yet another data compression method provided by the embodiments of the present disclosure, where the method includes:
firstly, determining target data, splitting the target data into a plurality of data blocks (such as a data block 1, a data block 2 and a data block 3 of fig. 4. A database N) through a logic slice, wherein each data block corresponds to one Map task, and for convenience of understanding, description is given of the Map task 1 as an example.
Specifically, the Map task 1 is processed in the Map stage to obtain target intermediate data, after receiving a compression request for the target intermediate data output in the Map stage, compression parameters corresponding to the target intermediate data are obtained from a pre-configured file, multiple Map output compression formats configured by the compression parameters corresponding to the target intermediate data are displayed based on the compression parameters corresponding to the target intermediate data (i.e., first compression parameters), so that a selected operation for the multiple Map output compression formats configured in the first compression parameters is received, the Map output compression format corresponding to the selected operation is used as the target Map output compression format, then the target intermediate data are compressed based on the target Map output compression format to obtain compressed target intermediate data, and the compressed target intermediate data are transmitted to the Reduce stage based on the Shuffle.
Assuming that the key value corresponding to the Map task 1 is k1, and the key value corresponding to the Reduce task 1 is also k1, processing the compressed target intermediate data through the Reduce task 1 in the Reduce stage to obtain target result data, after receiving a compression request for the target result data output in the Reduce stage, acquiring compression parameters corresponding to the target result data from a pre-configured file, displaying multiple Reduce output compression formats configured by the compression parameters corresponding to the target result data based on the compression parameters corresponding to the target result data (i.e., the second compression parameters), accordingly, receiving a selected operation for the multiple Reduce output compression formats configured in the second compression parameters, taking the Reduce output compression format corresponding to the selected operation as a target Reduce output compression format, then compressing the target result data based on the target Reduce output compression format to obtain compressed target result data, and transmitting the compressed target result data to a storage system such as an HDFS.
Therefore, according to the embodiment of the disclosure, the compression parameters corresponding to the target intermediate data are obtained from the pre-configured file, the target Map output compression format is determined based on the compression parameters corresponding to the target intermediate data, and further the compression of the target intermediate data is achieved based on the target Map output compression format, so that the compression mode of the target intermediate data is enriched, and the use experience of a user is improved.
In addition, the embodiment of the disclosure can further obtain the compression parameters corresponding to the target result data from the pre-configured file, and determine the target Map output compression format based on the compression parameters corresponding to the target result data, so as to compress the target result data based on the target Map output compression format, enrich the compression mode of the target result data, and improve the use experience of users.
Based on the above method embodiments, the present disclosure further provides a data compression device, and referring to fig. 5, a schematic structural diagram of the data compression device provided in the embodiments of the present disclosure is provided, where the device includes:
the first obtaining module 501 is configured to obtain, in response to a compression request for target intermediate data output in a Map mapping stage, a compression parameter corresponding to the target intermediate data from a pre-configured file, and take the compression parameter corresponding to the target intermediate data as a first compression parameter; wherein, a plurality of Map output compression formats are configured in the first compression parameters;
a first determining module 502, configured to determine a target Map output compression format from a plurality of Map output compression formats configured in the first compression parameter;
a first compression module 503, configured to compress the target intermediate data based on the target Map output compression format, to obtain compressed target intermediate data; the compressed target intermediate data is used for being transmitted to a reduction stage.
In an alternative embodiment, the apparatus further comprises:
the second acquisition module is used for responding to a compression request of target result data output by the Reduce stage, acquiring compression parameters corresponding to the target result data from the preconfigured file, and taking the compression parameters corresponding to the target result data as second compression parameters; the target result data is obtained by processing the compressed target intermediate data in the Reduce stage, and a plurality of Reduce output compression formats are configured in the second compression parameters;
the second determining module is used for determining a target Reduce output compression format from a plurality of Reduce output compression formats configured in the second compression parameters;
the second compression module is used for compressing the target result data based on the target Reduce output compression format to obtain compressed target processing result data; the target processing result data is used for being transmitted to a storage system.
In an alternative embodiment, the apparatus further comprises:
the display module is used for displaying a plurality of Map output compression formats configured in the first compression parameters;
accordingly, the first determining module 502 is specifically configured to:
responding to a selected operation of multiple Map output compression formats configured in the first compression parameters, and taking the Map output compression format corresponding to the selected operation as a target Map output compression format.
In an alternative embodiment, the first determining module 502 is specifically configured to:
and determining a target Map output compression format from the plurality of Map output compression formats configured in the first compression parameters according to the arrangement sequence of the plurality of Map output compression formats configured in the first compression parameters.
In an alternative embodiment, the first determining module 502 is specifically configured to:
and determining a target Map output compression format from a plurality of Map output compression formats configured in the first compression parameters based on a preset sequence.
In an alternative embodiment, the plurality of Map output compression formats includes a snappy format and a lzo format.
In an alternative embodiment, the plurality of Reduce output compression formats includes lzo format and Bzip2 format.
In the data compression device provided by the embodiment of the disclosure, in response to a compression request for target intermediate data output in a Map stage, compression parameters corresponding to the target intermediate data are obtained from a pre-configured file, the compression parameters corresponding to the target intermediate data are used as first compression parameters, a plurality of Map output compression formats are configured in the first compression parameters, then the target Map output compression format is determined from the plurality of Map output compression formats configured in the first compression parameters, the target intermediate data are compressed based on the target Map output compression format, and compressed target intermediate data are obtained, wherein the compressed target intermediate data are used for being transmitted to a reduction stage. Therefore, according to the embodiment of the disclosure, the compression parameters corresponding to the target intermediate data are obtained from the pre-configured file, the target Map output compression format is determined based on the compression parameters corresponding to the target intermediate data, and further the compression of the target intermediate data is achieved based on the target Map output compression format, so that the compression mode of the target intermediate data is enriched, and the use experience of a user is improved.
In addition to the above methods and apparatuses, the embodiments of the present disclosure further provide a computer readable storage medium, where instructions are stored, when the instructions are executed on a terminal device, to cause the terminal device to implement the data compression method according to the embodiments of the present disclosure.
The disclosed embodiments also provide a computer program product comprising computer programs/instructions which, when executed by a processor, implement the data compression method according to the disclosed embodiments.
In addition, the embodiment of the present disclosure further provides a data compression device, as shown in fig. 6, which may include:
a processor 601, a memory 602, an input device 603 and an output device 604. The number of processors 601 in the data compression device may be one or more, one processor being an example in fig. 6. In some embodiments of the present disclosure, the processor 601, memory 602, input device 603, and output device 604 may be connected by a bus or other means, with the bus connection being exemplified in fig. 6.
The memory 602 may be used to store software programs and modules, and the processor 601 performs various functional applications of the data compression device and data processing by executing the software programs and modules stored in the memory 602. The memory 602 may primarily include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for at least one function, and the like. In addition, the memory 602 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. The input means 603 may be used to receive input numeric or character information and to generate signal inputs related to user settings and function control of the data compression device.
In particular, in this embodiment, the processor 601 loads executable files corresponding to the processes of one or more application programs into the memory 602 according to the following instructions, and the processor 601 executes the application programs stored in the memory 602, so as to implement the various functions of the data compression device.
It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing is merely a specific embodiment of the disclosure to enable one skilled in the art to understand or practice the disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown and described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of data compression, the method comprising:
responding to a compression request of target intermediate data output in a Map mapping stage, acquiring compression parameters corresponding to the target intermediate data from a pre-configured file, and taking the compression parameters corresponding to the target intermediate data as first compression parameters; wherein, a plurality of Map output compression formats are configured in the first compression parameters;
determining a target Map output compression format from a plurality of Map output compression formats configured in the first compression parameters;
compressing the target intermediate data based on the target Map output compression format to obtain compressed target intermediate data; the compressed target intermediate data is used for being transmitted to a reduction stage.
2. The method of claim 1, wherein the compressing the target intermediate data based on the target Map output compression format, after obtaining compressed target intermediate data, further comprises:
responding to a compression request of target result data output by the Reduce stage, acquiring compression parameters corresponding to the target result data from the preconfigured file, and taking the compression parameters corresponding to the target result data as second compression parameters; the target result data is obtained by processing the compressed target intermediate data in the Reduce stage, and a plurality of Reduce output compression formats are configured in the second compression parameters;
determining a target Reduce output compression format from a plurality of Reduce output compression formats configured in the second compression parameters;
compressing the target result data based on the target Reduce output compression format to obtain compressed target processing result data; the target processing result data is used for being transmitted to a storage system.
3. The method of claim 1, wherein the determining the target Map output compression format from the plurality of Map output compression formats configured in the first compression parameter further comprises:
displaying a plurality of Map output compression formats configured in the first compression parameters;
accordingly, the determining the target Map output compression format from the multiple Map output compression formats configured in the first compression parameter includes:
responding to a selected operation of multiple Map output compression formats configured in the first compression parameters, and taking the Map output compression format corresponding to the selected operation as a target Map output compression format.
4. The method of claim 1, wherein determining the target Map output compression format from the plurality of Map output compression formats configured in the first compression parameter comprises:
and determining a target Map output compression format from the plurality of Map output compression formats configured in the first compression parameters according to the arrangement sequence of the plurality of Map output compression formats configured in the first compression parameters.
5. The method of claim 1, wherein determining the target Map output compression format from the plurality of Map output compression formats configured in the first compression parameter comprises:
and determining a target Map output compression format from a plurality of Map output compression formats configured in the first compression parameters based on a preset sequence.
6. The method of claim 1, wherein the plurality of Map output compression formats includes a snappy format and a lzo format.
7. The method of claim 2, wherein the plurality of Reduce output compression formats comprises a lzo format and a Bzip2 format.
8. A data compression apparatus, the apparatus comprising:
the first acquisition module is used for responding to a compression request of target intermediate data output in a Map mapping stage, acquiring compression parameters corresponding to the target intermediate data from a pre-configured file, and taking the compression parameters corresponding to the target intermediate data as first compression parameters; wherein, a plurality of Map output compression formats are configured in the first compression parameters;
the first determining module is used for determining a target Map output compression format from a plurality of Map output compression formats configured in the first compression parameters;
the first compression module is used for compressing the target intermediate data based on the target Map output compression format to obtain compressed target intermediate data; the compressed target intermediate data is used for being transmitted to a reduction stage.
9. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein instructions, which when run on a terminal device, cause the terminal device to implement the method of any of claims 1-7.
10. A data compression apparatus, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1-7 when the computer program is executed.
CN202311053461.0A 2023-08-18 2023-08-18 Data compression method, device, equipment and storage medium Pending CN117111845A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311053461.0A CN117111845A (en) 2023-08-18 2023-08-18 Data compression method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311053461.0A CN117111845A (en) 2023-08-18 2023-08-18 Data compression method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117111845A true CN117111845A (en) 2023-11-24

Family

ID=88808522

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311053461.0A Pending CN117111845A (en) 2023-08-18 2023-08-18 Data compression method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117111845A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105302494A (en) * 2015-11-19 2016-02-03 浪潮(北京)电子信息产业有限公司 Compression strategy selecting method and device
CN111930731A (en) * 2020-07-28 2020-11-13 苏州亿歌网络科技有限公司 Data dump method, device, equipment and storage medium
CN112925821A (en) * 2021-02-07 2021-06-08 江西理工大学 MapReduce-based parallel frequent item set incremental data mining method
CN114610792A (en) * 2022-03-09 2022-06-10 树根互联股份有限公司 Data processing method, device and system and industrial equipment
CN115442024A (en) * 2022-09-05 2022-12-06 哈尔滨理工大学 Chaos-based MapReduce data compression information protection method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105302494A (en) * 2015-11-19 2016-02-03 浪潮(北京)电子信息产业有限公司 Compression strategy selecting method and device
CN111930731A (en) * 2020-07-28 2020-11-13 苏州亿歌网络科技有限公司 Data dump method, device, equipment and storage medium
CN112925821A (en) * 2021-02-07 2021-06-08 江西理工大学 MapReduce-based parallel frequent item set incremental data mining method
CN114610792A (en) * 2022-03-09 2022-06-10 树根互联股份有限公司 Data processing method, device and system and industrial equipment
CN115442024A (en) * 2022-09-05 2022-12-06 哈尔滨理工大学 Chaos-based MapReduce data compression information protection method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王冬: "配电网时间序列数据的云计算集群快速压缩模型", 《2017年江西省电机工程学会年会论文集》, 31 December 2017 (2017-12-31), pages 299 - 303 *

Similar Documents

Publication Publication Date Title
US20140215170A1 (en) Block Compression in a Key/Value Store
CN109815261B (en) Global search function implementation and data real-time synchronization method and device and electronic equipment
JP6978467B2 (en) Systems and methods for converting sparse elements to dense matrices
US20170109371A1 (en) Method and Apparatus for Processing File in a Distributed System
CN103246730A (en) File storage method and device and file sensing method and device
JP2017138966A (en) Systems and methods for transforming sparse elements to dense matrix
CN110968585A (en) Method, device and equipment for storing orientation column and computer readable storage medium
CN110888862A (en) Data storage method, data query method, data storage device, data query device, server and storage medium
CN114461955A (en) Method for automatically generating http interface based on web page configuration
CN110334103B (en) Recommendation service updating method, providing device, access device and recommendation system
CN113873013B (en) Offline package reorganization method and system
CN114139040A (en) Data storage and query method, device, equipment and readable storage medium
CN109213950B (en) Data processing method and device for browser application of IPTV (Internet protocol television) intelligent set top box
CN116455956B (en) Method and system for data acquisition and data playback based on message middleware
CN117111845A (en) Data compression method, device, equipment and storage medium
CN113342813B (en) Key value data processing method, device, computer equipment and readable storage medium
CN113448739B (en) Data processing method and device
CN114519037A (en) Table online previewing method, device and system
CN116431585A (en) File compression method and device, and file decompression method and device
CN110750724B (en) Data processing method, device, equipment and storage medium
CN114218175A (en) Resource cross-platform sharing method and device, terminal equipment and storage medium
CN115291793A (en) Attribute data conversion method and device, storage medium and electronic device
CN113641643A (en) File writing method and device
CN111523066B (en) Data acquisition method and device
CN114065123A (en) Sparse matrix calculation method and acceleration device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination