WO2022021710A1

WO2022021710A1 - Data dump method and apparatus, device, and storage medium

Info

Publication number: WO2022021710A1
Application number: PCT/CN2020/132377
Authority: WO
Inventors: 宋大伟; 丁静
Original assignee: 苏州亿歌网络科技有限公司
Priority date: 2020-07-28
Filing date: 2020-11-27
Publication date: 2022-02-03
Also published as: CN111930731A

Abstract

A Map/Reduce architecture-based data dump method and apparatus, a device, and a storage medium. The method comprises: a Map end obtains target data and performs shuffling on the target data according to a target data classification framework, wherein the target data is stored in a row-type storage form (S110); a Reduce end aggregates and stores the shuffled target data in a column-type storage form according to a target data hierarchy (S120). According to the method, bumped data is stored in a column-type storage form, and multivariate data output corresponding to a target data classification framework can be implemented, thus improving the data query and utilization efficiency.

Description

Data dump method, device, device and storage medium

This application claims the priority of the Chinese patent application with application number 202010740395.4 filed with the China Patent Office on July 28, 2020, the entire contents of which are incorporated herein by reference.

technical field

The embodiments of the present application relate to the technical field of databases, and in particular, to a data dumping method, apparatus, device, and storage medium based on a Map/Reduce architecture.

Background technique

A database is a "warehouse that organizes, stores and manages data according to the data structure". It has a large storage space and can store millions, tens of millions, and hundreds of millions of pieces of data. It is suitable for various types of business data, such as games. Platform data, medical system data, etc. However, with the continuous increase of the amount of data stored in the database, how to improve the efficiency of data query and use is an urgent problem to be solved.

SUMMARY OF THE INVENTION

Embodiments of the present application provide a data dumping method, apparatus, device, and storage medium based on a Map/Reduce architecture, so as to improve data query and use efficiency.

In a first aspect, an embodiment of the present application provides a data dump method based on a Map/Reduce architecture, including:

The Map terminal obtains the target data, and shuffles the target data according to the target data grading framework; wherein, the target data is stored in the form of row storage;

The Reduce side aggregates and stores the shuffled target data in the form of columnar storage according to the hierarchical structure of the target data.

Further, the Map terminal shuffles the target data according to the target data grading framework, including:

The Map terminal parses the key of the target data;

The Map terminal sorts the keys of the target data according to the target data grading framework to obtain the shuffled target data.

Further, the Map terminal sorts the keys of the target data according to the target data grading framework, including:

The Map terminal sorts the hash values corresponding to the keys of the target data according to the target data grading framework.

Further, the Reduce side aggregates and stores the shuffled target data in the form of columnar storage according to the target data hierarchical structure, including:

The Reduce side obtains a preconfigured data hierarchy as the target data hierarchy;

The Reduce side aggregates and stores the shuffled target data in the form of columnar storage according to the target data hierarchical structure.

Optionally, the columnar storage form includes parquet columnar storage.

The Reduce side aggregates and stores the shuffled target data according to the hierarchical structure of the target data and the specification of parquet columnar storage.

In a second aspect, an embodiment of the present application further provides a data dump device based on a Map/Reduce architecture, including:

The Map-side processing module is used for the Map-side to obtain target data, and shuffle the target data according to the target data grading framework; wherein, the target data is stored in the form of row storage;

The processing module on the Reduce side is used for the Reduce side to aggregate and store the shuffled target data in the form of columnar storage according to the hierarchical structure of the target data.

Further, the Map-side processing module includes: a key parsing unit and a data shuffling unit, wherein,

the key parsing unit, used for parsing the key of the target data by the Map end;

The data shuffling unit is used for the Map end to sort the keys of the target data according to the target data grading framework to obtain the shuffled target data.

Further, the data shuffling unit is specifically configured for the Map side to sort the hash values corresponding to the keys of the target data according to the target data grading framework.

Further, the Reduce side processing module includes: a target data hierarchical structure acquisition unit and a data summary storage unit, wherein,

The target data hierarchical structure acquisition unit is used for the Reduce side to acquire a pre-configured data hierarchical structure as the target data hierarchical structure;

The data summary storage unit is used for the Reduce side to summarize and store the shuffled target data in the form of columnar storage according to the hierarchical structure of the target data.

Optionally, the columnar storage form includes parquet columnar storage.

Further, the Reduce side processing module is specifically used for the Reduce side to aggregate and store the shuffled target data according to the target data hierarchical structure and the specification of parquet columnar storage.

In a third aspect, an embodiment of the present application further provides a computer device, including a memory, a processor, and a computer program stored in the memory and running on the processor, the processor implementing the program as described in the present application when the processor executes the program. The data dump method based on the Map/Reduce architecture described in any embodiment.

In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the data based on the Map/Reduce architecture described in any embodiment of the present application dump method.

In the technical solution provided by the embodiments of the present application, after the Map side obtains the data stored in the form of row storage, it shuffles the data according to the target data classification framework, and the Reduce side reads the shuffled data according to the target data. The data hierarchical structure is summarized and stored in the form of columnar storage, and the dumped data is not only stored in the form of columnar storage, but also can achieve a multi-output effect that matches the target data hierarchical structure. This improves data query and usage efficiency.

Description of drawings

1 is a flowchart of a data dump method based on Map/Reduce architecture in Embodiment 1 of the present application;

2 is a flowchart of a data dump method based on the Map/Reduce architecture in Embodiment 2 of the present application;

3 is a flowchart of a data dump method based on the Map/Reduce architecture in Embodiment 3 of the present application;

4 is a schematic structural diagram of a data dump device based on the Map/Reduce architecture in Embodiment 4 of the present application;

FIG. 5 is a schematic diagram of a hardware structure of a computer device in Embodiment 5 of the present application.

detailed description

The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application. In addition, it should be noted that, for the convenience of description, the drawings only show some but not all the structures related to the present application.

Before discussing the exemplary embodiments in greater detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart depicts various operations (or steps) as a sequential process, many of the operations may be performed in parallel, concurrently, or concurrently. In addition, the order of operations can be rearranged. The process may be terminated when its operation is complete, but may also have additional steps not included in the figures. The processes may correspond to methods, functions, procedures, subroutines, subroutines, and the like.

Example 1

FIG. 1 is a flowchart of a data dump method based on the Map/Reduce architecture provided in the first embodiment of the present application, which can be applied to dump specific business data (for example, game platform data) to improve its query and use efficiency. In this case, the method may be executed by the data dump device based on the Map/Reduce architecture provided by the embodiments of the present application, and the device may be implemented in software and/or hardware, and may generally be integrated in a processor.

As shown in FIG. 1 , the data dump method based on the Map/Reduce architecture provided in this embodiment specifically includes:

S110. The Map terminal acquires the target data, and shuffles the target data according to the target data grading framework; wherein, the target data is stored in the form of row storage.

Target data refers to the data stored in the row database that needs to be dumped to improve data query and usage efficiency. Typically, the target data may be offline data stored in a row database within a set period of time (for example, within a month). Among them, the row-based database stores data according to the row, which is good at random read operations and is not suitable for big data analysis. Traditional databases such as SQL Server, Oracle, and MySQL all belong to the category of row-based databases.

The database stores data in the form of two-dimensional tables of rows and columns, but it stores data in the form of one-dimensional character strings. The following table 1 is taken as an example for simplified explanation.

Table 1

EmpIdEmpId	LastnameLastname	FirstnameFirstname	SalarySalary
11	SmithSmith	JoeJoe	4000040000
22	JonesJones	MaryMary	5000050000
33	JohnsonJohnson	CathyCathy	4400044000

A row-based database stores the data values in one row together, and then stores the data in the next row, and so on, that is:

1, Smith, Joe, 40000; 2, Jones, Mary, 50000; 3, Johnson, Cathy, 44000;

Suppose, when analyzing the salaries of all personnel, it is necessary to read the data of each row, and then select the data corresponding to the field "Salary", and then do the data analysis. In the case of a huge amount of data in Table 1, if the big data analysis only needs the data in the column "Salary", reading all the data in Table 1 will seriously affect the data query efficiency.

The target data grading framework can refer to a general data grading framework under the Map/Reduce architecture based on HDFS (Hadoop Distributed File System, Hadoop Distributed File System), or it can refer to a custom data grading framework under the HDFS-based Map/Reduce architecture Data Grading Framework. Typically, a custom data grading framework can match the following target data hierarchy, that is, the shuffled data ordering matches the target data hierarchy. Specifically, the shuffling operation of the target data is implemented under the grading framework of the target data.

In a cluster environment like Hadoop, most Map tasks and Reduce tasks are executed on different nodes. After each Map task is executed, an output file is generated and the data is pulled by the Reduce task, that is, the shuffled target data is pulled.

S120, the Reduce side aggregates and stores the shuffled target data in a columnar storage form according to the target data hierarchical structure.

Taking the above Table 1 as an example, columnar storage stores the data values in one column together, and then stores the data in the next column, and so on, that is:

1, 2, 3; Smith, Jones, Johnson; Joe, Mary, Cathy; 40000, 50000, 44000;

In a column-based storage database, data is stored according to the basic logical storage unit of the column, and the data in a column exists in the storage medium in the form of continuous storage. Suppose, when analyzing the salaries of all personnel, only one column of data needs to be read, that is, the data corresponding to the field "Salary", and then data analysis is performed. In the case of a large amount of data in Table 1, the query response time is significantly reduced, and data can be efficiently searched in the data column without maintaining an index (any column can be used as an index), which can minimize irrelevant input and output during the query process. , to avoid a full table data scan.

The target data hierarchical structure refers to a data hierarchical structure that matches the business analysis of target business data (for example, game platform business data, medical system data). Taking game platform business data as an example, for example, the first data layer is the game name, the second data layer is the event type (such as login events, consumption events, interactive events, etc.), and the third data layer is the date; For example, the first data layer is the game name, the second data layer is the user ID, the third data layer is the event type, and the fourth data layer is the date.

Typically, the data hierarchy is matched to business analysis and can be configured in real time. The user can configure the data hierarchy in real time according to actual big data analysis requirements or business analysis requirements, so that the data aggregated and output by the Reduce side matches the configured data hierarchy.

Furthermore, the Reduce side can aggregate and store the shuffled target data in the form of columnar storage according to the target data hierarchical structure, specifically:

Specifically, multiple data hierarchical structures can be pre-configured for storage, and the Reduce side obtains a data hierarchical structure selected by the user as the target data hierarchical structure, and then mixes mixed data structures according to the target data hierarchical structure. The washed target data is aggregated and stored in the form of columnar storage.

As a specific implementation manner, the columnar storage form may be parquet columnar storage.

Parquet is a columnar storage format capable of efficiently storing nested data. A Parquet file consists of a header, one or more blocks that follow it, and a footer for the end. Each file block whose file header contains only Parquet files is responsible for storing a row group, a row group consists of column blocks, and a column block is responsible for storing a column of data. The data in each column block is in units of pages.

Further, the Reduce side aggregates and stores the shuffled target data in the form of columnar storage according to the hierarchical structure of the target data, which may be specifically:

After pulling the shuffled data, the Reduce side collects and stores the shuffled target data according to the hierarchical structure of the target data that matches the business analysis and according to the parquet columnar storage specification (schema). That is, the data at each level is specified and stored, and the target data can be displayed to the user in the form of multiple outputs. For example, the data of a certain event of a certain game is designated to be stored, the data of a certain event of a certain game of a certain day is designated to be stored, and so on. Furthermore, when performing big data analysis, for example, when analyzing the data of a certain event of a certain game on a certain day, the data can be quickly obtained at the designated storage location, thereby improving the efficiency of data query and use.

Wherein, when the Reduce side aggregates and stores the shuffled target data according to the parquet columnar storage specification, it specifically sets the suffix of the generated data file to the suffix of the Parquet file, according to the compression of the Parquet columnar storage. method (such as compression ratio, etc.) for data compression, and so on.

In the technical solution provided by the embodiments of the present application, after the Map side obtains the data stored in the form of row storage, it shuffles the data according to the target data classification framework, and the Reduce side reads the shuffled data according to the target data. The data hierarchical structure is summarized and stored in the form of columnar storage, and the dumped data is not only stored in the form of columnar storage, but also can achieve a multi-dimensional output effect that matches the target data hierarchical structure. This improves data query and usage efficiency.

Embodiment 2

FIG. 2 is a flowchart of a data dump method based on a Map/Reduce architecture provided in Embodiment 2 of the present application. This embodiment is optimized on the basis of the above-mentioned embodiment, wherein the Map end is classified according to the target data framework. Shuffling the target data, specifically:

The Map terminal parses the keys of the target data; the Map terminal sorts the keys of the target data according to the target data grading framework to obtain the shuffled target data.

As shown in Figure 2, the data dump method based on the Map/Reduce architecture provided by this embodiment specifically includes:

S210. The Map terminal obtains target data, where the target data is stored in a row-based storage format.

S220. The Map terminal parses the key of the target data.

A key/value pair store is the simplest form of organization for a database. After the Map task runs, the target data is parsed into the form of key-value pairs, and the key and value values of the data can be obtained.

S230. The Map terminal sorts the keys of the target data according to the target data grading framework.

Based on the search of the key value, the value value corresponding to the key value can be quickly found, and then the shuffling and sorting of the target data can be realized by sorting the key value according to the target data grading framework. Furthermore, before the data reaches the Reduce side, the data has been sorted by key based on the target data grading framework.

Specifically, the key ordering can be related to the encapsulation type of the key. For example, if the key is an IntWritable type encapsulated as an int, then the keys can be sorted according to the numerical size according to the target data grading framework. If the key is a Text type encapsulated into a String, then The characters can be sorted according to the data lexicographical order according to the target data grading framework.

As a specific implementation manner, the Map side sorts the keys of the target data according to the target data grading framework, which may also be specifically: the Map side sorts the keys of the target data according to the target data grading framework. Hash values are sorted.

The key of the data is calculated by the hash function, and the hash value corresponding to the key of the data is obtained. After sorting the hash value, the tasks with the same hash value can be divided into the same node for processing. That is, hash value sorting enables key-value pairs with the same key value to be grouped together, and ensures that all key-value pairs of a single key value will be sent to the same Reducer for processing in one Reduce function call , or you can combine the key-value pairs (<key,value_1>,<key,value_2>,...,<key,value_n>) of a single key value into <key,(value_1,value_2,...,value_n) >, and as an input parameter to a Reduce function call.

S240. The Reduce side pulls the sorted target data, and acquires a preconfigured data hierarchy as the target data hierarchy.

S250. The Reduce end aggregates and stores the sorted target data in a columnar storage form according to the target data hierarchical structure.

For details that are not explained in this embodiment, please refer to the foregoing embodiments, which will not be repeated here.

In the above technical solution, data dumping is implemented based on the Map/Reduce architecture, so that the data originally stored in the form of row storage is converted into data stored in the form of columnar storage, and the dumped data is pre-configured with support It is output in multiple forms that match the hierarchical data structure, which is convenient for users to analyze big data, and achieves the beneficial effect of improving data query and use efficiency. At the same time, the data dump is realized based on the Map/Reduce architecture, which improves the stability of data dump processing.

Embodiment 3

FIG. 3 is a flowchart of a data dump method based on a Map/Reduce architecture provided by Embodiment 3 of the present application. On the basis of the foregoing embodiment, this embodiment provides a specific implementation manner.

As shown in FIG. 3 , the data dump method based on the Map/Reduce architecture provided by this embodiment specifically includes:

S310 , the Map terminal obtains target data, where the target data is stored in a row storage format.

S320. The Map terminal parses the key of the target data.

S330. The Map terminal sorts the hash values corresponding to the keys of the target data according to the target data grading framework.

S340. The Reduce side pulls the sorted target data, and acquires a preconfigured data hierarchy as the target data hierarchy.

S350. The Reduce side aggregates and stores the sorted target data according to the hierarchical structure of the target data and the specification of parquet columnar storage.

In the above technical solution, data dumping is implemented based on the Map/Reduce architecture, so that the data originally stored in the form of row storage is converted into data stored in the form of columnar storage, and the dumped data is pre-configured with support The data is output in multiple forms matching the hierarchical structure of the data, which is convenient for users to analyze big data, and achieves the beneficial effect of improving data query and use efficiency. At the same time, the data dump is realized based on the Map/Reduce architecture, which improves the stability of data dump processing.

Embodiment 4

4 is a schematic structural diagram of a data dump device based on Map/Reduce architecture provided in Embodiment 4 of the present application, which can be applied to dump specific business data (for example, game platform data) to improve its query and use efficiency In the case of the above, the device can be implemented in software and/or hardware, and can generally be integrated in a processor.

As shown in FIG. 4 , the data dumping device based on the Map/Reduce architecture specifically includes: a Map-side processing module 410 and a Reduce-side processing module 420, wherein,

The Map side processing module 410 is used for the Map side to obtain the target data, and according to the target data grading framework, the target data is shuffled; Wherein, the target data is stored in the row storage form;

The Reduce side processing module 420 is used for the Reduce side to aggregate and store the shuffled target data in the form of columnar storage according to the hierarchical structure of the target data.

In the technical solution provided by the embodiments of the present application, after the Map side obtains the data stored in the form of row storage, it shuffles the data according to the target data classification framework, and after the Reduce side reads the shuffled data, it shuffles the data according to the target data classification framework. The data hierarchical structure is summarized and stored in the form of columnar storage, and the dumped data is not only stored in the form of columnar storage, but also can achieve a multi-output effect that matches the target data hierarchical structure. This improves data query and usage efficiency.

Optionally, the columnar storage form includes parquet columnar storage.

The above-mentioned data dumping device based on the Map/Reduce architecture can execute the data dumping method based on the Map/Reduce architecture provided by any embodiment of the present application, and has corresponding functional modules and beneficial effects for executing the data dumping method based on the Map/Reduce architecture. Effect.

Embodiment 5

FIG. 5 is a schematic structural diagram of a computer device according to Embodiment 5 of the present application. Figure 5 shows a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present application. The computer device 12 shown in FIG. 5 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.

As shown in FIG. 5, computer device 12 takes the form of a general-purpose computing device. Components of computer device 12 may include, but are not limited to, one or more processors or processing units 16 , system memory 28 , and a bus 18 connecting various system components including system memory 28 and processing unit 16 .

Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures. By way of example, these architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, Enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect ( PCI) bus.

Computer device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by computer device 12, including both volatile and nonvolatile media, removable and non-removable media.

System memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32 . Computer device 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media. For example only, storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 5, commonly referred to as a "hard disk drive"). Although not shown in Figure 5, a disk drive may be provided for reading and writing to removable non-volatile magnetic disks (eg "floppy disks"), as well as removable non-volatile optical disks (eg CD-ROM, DVD-ROM) or other optical media) to read and write optical drives. In these cases, each drive may be connected to bus 18 through one or more data media interfaces. System memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present application.

A program/utility 40 having a set (at least one) of program modules 42, which may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and programs Data, each or some combination of these examples may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.

Computer device 12 may also communicate with one or more external devices 14 (eg, keyboard, pointing device, display 24, etc.), may also communicate with one or more devices that enable a user to interact with computer device 12, and/or communicate with Any device (eg, network card, modem, etc.) that enables the computer device 12 to communicate with one or more other computing devices. Such communication may take place through input/output (I/O) interface 22 . Also, the computer device 12 may communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network such as the Internet) through a network adapter 20 . As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18 . It should be understood that, although not shown in FIG. 5, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tapes drives and data backup storage systems, etc.

The processing unit 16 executes various functional applications and data processing by running the programs stored in the system memory 28, for example, implementing a data dump based on the Map/Reduce architecture provided by the embodiments of the present application. That is, when the processing unit executes the program, it realizes: the Map side obtains the target data, and shuffles the target data according to the target data grading framework; wherein, the target data is stored in the form of row storage; the Reduce side According to the hierarchical structure of the target data, the shuffled target data is aggregated and stored in the form of columnar storage.

Embodiment 6

The sixth embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements a Map/Reduce architecture-based data dump as provided by all the application embodiments of the present application Method: That is, when the program is executed by the processor: the Map side obtains the target data, and shuffles the target data according to the target data grading framework; wherein, the target data is stored in the form of row storage; the Reduce side According to the hierarchical structure of the target data, the shuffled target data is aggregated and stored in the form of columnar storage.

Any combination of one or more computer-readable media may be employed. The computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium. The computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples (a non-exhaustive list) of computer readable storage media include: electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), Erasable Programmable Read Only Memory (EPROM or Flash), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In this document, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

A computer-readable signal medium may include a propagated data signal in baseband or as part of a carrier wave, with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .

Program code embodied on a computer readable medium may be transmitted using any suitable medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out the operations of the present application may be written in one or more programming languages, including object-oriented programming languages (such as Java, Smalltalk, C++), and conventional procedural programming language (such as the "C" language or similar programming language). The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. Where a remote computer is involved, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider via the Internet connect).

Claims

A data dump method based on Map/Reduce architecture, including:

The Map terminal obtains the target data, and shuffles the target data according to the target data grading framework; wherein, the target data is stored in the form of row storage;

The Reduce side aggregates and stores the shuffled target data in the form of columnar storage according to the hierarchical structure of the target data.
The method according to claim 1, wherein the Map side shuffles the target data according to a target data grading framework, comprising:

The Map terminal parses the key of the target data;

The Map terminal sorts the keys of the target data according to the target data grading framework to obtain the shuffled target data.
The method according to claim 2, wherein, the Map terminal sorts the keys of the target data according to the target data grading framework, comprising:

The Map terminal sorts the hash values corresponding to the keys of the target data according to the target data grading framework.
The method according to claim 1, wherein the Reduce side aggregates and stores the shuffled target data in a columnar storage form according to the target data hierarchical structure, comprising:

The Reduce side obtains a preconfigured data hierarchy as the target data hierarchy;

The Reduce side aggregates and stores the shuffled target data in the form of columnar storage according to the target data hierarchical structure.
The method according to any one of claims 1-4, wherein the columnar storage form comprises parquet columnar storage.
The method according to claim 5, wherein the Reduce side aggregates and stores the shuffled target data in a columnar storage form according to the target data hierarchical structure, comprising:

The Reduce side aggregates and stores the shuffled target data according to the hierarchical structure of the target data and the specification of parquet columnar storage.
A data dump device based on Map/Reduce architecture, comprising:

The Map-side processing module is used for the Map-side to obtain target data, and shuffle the target data according to the target data grading framework; wherein, the target data is stored in the form of row storage;

The processing module on the Reduce side is used for the Reduce side to aggregate and store the shuffled target data in the form of columnar storage according to the hierarchical structure of the target data.
The device according to claim 7, wherein the Map-side processing module comprises: a key parsing unit and a data shuffling unit, wherein,

the key parsing unit, used for parsing the key of the target data by the Map end;

The data shuffling unit is used for the Map end to sort the keys of the target data according to the target data grading framework to obtain the shuffled target data.
The apparatus according to claim 8, wherein the data shuffling unit is specifically configured for the Map side to sort the hash values corresponding to the keys of the target data according to the target data grading framework.
The apparatus according to claim 7, wherein the Reduce side processing module comprises: a target data hierarchical structure acquisition unit and a data summary storage unit, wherein,

The target data hierarchical structure acquisition unit is used for the Reduce side to acquire a pre-configured data hierarchical structure as the target data hierarchical structure;

The data summary storage unit is used for the Reduce side to summarize and store the shuffled target data in the form of columnar storage according to the hierarchical structure of the target data.
A computer device, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the program according to any one of claims 1-6 when the processor executes the program method.
A computer-readable storage medium on which a computer program is stored, wherein the program, when executed by a processor, implements the method according to any one of claims 1-6.