WO2021068351A1

WO2021068351A1 - Cloud-storage-based data transmission method and apparatus, and computer device

Info

Publication number: WO2021068351A1
Application number: PCT/CN2019/118401
Authority: WO
Inventors: 邓煜
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-10-12
Filing date: 2019-11-14
Publication date: 2021-04-15
Also published as: CN111090645B; CN111090645A

Abstract

Disclosed in the present application are a cloud-storage-based data transmission method and apparatus, a computer device, and a storage medium. The method comprises: receiving full data uploaded by means of an Hive database, and storing the full data; obtaining the number of pre-partitioned regions in an HBase database; according to the number of the pre-partitioned regions and the row key of each data of the full data, partitioning the full data to obtain corresponding partition data; sequentially sorting each partition data in an ascending order according to column and row keys so as to obtain corresponding sorted partition data; and transmitting each sorted partition data to a partition server corresponding to the Hbase database for storage.

Description

Data transmission method, device and computer equipment based on cloud storage

This application claims the priority of the Chinese patent application filed with the Chinese Patent Office on October 12, 2019, the application number is 201910969811.5, and the application name is "cloud storage-based data transmission methods, devices and computer equipment", the entire content of which is incorporated by reference Incorporated in this application.

Technical field

This application relates to the field of cloud storage technology, and in particular to a data transmission method, device, computer equipment, and storage medium based on cloud storage.

Background technique

At present, when the data in the Hive database (Hive is a data warehouse tool that can map structured data files to a database table) is written into HBase (HBase is a distributed, column-oriented open source database), Generally, offline batch writing or streaming writing is used, but the above two methods both use the put method when writing data into HBase (put is one of the data insertion methods in HBase), which is inserted through the put instruction The data is inserted while sorting, which affects the data processing efficiency of the HBase cluster and leads to low data writing efficiency.

Summary of the invention

The embodiments of the present application provide a data transmission method, device, computer equipment, and storage medium based on cloud storage, aiming to solve the problem that when data is written into HBase in the prior art, the put method is used, and when the data is inserted through the put instruction It is inserting while sorting, which affects the data processing efficiency of the HBase cluster, and causes the problem of low data writing efficiency.

In the first aspect, an embodiment of the present application provides a data transmission method based on cloud storage, which includes:

Receive and store the full amount of data uploaded by the Hive database; wherein, the Hive database is a data warehouse database;

Acquiring the number of pre-partitions in the HBase database; wherein the HBase database is a distributed open source database, and each pre-partition in the HBase database corresponds to a partition server;

According to the number of pre-partitions and the row key of each data in the full data, the full data is partitioned to obtain corresponding partition data; wherein the total number of partitions of the partition data is equal to the number of pre-partitions, and each One partition data uniquely corresponds to one partition server;

Sort each partition data in ascending order according to the column and row key to obtain the corresponding sorted partition data; and

Send each sorted partition data to the partition server corresponding to the Hbase database for storage.

In the second aspect, an embodiment of the present application provides a cloud storage-based data transmission device, which includes:

The receiving unit is used to receive and store the full amount of data uploaded by the Hive database; wherein, the Hive database is a data warehouse database;

The partition number obtaining unit is used to obtain the number of pre-partitions in the HBase database; wherein, the HBase database is a distributed open source database, and each pre-partition in the HBase database corresponds to a partition server;

The partition unit is used to partition the full amount of data according to the number of pre-partitions and the row key of each data in the full amount of data to obtain corresponding partitioned data; wherein, the total number of partitions of the partitioned data and the number of pre-partitioned data The numbers are equal, and each partition data uniquely corresponds to a partition server;

The sorting unit is used to sort each partition data in ascending order according to the column and row key in turn to obtain the corresponding sorted partition data; and

The transmission unit is used to send each sorted partition data to the partition server corresponding to the Hbase database for storage.

In a third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and running on the processor, and the processor executes the computer The program implements the cloud storage-based data transmission method described in the first aspect above.

In a fourth aspect, the embodiments of the present application also provide a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor executes the above-mentioned first The cloud storage-based data transmission method described in one aspect.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. Ordinary technicians can obtain other drawings based on these drawings without creative work.

FIG. 1 is a schematic diagram of an application scenario of a data transmission method based on cloud storage provided by an embodiment of the application;

2 is a schematic flowchart of a data transmission method based on cloud storage provided by an embodiment of the application;

FIG. 3 is a schematic diagram of a sub-flow of a data transmission method based on cloud storage provided by an embodiment of the application;

4 is a schematic diagram of another sub-flow of the cloud storage-based data transmission method provided by an embodiment of the application;

FIG. 5 is a schematic diagram of another sub-process of the cloud storage-based data transmission method provided by an embodiment of the application;

6 is a schematic diagram of another sub-flow of the cloud storage-based data transmission method provided by an embodiment of the application;

FIG. 7 is a schematic block diagram of a data transmission device based on cloud storage provided by an embodiment of the application;

FIG. 8 is a schematic block diagram of subunits of a cloud storage-based data transmission device provided by an embodiment of the application;

FIG. 9 is a schematic block diagram of another subunit of the cloud storage-based data transmission device provided by an embodiment of the application;

FIG. 10 is a schematic block diagram of another subunit of the cloud storage-based data transmission device provided by an embodiment of the application; FIG.

FIG. 11 is a schematic block diagram of another subunit of the cloud storage-based data transmission device provided by an embodiment of the application; FIG.

FIG. 12 is a schematic block diagram of a computer device provided by an embodiment of this application.

Detailed ways

The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

It should be understood that when used in this specification and appended claims, the terms "including" and "including" indicate the existence of the described features, wholes, steps, operations, elements and/or components, but do not exclude one or The existence or addition of multiple other features, wholes, steps, operations, elements, components, and/or collections thereof.

It should also be understood that the terms used in the specification of this application are only for the purpose of describing specific embodiments and are not intended to limit the application. As used in the specification of this application and the appended claims, unless the context clearly indicates otherwise, the singular forms "a", "an" and "the" are intended to include plural forms.

It should be further understood that the term "and/or" used in the specification and appended claims of this application refers to any combination and all possible combinations of one or more of the associated listed items, and includes these combinations .

Please refer to FIGS. 1 and 2. FIG. 1 is a schematic diagram of an application scenario of a data transmission method based on cloud storage provided by an embodiment of the application; FIG. 2 is a schematic flowchart of a data transmission method based on cloud storage provided by an embodiment of the application. The data transmission method based on cloud storage is applied to the server, and the method is executed by application software installed in the server.

As shown in Figure 2, the method includes steps S110 to S150.

S110. Receive and store the full amount of data uploaded by the Hive database; wherein, the Hive database is a data warehouse database.

In this embodiment, the technical solution is described from the perspective of a cloud computing platform. The cloud computing platform in this application specifically uses Spark. Spark is a fast and universal computing engine designed for large-scale data processing. Spark enables memory distributed data sets. In addition to providing interactive queries, it can also optimize iterations. Work load.

When the cloud computing platform receives the full amount of data uploaded by the Hive database, it generates a logical dataframe (dataframe is a collection of rows of dataset, dataset is a new interface added in Spark 1.6+) for physical storage (physical storage is memory Combined with disk storage).

S120. Obtain the number of pre-partitions in the HBase database; wherein the HBase database is a distributed open source database, and each pre-partition in the HBase database corresponds to a partition server.

In this embodiment, after the storage of the full amount of data is completed on the cloud computing platform, in order to know how many partitions the full amount of data is subsequently divided into for storage, the number of pre-partitions needs to be obtained from the HBase database at this time.

Wherein, the HBase database is a distributed open source database, and each pre-partition in the HBase database corresponds to a partition server. It is a highly reliable, high-performance, column-oriented, and scalable distributed storage system based on Hadoop , Using HBase technology can build a large-scale structured storage cluster on a cheap computer server.

In an embodiment, as shown in FIG. 3, step S120 includes:

S121. Send an RPC request to the HBase database; wherein the RPC request is a remote procedure call protocol request;

S122. Receive meta-information sent by the HBase database according to the RPC request, and obtain the number of pre-partitions according to the meta-information.

In this embodiment, when the full amount of data is stored on the cloud computing platform, the cloud computing platform will initiate an RPC request (RPC request is a remote procedure call protocol request, which is a request for service from a remote computer program through the network) , Access the zk meta-information of the Hbase database (ie ZooKeeper meta-information, ZooKeeper is a distributed, open source distributed application coordination service), and the partition of the HBase pre-built table has been stored in the zk meta-information Information, the number of pre-partitions in the HBase database is also known. By knowing the number of pre-partitions in the HBase database, the entire amount of data can be accurately divided into the same number of partitions.

S130. Partition the full amount of data according to the number of pre-partitions and the row key of each data in the full amount of data to obtain corresponding partitioned data; wherein the total number of partitions of the partitioned data is equal to the number of pre-partitions, And each partition data uniquely corresponds to a partition server.

In this embodiment, the full amount of data stored in the dataframe in the cloud computing platform is divided into corresponding partitions according to the pre-partitioning method of HexStringSplit. Among them, HexStringSplit is a kind of pre-splitting that is suitable for prefixing the string with hexadecimal as the row key.

In an embodiment, as shown in FIG. 4, step S130 includes:

S131. Obtain a row key corresponding to each data in the full amount of data;

S132. Use the MD5 encryption algorithm or the SHA-256 encryption algorithm to generate a corresponding hash value for the row key of each data;

S133. The hash value corresponding to each row key is modulo the number of pre-partitions to obtain a remainder corresponding to each row key;

S134. Store the data corresponding to each row key in the partition corresponding to the remainder corresponding to the row key to obtain corresponding partition data.

In this embodiment, each data in Spark corresponds to a row key (ie, rowkey). At this time, the row key of each data is obtained first, so that the data can be divided into corresponding areas after corresponding processing.

Later, when the row key of each data is calculated by MD5 encryption algorithm or SHA encryption algorithm, it can correspond to the generated hash value. Among them, the MD5 algorithm is a widely used cryptographic hash function, which can generate a 128-bit (16-byte) hash value to ensure complete and consistent information transmission. The SHA-256 algorithm is a secure hash algorithm that can calculate a fixed-length character string (also called a message digest) corresponding to a digital message. The hash value of the row key is generated by the above MD5 or SHA-256 method to disperse it into the corresponding partition, so that the data with the same row key remainder is divided into the same partition. In this way, a fast and effective division of the full amount of data is realized.

Since each pre-partition in the HBase database corresponds to a partition server, and each partition data uniquely corresponds to a partition server, the correspondence relationship between partition data and partition server may be a preset correspondence relationship, for example, partition 1 corresponds to a partition Server 1,..., Partition N should be Partition Server N. After knowing the corresponding relationship between each partition data and the partition server, when subsequent data storage is performed, directional storage can be realized and storage efficiency can be improved.

S140: Sort each partition data in ascending order according to the column and row key in turn to obtain the corresponding sorted partition data.

In this embodiment, after the full amount of data is partitioned according to the number of pre-partitions in the cloud computing platform, then each partition data needs to be sorted again, and then sent to the Hbase database after sorting is completed Fast storage. At this time, when sorting the data of each partition, you can select the size of the column value and the row key value to sort.

In an embodiment, as shown in FIG. 5, step S140 includes:

S141. Obtain data with the same row key in each partition data, and sort the data with the same row key according to the ascending order of the columns to obtain the first sorted partition data corresponding to each partition data;

S142: Sort each first sorted partition data according to the ascending order of the row key to obtain sorted partition data corresponding to each first sorted partition data.

In this embodiment, in each partition data, the data with the same row key value is first classified into one category, and the data with the same row key value is internally sorted according to the column value in ascending order, so as to obtain the first sorted partition data. The first sorted partition data obtained after the initial sorting is completed can be sorted according to the ascending order of the row key to obtain sorted partition data corresponding to each first sorted partition data. It can be seen that after sorting the partition data by column and row keys, the data can be stored more regularly.

S150. Send each sorted partition data to a partition server corresponding to the Hbase database for storage.

In this embodiment, after sorting the partition data to obtain the corresponding sort partition data, it can be directly sent to the Hbase database for storage. There is no need to use the put instruction to insert data while sorting while inserting. As a result, the data processing efficiency of the HBase cluster is affected, and the partitioned and sorted data is directly stored in the Hbase database, and it only needs to be stored directly, which improves the storage efficiency.

In one embodiment, as shown in FIG. 6, step S150 includes:

S151. Input each sorted partition data to a local HDFS layer to convert each sorted partition data into a corresponding data file; wherein, the HDFS layer is a distributed file system layer;

S152. Send the data file to the partition server corresponding to the Hbase database for storage.

In this embodiment, the bottom layer of the cloud computing platform (ie Spark) is the HDFS layer for storing data, and each sorted partition data is input to the HDFS layer, and then the sorted partition data can be converted into a data file by the HDFS layer . The data file is specifically an HFile file. The HFile file includes 7 blocks (ie blocks), which can be divided into:

a) Datablock (datablock is the data block), stored key-value data (key-value pair data), generally the size of a datablock is 64KB by default;

b) data index block, which stores the index of the datablock (index is the index), the index can be a multi-level index, the intermediate index, and the leaf index are generally distributed in the HFile file;

c) bloom filterblock, which saves the value of the bloom filter (ie bloom filter);

d) There are multiple metadata blocks, metadata blocks (that is, metadata blocks), and they are continuously distributed;

e) metadata index, which means the index of metadata (ie metadata);

f) file-info block (ie file information block), which records some information about the file, such as: the largest key in the HFile, the average key length, the HFile creation timestamp, the encoding method used by the data block, etc.;

g) Trailer block (that is, trailer), each HFile file will have a trailer block. For different versions of HFile (there are three versions of V1, V2, V3, V2 and V3 are not much different), the trailer length may be different , But all HFile trailers of the same version have the same length, and the last 4B of the trailer must be version information.

It can be seen that the sorted partition data is stored in the local HDFS layer, and it is stored in a way of converting into HFile files.

When the sorted partition data is converted into HFile files in the HSFS layer, the HFile files corresponding to the sorted partition data can be sent to the partition server corresponding to the Hbase database. After that, the partition server of the Hbase database uses the Bulkload scheme (that is, the main loading scheme) to write the HFile into the HBase database. Among them, the advantage of Bulkload is that the import process does not occupy partition resources; it can quickly import a large amount of data; and it saves memory.

In an embodiment, after step S150, the method further includes:

If it is detected that the data transmission error message sent by the Hbase database has been received, the data transmission interruption point is obtained by partitioning data after each sort according to the log file corresponding to the data transmission error message;

The data located after the data transmission interruption point of each sorted partition data is sent to the partition server corresponding to the Hbase database for storage.

In this embodiment, in the process of sending the sorted partition data to the Hbase database for storage, if there is a transmission interruption, the data transmission error message sent by the Hbase database can be received at this time, according to the data After each sorting, the log file corresponding to the transmission error message is partitioned to locate the data transmission interruption point. After the data transmission interruption point is obtained, data transmission can be continued from the data after the data transmission interruption point, ensuring that normal transmission can be resumed after an abnormality occurs.

This method realizes that before the full amount of data is written into the Hbase database, the sorting process is completed in the cloud, which improves the efficiency of data writing into the Hbase database.

The embodiment of the present application also provides a data transmission device based on cloud storage, which is used to execute any embodiment of the aforementioned data transmission method based on cloud storage. Specifically, please refer to FIG. 7, which is a schematic block diagram of a cloud storage-based data transmission apparatus provided in an embodiment of the present application. The cloud storage-based data transmission device 100 can be configured in a server.

As shown in FIG. 7, the data transmission device 100 based on cloud storage includes a receiving unit 110, a partition number obtaining unit 120, a partition unit 130, a sorting unit 140, and a transmission unit 150.

The receiving unit 110 is configured to receive and store the full amount of data uploaded by the Hive database; wherein, the Hive database is a data warehouse database.

The partition number obtaining unit 120 is configured to obtain the number of pre-partitions in the HBase database; wherein, the HBase database is a distributed open source database, and each pre-partition in the HBase database corresponds to a partition server.

In an embodiment, as shown in FIG. 8, the partition number obtaining unit 120 includes:

The request sending unit 121 is configured to send an RPC request to the HBase database; wherein, the RPC request is a remote procedure call protocol request;

The meta-information analysis unit 122 is configured to receive meta-information sent by the HBase database according to the RPC request, and obtain the number of pre-partitions according to the meta-information.

The partition unit 130 is configured to partition the full amount of data according to the number of pre-partitions and the row key of each data in the full amount of data to obtain corresponding partitioned data; wherein, the total number of partitions of the partitioned data is the same as that of the pre-partitioned data. The number is equal, and each partition data uniquely corresponds to a partition server.

In an embodiment, as shown in FIG. 9, the partition unit 130 includes:

The row key acquiring unit 131 is configured to acquire the row key corresponding to each data in the full amount of data;

The hash unit 132 is configured to generate a corresponding hash value through the MD5 encryption algorithm or the SHA-256 encryption algorithm for the row key of each data;

A modulus calculation unit 133, configured to modulate the hash value corresponding to each row key to the number of pre-partitions to obtain a remainder corresponding to each row key;

The data partition unit 134 is configured to store the data corresponding to each row key in the partition corresponding to the remainder corresponding to the row key to obtain the corresponding partition data.

The sorting unit 140 is configured to sort each partition data in ascending order according to the column and row keys in order to obtain the corresponding sorted partition data.

In an embodiment, as shown in FIG. 10, the sorting unit 140 includes:

The first sorting unit 141 is configured to obtain data with the same row key in each partition data, and sort the data with the same row key according to the ascending column order to obtain the first sorted data corresponding to each partition data. Partition data

The second sorting unit 142 is configured to sort each first sorted partition data according to the ascending order of the row key to obtain sorted partition data corresponding to each first sorted partition data.

The transmission unit 150 is configured to send each sorted partition data to the partition server corresponding to the Hbase database for storage.

In this embodiment, after sorting the partition data to obtain the corresponding sort partition data, it can be directly sent to the Hbase database for storage, and there is no need to use the put instruction to insert data while sorting while inserting. As a result, the data processing efficiency of the HBase cluster is affected, and the partitioned and sorted data is directly stored in the Hbase database, and it only needs to be stored directly, which improves the storage efficiency.

In an embodiment, as shown in FIG. 11, the transmission unit 150 includes:

The bottom storage unit 151 is used to input each sorted partition data to the local HDFS layer to convert each sorted partition data into a corresponding data file; wherein, the HDFS layer is a distributed file system layer;

The data sending unit 152 is configured to send the data file to the partition server corresponding to the Hbase database for storage.

In this embodiment, the bottom layer of the cloud computing platform (ie Spark) is the HDFS layer for storing data, and each sorted partition data is input to the HDFS layer, and then the sorted partition data can be converted into a data file by the HDFS layer . It can be seen that the partition data after sorting is stored in the local HDFS layer, and it is converted into HFile files for storage.

In an embodiment, the cloud storage-based data transmission device 100 further includes:

The interruption point acquisition unit is configured to, if it is detected that the data transmission error message sent by the Hbase database has been received, locate and obtain the data transmission interruption point according to the log file corresponding to the data transmission error message after each sort;

The data transmission recovery unit is used to send the data located after the data transmission interruption point of each sorted partition data to the partition server corresponding to the Hbase database for storage.

In this embodiment, in the process of sending the sorted partition data to the Hbase database for storage, if there is a transmission interruption, the data transmission error message sent by the Hbase database can be received at this time, according to the data After each sorting, the log file corresponding to the transmission error message is partitioned to locate the data transmission interruption point. After the data transmission interruption point is acquired, data transmission can be continued from the data after the data transmission interruption point, ensuring that normal transmission can be resumed after an abnormality occurs.

The device realizes that before the full amount of data is written into the Hbase database, the sorting process is completed in the cloud, which improves the efficiency of data writing into the Hbase database.

The above-mentioned cloud storage-based data transmission device can be implemented in the form of a computer program, and the computer program can be run on a computer device as shown in FIG. 12.

Please refer to FIG. 12, which is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 is a server, and the server may be an independent server or a server cluster composed of multiple servers.

Referring to FIG. 12, the computer device 500 includes a processor 502, a memory, and a network interface 505 connected through a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.

The non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032. When the computer program 5032 is executed, the processor 502 can execute a data transmission method based on cloud storage.

The processor 502 is used to provide computing and control capabilities, and support the operation of the entire computer device 500.

The internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503. When the computer program 5032 is executed by the processor 502, the processor 502 can execute the data transmission method based on cloud storage.

The network interface 505 is used for network communication, such as providing data information transmission. Those skilled in the art can understand that the structure shown in FIG. 12 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied. The specific computer device 500 may include more or fewer components than shown in the figure, or combine certain components, or have a different component arrangement.

Wherein, the processor 502 is configured to run a computer program 5032 stored in a memory to implement the cloud storage-based data transmission method disclosed in the embodiment of the present application.

Those skilled in the art can understand that the embodiment of the computer device shown in FIG. 12 does not constitute a limitation on the specific configuration of the computer device. In other embodiments, the computer device may include more or less components than those shown in the figure. Or some parts are combined, or different parts are arranged. For example, in some embodiments, the computer device may only include a memory and a processor. In such an embodiment, the structures and functions of the memory and the processor are consistent with the embodiment shown in FIG. 12, and will not be repeated here.

It should be understood that in this embodiment of the application, the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. Among them, the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.

In another embodiment of the present application, a computer-readable storage medium is provided. The computer-readable storage medium may be a non-volatile computer-readable storage medium. The computer-readable storage medium stores a computer program, where the computer program is executed by a processor to implement the cloud storage-based data transmission method disclosed in the embodiments of the present application.

The storage medium is a physical, non-transitory storage medium, such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk, etc., which can store program codes. medium.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the above-described equipment, device, and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Anyone familiar with the technical field can easily think of various equivalents within the technical scope disclosed in this application. Modifications or replacements, these modifications or replacements shall be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

A data transmission method based on cloud storage includes:

Receive and store the full amount of data uploaded by the Hive database; wherein, the Hive database is a data warehouse database;

Acquiring the number of pre-partitions in the HBase database; wherein the HBase database is a distributed open source database, and each pre-partition in the HBase database corresponds to a partition server;

According to the number of pre-partitions and the row key of each data in the full data, the full data is partitioned to obtain corresponding partition data; wherein the total number of partitions of the partition data is equal to the number of pre-partitions, and each One partition data uniquely corresponds to one partition server;

Sort each partition data in ascending order according to the column and row key to obtain the corresponding sorted partition data; and

Send each sorted partition data to the partition server corresponding to the Hbase database for storage.
The data transmission method based on cloud storage according to claim 1, wherein said obtaining the number of pre-partitions in the HBase database comprises:

Sending an RPC request to the HBase database; wherein the RPC request is a remote procedure call protocol request;

The meta-information sent by the HBase database according to the RPC request is received, and the number of pre-partitions is obtained according to the meta-information.
The cloud storage-based data transmission method according to claim 1, wherein said partitioning said full amount of data according to the number of pre-partitions and the row key of each data in the full amount of data to obtain corresponding partitioned data, include:

Obtaining the row key corresponding to each data in the total data;

Use MD5 encryption algorithm or SHA-256 encryption algorithm to generate the corresponding hash value of the row key of each data;

Modulo the hash value corresponding to each row key to the number of pre-partitions to obtain a remainder corresponding to each row key;

The data corresponding to each row key is stored in the partition corresponding to the remainder corresponding to the row key to obtain the corresponding partition data.
The cloud storage-based data transmission method according to claim 1, wherein said sorting each partition data in ascending order according to column and row keys in order to obtain the corresponding sorted partition data comprises:

Obtain data with the same row key in each partition data, sort the data with the same row key according to the ascending column order, and obtain the first sorted partition data corresponding to each partition data;

Each first sorted partition data is sorted according to the ascending order of the row key to obtain sorted partition data corresponding to each first sorted partition data.
The cloud storage-based data transmission method according to claim 1, wherein the sending each sorted partition data to a partition server corresponding to the Hbase database for storage includes:

Input each sorted partition data to a local HDFS layer to convert each sorted partition data into a corresponding data file; wherein, the HDFS layer is a distributed file system layer;

The data file is sent to the partition server corresponding to the Hbase database for storage.
The cloud storage-based data transmission method according to claim 1, wherein, after said sending each sorted partition data to a partition server corresponding to the Hbase database for storage, the method further comprises:

If it is detected that the data transmission error message sent by the Hbase database has been received, the data transmission interruption point is obtained by partitioning data after each sort according to the log file corresponding to the data transmission error message;

The data located after the data transmission interruption point of each sorted partition data is sent to the partition server corresponding to the Hbase database for storage.
The data transmission method based on cloud storage according to claim 1, wherein the receiving and storing the full amount of data uploaded by the Hive database comprises:

Generate a dataframe to physically store the full amount of data; wherein, the dataframe is a matrix data table.
The data transmission method based on cloud storage according to claim 7, wherein said partitioning said full amount of data according to the number of pre-partitions and the row key of each data in the full amount of data to obtain corresponding partitioned data, include:

The full amount of data stored in the dataframe is divided into corresponding partitions according to the pre-partitioning method of HexStringSplit; wherein, the pre-partitioning method of HexStringSplit is that the row key for the data is hexadecimal The pre-divided character string used as the prefix.
A data transmission device based on cloud storage, including:

The receiving unit is used to receive and store the full amount of data uploaded by the Hive database; wherein, the Hive database is a data warehouse database;

The partition number obtaining unit is used to obtain the number of pre-partitions in the HBase database; wherein, the HBase database is a distributed open source database, and each pre-partition in the HBase database corresponds to a partition server;

The partition unit is used to partition the full amount of data according to the number of pre-partitions and the row key of each data in the full amount of data to obtain corresponding partitioned data; wherein, the total number of partitions of the partitioned data and the number of pre-partitioned data The numbers are equal, and each partition data uniquely corresponds to a partition server;

The sorting unit is used to sort each partition data in ascending order according to the column and row key in turn to obtain the corresponding sorted partition data; and

The transmission unit is used to send each sorted partition data to the partition server corresponding to the Hbase database for storage.
The cloud storage-based data transmission device according to claim 9, wherein the unit for obtaining the number of partitions comprises:

The request sending unit is configured to send an RPC request to the HBase database; wherein the RPC request is a remote procedure call protocol request;

The meta-information analysis unit is configured to receive meta-information sent by the HBase database according to the RPC request, and obtain the number of pre-partitions according to the meta-information.
A computer device includes a memory, a processor, and a computer program that is stored on the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer program:

Receive and store the full amount of data uploaded by the Hive database; wherein, the Hive database is a data warehouse database;

Acquiring the number of pre-partitions in the HBase database; wherein the HBase database is a distributed open source database, and each pre-partition in the HBase database corresponds to a partition server;

According to the number of pre-partitions and the row key of each data in the full data, the full data is partitioned to obtain corresponding partition data; wherein the total number of partitions of the partition data is equal to the number of pre-partitions, and each One partition data uniquely corresponds to one partition server;

Sort each partition data in ascending order according to the column and row key to obtain the corresponding sorted partition data; and

Send each sorted partition data to the partition server corresponding to the Hbase database for storage.
The computer device according to claim 11, wherein said obtaining the number of pre-partitions in the HBase database comprises:

Sending an RPC request to the HBase database; wherein the RPC request is a remote procedure call protocol request;

The meta-information sent by the HBase database according to the RPC request is received, and the number of pre-partitions is obtained according to the meta-information.
The computer device according to claim 11, wherein the partitioning the full amount of data according to the number of pre-partitions and the row key of each data in the full amount of data to obtain the corresponding partitioned data comprises:

Obtaining the row key corresponding to each data in the total data;

Use MD5 encryption algorithm or SHA-256 encryption algorithm to generate the corresponding hash value of the row key of each data;

Modulo the hash value corresponding to each row key to the number of pre-partitions to obtain a remainder corresponding to each row key;

The data corresponding to each row key is stored in the partition corresponding to the remainder corresponding to the row key to obtain the corresponding partition data.
11. The computer device according to claim 11, wherein said sorting each partition data in ascending order according to column and row keys in order to obtain the corresponding sorted partition data comprises:

Obtain data with the same row key in each partition data, sort the data with the same row key according to the ascending column order, and obtain the first sorted partition data corresponding to each partition data;

Each first sorted partition data is sorted according to the ascending order of the row key to obtain sorted partition data corresponding to each first sorted partition data.
The computer device according to claim 11, wherein the sending each sorted partition data to the partition server corresponding to the Hbase database for storage comprises:

Input each sorted partition data to a local HDFS layer to convert each sorted partition data into a corresponding data file; wherein, the HDFS layer is a distributed file system layer;

The data file is sent to the partition server corresponding to the Hbase database for storage.
11. The computer device according to claim 11, wherein after said sending each sorted partition data to the partition server corresponding to the Hbase database for storage, further comprising:

If it is detected that the data transmission error message sent by the Hbase database has been received, the data transmission interruption point is obtained by partitioning data after each sort according to the log file corresponding to the data transmission error message;

The data located after the data transmission interruption point of each sorted partition data is sent to the partition server corresponding to the Hbase database for storage.
The computer device according to claim 11, wherein the receiving and storing the full amount of data uploaded by the Hive database comprises:

Generate a dataframe to physically store the full amount of data; wherein, the dataframe is a matrix data table.
18. The computer device according to claim 17, wherein the partitioning the full amount of data according to the number of pre-partitions and the row key of each data in the full amount of data to obtain the corresponding partitioned data comprises:

The full amount of data stored in the dataframe is divided into corresponding partitions according to the pre-partitioning method of HexStringSplit; wherein, the pre-partitioning method of HexStringSplit is that the row key for the data is hexadecimal The pre-divided character string used as the prefix.
A computer-readable storage medium that stores a computer program that, when executed by a processor, causes the processor to perform the following operations: receive the full amount of data uploaded by the Hive database, and perform Storage; wherein, the Hive database is a data warehouse database;

Acquiring the number of pre-partitions in the HBase database; wherein the HBase database is a distributed open source database, and each pre-partition in the HBase database corresponds to a partition server;

According to the number of pre-partitions and the row key of each data in the full data, the full data is partitioned to obtain corresponding partition data; wherein the total number of partitions of the partition data is equal to the number of pre-partitions, and each One partition data uniquely corresponds to one partition server;

Sort each partition data in ascending order according to the column and row key to obtain the corresponding sorted partition data; and

Send each sorted partition data to the partition server corresponding to the Hbase database for storage.
The computer-readable storage medium according to claim 19, wherein said obtaining the number of pre-partitions in the HBase database comprises:

Sending an RPC request to the HBase database; wherein the RPC request is a remote procedure call protocol request;

The meta information sent by the HBase database according to the RPC request is received, and the number of pre-partitions is obtained according to the meta information.