CN117992422A - Data transmission method, device and computer readable storage medium - Google Patents

Data transmission method, device and computer readable storage medium Download PDF

Info

Publication number
CN117992422A
CN117992422A CN202211333489.5A CN202211333489A CN117992422A CN 117992422 A CN117992422 A CN 117992422A CN 202211333489 A CN202211333489 A CN 202211333489A CN 117992422 A CN117992422 A CN 117992422A
Authority
CN
China
Prior art keywords
data
storage
cloud storage
key
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211333489.5A
Other languages
Chinese (zh)
Inventor
李清炳
郭保江
吕侣
毛琦
贺晋如
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xingyin Information Technology Shanghai Co ltd
Original Assignee
Xingyin Information Technology Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xingyin Information Technology Shanghai Co ltd filed Critical Xingyin Information Technology Shanghai Co ltd
Priority to CN202211333489.5A priority Critical patent/CN117992422A/en
Publication of CN117992422A publication Critical patent/CN117992422A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure relates to a data transmission method, a data transmission device and a computer readable storage medium, and relates to the technical field of cloud storage. The method of the present disclosure comprises: reading data to be migrated from a first cloud storage cluster; converting the format of the data to be migrated into a data storage format according to the data storage format in the second cloud storage cluster; writing the converted data to be migrated into a plurality of storage files conforming to the storage file form according to the storage file form in the second cloud storage cluster, and storing the plurality of storage files into the first cloud storage cluster; and the plurality of storage files are distributed and transmitted to the second cloud storage cluster from the first cloud storage cluster, so that the plurality of storage files are stored to corresponding storage nodes in the second cloud storage cluster in parallel.

Description

Data transmission method, device and computer readable storage medium
Technical Field
The disclosure relates to the technical field of cloud storage, and in particular relates to a data transmission method, a data transmission device and a computer readable storage medium.
Background
With the development of internet technology, data has been shown to grow explosively. Cloud storage technology is the dominant approach to mass data storage.
The data generated by the network platform may be stored across clouds, for example, core data is stored in one cloud storage cluster, and business applications are deployed in another cloud storage cluster. Such data can present a problem of cross-cloud access.
Currently, some cloud storage clusters provide CLIENT SDK (client software development kit) or interfaces, etc., for reading and writing of data of another cloud storage cluster.
Disclosure of Invention
The inventors found that: the data storage manner of different cloud storage clusters may be different. At present, through CLIENT SDK or interfaces and the like provided by a cloud storage cluster, cross-cloud data transmission and storage are performed, a storage mode of data needs to be converted in real time in the process of reading and writing a large amount of data, a large amount of server resources are needed to support, excessive consumption of resources is caused, and the transmission efficiency of the data is affected.
One technical problem to be solved by the present disclosure is: how to improve the efficiency of cross-cloud data transmission and reduce the consumption of resources.
According to some embodiments of the present disclosure, a data transmission method is provided, including: reading data to be migrated from a first cloud storage cluster; converting the format of the data to be migrated into a data storage format according to the data storage format in the second cloud storage cluster; writing the converted data to be migrated into a plurality of storage files conforming to the storage file form according to the storage file form in the second cloud storage cluster, and storing the plurality of storage files into the first cloud storage cluster; and the plurality of storage files are distributed and transmitted to the second cloud storage cluster from the first cloud storage cluster, so that the plurality of storage files are stored to corresponding storage nodes in the second cloud storage cluster in parallel.
In some embodiments, according to a data storage format in the second cloud storage cluster, converting the format of the data to be migrated to the data storage format comprises: converting each piece of data into a hash structure key value pair aiming at each piece of data in the data to be migrated, wherein the hash structure key value pair corresponding to each piece of data comprises: a primary key and one or more member key value pairs; according to a key format in a data storage format in the second cloud storage cluster, encoding keys in key value pairs of a hash structure corresponding to each piece of data to obtain a plurality of encoded key value pairs corresponding to the data to be migrated; dividing the plurality of encoded key value pairs into a plurality of partitions according to the encoded keys in each encoded key value pair, and fully ordering the encoded key value pairs in each partition.
In some embodiments, each piece of data corresponds to a row of data in the data table, and converting each piece of data into a hash structured key-value pair comprises: and extracting a main key field value of each row of data corresponding to each row of data as a main key, and extracting field names and field values of all columns except columns where the main key field names are located as key value pairs of all members.
In some embodiments, encoding the keys in the key value pair of the hash structure corresponding to each piece of data according to the key format in the data storage format in the second cloud storage cluster comprises: and according to the key format in the data storage format in the second cloud storage cluster, encoding the combination of the primary key and the member key in the key value pair of the hash structure corresponding to each piece of data.
In some embodiments, the encoded keys of each encoded key-value pair comprise a combination of an encoded primary key and a member key, and dividing the plurality of encoded key-value pairs into a plurality of partitions based on the encoded keys of each encoded key-value pair comprises: hash-modulo the combination of the encoded primary key and the member key for each encoded key value pair to obtain a remainder value corresponding to the combination of the primary key and the member key; and dividing the plurality of encoded key value pairs into a plurality of partitions according to the remainder value corresponding to each combination of the primary key and the member key.
In some embodiments, ordering the respective encoded key-value pairs in each partition includes: for each encoded key value pair in each partition, the bytes of the encoded keys in each encoded key value pair are ordered based on the respective encoded key value pair.
In some embodiments, writing the converted data to be migrated to a plurality of storage files conforming to the storage file form according to the storage file form in the second cloud storage cluster includes: and writing each coded key value pair in each partition into one storage file conforming to the storage file form according to the storage file form in the second cloud storage cluster.
In some embodiments, the distributed transmission of the plurality of storage files by the first cloud storage cluster into the second cloud storage cluster comprises: reading a plurality of storage files from a first cloud storage cluster; according to the sizes of all the storage files, partitioning the storage files to obtain a plurality of partitions, wherein the sizes of the partitions are within a preset range; and starting a plurality of tasks to copy and transmit the plurality of blocks to the second cloud storage cluster in parallel respectively.
In some embodiments, concurrently storing the plurality of storage files to respective storage nodes in the second cloud storage cluster comprises: under the condition that a plurality of storage files are transmitted to an intermediate storage space in a second cloud storage cluster, a management service module corresponding to each storage node acquires the storage file corresponding to the storage node from the intermediate storage space in a concurrent mode; and loading the acquired storage files from the corresponding management service modules in a remote procedure call mode by each storage node.
In some embodiments, reading data to be migrated from the first cloud storage cluster includes: storing files of data to be migrated by the first cloud storage cluster, and mapping the files into a data table by Hive, wherein one row of the data table stores one piece of data in the data to be migrated; the data in the data table is read row by row.
In some embodiments, storing the plurality of storage files to the first cloud storage cluster comprises: and slicing each storage file according to a preset storage value, and storing the storage file into the first cloud storage cluster.
In some embodiments, the number of encoded key-value pairs for each piece of data is the number of columns for that piece of data minus 1.
According to other embodiments of the present disclosure, there is provided a data transmission apparatus including: the reading module is used for reading data to be migrated from the first cloud storage cluster; the conversion module is used for converting the format of the data to be migrated into a data storage format according to the data storage format in the second cloud storage cluster; the writing module is used for writing the converted data to be migrated into a plurality of storage files conforming to the storage file form according to the storage file form in the second cloud storage cluster, and storing the storage files into the first cloud storage cluster; and the transmission module is used for carrying out distributed transmission on the plurality of storage files from the first cloud storage cluster to the second cloud storage cluster so as to store the plurality of storage files to corresponding storage nodes in the second cloud storage cluster in parallel.
According to still further embodiments of the present disclosure, there is provided a data transmission apparatus including: a processor; and a memory coupled to the processor for storing instructions that, when executed by the processor, cause the processor to perform the data transmission method of any of the embodiments described above.
According to still further embodiments of the present disclosure, a non-transitory computer-readable storage medium is provided, on which a computer program is stored, wherein the program, when executed by a processor, implements the data transmission method of any of the foregoing embodiments.
In the method, data to be migrated is read from a first cloud storage cluster, the data to be migrated is stored into a plurality of storage files conforming to the storage file form after being subjected to format conversion according to the data storage format and the storage file form in a second cloud storage cluster, and then the plurality of storage files are transmitted to the second cloud storage cluster in a distributed transmission mode so as to be stored in corresponding storage nodes in the second cloud storage cluster in a concurrent mode. According to the method, after the data to be migrated is transferred in the data storage format and the storage file form in an offline mode, cross-cloud reading and writing are performed, the second cloud storage cluster is not required to convert the data to be migrated in real time in the transmission process of the data to be migrated, occupation and consumption of resources are reduced, resource cost of the second cloud storage cluster is saved, and meanwhile the cross-cloud transmission rate of the data to be migrated is improved. In addition, the speed of cross-cloud transmission of the data to be migrated is further improved in a distributed transmission mode.
Other features of the present disclosure and its advantages will become apparent from the following detailed description of exemplary embodiments of the disclosure, which proceeds with reference to the accompanying drawings.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort to a person of ordinary skill in the art.
Fig. 1 illustrates a flow diagram of a data transmission method of some embodiments of the present disclosure.
Fig. 2 illustrates a schematic diagram of a data transmission method of some embodiments of the present disclosure.
Fig. 3 illustrates a schematic structural diagram of a data transmission device of some embodiments of the present disclosure.
Fig. 4 illustrates a schematic structure of a data transmission apparatus of other embodiments of the present disclosure.
Fig. 5 shows a schematic structural diagram of a data transmission device according to still other embodiments of the present disclosure.
Detailed Description
The following description of the technical solutions in the embodiments of the present disclosure will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, not all embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses. Based on the embodiments in this disclosure, all other embodiments that a person of ordinary skill in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.
The present disclosure provides a data transmission method, which may be applied to a scenario of data cross-cloud access, and is described below with reference to fig. 1-2.
Fig. 1 is a flow chart of some embodiments of the disclosed data transmission method. As shown in fig. 1, the method of this embodiment includes: steps S102 to S108.
In step S102, data to be migrated is read from a first cloud storage cluster.
For example, as shown in fig. 2, the operation such as reading the data to be migrated may be performed by HQL provided by Hive.
In some embodiments, storing a file of data to be migrated in a first cloud storage cluster, and mapping the file into a data table through Hive, wherein one row of the data table stores one piece of data in the data to be migrated; the data in the data table is read row by row. Further, reading or the like is performed by HQL.
The data to be migrated in the first cloud storage cluster may be stored in a file in the form of object storage or the like, not limited to the illustrated example. The data to be migrated may be mapped from the file to the data table by Hive. For example, hive's User data table is shown in table 1, where a row in the data table represents one piece of data, and corresponds to one piece of data of a User.
TABLE 1
userid name address hobbies cellphone
uid9527 jack beijing basketball 13212345678
uid2049 yunzhe shanghai badminton 15712345678
In step S104, the format of the data to be migrated is converted into a data storage format according to the data storage format in the second cloud storage cluster.
In some embodiments, for each piece of data in the data to be migrated, converting each piece of data into a hash-structured key-value pair, where the hash-structured key-value pair corresponding to each piece of data includes: a primary key and one or more member key value pairs; according to a key format in a data storage format in the second cloud storage cluster, encoding keys in key value pairs of a hash structure corresponding to each piece of data to obtain a plurality of encoded key value pairs corresponding to the data to be migrated; dividing the plurality of encoded key value pairs into a plurality of partitions according to the encoded keys in each encoded key value pair, and fully ordering the encoded key value pairs in each partition.
In some embodiments, each piece of data corresponds to a row of data in the data table, and for each row of data corresponding to each piece of data, a primary key field value of the row of data is extracted as a primary key (PRIMARY KEY), and a field name and a field value of each column other than the column where the primary key field name is located are extracted as each Member key value pair (Member KV). For example, for the second row of data in Table 1, uid9527 is the primary key, name is the member key, jack is the value … …
A data conversion engine, which may define key-value pairs from the data to be migrated to the hash structure, may be implemented by HQL. For example, each column of one row in the Select condition is converted into a Hash structure (K1, map (K11, V1, K12, V2 … …)), K1 represents a primary key, K11, K12 … … represents a member key, and V1, V2 … … represent values. For example, the data conversion statement corresponding to table 1 is as follows.
select userid as k,map(name,COALESCE(name,″),address,COALESCE(address,″),hobbies,COALESCE(hobbies,″)cellphone,COALESCE(cellphone,″))as v
Further, in some embodiments, the combination of the primary key and the member key in the key value pair of the hash structure corresponding to each piece of data is encoded according to a key format in the data storage format in the second cloud storage cluster. The encoding scheme is kept consistent with the data encoding mode of the storage cluster on the second cloud.
And if the Key needs to be encoded in the second cloud storage cluster, encoding a combination of a main Key and a member Key in a Key value pair of a hash structure corresponding to each piece of data. If the Value needs to be encoded in the second cloud storage cluster, encoding the Value in the key Value pair of the hash structure corresponding to each piece of data, and specifically processing according to actual conditions.
For example, in order to adapt Key format of Rocksdb engine, K1K11, K1K12 … are encoded by UDF (User-Defined Functions, user-defined function) to obtain encodekey (encoded Key). For example, encoding is performed by hashencode methods.
The number of encoded key value pairs corresponding to each piece of data is the number of columns corresponding to the piece of data minus 1. For example, the second row of data in Table 1 corresponds to 4 encoded key-value pairs. The encoded key value pairs may also be stored in a temporary table built in Hive.
Further, in some embodiments, the encoded keys of each encoded key value pair include a combination of an encoded primary key and a member key, and hash modulo the combination of the encoded primary key and member key for each encoded key value pair to obtain a remainder value corresponding to the combination of the primary key and the member key; and dividing the plurality of encoded key value pairs into a plurality of partitions according to the remainder value corresponding to each combination of the primary key and the member key. The coded key value pairs are pressed to the coded key partitions, so that the effect of load balancing can be realized in the subsequent storage and transmission processes.
Further, in some embodiments, for each encoded key value pair in each partition, the bytes (bytes) of the encoded keys in the respective encoded key value pair are ordered based.
For example, partitioning and ordering of multiple encoded key-value pairs into multiple partitions may be accomplished by the following statement, where part is one mechanism provided by Hive.
select encode_key,encode_value,row_number()
over(partition by partition_key order by encode_key)as rn
In step S106, according to the storage file format in the second cloud storage cluster, the converted data to be migrated is written into a plurality of storage files conforming to the storage file format, and the plurality of storage files are stored in the first cloud storage cluster.
In some embodiments, according to the storage file form in the second cloud storage cluster, each encoded key value pair in each partition is written into one storage file conforming to the storage file form.
For example, the second cloud storage cluster stores the SST file, and then writes each encoded key value pair in each partition into the SST file. For example, as shown in FIG. 2, recordWriter may be defined HiveToSst to write the partition-ordered encoded key-value pairs to the local SST file in the Reducer flow of EMR (E-MapReduce) by Tez.
In some embodiments, each storage file is stored to the first cloud storage cluster after being sliced according to a preset storage value. For example, each storage file may be slice saved at 256MB (configurable). As shown in fig. 2, for example, if the first cloud storage cluster adopts the S3 cloud storage service, the SST file is uploaded to S3.
In the foregoing embodiment, the steps of converting the format of the data to be migrated into the data storage format, writing the converted data to be migrated into a plurality of storage files conforming to the storage file format, and storing the plurality of storage files in the first cloud storage cluster may all adopt a parallel processing manner.
In step S108, the plurality of storage files are distributed and transferred from the first cloud storage cluster to the second cloud storage cluster, so as to store the plurality of storage files to corresponding storage nodes in the second cloud storage cluster concurrently.
In some embodiments, a plurality of storage files are read from a first cloud storage cluster; according to the sizes of all the storage files, partitioning the storage files to obtain a plurality of partitions, wherein the sizes of the partitions are within a preset range; and starting a plurality of tasks to copy and transmit the plurality of blocks to the second cloud storage cluster in parallel respectively.
The storage files are subjected to block parallel copy transmission, so that data transmitted by each thread can be balanced as much as possible, and the transmission efficiency is improved. As shown in fig. 2, transmission may be performed by distcp distributed transmission tools.
In some embodiments, under the condition that a plurality of storage files are transmitted to an intermediate storage space in the second cloud storage cluster, a management service module corresponding to each storage node concurrently acquires the storage file corresponding to the storage node from the intermediate storage space; and loading the acquired storage files from the corresponding management service modules in a remote procedure call mode by each storage node.
The intermediate storage space is, for example, COS (Cloud Object Storage ) space. Multiple storage files may be transferred into the COS bucket. The management service (ADMIN SERVICE) module serves a sidecar (Sidecar) of the storage node. The system platform may trigger the cross-cloud data transmission and storage tasks, and after executing steps S102 to S106, transmit the plurality of storage files to the intermediate storage space in the second cloud storage cluster. And then the system platform can inform the management service module to pull the storage file corresponding to the storage node to the intermediate storage space. The storage node loads the acquired storage file from the corresponding management service module in a mode of RPC (Remote Procedure Call, remote call) in parallel.
As shown in fig. 2, one ADMIN SERVICE for each storage node. S3 SST ordered files are transmitted to the COS through a distributed transmission tool, corresponding storage files are pulled from the COS through ADMIN SERVICE, and each storage node is loaded to the cluster in an RPC mode of Ingest.
The process of cross-cloud data transmission and storage in step S108 may be performed offline, and each storage node in the second cloud storage cluster may select a time for loading the corresponding storage file according to a situation such as a self load, for example, each storage node may load the corresponding storage file under a situation that the self I/O load is lower than a threshold value. The larger the scale of the second cloud storage cluster is, the better the data loading performance is, and the client can read the cluster data through the standard SDK.
According to the method, data to be migrated is read from the first cloud storage cluster, the data to be migrated is stored into a plurality of storage files conforming to the storage file form after being subjected to format conversion according to the data storage format and the storage file form in the second cloud storage cluster, and then the plurality of storage files are transmitted to the second cloud storage cluster in a distributed transmission mode so as to be stored in corresponding storage nodes in the second cloud storage cluster in a concurrent mode. According to the method, after the data to be migrated is transferred in the data storage format and the storage file form in an offline mode, cross-cloud reading and writing are performed, the second cloud storage cluster is not required to convert the data to be migrated in real time in the transmission process of the data to be migrated, occupation and consumption of resources are reduced, resource cost of the second cloud storage cluster is saved, and meanwhile the cross-cloud transmission rate of the data to be migrated is improved. In addition, the speed of cross-cloud transmission of the data to be migrated is further improved in a distributed transmission mode.
The scheme disclosed by the invention can be applied to scenes of cross-cloud transmission and access of mass data, and the performance of data transmission and storage is improved. Compared with online CLIENT SDK writing, the method can lead data into the second cloud storage cluster in an offline data transmission mode, can quickly lead massive data, improves the data transmission performance and speed by several times, does not excessively depend on the machine resources of the second cloud storage cluster, saves memory, and greatly saves the resource cost of the second storage cluster.
In addition, read service impact on the on-line traffic can be reduced by off-peak data transmission. For example, a scenario of filling data in data transmission across cloud environments is generally that writing of burst traffic for a long time can seriously affect the performance of reading, resulting in a long tail effect. The method can quickly complete data transmission and storage without influencing data reading.
The present disclosure also provides a data transmission apparatus, described below in connection with fig. 3.
Fig. 3 is a block diagram of some embodiments of the disclosed data transmission apparatus. As shown in fig. 3, the apparatus 30 of this embodiment includes: a reading module 310, a converting module 320, a writing module 330, a transmitting module 340.
The reading module 310 is configured to read data to be migrated from the first cloud storage cluster.
The conversion module 320 is configured to convert a format of data to be migrated into a data storage format according to a data storage format in the second cloud storage cluster.
In some embodiments, the conversion module 320 is configured to convert each piece of data to be migrated into a hash-structured key-value pair, where the hash-structured key-value pair corresponding to each piece of data includes: a primary key and one or more member key value pairs; according to a key format in a data storage format in the second cloud storage cluster, encoding keys in key value pairs of a hash structure corresponding to each piece of data to obtain a plurality of encoded key value pairs corresponding to the data to be migrated; dividing the plurality of encoded key value pairs into a plurality of partitions according to the encoded keys in each encoded key value pair, and fully ordering the encoded key value pairs in each partition.
In some embodiments, each piece of data corresponds to a row of data in the data table, and the conversion module 320 is configured to extract, for each row of data corresponding to each piece of data, a primary key field value of the row of data as a primary key, and extract, as each member key value pair, a field name and a field value of each column other than the column where the primary key field name is located.
In some embodiments, the conversion module 320 is configured to encode a combination of the primary key and the member key in the key value pair of the hash structure corresponding to each piece of data according to a key format in the data storage format in the second cloud storage cluster.
In some embodiments, the conversion module 320 is configured to hash-modulo the combination of the encoded primary key and the member key for each encoded key value pair to obtain a remainder value corresponding to the combination of the primary key and the member key; and dividing the plurality of encoded key value pairs into a plurality of partitions according to the remainder value corresponding to each combination of the primary key and the member key.
In some embodiments, the conversion module 320 is configured to, for each encoded key value pair in each partition, order bytes of the encoded key in the pair based on the respective encoded key value.
In some embodiments, the number of encoded key-value pairs for each piece of data is the number of columns for that piece of data minus 1.
The writing module 330 is configured to write the converted data to be migrated into a plurality of storage files according to the storage file format in the second cloud storage cluster, and store the plurality of storage files to the first cloud storage cluster.
In some embodiments, the writing module 330 is configured to write, according to a storage file format in the second cloud storage cluster, each encoded key value pair in each partition into one storage file that conforms to the storage file format.
In some embodiments, the writing module 330 is configured to slice each storage file according to a preset storage value, and store the slice to the first cloud storage cluster.
The transmission module 340 is configured to perform distributed transmission on a plurality of storage files from the first cloud storage cluster to the second cloud storage cluster, so as to store the plurality of storage files to corresponding storage nodes in the second cloud storage cluster concurrently.
In some embodiments, the transmission module 340 is configured to read a plurality of storage files from a first cloud storage cluster; according to the sizes of all the storage files, partitioning the storage files to obtain a plurality of partitions, wherein the sizes of the partitions are within a preset range; and starting a plurality of tasks to copy and transmit the plurality of blocks to the second cloud storage cluster in parallel respectively.
In some embodiments, the transmission module 340 includes a transmission unit, an acquisition unit, and a loading unit, which may be respectively disposed in the management service module and the storage nodes in the second cloud storage cluster. The transmission unit is used for transmitting the plurality of storage files to the intermediate storage space in the second cloud storage cluster, and the acquisition units are used for acquiring the storage files corresponding to the storage nodes from the intermediate storage space in a concurrent mode under the condition that the plurality of storage files are transmitted to the intermediate storage space in the second cloud storage cluster; each loading unit is used for loading the acquired storage files from the corresponding management service modules in a concurrent mode of remote procedure call.
The data transmission apparatus in embodiments of the present disclosure may each be implemented by various computing devices or computer systems, as described below in connection with fig. 4 and 5.
Fig. 4 is a block diagram of some embodiments of the disclosed data transmission apparatus. As shown in fig. 4, the apparatus 40 of this embodiment includes: a memory 410 and a processor 420 coupled to the memory 410, the processor 420 being configured to perform the data transmission method in any of the embodiments of the present disclosure based on instructions stored in the memory 410.
The memory 410 may include, for example, system memory, fixed nonvolatile storage media, and the like. The system memory stores, for example, an operating system, application programs, boot Loader (Boot Loader), database, and other programs.
Fig. 5 is a block diagram of further embodiments of the data transmission device of the present disclosure. As shown in fig. 5, the apparatus 50 of this embodiment includes: memory 510 and processor 520 are similar to memory 410 and processor 420, respectively. Input/output interface 530, network interface 540, storage interface 550, and the like may also be included. These interfaces 530, 540, 550, as well as the memory 510 and the processor 520, may be connected by a bus 560, for example. The input/output interface 530 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, a touch screen, etc. The network interface 540 provides a connection interface for various networking devices, such as may be connected to a database server or cloud storage server, or the like. The storage interface 550 provides a connection interface for external storage devices such as SD cards, U discs, and the like.
It should be noted that, in the technical solution of the present disclosure, the acquisition, storage, application, etc. of the personal information of the user meet the requirements of the related laws and regulations, and the public sequence is not violated.
It will be appreciated by those skilled in the art that embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flowchart and/or block of the flowchart illustrations and/or block diagrams, and combinations of flowcharts and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing description of the preferred embodiments of the present disclosure is not intended to limit the disclosure, but rather to enable any modification, equivalent replacement, improvement or the like, which fall within the spirit and principles of the present disclosure.

Claims (15)

1. A data transmission method, comprising:
Reading data to be migrated from a first cloud storage cluster;
Converting the format of the data to be migrated into the data storage format according to the data storage format in the second cloud storage cluster;
writing the converted data to be migrated into a plurality of storage files conforming to the storage file form according to the storage file form in the second cloud storage cluster, and storing the plurality of storage files into the first cloud storage cluster;
and the plurality of storage files are distributed and transmitted to the second cloud storage cluster from the first cloud storage cluster, so that the plurality of storage files are stored to corresponding storage nodes in the second cloud storage cluster simultaneously.
2. The data transmission method according to claim 1, wherein the converting the format of the data to be migrated into the data storage format according to the data storage format in the second cloud storage cluster includes:
converting each piece of data in the data to be migrated into a hash key value pair, wherein the hash key value pair corresponding to each piece of data comprises: a primary key and one or more member key value pairs;
according to a key format in a data storage format in the second cloud storage cluster, encoding keys in a key value pair of a hash structure corresponding to each piece of data to obtain a plurality of encoded key value pairs corresponding to the data to be migrated;
dividing the plurality of encoded key value pairs into a plurality of partitions according to the encoded keys of each encoded key value pair, and fully ordering the encoded key value pairs in each partition.
3. The data transmission method according to claim 2, wherein each piece of data corresponds to one row of data in the data table, and the converting each piece of data into the key value pair of the hash structure includes:
And extracting a main key field value of the row of data corresponding to each piece of data as a main key, and extracting field names and field values of all columns except columns where the main key field names are located as each member key value pair.
4. The data transmission method according to claim 2, wherein the encoding the key in the key value pair of the hash structure corresponding to each piece of data according to the key format in the data storage format in the second cloud storage cluster includes:
And encoding the combination of the primary key and the member key in the key value pair of the hash structure corresponding to each piece of data according to the key format in the data storage format in the second cloud storage cluster.
5. The data transmission method of claim 4, wherein the encoded keys of each encoded key value pair comprise a combination of encoded primary keys and member keys, and wherein dividing the plurality of encoded key value pairs into a plurality of partitions based on the encoded keys of each encoded key value pair comprises:
Hash-modulo the combination of the primary key and the member key after encoding for each encoded key value pair to obtain a remainder value corresponding to the combination of the primary key and the member key;
and dividing the plurality of encoded key value pairs into a plurality of partitions according to remainder values corresponding to the combination of each primary key and each member key.
6. The data transmission method of claim 2, wherein the ordering of the respective encoded key-value pairs in each partition comprises:
For each encoded key value pair in each partition, the bytes of the encoded keys in each encoded key value pair are ordered based on the respective encoded key value pair.
7. The data transmission method according to claim 2, wherein the writing the converted data to be migrated into a plurality of storage files conforming to the storage file form according to the storage file form in the second cloud storage cluster includes:
And writing each coded key value pair in each partition into a storage file conforming to the storage file form according to the storage file form in the second cloud storage cluster.
8. The data transmission method of claim 1, wherein the distributed transmission of the plurality of storage files from the first cloud storage cluster into the second cloud storage cluster comprises:
reading the plurality of storage files from the first cloud storage cluster;
Partitioning the plurality of storage files according to the sizes of the storage files to obtain a plurality of partitions, wherein the sizes of the plurality of partitions are within a preset range;
And starting a plurality of tasks to copy and transmit the plurality of blocks to the second cloud storage cluster in parallel respectively.
9. The data transmission method according to claim 1, wherein the concurrently storing the plurality of storage files to respective storage nodes in the second cloud storage cluster comprises:
under the condition that the plurality of storage files are transmitted to an intermediate storage space in the second cloud storage cluster, a management service module corresponding to each storage node acquires the storage file corresponding to the storage node from the intermediate storage space concurrently;
And loading the acquired storage files from the corresponding management service modules in a remote procedure call mode by each storage node.
10. The data transmission method according to claim 1, wherein the reading data to be migrated from the first cloud storage cluster includes:
The first cloud storage cluster stores the file of the data to be migrated, and the file is mapped into a data table through Hive, wherein one row of the data table stores one piece of data in the data to be migrated;
and reading the data in the data table according to the rows.
11. The data transmission method of claim 1, wherein the storing the plurality of storage files to the first cloud storage cluster comprises:
And slicing each storage file according to a preset storage value, and storing the storage file into the first cloud storage cluster.
12. The data transmission method according to claim 2, wherein,
The number of encoded key value pairs corresponding to each piece of data is the number of columns corresponding to the piece of data minus 1.
13. A data transmission apparatus comprising:
The reading module is used for reading data to be migrated from the first cloud storage cluster;
The conversion module is used for converting the format of the data to be migrated into the data storage format according to the data storage format in the second cloud storage cluster;
The writing module is used for writing the converted data to be migrated into a plurality of storage files conforming to the storage file form according to the storage file form in the second cloud storage cluster, and storing the storage files into the first cloud storage cluster;
And the transmission module is used for carrying out distributed transmission on the plurality of storage files from the first cloud storage cluster to the second cloud storage cluster so as to store the plurality of storage files to corresponding storage nodes in the second cloud storage cluster simultaneously.
14. A data transmission apparatus comprising:
A processor; and
A memory coupled to the processor for storing instructions that, when executed by the processor, cause the processor to perform the data transmission method of any of claims 1-12.
15. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the steps of the method of any of claims 1-12.
CN202211333489.5A 2022-10-28 2022-10-28 Data transmission method, device and computer readable storage medium Pending CN117992422A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211333489.5A CN117992422A (en) 2022-10-28 2022-10-28 Data transmission method, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211333489.5A CN117992422A (en) 2022-10-28 2022-10-28 Data transmission method, device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN117992422A true CN117992422A (en) 2024-05-07

Family

ID=90893896

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211333489.5A Pending CN117992422A (en) 2022-10-28 2022-10-28 Data transmission method, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN117992422A (en)

Similar Documents

Publication Publication Date Title
CN103077197A (en) Data storing method and device
US20140101213A1 (en) Computer-readable recording medium, execution control method, and information processing apparatus
CN110347651A (en) Method of data synchronization, device, equipment and storage medium based on cloud storage
CN111159265B (en) ETL data migration method and system
CN113900810A (en) Distributed graph processing method, system and storage medium
CN106570153A (en) Data extraction method and system for mass URLs
CN112860412B (en) Service data processing method and device, electronic equipment and storage medium
CN106980618B (en) File storage method and system based on MongoDB distributed cluster architecture
US20220129430A1 (en) Optimizing storage and retrieval of compressed data
CN103810197A (en) Hadoop-based data processing method and system
CN116226139B (en) Distributed storage and processing method and system suitable for large-scale ocean data
CN103577604B (en) A kind of image index structure for Hadoop distributed environments
CN109271456A (en) Host data library file deriving method and device
CN106570152B (en) Mass extraction method and system for mobile phone numbers
CN110109751B (en) Distribution method and device of distributed graph cutting tasks and distributed graph cutting system
CN116842012A (en) Method, device, equipment and storage medium for storing Redis cluster in fragments
CN116760661A (en) Data storage method, apparatus, computer device, storage medium, and program product
US10083121B2 (en) Storage system and storage method
CN117992422A (en) Data transmission method, device and computer readable storage medium
CN115858322A (en) Log data processing method and device and computer equipment
CN113392131A (en) Data processing method and device and computer equipment
CN113836157A (en) Method and device for acquiring incremental data of database
CN114625474A (en) Container migration method and device, electronic equipment and storage medium
CN113760898A (en) Method and device for processing table connection operation
CN112800091A (en) Flow-batch integrated calculation control system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination