CN111427950B - Data transmitting and receiving method, and corresponding device, equipment and storage medium - Google Patents

Data transmitting and receiving method, and corresponding device, equipment and storage medium Download PDF

Info

Publication number
CN111427950B
CN111427950B CN202010092301.7A CN202010092301A CN111427950B CN 111427950 B CN111427950 B CN 111427950B CN 202010092301 A CN202010092301 A CN 202010092301A CN 111427950 B CN111427950 B CN 111427950B
Authority
CN
China
Prior art keywords
data
source
target
tables
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010092301.7A
Other languages
Chinese (zh)
Other versions
CN111427950A (en
Inventor
戴建明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202010092301.7A priority Critical patent/CN111427950B/en
Publication of CN111427950A publication Critical patent/CN111427950A/en
Application granted granted Critical
Publication of CN111427950B publication Critical patent/CN111427950B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data transmitting and receiving method, and corresponding devices, equipment and storage media, wherein the data transmitting method comprises the following steps: the source server determines the table structure and the data quantity of source data to be synchronized, determines a time window for transmitting the source data, splits the table structure according to the time window, the table structure and the data quantity of the source data to obtain N source sub-tables and N unit data mapped to the source sub-tables, then sends a data synchronization request to the target server, and sends the split data to the target server in parallel in the time window, so that the target server receives the split data in parallel, and then maps each unit data and source attribute parameter in the N unit data to a partition table of the corresponding target table. The invention greatly shortens the time for synchronizing the data between the source server and the target server, thereby completing the data synchronization task in a designated time window.

Description

Data transmitting and receiving method, and corresponding device, equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data transmission method, a data reception method, a data transmission device, a data reception device, a computer device, and a computer readable storage medium.
Background
Currently in the big data age, big data technology is one of the core technologies of industry internet, and the data volume stored in various application systems also presents an explosive growth situation, and in the face of storing massive data today, data often needs to be migrated between different databases, for example, data in an ORACLE database is migrated to an HIVE database, based on some purposes.
In the prior art, due to different data storage modes of the HIVE library and the ORACLE library, a certain performance bottleneck exists when data are synchronized, for example, in the prior art, the data in the ORACLE database can be synchronized into the HIVE database by directly taking the data from the ORACLE through SQOOP and other tools, but the HIVE library needs to complete data migration in the ORACLE library in a designated time, however, by the prior art, when the data magnitude is not large, the data migration task can be successfully completed, and when the data amount reaches one hundred billion level, the data synchronization is difficult to complete in the designated time.
Disclosure of Invention
The invention provides a data sending method, a data receiving method, a corresponding device, equipment and a medium, which are used for solving the problem that when the data volume to be migrated reaches the level of billions, the data synchronization is difficult to complete within a designated time.
A data transmission method, comprising:
Determining a table structure of source data to be synchronized and total data quantity;
determining a time window for transmitting the source data;
Splitting the table structure and the source data according to the time window, the table structure of the source data and the total data amount to obtain N source sub-tables and unit data mapped to each source sub-table, wherein N is a positive integer greater than or equal to 2;
Transmitting a data synchronization request to a target server, wherein the data synchronization request carries the number of the source sub-tables and the data quantity of the unit data stored in the source sub-tables; the data synchronization request is used for instructing the target server to establish a target table for storing the unit data;
and in the time window, the unit data of the N source sub-tables are sent to the target server in parallel, so that the target server receives the N unit data in parallel in the same time period, each unit data in the N unit data is mapped to a corresponding partition table in the target table, and the source sub-table and the partition table are different in structure.
A data receiving method, comprising:
Receiving a data synchronization request sent by a source server, wherein the data synchronization request carries the number of source sub-tables and the data volume of unit data stored in the source sub-tables;
establishing a target table for storing the unit data according to the data quantity of each unit data, the quantity of the source sub-tables and the storage space condition of the source sub-tables, wherein the target table comprises a plurality of partition tables;
Receiving N unit data and source attribute parameters corresponding to each unit data in parallel in a time window, wherein N is a positive integer greater than or equal to 2;
mapping each unit data and the source attribute parameter in the N unit data into the partition table of the corresponding target table, wherein the partition table has a different structure from the source partition table, and the target table comprises the partition table and an additional partition table; and carrying out data processing on the first data corresponding to the source attribute parameters to obtain processed second data, and mapping the second data to the additional partition table.
A data transmission apparatus, comprising:
The receiving module is used for receiving a data synchronization instruction sent by the client and a time window sent by the target server, wherein the time window is the time for transmitting the source data to be synchronized;
a calculation module for calculating the table structure and total data amount of the source data,
The splitting module is used for splitting the table structure and the source data according to the time window, the table structure of the source data and the total data amount to obtain N source sub-tables and unit data mapped to each source sub-table, wherein N is a positive integer greater than or equal to 2;
the request sending module is used for sending a data synchronization request to a target server, wherein the data synchronization request carries the number of the source sub-tables and the data volume of the unit data stored in the source sub-tables; the data synchronization request is used for instructing the target server to establish a target table for storing the unit data;
And the data sending module is used for sending the unit data of the N source sub-tables to the target server in parallel in the time window, so that the target server receives the N unit data in parallel in the same time period, each unit data in the N unit data is mapped to a corresponding partition table in the target table, and the source sub-table and the partition table are different in structure.
A data receiving apparatus, comprising:
The device comprises a receiving module, a source server and a time window, wherein the receiving module is used for receiving a data synchronization request and a time window request sent by the source server, the data synchronization request carries the number of source sub-tables and the data quantity of unit data stored in the source sub-tables, and the time window is the time for transmitting the source data to be synchronized;
the table building module is used for building a target table for storing the unit data according to the data volume of each unit data, the number of the source sub-tables and the storage space condition of the source sub-tables, wherein the target table comprises a plurality of sub-tables;
The data receiving module is used for receiving N unit data and source attribute parameters corresponding to each unit data in parallel in the time window, wherein N is a positive integer greater than or equal to 2;
The processing module is used for mapping each unit data and the source attribute parameter in the N unit data into the partition table of the corresponding target table, the structure of the partition table is different from that of the source partition table, the target table comprises the partition table and an additional partition table, and the additional partition table corresponds to the partition table; and the data processing module is also used for carrying out data processing on the first data corresponding to the source attribute parameters to obtain processed second data, and mapping the second data to the additional partition table.
A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the above data transmission method or the steps of the above data reception method when executing the computer program.
A computer-readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the above-described data transmission method or the steps of the above-described data reception method.
The beneficial effects provided by the invention are as follows:
In the invention, the source server determines the table structure and the data quantity of source data to be synchronized, determines the time window for transmitting the source data, splits the table structure according to the time window, the table structure and the data quantity of the source data to obtain N source sub-tables and unit data mapped to the source sub-tables, then sends a data synchronization request to the target server, and sends the split data to the target server in parallel in the time window, so that after the target server receives the data in parallel, each unit data and source attribute parameter in the N unit data are mapped to the partition table of the corresponding target table, thereby greatly shortening the time length of the synchronization data from the source server to the target server, and further completing the data synchronization task in the appointed time window.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an application environment of a data transmission method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method of data transmission in an embodiment of the invention;
FIG. 3 is a flowchart showing the implementation of data transmission method step SA 30 according to one embodiment of the present invention;
FIG. 4 is a flow chart of a method of data reception in an embodiment of the invention;
FIG. 5 is a schematic diagram of a data transmission device according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a data receiving device according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a computer device in accordance with an embodiment of the invention.
Detailed Description
The invention provides a data synchronization method which is applied to a data synchronization system, wherein the data synchronization system comprises a source server, a target server and a client, the client sends an instruction to the server through a network, the source server is used as a sender of data to be synchronized, and the target server is used as a receiver of the data to be synchronized. The client is also called a client, and refers to a program corresponding to the server for providing local service for the client. The client may be installed on, but is not limited to, various personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices. Both the source server and the target server may be a server cluster.
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The data synchronization method provided by the embodiment of the invention can be applied to an application environment as shown in fig. 1, wherein a target server communicates with a source server through a network, and data are synchronized, calculated, split, processed and the like through the target server and the source server.
In one embodiment, to synchronize data in the source server to the target server, the source server in fig. 1 is used as an execution body, and a data transmission method is provided, as shown in fig. 2, including the following specific steps:
SA10: the table structure of the source data to be synchronized and the total data amount are determined.
For example, the source server may be an oracle server, the source server receives a synchronization data instruction sent by the client, the source server selects source data to be synchronized according to the synchronization data instruction, and in the source server, the source server maps the stored data into a table structure.
In this embodiment, the source data to be synchronized is illustrated as being mapped into a table, and the source server identifies a table structure of the source data, where the table structure includes columns and rows, and the columns are source attribute parameters, for example, the source attribute parameters may be "name", "age", etc., and the source attribute parameters are different according to the service corresponding to the source data, which is not illustrated herein. Further, the source server calculates the total data amount of the source data to be synchronized.
SA20: a time window for transmitting the source data is determined.
The time window may be understood as a time period having a start time and an end time. In one implementation, the source server may receive information sent by the target server for indicating a transmission duration, for example, the target server sends the time information for indicating the transmission duration to the source server, and the source server determines a time window for transmitting source data according to the information. For example, the information indicates that the duration of the transmission is 1 hour, and the source server determines the time window as the T1 st time to the Tn th time according to the duration. In another implementation, the source server may also receive time information sent by the client, where the time information includes a start time and an end time, and the time information may indicate a transmission duration.
SA30: splitting the table structure and the source data according to the time window, the table structure of the source data and the total data volume to obtain N source sub-tables and unit data mapped to each source sub-table, wherein N is a positive integer greater than or equal to 2.
The source data table structure includes columns and rows, for example, the table structure of the source data can be illustrated in table 1 below:
ID region (zone) Cost of Nutrient and receipts
1 Shanghai 60 100
2 Beijing 140 200
3 Shenzhen (Shenzhen) 150 300
TABLE 1
In the time window, the source server splits according to the structure of table 1 and the total data amount of table 1, for example, splits according to the source attribute parameters such as the column of "ID", "region", "cost", "revenue", etc., to obtain 4 source sub-tables and the unit data of the 4 source sub-tables, where the unit data includes each attribute parameter and the corresponding data.
It should be noted that, in practical applications, the source data table may be in the billion level, 1 ten thousand source attribute parameters, etc., and table 1 is merely an example for convenience of description, and is not meant to limit the present application.
SA40: transmitting a data synchronization request to a target server, wherein the data synchronization request carries the number of source sub-tables and the data quantity of unit data stored in the source sub-tables; the data synchronization request is used to instruct the target server to build a target table for the storage unit data.
The source server sends a data synchronization request to the target server, for example, the number of source sub-tables carried by the data synchronization request is 4, and the data volume of each source sub-table is smaller than 200. The data synchronization request is used to instruct the targeted server to establish a target storage area for storing unit data, and the unit data is mapped into a target table in the target storage area.
SA50: and in the time window, the unit data of the N source sub-tables are transmitted to the target server in parallel, so that the target server receives the N unit data in parallel in the same time period, each unit data in the N unit data is mapped into a corresponding partition table in the target table, and the structures of the source sub-tables and the partition tables are different.
In one possible implementation, the synchronization service of the source database queries the synchronization record table of the database in a timed manner, finds out the source sub-table to be sent, converts the source sub-table into a predefined message format, and sends the source sub-table to the queue to be sent. The source server sends the unit data of the source sub-table to the target server in parallel.
For example, the source server side includes a management process (Manager), an extraction process (Extract), and a transmission process (Pump) corresponding to the target server side, where the Manager process controls other processes, reports errors, and so on; the Extract process sends the unit data to be transmitted to the target server end in the form of data blocks through TCP/IP protocol according to the IP and the port of the target end by configuring the source partition table name to be synchronized.
In this embodiment, the source server determines the table structure and the data amount of the source data to be synchronized, determines the time window for transmitting the source data, splits the table structure according to the time window, the table structure and the data amount of the source data to obtain N source sub-tables and unit data mapped to the source sub-tables, then sends a data synchronization request to the target server, and sends the split data to the target server in parallel in the time window, so that the target server maps each unit data and source attribute parameter in the N unit data to the partition table of the corresponding target table after receiving in parallel, which greatly shortens the duration of the synchronization data between the source server and the target server, thereby completing the data synchronization task in the designated time window.
In one embodiment, as shown in fig. 3, in step SA30, the table structure and the source data are split according to the time window, the table structure and the data amount to obtain N source sub-tables and unit data mapped to each source sub-table, and the specific steps include:
SA310: and determining the target data quantity to be synchronously transmitted in the time window according to the current transmission rate.
And after the source data is transmitted in the time window, calculating the target data quantity which can be transmitted in the transmission time according to the current transmission rate and the transmission time indicated by the time window, wherein the target data quantity is the maximum value of the data quantity which can be transmitted in the transmission time. For example, the current transmission rate is 10, the transmission duration is 20, and then the target data amount that can be transmitted in the transmission duration is 200. In this embodiment, the data is transmitted, the transmission time is long, and the target data amount is only described for convenience.
SA320: and determining the target number of the source sub-table according to the total data amount and the target data amount.
The theoretical number of source sub-tables is determined according to the ratio of the total data amount to the target data amount, for example, the total data amount is 600, the data amount that can be transmitted in the transmission duration is 200, in order to finish transmitting the source data in a specified time window, the theoretical target number of source sub-tables is 3, the target number may be greater than or equal to the number, for example, the target number may be 3,4 or 5, i.e., the greater the number of source sub-tables is, the shorter the duration that the source data can be synchronized to the target server is, in this embodiment, the target number of source sub-tables may be illustrated by taking 3 source sub-tables as an example.
SA330: and determining a first data volume corresponding to each source attribute parameter in the P source attribute parameters, wherein P is a positive integer greater than or equal to 2.
In the table structure of the source data, P source attribute parameters may be included in the column, where each source attribute parameter corresponds to a plurality of specific parameter values, taking the table 1 as an example, where the source attribute parameters are ID, region, cost, harvest, etc., and the first data corresponding to the source attribute parameter "region" is "Shanghai", "Beijing" and "Shenzhen", it should be noted that, in the table 1, the first data corresponding to each source attribute parameter may be the same, for example, the first data corresponding to the "cost" (e.g., 60, 140 and 150) and the first data corresponding to the "harvest" are the same (e.g., 100, 200 and 300), but in practical applications, the data corresponding to different source attribute parameters may be different, as shown in the following table 2:
ID region (zone) Cost of Nutrient and receipts
1 Shanghai 60 100
2 Beijing 140 200
3 Shenzhen (Shenzhen) 150 300
4 Guangzhou style
5 Zhongshan (Zhongshan)
TABLE 2
As shown in table 2 above, the first data amount (the first data amount in this embodiment can be understood as the data amount of the first data) corresponding to each source attribute parameter is different. For example, in an actual application scenario, the guangzhou and zhongshan do not report corresponding "cost" data and "revenue" data, and the first data volume corresponding to the "region" and the first data volume corresponding to the "cost" are different. The source server determines a first amount of data corresponding to each source attribute parameter.
SA340: splitting the P source attribute parameters according to the first data quantity corresponding to each source attribute parameter in the P source attribute parameters to obtain N source sub-tables and unit data mapped to each source sub-table, wherein the unit data of each source sub-table comprises at least one source attribute parameter and the first data corresponding to the source attribute parameter, so that the data quantity of each source sub-table is smaller than or equal to the target data quantity.
For example, in step SA330, it is determined that the target number of the source sub-table is 3, the target data size is 200, the source server splits the P source attribute parameters according to the first data size corresponding to each parameter, for example, the data size corresponding to "ID" is "40", the first data size corresponding to "region" is "150", the data size corresponding to "cost" is "80", the data size corresponding to "revenue" is "80", and the data size of the split source sub-table cannot exceed "200", and one splitting method is to form "ID" and "region" into the first source sub-table, form "ID" and "cost" into the second source sub-table, and form "ID" and "revenue" into the third source sub-table. Where "ID" is a common source attribute parameter in the three source sub-tables, and may also be understood as an index to the three source sub-tables.
In another implementation manner, in step SA330, specifically, the P source attribute parameters are split according to the association relationship between the P source attribute parameters corresponding to the first data amount of each source attribute parameter in the P source attribute parameters, and the source attribute parameters with the association relationship are split into the same source sub-table, so as to obtain N source sub-tables and unit data mapped to each source sub-table. For example, "cost" and "revenue" have an association relationship, i.e., "revenue-cost=profit", and the data amount of cost and revenue are both 80, and "cost" and "revenue" can be split into the same source sub-table.
The source server can automatically split the P source attribute parameters according to the first data volume corresponding to each source attribute parameter in the P attributes to obtain N source sub-tables, and different source data can be dynamically split into different numbers of source sub-tables.
In this embodiment, the source server determines the target data amount of the time window according to the transmission rate, then determines the target amount of the source sub-table according to the total data amount and the target data amount, and splits the source data according to the source attribute parameters, so as to further refine the split data amount of the data to be synchronized, shorten the duration of synchronizing the data between the source server and the target server, and obtain higher data synchronization efficiency.
Further, after splitting to obtain N source sub-tables and unit data mapped to each source sub-table, the source server matches the unit data of each source sub-table with the source data, and verifies whether the split unit data is matched with the source data or not; if the unit data is matched with the source data, outputting information for indicating that the splitting is successful; if the unit data is not matched with the source data, outputting error reporting information for prompting a worker to find a reason.
The information of successful splitting or error reporting information output by the source server can be received by the client, and the client receives the information of successful splitting, so that the splitting unit data is effective; after receiving the error reporting information, the client can output an error searching instruction so as to enable the source server to search the error reason.
In the embodiment, whether the splitting process is successful or not is verified by matching the unit data of the source sub-table with the source data, and the accuracy and the effectiveness of the data are ensured.
In one embodiment, before the source server sends N units of data to the target server in parallel in step SA50, the method includes: for N split unit data, N independent data synchronization tasks are created, and each data synchronization task corresponds to the unit data of one source sub-table.
Specifically, the SQOOP fetch tasks are created, and N independent data synchronization tasks are created for the split source sub-tables respectively, so that when the independent data synchronization tasks are synchronously executed, performance influence among the tasks can be reduced.
In an embodiment, as shown in fig. 4, the present invention further provides a data receiving method, in which a target server in the synchronization data is used as an execution subject, and the target server is used as a receiver of the data to be synchronized, and specifically performs the following steps:
SB10: and receiving a data synchronization request sent by the source server, wherein the data synchronization request carries the number of the source sub-tables and the data quantity of the unit data stored in the source sub-tables.
The source sub-table is obtained by splitting the table structure of the source data by the source server according to a time window, the table structure of the source data and the total data volume of the source data. The time window, the table structure of the source data, and the total data amount of the source data are understood with reference to the above embodiments, and are not described herein.
SB20: and establishing a target table for storing the unit data according to the data quantity of each unit data, the quantity of the source sub-tables and the storage space condition of the source sub-tables, wherein the target table comprises a plurality of partition tables.
The target server may be described by taking HIVE as an example, where each partition of the target table corresponds to a corresponding directory under the table, and data of all partitions is stored in the corresponding directory. Partition tables are understood to be folders created on the system, placing classification data under different folders. When the partition table is created, the table can be declared to be the partition table through a keyword, the partition is performed according to a field type, all records with consistent type values are stored in one partition, the partition can be performed according to a plurality of columns, and the partition can be continued for the data of one partition according to some columns.
The target server needs to determine a target storage area for storing the unit data according to the storage space condition of the target server, the data quantity of each data and the quantity of the source sub-tables, and map the unit data into the target tables in the target storage area.
SB30: and in the time window, receiving N unit data and source attribute parameters corresponding to each unit data in parallel, wherein N is a positive integer greater than or equal to 2.
The target server receives N unit data in parallel, wherein the N unit data are data in a source partition table, and the received source attribute parameters are used as attribute parameters in the partition table. For example, the source attribute parameter of one of the partition tables includes "cost".
SB40: mapping each unit data and source attribute parameter in the N unit data into a partition table of a corresponding target table, wherein the partition table has different structures from the source partition table, and the target table comprises a partition table and an additional partition table, and the additional partition table corresponds to the partition table; and carrying out data processing on the first data corresponding to the source attribute parameters to obtain processed second data, and mapping the second data to the additional partition table.
Specifically, HIVE is a distributed storage system, where the target table includes a partition table, and the target table is mapped to the partition table in the target storage area according to the data amount of each unit data and the source attribute parameter corresponding to the unit data. The target table is equivalent to a directory and the partition table may be equivalent to a subdirectory below the target table.
The unit data is stored to a corresponding partition table, which includes source attribute parameters.
For example, a partition table established in the target server is shown in table 3 below:
ID Cost of Nutrient and receipts
1 60 100
2 140 200
3 150 300
TABLE 3 Table 3
The above table 3 is merely an illustrative example, and does not limit the present application.
The table structure corresponding to the source data is different from the table structure of the target table in the source server, and the different cases include the following two cases:
in the first, different case, the target table continues partitioning, adding additional partition tables.
Specifically, first, the target server may receive a service processing instruction of the client.
And then, the target server carries out decomposition, merging or association processing data processing on the first data corresponding to the source attribute parameters according to the service processing instruction to obtain processed second data. For example, in table 3, the first data corresponding to the source attribute parameter "cost" are "60", "140" and "150", and according to some application scenarios, for example, the cost of all the areas needs to be determined, that is, the first data of "cost" is subjected to merging processing, so as to obtain second data, where the second data is the data after the first data is merged, and if the second data is "350". Or in another application scenario, for example, the first data (e.g. 60) of the horizontal line corresponding to each ID is data of one quarter, and the first data may be decomposed to obtain data of average month. For example, average data per month is 20; or in another application scenario, if the number of source attribute parameters is at least two and the two source attribute parameters (e.g., the first source attribute parameter and the second source attribute parameter) have an association relationship, the first source attribute parameter and the second source attribute parameter may be subjected to an association process, where the association process includes, but is not limited to, a difference process, a sum process, a ratio process, and so on. Taking the above table 3 as an example, the "cost" is a first source attribute parameter, the "revenue" is a second source attribute parameter, and the association relationship between the "cost" and the "revenue" is: revenue-cost = profit, profit 40, 60, 150.
And finally, the target server establishes an additional partition table in the target table, and the second attribute corresponds to the second data. The additional partition table may be understood as a column or row added on the basis of the partition table, changing the structure of the source partition table, for example, adding a column with an attribute parameter of "profit" on the basis of table 3, or adding a column with an attribute parameter of "total cost", etc., mapping the second data to the additional partition table.
Second different case: the target server determines the structure of the partition table based on the unit data actually received. For example, in table 3, there are 6 rows in the table structure of the source data, when the table structure of the source data is split, there are only 4 rows of data in one source sub-table obtained, and when the target server receives the table data of the source sub-table, the partition table determined according to the table data of the source sub-table actually received is as shown in table 3 above, and only 4 rows of data are present, and the structure of the partition table is different from that of the source sub-table.
In this embodiment, the target server receives a data synchronization request sent by the source server, where the data synchronization request carries the number of source sub-tables and the data amount of unit data stored in the source sub-tables; the target server establishes a target table for storing the unit data according to the data quantity of each unit data, the quantity of the source sub-tables and the storage space condition of the target server, and in a time window, the target server receives N unit data and source attribute parameters corresponding to each unit data in parallel; the target server maps each unit data of the N unit data and the source attribute parameter to a partition table of a corresponding target table. The target server receives the N unit data in parallel in the same time period, and stores each unit data in the N unit data into a target table in a corresponding target storage area, so that the time period for synchronizing the data between the source server and the target server is greatly shortened, and the data synchronization task can be completed in a designated time window. And a target table is established in the target storage area, the target table comprises a plurality of partition tables, table data in the partition tables can be quickly gathered in one table of the target table, the source data can be used in subsequent tasks after synchronization is completed, merging is not needed, and data processing timeliness is shortened.
Further, the target server verifies the unit data of each source sub-table and the table data in the partition table, and the verification comprises verification of the data quantity and row-column level verification. The data synchronized by the partition is ensured to be completely effective, and each time one part of data synchronized by the partition is verified, the partition data in the current target storage area is immediately effective, and all the table data of all the source sub-tables are not required to be completely synchronized.
In this embodiment, the table data in the current partition table is immediately and effectively after synchronization is completed, and the subsequent task can be performed without completely synchronizing all the table data of the source partition table, thereby improving the timeliness of the subsequent processing data.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.
In an embodiment, the present invention further provides a data transmission device, where the data transmission device corresponds to one of the data transmission methods in the above embodiment. As shown in fig. 5, the data transmitting apparatus includes a receiving module, a calculating module, a splitting module, a request transmitting module, and a data transmitting module. The functional modules are described in detail as follows:
the receiving module is used for receiving a data synchronization instruction sent by the client and a time window sent by the target server, wherein the time window is the time for transmitting the source data to be synchronized.
And the calculation module is used for calculating the table structure and total data quantity of the source data.
The splitting module is used for splitting the table structure and the source data according to the time window, the table structure of the source data and the total data volume to obtain N source sub-tables and unit data mapped to each source sub-table, wherein N is a positive integer greater than or equal to 2.
The request sending module is used for sending a data synchronization request to the target server, wherein the data synchronization request carries the number of the source sub-tables and the data volume of the unit data stored in the source sub-tables; the data synchronization request is used to instruct the target server to build a target table for the storage unit data.
The data sending module is used for sending the unit data of the N source sub-tables to the target server in parallel in a time window, so that the target server receives the N unit data in parallel in the same time period, each unit data in the N unit data is mapped to a corresponding partition table in the target table, and the structures of the source sub-tables and the partition tables are different.
The specific limitation of the data transmission apparatus may be referred to the limitation of the data transmission method hereinabove, and will not be described herein. Each of the modules in the data transmission apparatus described above may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or independent of a processor in a server, or may be stored in software in a memory in the server, so that the processor may call and execute operations corresponding to the above modules.
In an embodiment, the present invention further provides a data receiving device, where the data receiving device corresponds to one of the data receiving methods in the above embodiment. As shown in fig. 6, the data transmitting apparatus includes a receiving module, a table building module, a data receiving module and a processing module. The functional modules are described in detail as follows:
The receiving module is used for receiving a data synchronization request and a time window request sent by the source server, wherein the data synchronization request carries the number of source sub-tables and the data amount of unit data stored in the source sub-tables, and the time window is the time for transmitting the source data to be synchronized.
The table building module is used for building a target table for storing the unit data according to the data quantity of each unit data, the quantity of the source partition tables and the storage space condition of the source partition tables, wherein the target table comprises a plurality of partition tables.
The data receiving module is used for receiving N unit data and source attribute parameters corresponding to each unit data in parallel in a time window, wherein N is a positive integer greater than or equal to 2;
the processing module is used for mapping each unit data and source attribute parameter in the N unit data into a partition table of a corresponding target table, the structure of the partition table is different from that of the source partition table, and the target table comprises a partition table and an additional partition table, and the additional partition table corresponds to the partition table; and the method is also used for carrying out data processing on the first data corresponding to the source attribute parameters to obtain processed second data, and mapping the second data to the additional partition table.
For specific limitations of the data receiving apparatus, reference may be made to the above limitations of the data receiving method, and no further description is given here. Each of the modules in the above-described data receiving apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or independent of a processor in a server, or may be stored in software in a memory in the server, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the steps of the data transmission method and the steps of the data receiving method in the foregoing embodiments when executing the computer program, or implements the functions of each module of the data transmission apparatus and the functions of each module of the data receiving apparatus in the foregoing embodiments when executing the computer program, and are not repeated herein.
In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, where the computer program when executed by a processor implements the steps of the data transmission method and the steps of the data receiving method in the above embodiments, or where the computer program when executed by a processor implements the functions of the modules of the data transmission device and the functions of the modules of the data receiving device in the above embodiments, which are not repeated herein.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims (8)

1. A data transmission method, comprising:
Determining a table structure of source data to be synchronized and total data quantity;
determining a time window for transmitting the source data;
Splitting the table structure and the source data according to the time window, the table structure of the source data and the total data amount to obtain N source sub-tables and unit data mapped to each source sub-table, wherein N is a positive integer greater than or equal to 2;
Transmitting a data synchronization request to a target server, wherein the data synchronization request carries the number of the source sub-tables and the data quantity of the unit data stored in the source sub-tables; the data synchronization request is used for instructing the target server to establish a target table for storing the unit data;
In the time window, the unit data of the N source sub-tables are sent to the target server in parallel, so that the target server receives the N unit data in parallel in the same time period, each unit data in the N unit data is mapped to a corresponding partition table in the target table, and the source sub-tables and the partition tables are different in structure;
splitting the table structure and the source data according to the time window, the table structure of the source data and the total data amount to obtain N source sub-tables and unit data mapped to each source sub-table, including:
determining a target data quantity to be synchronously transmitted in the time window according to the current transmission rate;
determining a target number of the source sub-table according to the total data amount and the target data amount;
determining a first data volume corresponding to each source attribute parameter in P source attribute parameters, wherein P is a positive integer greater than or equal to 2;
splitting the P source attribute parameters according to the first data volume corresponding to each source attribute parameter in the P source attribute parameters to obtain N source sub-tables and unit data mapped to each source sub-table, wherein the unit data of each source sub-table comprises at least one source attribute parameter and the first data corresponding to the source attribute parameter so that the data volume of each source sub-table is smaller than or equal to the target data volume;
Splitting the P source attribute parameters according to the first data volume corresponding to each source attribute parameter in the P source attribute parameters includes:
Splitting the P source attribute parameters according to the first data amount corresponding to each source attribute parameter in the P source attribute parameters and the association relation between the P source attribute parameters, and splitting the source attribute parameters with the association relation into the same source sub-table to obtain the N source sub-tables and unit data mapped to each source sub-table.
2. The data transmission method as claimed in claim 1, wherein before said transmitting said N unit data in parallel to said target server, comprising: and establishing N mutually independent data synchronization tasks for the N split unit data, wherein each data synchronization task corresponds to the unit data of one source sub-table.
3. The data transmission method according to any one of claims 1 to 2, wherein the method further comprises:
matching the unit data of each source sub-table with the source data, and verifying whether the unit data after splitting are matched;
if the unit data is matched with the source data, outputting information for indicating that the splitting is successful; and if the unit data is not matched with the source data, outputting error reporting information for prompting a worker to find a reason.
4. A data receiving method, comprising:
Receiving a data synchronization request sent by a source server, wherein the data synchronization request carries the number of source sub-tables and the data volume of unit data stored in the source sub-tables;
establishing a target table for storing the unit data according to the data quantity of each unit data, the quantity of the source sub-tables and the storage space condition of the source sub-tables, wherein the target table comprises a plurality of partition tables;
Receiving N unit data and source attribute parameters corresponding to each unit data in parallel in a time window, wherein N is a positive integer greater than or equal to 2;
Mapping each unit data and the source attribute parameter in N unit data into the partition table of the corresponding target table, wherein the partition table has a different structure from the source partition table, and the target table comprises the partition table and an additional partition table, and the additional partition table corresponds to the partition table; performing data processing on the first data corresponding to the source attribute parameters to obtain processed second data, and mapping the second data to the additional partition table;
Determining a target data quantity to be synchronously transmitted in the time window according to the current transmission rate; determining the target number of the source sub-table according to the total data amount and the target data amount; determining a first data volume corresponding to each source attribute parameter in P source attribute parameters, wherein P is a positive integer greater than or equal to 2; splitting the P source attribute parameters according to the first data volume corresponding to each source attribute parameter in the P source attribute parameters to obtain N source sub-tables and unit data mapped to each source sub-table, wherein the unit data of each source sub-table comprises at least one source attribute parameter and the first data corresponding to the source attribute parameter so that the data volume of each source sub-table is smaller than or equal to the target data volume;
Splitting the P source attribute parameters according to the first data volume corresponding to each source attribute parameter in the P source attribute parameters includes:
Splitting the P source attribute parameters according to the first data amount corresponding to each source attribute parameter in the P source attribute parameters and the association relation between the P source attribute parameters, and splitting the source attribute parameters with the association relation into the same source sub-table to obtain the N source sub-tables and unit data mapped to each source sub-table.
5. A data transmission apparatus for implementing the method of claim 1, the apparatus comprising:
The receiving module is used for receiving a data synchronization instruction sent by the client and a time window sent by the target server, wherein the time window is the time for transmitting the source data to be synchronized;
the calculation module is used for calculating the table structure and total data quantity of the source data;
the splitting module is used for splitting the table structure and the source data according to the time window, the table structure of the source data and the total data amount to obtain N source sub-tables and unit data mapped to each source sub-table, wherein N is a positive integer greater than or equal to 2;
the request sending module is used for sending a data synchronization request to a target server, wherein the data synchronization request carries the number of the source sub-tables and the data volume of the unit data stored in the source sub-tables; the data synchronization request is used for instructing the target server to establish a target table for storing the unit data;
And the data sending module is used for sending the unit data of the N source sub-tables to the target server in parallel in the time window, so that the target server receives the N unit data in parallel in the same time period, each unit data in the N unit data is mapped to a corresponding partition table in the target table, and the source sub-table and the partition table are different in structure.
6. A data receiving device for implementing the method of claim 4, the device comprising:
The device comprises a receiving module, a source server and a time window, wherein the receiving module is used for receiving a data synchronization request and a time window request sent by the source server, the data synchronization request carries the number of source sub-tables and the data quantity of unit data stored in the source sub-tables, and the time window is the time for transmitting the source data to be synchronized;
the table building module is used for building a target table for storing the unit data according to the data volume of each unit data, the number of the source sub-tables and the storage space condition of the source sub-tables, wherein the target table comprises a plurality of sub-tables;
The data receiving module is used for receiving N unit data and source attribute parameters corresponding to each unit data in parallel in the time window, wherein N is a positive integer greater than or equal to 2;
The processing module is used for mapping each unit data and the source attribute parameter in the N unit data into the partition table of the corresponding target table, the structure of the partition table is different from that of the source partition table, the target table comprises the partition table and an additional partition table, and the additional partition table corresponds to the partition table; and the data processing module is also used for carrying out data processing on the first data corresponding to the source attribute parameters to obtain processed second data, and mapping the second data to the additional partition table.
7. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, realizes the steps of the data transmission method according to any one of claims 1 to 3 or the data reception method according to claim 4.
8. A computer-readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the data transmission method according to any one of claims 1 to 3 or the steps of the data reception method according to claim 4.
CN202010092301.7A 2020-02-14 2020-02-14 Data transmitting and receiving method, and corresponding device, equipment and storage medium Active CN111427950B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010092301.7A CN111427950B (en) 2020-02-14 2020-02-14 Data transmitting and receiving method, and corresponding device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010092301.7A CN111427950B (en) 2020-02-14 2020-02-14 Data transmitting and receiving method, and corresponding device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111427950A CN111427950A (en) 2020-07-17
CN111427950B true CN111427950B (en) 2024-08-02

Family

ID=71547064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010092301.7A Active CN111427950B (en) 2020-02-14 2020-02-14 Data transmitting and receiving method, and corresponding device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111427950B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112463325B (en) * 2020-11-25 2024-05-24 政采云有限公司 Cloud native parameter mapping method, device, equipment and readable storage medium
CN115834654B (en) * 2023-02-22 2023-05-05 广东广宇科技发展有限公司 Efficient data transmission method based on multiple mapping

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239580A (en) * 2014-10-13 2014-12-24 武汉大学 General single-field split data extraction method and device based on value-column mapping

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7624120B2 (en) * 2004-02-11 2009-11-24 Microsoft Corporation System and method for switching a data partition
US10540366B2 (en) * 2017-03-09 2020-01-21 Bank Of America Corporation Transforming data structures and data objects for migrating data between databases having different schemas

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239580A (en) * 2014-10-13 2014-12-24 武汉大学 General single-field split data extraction method and device based on value-column mapping

Also Published As

Publication number Publication date
CN111427950A (en) 2020-07-17

Similar Documents

Publication Publication Date Title
US11068449B2 (en) Data migration method, apparatus, and storage medium
CN108769212B (en) Data synchronization method and device, computer equipment and storage medium
CN111245548B (en) Data synchronization method and device based on time stamp and computer equipment
CN109766349B (en) Task duplicate prevention method, device, computer equipment and storage medium
CN105447046A (en) Distributed system data consistency processing method, device and system
EP3373158B1 (en) Data storage method and coordinator node
CN111427950B (en) Data transmitting and receiving method, and corresponding device, equipment and storage medium
CN111190901B (en) Business data storage method and device, computer equipment and storage medium
CN111767297B (en) Big data processing method, device, equipment and medium
CN110633306B (en) Service data processing method, device, computer equipment and storage medium
WO2020253122A1 (en) Data verification method and device, computer equipment and storage medium
CN110795171B (en) Service data processing method, device, computer equipment and storage medium
CN108512948B (en) Address book updating method and device, computer equipment and storage medium
CN110737719A (en) Data synchronization method, device, equipment and computer readable storage medium
CN110851477B (en) Stream data processing method, stream data processing device, computer equipment and storage medium
CN111209061B (en) User information filling method, device, computer equipment and storage medium
CN108389124B (en) Data processing method, data processing device, computer equipment and storage medium
CN110659272A (en) Data cleaning method and system
CN113377789A (en) Processing method and device for database change data, computer equipment and medium
CN113065887A (en) Resource processing method, resource processing device, computer equipment and storage medium
CN109408532B (en) Data acquisition method, device, computer equipment and storage medium
CN111522881A (en) Service data processing method, device, server and storage medium
CN112818021B (en) Data request processing method, device, computer equipment and storage medium
CN110162542B (en) Data page turning method and device based on cassandra, computer equipment and storage medium
CN112328615A (en) Data updating method, device, system, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant