CN111427950A - Data transmitting and receiving method, corresponding device, equipment and storage medium - Google Patents

Data transmitting and receiving method, corresponding device, equipment and storage medium Download PDF

Info

Publication number
CN111427950A
CN111427950A CN202010092301.7A CN202010092301A CN111427950A CN 111427950 A CN111427950 A CN 111427950A CN 202010092301 A CN202010092301 A CN 202010092301A CN 111427950 A CN111427950 A CN 111427950A
Authority
CN
China
Prior art keywords
data
source
tables
target
unit data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010092301.7A
Other languages
Chinese (zh)
Other versions
CN111427950B (en
Inventor
戴建明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202010092301.7A priority Critical patent/CN111427950B/en
Publication of CN111427950A publication Critical patent/CN111427950A/en
Application granted granted Critical
Publication of CN111427950B publication Critical patent/CN111427950B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data sending and receiving method, a corresponding device, equipment and a storage medium, wherein the data sending method comprises the following steps: the method comprises the steps that a source server determines a table structure and a data volume of source data to be synchronized, determines a time window for transmitting the source data, splits the table structure according to the time window, the table structure and the data volume of the source data to obtain N source sub-tables and N unit data mapped to the source sub-tables, then sends a data synchronization request to a target server, and sends split data to the target server in parallel in the time window, so that after the target server receives the split data in parallel, each unit data and source attribute parameters in the N unit data are mapped to a partition table of a corresponding target table. The invention greatly shortens the time length of synchronizing data between the source server and the target server, thereby completing the data synchronization task in the appointed time window.

Description

Data transmitting and receiving method, corresponding device, equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data transmitting method, a data receiving method, a data transmitting device, a data receiving device, a computer device, and a computer-readable storage medium.
Background
Currently in the big data era, the big data technology is one of the core technologies of the industry internet, the amount of data stored in various application systems also presents an explosive growth situation, and in the present day, in the face of the stored massive data, it is often necessary to migrate the data between different databases for some purposes, for example, migrating the data in the ORAC L E database to the HIVE database.
In the prior art, because the data storage modes of the HIVE library and the ORAC L E library are different, a certain performance bottleneck exists when data are synchronized, for example, in the prior art, data in the ORAC L E database can be synchronized into the HIVE database in a mode of directly fetching data from the ORAC L E by using tools such as sqop, but the HIVE library needs to complete data migration in the ORAC L E library within a specified time, however, through the prior art, when the data size is not large, the data migration task can be smoothly completed, and when the data size reaches hundreds of millions, the data synchronization is difficult to complete within the specified time.
Disclosure of Invention
The invention provides a data sending method, a data receiving method, a corresponding device, equipment and a medium, which are used for solving the problem that data synchronization is difficult to complete in specified time when the data volume to be migrated reaches billions.
A data transmission method, comprising:
determining a table structure and total data volume of source data to be synchronized;
determining a time window for transmitting the source data;
splitting the table structure and the source data according to the time window, the table structure of the source data and the total data volume to obtain N source sub-tables and unit data mapped to each source sub-table, wherein N is a positive integer greater than or equal to 2;
sending a data synchronization request to a target server, wherein the data synchronization request carries the number of the source sub-tables and the data volume of unit data stored in the source sub-tables; the data synchronization request is used for instructing the target server to establish a target table for storing the unit data;
and in the time window, the unit data of the N source sub-tables are sent to the target server in parallel, so that the target server receives the N unit data in parallel in the same time period, each unit data in the N unit data is mapped to a corresponding partition table in the target table, and the structures of the source sub-tables and the partition tables are different.
A data receiving method, comprising:
receiving a data synchronization request sent by a source server, wherein the data synchronization request carries the number of source sub-tables and the data volume of unit data stored in the source sub-tables;
establishing a target table for storing the unit data according to the data volume of each unit data, the number of the source sub-tables and the storage space condition of the target table, wherein the target table comprises a plurality of partition tables;
in a time window, receiving N unit data and a source attribute parameter corresponding to each unit data in parallel, wherein N is a positive integer greater than or equal to 2;
mapping each unit data and the source attribute parameters in the N unit data into the partition table of the corresponding target table, wherein the structure of the partition table is different from that of the source partition table, and the target table comprises the partition table and an additional partition table; and performing data processing on the first data corresponding to the source attribute parameters to obtain processed second data, and mapping the second data to the additional partition table.
A data transmission apparatus, comprising:
the receiving module is used for receiving a data synchronization instruction sent by a client and a time window sent by a target server, wherein the time window is the time for transmitting source data to be synchronized;
a calculation module for calculating a table structure and a total data amount of the source data,
the splitting module is used for splitting the table structure and the source data according to the time window, the table structure of the source data and the total data volume to obtain N source sub-tables and unit data mapped to each source sub-table, wherein N is a positive integer greater than or equal to 2;
the request sending module is used for sending a data synchronization request to a target server, wherein the data synchronization request carries the number of the source sub-tables and the data volume of the unit data stored in the source sub-tables; the data synchronization request is used for instructing the target server to establish a target table for storing the unit data;
and the data sending module is used for sending the unit data of the N source sub-tables to the target server in parallel in the time window so that the target server receives the N unit data in parallel in the same time period, and mapping each unit data in the N unit data to a corresponding partition table in the target table, wherein the source sub-tables and the partition tables have different structures.
A data receiving device, comprising:
the system comprises a receiving module, a synchronization module and a synchronization module, wherein the receiving module is used for receiving a data synchronization request and a time window request sent by a source server, the data synchronization request carries the number of source sub-tables and the data volume of unit data stored in the source sub-tables, and the time window is the time for transmitting the source data to be synchronized;
the table building module is used for building a target table for storing the unit data according to the data volume of each unit data, the number of the source sub-tables and the storage space condition of the target table, and the target table comprises a plurality of sub-tables;
a data receiving module, configured to receive N unit data and a source attribute parameter corresponding to each unit data in parallel within the time window, where N is a positive integer greater than or equal to 2;
a processing module, configured to map each of the unit data and the source attribute parameters in the N unit data to the partition table of the corresponding target table, where a structure of the partition table is different from a structure of the source partition table, the target table includes the partition table and an additional partition table, and the additional partition table corresponds to the partition table; and the system is further used for performing data processing on the first data corresponding to the source attribute parameters to obtain processed second data, and mapping the second data to the additional partition table.
A computer device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, characterized in that said processor implements the steps of the above-mentioned data transmission method or the steps of the above-mentioned data reception method when executing said computer program.
A computer-readable storage medium storing a computer program, wherein the computer program realizes the steps of the data transmission method or the steps of the data reception method when executed by a processor.
The beneficial effects provided by the invention are as follows:
in the invention, a source server determines a table structure and a data volume of source data to be synchronized, determines a time window for transmitting the source data, splits the table structure according to the time window, the table structure and the data volume of the source data to obtain N source sub-tables and unit data mapped to the source sub-tables, then sends a data synchronization request to a target server, and sends the split data to the target server in parallel in the time window, so that the target server receives the split data in parallel and maps each unit data and source attribute parameters in the N unit data to a corresponding partition table of the target table, thereby greatly shortening the time length for synchronizing the data between the source server and the target server, and further finishing a data synchronization task in the specified time window.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
Fig. 1 is a schematic diagram of an application environment of a data transmission method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a data transmission method according to an embodiment of the present invention;
fig. 3 is a flowchart of an implementation of step SA30 of a data transmission method according to an embodiment of the present invention;
fig. 4 is a flow chart of a data receiving method according to an embodiment of the present invention;
FIG. 5 is a diagram of a data transmission apparatus according to an embodiment of the present invention;
FIG. 6 is a diagram of a data receiving device according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a computer device according to an embodiment of the invention.
Detailed Description
The invention provides a data synchronization method, which is applied to a data synchronization system, wherein the data synchronization system comprises a source server, a target server and a client, the client sends an instruction to the server through a network, the source server is used as a sender of data to be synchronized, and the target server is used as a receiver of the synchronized data. The client is also called a user side, and refers to a program corresponding to the server and providing local services for the client. The client may be installed on, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. Both the source server and the target server may be a server cluster.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The data synchronization method provided by the embodiment of the invention can be applied to the application environment shown in fig. 1, wherein the target server communicates with the source server through a network, and the target server and the source server perform operations such as synchronization, calculation, splitting, processing and the like on data.
In an embodiment, in order to synchronize data in a source server to a target server, a data transmission method is provided below with the source server applied in fig. 1 as an execution subject, as shown in fig. 2, and includes the specific steps of:
SA 10: the table structure and the total data volume of the source data to be synchronized are determined.
For example, the origin server may be an oracle server, the origin server receives a synchronization data instruction sent by the client, the origin server selects the origin data to be synchronized according to the synchronization data instruction, and in the origin server, the origin server maps the stored data into a table structure.
In this embodiment, the source data to be synchronized is described by taking mapping into a table as an example, the source server identifies a table structure of the source data, the table structure includes columns and rows, the columns are source attribute parameters, for example, the source attribute parameters may be "name", "age", and the like, and the source attribute parameters are different according to different services corresponding to the source data, which is not an example here. Further, the source server calculates a total data volume of the source data to be synchronized.
SA 20: a time window for transmitting the source data is determined.
The time window can be understood as a time segment having a start time and an end time. In one implementation, the source server may receive information indicating the transmission duration sent by the target server, for example, the target server sends the time information indicating the transmission duration to the source server, and the source server determines a time window for transmitting the source data according to the information. For example, if the information indicates that the transmission duration is 1 hour, the source server determines that the time window is from T1 to Tn according to the duration. In another implementation, the source server may also receive time information sent by the client, where the time information includes a start time and an end time, and the time information may indicate a transmission duration.
SA 30: and splitting the table structure and the source data according to the time window, the table structure of the source data and the total data volume to obtain N source sub-tables and unit data mapped to each source sub-table, wherein N is a positive integer greater than or equal to 2.
The source data table structure includes columns and rows, for example, as illustrated in table 1 below:
ID region of origin Cost of Revenue and earning
1 Shanghai province 60 100
2 Beijing 140 200
3 Shenzhen (Shenzhen medicine) 150 300
TABLE 1
In the time window, the source server splits according to the structure of table 1 and the total data size of table 1, for example, splits according to the source attribute parameters of "ID", "region", "cost", and "revenue" in the column, and obtains the unit data of 4 source sub-tables and 4 source sub-tables, where the unit data includes each attribute parameter and the corresponding data.
It should be noted that, in practical applications, the source data table records data up to billions, 1 ten thousand source attribute parameters, etc., and table 1 is only an example for convenience of description and is not intended to limit the present application.
SA 40: sending a data synchronization request to a target server, wherein the data synchronization request carries the number of source sub-tables and the data volume of unit data stored in the source sub-tables; the data synchronization request is used to instruct the target server to establish a target table for the storage unit data.
The source server sends a data synchronization request to the target server, for example, the number of source sub-tables carried by the data synchronization request is 4, and the data volume of each source sub-table is less than 200. The data synchronization request is used to instruct the target server to establish a target storage area for storing the unit data, and to map the unit data into a target table in the target storage area.
SA 50: and in the time window, the unit data of the N source sub-tables are sent to the target server in parallel, so that the target server receives the N unit data in parallel in the same time period, each unit data in the N unit data is mapped to a corresponding partition table in the target table, and the structures of the source sub-tables and the partition tables are different.
In a possible implementation manner, the synchronization service of the source database queries the synchronization record table of the database in a timed manner, finds out the source sub-table to be sent, converts the source sub-table into a predefined message format, and delivers the predefined message format to the queue to be sent. And the source server sends the unit data of the source sub-table to the target server in parallel.
For example, the source server side includes a management process (Manager), an extraction process (Extract), and a transmission process (Pump) corresponding to the target server side, where the Manager process controls other processes, reports errors, and the like; and the Extract process sends the unit data to be transmitted to the target server end in a data block form through a TCP/IP protocol according to the IP and the port of the target end by configuring the source table name needing synchronization.
In this embodiment, a source server determines a table structure and a data amount of source data to be synchronized, determines a time window for transmitting the source data, splits the table structure according to the time window, the table structure of the source data, and the data amount to obtain N source sub-tables and unit data mapped to the source sub-tables, then sends a data synchronization request to a target server, and sends the split data to the target server in parallel in the time window, so that the target server maps each unit data and source attribute parameter in the N unit data to a corresponding partition table of the target table after receiving the split data in parallel, which greatly shortens the time length for synchronizing data between the source server and the target server, and thus, the data synchronization task can be completed in the specified time window.
In an embodiment, as shown in fig. 3, in step SA30, the table structure and the source data are split according to the time window, the table structure and the data size to obtain N source sub-tables and the unit data mapped to each source sub-table, and the specific steps include:
SA 310: and determining the target data volume to be synchronously transmitted in the time window according to the current transmission rate.
And after the source data is transmitted in the time window, calculating a target data volume capable of being transmitted in the transmission time length according to the current transmission rate and the transmission time length indicated by the time window, wherein the target data volume is the maximum value of the data volume capable of being transmitted in the transmission time length. For example, if the current transmission rate is 10 and the transmission duration is 20, the target amount of data that can be transmitted in the transmission duration is 200. It should be noted that, in the present embodiment, the transmission data, the transmission time length, and the target data amount are only exemplary descriptions for convenience of description.
SA 320: and determining the target number of the source sub-tables according to the total data volume and the target data volume.
Determining a theoretical number of source sub-tables according to a ratio of the total data amount and the target data amount, for example, the total data amount is 600, the data amount capable of being transmitted in the transmission time duration is 200, in order to finish transmitting the source data within a specified time window, the theoretical target number of the source sub-tables is 3, the target number may be greater than or equal to the number, for example, the target number may be 3, 4, or 5, that is, the greater the number of the source sub-tables, the shorter the time duration for synchronizing the source data to the target server is, in this embodiment, the target number of the source sub-tables may be described by taking 3 source sub-tables as an example.
SA 330: and determining a first data volume corresponding to each of P source attribute parameters, wherein P is a positive integer greater than or equal to 2.
In the column of the table structure of the source data, P source attribute parameters may be included, each source attribute parameter corresponds to a plurality of specific parameter values, and in table 1 above, the source attribute parameters are ID, territory, cost, revenue, and the like, where the first data corresponding to the source attribute parameter "territory" are "shanghai", "beijing", and "shenzhen", it should be noted that, in table 1 above, the first data amount corresponding to each source attribute parameter may be the same, for example, the data amounts of the first data corresponding to "cost" (e.g., 60, 140, and 150) and the first data corresponding to "revenue" (e.g., 100, 200, and 300) are the same, but in actual applications, the data amounts corresponding to different source attribute parameters may be different, as shown in table 2 below:
ID region of origin Cost of Revenue and earning
1 Shanghai province 60 100
2 Beijing 140 200
3 Shenzhen (Shenzhen medicine) 150 300
4 Guangzhou province
5 Zhongshan mountain
TABLE 2
As shown in table 2 above, the first data amount (in this embodiment, the first data amount may be understood as the data amount of the first data) corresponding to each source attribute parameter is different. For example, in an actual application scenario, the corresponding "cost" data and "revenue" data are not reported in guangzhou and zhongshan, and the first data volume corresponding to the "region" is different from the first data volume corresponding to the "cost". The source server determines a first data volume corresponding to each source attribute parameter.
SA 340: splitting the P source attribute parameters according to the first data volume corresponding to each source attribute parameter in the P source attribute parameters to obtain N source sub-tables and unit data mapped to each source sub-table, wherein the unit data of each source sub-table comprises at least one source attribute parameter and first data corresponding to the source attribute parameter, so that the data volume of each source sub-table is smaller than or equal to the target data volume.
For example, in step SA330, it is determined that the target number of the source partial table is 3, the target data amount is 200, the source server splits the P source attribute parameters according to the first data amount corresponding to each parameter, for example, the data amount corresponding to "ID" is "40", the first data amount corresponding to "region" is "150", the data amount corresponding to "cost" is "80", the data amount corresponding to "revenue" is "80", the data amount of the split source partial table cannot exceed "200", one splitting manner is to combine "ID" and "region" into a first source partial table, "ID" and "cost" into a second source partial table, and "ID" and "revenue" into a third source partial table. Where "ID" is a common source attribute parameter in the three source sub-tables, it can also be understood as an index of the three source sub-tables.
In another implementation manner, in step SA330, specifically, the P source attribute parameters are further split according to the first data size corresponding to each source attribute parameter in the P source attribute parameters and the association relationship between the P source attribute parameters, and the source attribute parameters having the association relationship are split into the same source sub-table, so as to obtain N source sub-tables and the unit data mapped to each source sub-table. For example, "cost" and "revenue" have a correlation relationship, that is, "revenue-cost is profit", and the data volume of both cost and revenue is 80, the "cost" and "revenue" can be split into the same source sublist.
The source server can automatically split the P source attribute parameters according to the first data volume corresponding to each source attribute parameter in the P attributes to obtain N source sub-tables, and different source data can be dynamically split into different numbers of source sub-tables.
In this embodiment, the source server determines the target data amount of the time window according to the transmission rate, then determines the target number of the source sub-tables according to the total data amount and the target data amount, and then splits the source data according to the source attribute parameters, thereby further refining the split data amount of the data to be synchronized, shortening the time length for synchronizing the data between the source server and the target server, and obtaining higher data synchronization efficiency.
Further, after the source server splits and obtains N source sub-tables and the unit data mapped to each source sub-table, matching the unit data of each source sub-table with the source data, and verifying whether the unit data after the splitting is matched with the source data; if the unit data is matched with the source data, outputting information for indicating successful splitting; and if the unit data are not matched with the source data, outputting error reporting information for prompting a worker to search the reason.
The information of successful splitting or error reporting information output by the source server can be received by the client, and the client receives the information of successful splitting, so that the data of the splitting and putting-in unit is effective; after receiving the error reporting information, the client can output a search error instruction so that the source server can search for the error reason.
In the embodiment, whether the splitting process is successful or not is verified by matching the unit data of the source sub-table with the source data, so that the accuracy and the validity of the data are ensured.
In an embodiment, before the source server sends the N unit data to the target server in parallel, step SA50 includes: and aiming at the N split unit data, creating N mutually independent data synchronization tasks, wherein each data synchronization task corresponds to the unit data of one source sub-table.
Specifically, the SQOOP fetch task is created, N independent data synchronization tasks are created for the split source sub-tables, and when the independent data synchronization tasks are executed synchronously, performance influence between the tasks can be reduced.
In an embodiment, the present invention further provides a data receiving method, as shown in fig. 4, the method uses a target server in the synchronized data as an execution subject for description, where the target server is a receiver of the data to be synchronized, and specifically executes the following steps:
SB 10: and receiving a data synchronization request sent by the source server, wherein the data synchronization request carries the number of the source sub-tables and the data volume of the unit data stored in the source sub-tables.
The source sub-table is obtained by splitting a table structure of the source data by the source server according to a time window, the table structure of the source data and the total data volume of the source data. Please refer to the above embodiments for understanding the time window, the table structure of the source data, and the total data size of the source data, which are not described herein.
SB 20: and establishing a target table for storing the unit data according to the data volume of each unit data, the number of the source sub-tables and the storage space condition of the target table, wherein the target table comprises a plurality of sub-tables.
The target server may be described by taking a HIVE as an example, in the HIVE, each partition of the target table corresponds to a corresponding directory under the table, and data of all the partitions are stored in the corresponding directories. The partition table may be understood as establishing folders on the system, placing classification data under different folders. When creating the partition table, it may be stated through a keyword that the table is a partition table, and the partition is performed according to field type, and all records with consistent type values are stored in one partition, or the partition may be performed according to a plurality of columns, that is, data of a certain partition may be continuously partitioned according to some columns.
The target server needs to determine a target storage area for storing the unit data according to the storage space condition of the target server, the data volume of each data and the number of the source sub-tables, and map the unit data into a target table in the target storage area.
SB 30: and in the time window, receiving N unit data and a source attribute parameter corresponding to each unit data in parallel, wherein N is a positive integer greater than or equal to 2.
The target server receives N unit data in parallel, the N unit data are data in the source sub-table, and the received source attribute parameters are used as attribute parameters in the sub-table. For example, the source attribute parameter of one of the partition tables includes "cost".
SB 40: mapping each unit data and source attribute parameters in the N unit data into a partition table of a corresponding target table, wherein the partition table has a different structure from that of the source partition table, the target table comprises a partition table and an additional partition table, and the additional partition table corresponds to the partition table; and performing data processing on the first data corresponding to the source attribute parameters to obtain processed second data, and mapping the second data to the additional partition table.
Specifically, the HIVE is a distributed storage system, the target table includes a partition table, and the partition table is mapped in the target storage area according to the data volume of each unit data and the source attribute parameters corresponding to the unit data. The target table is equivalent to a directory and the partition table may be equivalent to a subdirectory below the target table.
And storing the unit data to a corresponding partition table, wherein the partition table comprises a source attribute parameter.
For example, one partition table established in the target server is shown in table 3 below:
ID cost of Revenue and earning
1 60 100
2 140 200
3 150 300
TABLE 3
Table 3 is an exemplary example for convenience of description, and does not limit the present application.
The table structure corresponding to the source data in the source server is different from that of the target table, and the different situations include the following two situations:
in a first, different scenario, partitioning continues in the target table, adding an additional partition table.
Specifically, first, the target server may receive a service processing instruction of the client.
Then, the target server carries out decomposition, merging or associated processing data processing on the first data corresponding to the source attribute parameters according to the service processing instruction to obtain processed second data. For example, in table 3, the first data corresponding to the source attribute parameter "cost" is "60", "140", and "150", and according to some application scenarios, for example, the cost of all the areas needs to be determined, that is, the first data of "cost" is merged to obtain the second data, which is the data merged with respect to the first data, for example, the second data is "350". Alternatively, in another application scenario, for example, the first data (e.g. 60) of the row corresponding to each ID is data of a quarter, and the first data may be decomposed to obtain data of each month on average. For example, the average monthly data is 20; or, in another application scenario, if the number of the source attribute parameters is at least two, and the two source attribute parameters (e.g., the first source attribute parameter and the second source attribute parameter) have an association relationship, the first source attribute parameter and the second source attribute parameter may be associated, where the association process includes, but is not limited to, difference processing, sum processing, ratio processing, and the like. Taking the above table 3 as an example, "cost" is the first source attribute parameter, "revenue" is the second source attribute parameter, and the correlation between "cost" and "revenue" is: profit-cost is profit, and profit is 40, 60, 150.
And finally, the target server establishes an additional partition table in the target table, and the second attribute corresponds to the second data. The additional partition table may be understood as an additional column or row on the basis of the partition table, which changes the structure of the source partition table, for example, on the basis of table 3, which adds a column whose attribute parameter is "profit" or an additional column whose attribute parameter is "total cost", etc., and the second data is mapped to the additional partition table.
A second, different case: the target server determines the structure of the partition table based on the actually received unit data. For example, in table 3 above, there are 6 rows in the table structure of the source data, and after the table structure of the source data is split, only 4 rows of data are in one obtained source sub-table, and after the target server receives the table data of the source sub-table, the partition table determined according to the table data actually received from the source sub-table is as shown in table 3 above, and there are only 4 rows of data, and the partition table structure is different from the structure of the source sub-table.
In this embodiment, a target server receives a data synchronization request sent by a source server, where the data synchronization request carries the number of source sub-tables and the data amount of unit data stored in the source sub-tables; the target server establishes a target table for storing the unit data according to the data volume of each unit data, the number of the source sub-tables and the storage space condition of the target server, and receives N unit data and the source attribute parameters corresponding to each unit data in parallel in a time window; and the target server maps each unit data and the source attribute parameters in the N unit data into the partition table of the corresponding target table. The target server receives the N unit data in parallel in the same time period, and stores each unit data in the N unit data into a target table in a corresponding target storage area, so that the time length for synchronizing the data between the source server and the target server is greatly shortened, and the data synchronization task can be completed in a specified time window. And a target table is established in the target storage area, the target table comprises a plurality of partition tables, the table data in the partition tables can be rapidly gathered in one table of the target table, and the source data can be used in the subsequent tasks after being synchronously completed without being merged, so that the data processing timeliness is shortened.
Further, the target server verifies the unit data of each source sub-table and the table data in the partition table, and the verification comprises verification of data volume and row-column level verification. The synchronized data is ensured to be completely effective, and when the verification of each synchronized data is completed, the partition data in the current target storage area is immediately effective without the need of completely synchronizing the table data of all the source sub-tables.
In the embodiment, the table data in the current partition table is effective immediately after being synchronized, and the table data in all the source partition tables is effective without being synchronized completely, so that subsequent tasks can be performed, and the time efficiency of subsequent processing data is improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
In an embodiment, the present invention further provides a data sending apparatus, where the data sending apparatus corresponds to the data sending method in the foregoing embodiment one to one. As shown in fig. 5, the data sending apparatus includes a receiving module, a calculating module, a splitting module, a request sending module, and a data sending module. The functional modules are explained in detail as follows:
and the receiving module is used for receiving a data synchronization instruction sent by the client and a time window sent by the target server, wherein the time window is the time for transmitting the source data to be synchronized.
And the calculation module is used for calculating the table structure and the total data volume of the source data.
And the splitting module is used for splitting the table structure and the source data according to the time window, the table structure of the source data and the total data volume to obtain N source sub-tables and unit data mapped to each source sub-table, wherein N is a positive integer greater than or equal to 2.
The request sending module is used for sending a data synchronization request to the target server, wherein the data synchronization request carries the number of the source sub-tables and the data volume of the unit data stored in the source sub-tables; the data synchronization request is used to instruct the target server to establish a target table for the storage unit data.
And the data sending module is used for sending the unit data of the N source sub-tables to the target server in parallel in a time window so that the target server receives the N unit data in parallel in the same time period, and mapping each unit data of the N unit data to a corresponding partition table in the target table, wherein the structures of the source sub-tables and the partition tables are different.
For specific limitations of the data transmission apparatus, reference may be made to the above limitations of the data transmission method, which are not described herein again. The modules in the data transmission device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the server, and can also be stored in a memory in the server in a software form, so that the processor can call and execute operations corresponding to the modules.
In an embodiment, the present invention further provides a data receiving apparatus, where the data receiving apparatus corresponds to the data receiving method in the foregoing embodiment one to one. As shown in fig. 6, the data transmission device includes a receiving module, a table building module, a data receiving module, and a processing module. The functional modules are explained in detail as follows:
the receiving module is used for receiving a data synchronization request and a time window request sent by the source server, wherein the data synchronization request carries the number of the source sub-tables and the data volume of the unit data stored in the source sub-tables, and the time window is the time for transmitting the source data to be synchronized.
And the table building module is used for building a target table for storing the unit data according to the data volume of each unit data, the number of the source sub-tables and the storage space condition of the target table, and the target table comprises a plurality of sub-tables.
The data receiving module is used for receiving N unit data and the source attribute parameter corresponding to each unit data in parallel in a time window, wherein N is a positive integer greater than or equal to 2;
the processing module is used for mapping each unit data and the source attribute parameters in the N unit data into a partition table of a corresponding target table, the structure of the partition table is different from that of the source partition table, the target table comprises a partition table and an additional partition table, and the additional partition table corresponds to the partition table; and the system is also used for carrying out data processing on the first data corresponding to the source attribute parameters to obtain processed second data and mapping the second data to the additional partition table.
For specific limitations of the data receiving apparatus, reference may be made to the above limitations of the data receiving method, which are not described herein again. The respective modules in the data receiving apparatus may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the server, and can also be stored in a memory in the server in a software form, so that the processor can call and execute operations corresponding to the modules.
In an embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the steps of the data sending method and the steps of the data receiving method in the foregoing embodiment are implemented, or when the processor executes the computer program, the functions of each module of the data sending apparatus and the functions of each module of the data receiving apparatus in the foregoing embodiment are implemented, and in order to avoid repetition, details are not repeated here.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, and the computer program, when being executed by a processor, implements the steps of the data sending method and the steps of the data receiving method in the foregoing embodiment, or the computer program, when being executed by the processor, implements the functions of each module of the data sending apparatus and the functions of each module of the data receiving apparatus in the foregoing embodiment, and in order to avoid repetition, the details are not repeated here.
It will be understood by those of ordinary skill in the art that all or a portion of the processes of the methods of the embodiments described above may be implemented by a computer program that may be stored on a non-volatile computer-readable storage medium, which when executed, may include the processes of the embodiments of the methods described above, wherein any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (10)

1. A data transmission method, comprising:
determining a table structure and total data volume of source data to be synchronized;
determining a time window for transmitting the source data;
splitting the table structure and the source data according to the time window, the table structure of the source data and the total data volume to obtain N source sub-tables and unit data mapped to each source sub-table, wherein N is a positive integer greater than or equal to 2;
sending a data synchronization request to a target server, wherein the data synchronization request carries the number of the source sub-tables and the data volume of unit data stored in the source sub-tables; the data synchronization request is used for instructing the target server to establish a target table for storing the unit data;
and in the time window, the unit data of the N source sub-tables are sent to the target server in parallel, so that the target server receives the N unit data in parallel in the same time period, each unit data in the N unit data is mapped to a corresponding partition table in the target table, and the structures of the source sub-tables and the partition tables are different.
2. The data sending method according to claim 1, wherein the splitting the table structure and the source data according to the time window, the table structure of the source data, and the total data volume to obtain N source sub-tables and unit data mapped to each source sub-table includes:
determining the target data volume to be synchronously transmitted in the time window according to the current transmission rate;
determining the target number of the source sub-tables according to the total data volume and the target data volume;
determining a first data volume corresponding to each source attribute parameter in P source attribute parameters, wherein P is a positive integer greater than or equal to 2;
splitting the P source attribute parameters according to the first data volume corresponding to each of the P source attribute parameters to obtain the N source sub-tables and unit data mapped to each source sub-table, wherein the unit data of each source sub-table comprises at least one source attribute parameter and the first data corresponding to the source attribute parameter, so that the data volume of each source sub-table is smaller than or equal to the target data volume.
3. The data sending method according to claim 2, wherein the splitting the P source attribute parameters according to the first data amount corresponding to each of the P source attribute parameters includes:
splitting the P source attribute parameters according to the first data volume corresponding to each source attribute parameter in the P source attribute parameters and the incidence relation between the P source attribute parameters, splitting the source attribute parameters with the incidence relation into the same source sub-table, and obtaining the N source sub-tables and the unit data mapped to each source sub-table.
4. The data transmission method according to claim 1, wherein before the transmitting the N unit data in parallel to the target server, the method includes: and aiming at the N split unit data, creating N mutually independent data synchronization tasks, wherein each data synchronization task corresponds to the unit data of one source sub-table.
5. The data transmission method according to any one of claims 1 to 4, characterized in that the method further comprises:
matching the unit data of each source sub-table with the source data, and verifying whether the split unit data are matched or not;
if the unit data is matched with the source data, outputting information for indicating that the splitting is successful; and if the unit data is not matched with the source data, outputting error reporting information for prompting a worker to search reasons.
6. A data receiving method, comprising:
receiving a data synchronization request sent by a source server, wherein the data synchronization request carries the number of source sub-tables and the data volume of unit data stored in the source sub-tables;
establishing a target table for storing the unit data according to the data volume of each unit data, the number of the source sub-tables and the storage space condition of the target table, wherein the target table comprises a plurality of partition tables;
in a time window, receiving N unit data and a source attribute parameter corresponding to each unit data in parallel, wherein N is a positive integer greater than or equal to 2;
mapping each unit data and the source attribute parameters in the N unit data into the corresponding partition table of the target table, wherein the partition table has a different structure from that of the source partition table, the target table comprises the partition table and an additional partition table, and the additional partition table corresponds to the partition table; and performing data processing on the first data corresponding to the source attribute parameters to obtain processed second data, and mapping the second data to the additional partition table.
7. A data transmission apparatus, comprising:
the receiving module is used for receiving a data synchronization instruction sent by a client and a time window sent by a target server, wherein the time window is the time for transmitting source data to be synchronized;
a calculation module for calculating a table structure and a total data amount of the source data,
the splitting module is used for splitting the table structure and the source data according to the time window, the table structure of the source data and the total data volume to obtain N source sub-tables and unit data mapped to each source sub-table, wherein N is a positive integer greater than or equal to 2;
the request sending module is used for sending a data synchronization request to a target server, wherein the data synchronization request carries the number of the source sub-tables and the data volume of the unit data stored in the source sub-tables; the data synchronization request is used for instructing the target server to establish a target table for storing the unit data;
and the data sending module is used for sending the unit data of the N source sub-tables to the target server in parallel in the time window so that the target server receives the N unit data in parallel in the same time period, and mapping each unit data in the N unit data to a corresponding partition table in the target table, wherein the source sub-tables and the partition tables have different structures.
8. A data receiving device, comprising:
the system comprises a receiving module, a synchronization module and a synchronization module, wherein the receiving module is used for receiving a data synchronization request and a time window request sent by a source server, the data synchronization request carries the number of source sub-tables and the data volume of unit data stored in the source sub-tables, and the time window is the time for transmitting the source data to be synchronized;
the table building module is used for building a target table for storing the unit data according to the data volume of each unit data, the number of the source sub-tables and the storage space condition of the target table, and the target table comprises a plurality of sub-tables;
a data receiving module, configured to receive N unit data and a source attribute parameter corresponding to each unit data in parallel within the time window, where N is a positive integer greater than or equal to 2;
a processing module, configured to map each of the unit data and the source attribute parameters in the N unit data to the partition table of the corresponding target table, where a structure of the partition table is different from a structure of the source partition table, the target table includes the partition table and an additional partition table, and the additional partition table corresponds to the partition table; and the system is further used for performing data processing on the first data corresponding to the source attribute parameters to obtain processed second data, and mapping the second data to the additional partition table.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the data transmission method according to any one of claims 1 to 4 or the steps of the data reception method according to any one of claims 6 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of a data transmission method according to one of claims 1 to 4 or a data reception method according to one of claims 6 to 7.
CN202010092301.7A 2020-02-14 2020-02-14 Data transmitting and receiving method, and corresponding device, equipment and storage medium Active CN111427950B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010092301.7A CN111427950B (en) 2020-02-14 2020-02-14 Data transmitting and receiving method, and corresponding device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010092301.7A CN111427950B (en) 2020-02-14 2020-02-14 Data transmitting and receiving method, and corresponding device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111427950A true CN111427950A (en) 2020-07-17
CN111427950B CN111427950B (en) 2024-08-02

Family

ID=71547064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010092301.7A Active CN111427950B (en) 2020-02-14 2020-02-14 Data transmitting and receiving method, and corresponding device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111427950B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112463325A (en) * 2020-11-25 2021-03-09 政采云有限公司 Cloud native parameter mapping method, device, equipment and readable storage medium
CN115834654A (en) * 2023-02-22 2023-03-21 广东广宇科技发展有限公司 Data efficient transmission method based on multiple mappings

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050187897A1 (en) * 2004-02-11 2005-08-25 Microsoft Corporation System and method for switching a data partition
CN104239580A (en) * 2014-10-13 2014-12-24 武汉大学 General single-field split data extraction method and device based on value-column mapping
US20180260458A1 (en) * 2017-03-09 2018-09-13 Bank Of America Corporation Transforming Data Structures and Data Objects for Migrating Data Between Databases Having Different Schemas

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050187897A1 (en) * 2004-02-11 2005-08-25 Microsoft Corporation System and method for switching a data partition
CN104239580A (en) * 2014-10-13 2014-12-24 武汉大学 General single-field split data extraction method and device based on value-column mapping
US20180260458A1 (en) * 2017-03-09 2018-09-13 Bank Of America Corporation Transforming Data Structures and Data Objects for Migrating Data Between Databases Having Different Schemas

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112463325A (en) * 2020-11-25 2021-03-09 政采云有限公司 Cloud native parameter mapping method, device, equipment and readable storage medium
CN112463325B (en) * 2020-11-25 2024-05-24 政采云有限公司 Cloud native parameter mapping method, device, equipment and readable storage medium
CN115834654A (en) * 2023-02-22 2023-03-21 广东广宇科技发展有限公司 Data efficient transmission method based on multiple mappings

Also Published As

Publication number Publication date
CN111427950B (en) 2024-08-02

Similar Documents

Publication Publication Date Title
US20140229628A1 (en) Cloud-based streaming data receiver and persister
CN110413595B (en) Data migration method applied to distributed database and related device
CN108833610B (en) Information updating method, device and system
US11226982B2 (en) Synchronization of offline instances
CN111490843B (en) Time checking method and device, computer equipment and storage medium
CN110633306B (en) Service data processing method, device, computer equipment and storage medium
CN111427950A (en) Data transmitting and receiving method, corresponding device, equipment and storage medium
CN108629050B (en) Service data adjustment method, device, computer equipment and storage medium
CN111339183A (en) Data processing method, edge node, data center and storage medium
CN111338834B (en) Data storage method and device
CN117271147A (en) Data synchronous processing method and device
CN104717197A (en) Session management system, session management apparatus, and session management method
CN114661823A (en) Data synchronization method and device, electronic equipment and readable storage medium
CN115809301B (en) Database processing method and device, electronic equipment and readable storage medium
US10320626B1 (en) Application discovery and dependency mapping
CN110909072B (en) Data table establishment method, device and equipment
KR20200046316A (en) Web application server, method for handling user request and method for handling intergrated request
CN113872994B (en) Organization architecture synchronization method, device, computer equipment and storage medium
CN112328615A (en) Data updating method, device, system, server and storage medium
US11157454B2 (en) Event-based synchronization in a file sharing environment
CN113360689A (en) Image retrieval system, method, related device and computer program product
CN111221858B (en) Data processing method, device and equipment
CN108551484B (en) User information synchronization method, device, computer device and storage medium
CN112988806A (en) Data processing method and device
CN113377831B (en) Resource data query method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant