CN111930821A

CN111930821A - One-step data exchange method, device, equipment and storage medium

Info

Publication number: CN111930821A
Application number: CN202010935803.1A
Authority: CN
Inventors: 宋天喜; 郭钊铭; 丁忠伟; 牟小欢; 安文强; 王斌
Original assignee: Ping An International Smart City Technology Co Ltd
Current assignee: Ping An International Smart City Technology Co Ltd
Priority date: 2020-09-08
Filing date: 2020-09-08
Publication date: 2020-11-13

Abstract

The invention discloses a one-step data exchange method, a device, equipment and a storage medium, which relate to the technical field of data processing, and the method comprises the following steps: determining a data source table, a target data table and a data exchange starting time point; transferring inventory data generated between a data exchange starting time point and a data cutting time point to the target data table; monitoring an archive log of a data source table, and transferring incremental data generated after a data cutting time point to the target data table; performing K-means clustering on a target data set in a target data table to obtain a corresponding data clustering result; and acquiring data demand information of each service server, and distributing each data clustering cluster in the data clustering result to the service servers. The invention can reduce the complexity of data exchange operation configuration, improve the timeliness of data exchange, simultaneously meet the requirements of near real-time incremental data and stock data exchange, and can also carry out classification distribution according to the requirements, thereby improving the working efficiency and reducing the cost.

Description

One-step data exchange method, device, equipment and storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a one-step data exchange method, apparatus, device, and storage medium.

Background

With the continuous development of the information age, the information exchange of different departments and different regions is gradually increased, and the development of the computer network technology provides guarantee for information transmission. Data sharing exchanges enable data to be shared between different principals for various operations, calculations, and analyses on the data.

The data sharing and exchange are realized, the existing data resources can be used more fully, and the repeated labor and cost of data collection, data acquisition and the like are reduced. The data sharing and exchanging play an important role in opening each data island of the organization structure and realizing smooth circulation of the organization structure data.

In the prior art, the data sharing and exchanging technology mainly adopts the following two modes:

1. an offline scheduling-type ETL (Extract-Transform-Load) data synchronization based on a data integration technology, an ETL mode is suitable for a large data volume exchange scene, but due to the characteristic of scheduling triggering, the real-time performance is poor, and near real-time incremental data exchange is not supported.

2. The real-time log analysis CDC (Change-Data-Capture) Data synchronization based on the database log has good real-time performance, but has poor business control and does not support the stock Data exchange of the source table of the legacy source library.

From the above, the two existing data exchange technologies can only solve unilateral business requirements and pain points, but cannot be integrated, and cannot simultaneously meet the requirements of near real-time incremental data and stock data exchange, and the existing data exchange technologies cannot perform classified distribution according to requirements when data is distributed.

Disclosure of Invention

The embodiment of the invention provides a one-step data exchange method, a one-step data exchange device, one-step data exchange equipment and a storage medium, and aims to solve the problems that the existing data exchange technology cannot simultaneously meet near real-time incremental data and stock data exchange and cannot perform classified distribution according to requirements.

In a first aspect, an embodiment of the present invention provides a one-step data exchange method, including:

determining a data source table, a target data table and a data exchange starting time point to be exchanged, and taking the current time as a data cutting time point;

transferring stock data generated between a data exchange starting time point and a data cutting time point in the data source table to the target data table through an ETL task execution module;

monitoring an archive log of a data source table through a CDC task execution module, and transferring incremental data generated after a data cutting time point in the data source table to the target data table according to the archive log;

performing K-means clustering on the target data set in the target data table to obtain a corresponding data clustering result; the data clustering result comprises a plurality of data clustering clusters, and each data clustering cluster corresponds to one clustered data label;

acquiring data demand information of each service server, and distributing each data clustering cluster in the data clustering result to the corresponding service server according to the data demand information; the data demand information of each service server comprises a plurality of clustering data labels.

In a second aspect, an embodiment of the present invention provides a one-step data exchange apparatus, including:

the configuration unit is used for determining a data source table, a target data table and a data exchange starting time point to be exchanged and taking the current time as a data cutting time point;

the ETL execution unit is used for transferring stock data generated between a data exchange starting time point and a data cutting time point in the data source table to the target data table through the ETL task execution module;

the CDC execution unit is used for monitoring an archive log of a data source table through a CDC task execution module and transferring incremental data generated after a data cutting time point in the data source table to the target data table according to the archive log;

the clustering unit is used for carrying out K-means clustering on the target data set in the target data table to obtain a corresponding data clustering result; the data clustering result comprises a plurality of data clustering clusters, and each data clustering cluster corresponds to one clustered data label;

the distribution unit is used for acquiring data demand information of each service server and distributing each data clustering cluster in the data clustering result to the corresponding service server according to the data demand information; the data demand information of each service server comprises a plurality of clustering data labels.

In a third aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the one-step data exchange method according to the first aspect when executing the computer program.

In a fourth aspect, the present invention provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program causes the processor to execute the one-step data exchange method according to the first aspect.

The embodiment of the invention provides a one-step data exchange method, a device, equipment and a storage medium, wherein a data source table, a target data table and a data exchange starting time point to be exchanged are determined, and the current time is taken as a data cutting time point; transferring stock data generated between a data exchange starting time point and a data cutting time point in the data source table to the target data table through an ETL task execution module; monitoring an archive log of a data source table through a CDC task execution module, and transferring incremental data generated after a data cutting time point in the data source table to the target data table; performing K-means clustering on the target data set in the target data table to obtain a corresponding data clustering result; and acquiring data demand information of each service server, and distributing each data clustering cluster in the data clustering result to the corresponding service server according to the data demand information. According to the embodiment, the configuration complexity of data exchange operation can be effectively reduced, the data exchange timeliness is improved, the requirements of near real-time incremental data and stock data exchange are met, classification and distribution can be carried out according to the requirements, the working efficiency is improved, and the cost is reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of a one-step data exchange method according to an embodiment of the present invention;

fig. 2 is a schematic sub-flowchart of a one-step data exchange method according to an embodiment of the present invention;

fig. 3 is another sub-flowchart of a one-step data exchange method according to an embodiment of the present invention;

fig. 4 is another sub-flowchart of a one-step data exchange method according to an embodiment of the present invention;

fig. 5 is another sub-flowchart of a one-step data exchange method according to an embodiment of the present invention;

fig. 6 is a schematic block diagram of a one-step data exchange method and apparatus according to an embodiment of the present invention;

fig. 7 is a schematic block diagram of a sub-unit of a one-step data exchange method apparatus according to an embodiment of the present invention;

fig. 8 is a schematic block diagram of another subunit of a one-step data exchange method apparatus according to an embodiment of the present invention;

fig. 9 is a schematic block diagram of another subunit of a one-step data exchange method apparatus according to an embodiment of the present invention;

fig. 10 is a schematic block diagram of another subunit of a one-step data exchange method apparatus according to an embodiment of the present invention;

FIG. 11 is a schematic block diagram of a computer device provided by an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Referring to fig. 1, fig. 1 is a one-step data exchange method according to an embodiment of the present invention, which includes steps S101 to S105:

s101, determining a data source table, a target data table and a data exchange starting time point to be exchanged, and taking the current time as a data cutting time point;

the step needs to perform operation configuration firstly: selecting a data source table, a target data table, a data exchange starting time point, and taking the current time as a data cutting time point.

The data source table is a data source from which data needs to be exchanged to another data table. The target data table is another data table mentioned above, and is a destination of data exchange. The data source table may be deployed in a first server (the first server may be a service server deployed in a certain subsidiary company of the group), the target data table is deployed in a data sharing switching platform (the data sharing switching platform may be understood as a total service server of the group), and the data synchronized from the data source table may be synchronized again to the service servers deployed in other subsidiary companies of the group through the data sharing switching platform.

The data exchange starting time point refers to a time point at which data exchange is started, and the time point may be an earlier time point before the current time, that is, a certain historical time.

The current time is used as a data cutting time point, that is, data generated before the data cutting time point is used as stock data, and data generated after the data cutting time point is used as incremental data.

In this embodiment, the main key and the control field may also be preset, so that data meeting the requirements may be selectively exchanged, and of course, other configuration attributes may also be set as required.

After the above contents are configured, a data acquisition and exchange task can be generated. The data acquisition and exchange task comprises a data source table, a target data table, a data exchange starting time point and a data cutting time point, and other configuration attributes can be included according to needs.

In this embodiment, the data source table and the target data table may be the same database or different databases, that is, the present embodiment supports heterogeneous data acquisition and exchange. In addition, the data source table and the target data table can be stored in the form of a database, and can also exist in the form of files, such as a txt file, an excel file, an xml file, and the like.

S102, transferring stock data generated between a data exchange starting time point and a data cutting time point in the data source table to the target data table through an ETL task execution module;

in the step, the ETL task execution module exchanges data in the data source table to the target data table, the exchanged data object is stock data, and the stock data is data generated from the data exchange starting time point to the data cutting time point.

In one embodiment, as shown in fig. 2, the step S102 includes:

s201, extracting stock data generated between a data exchange starting time point and a data cutting time point in a data source table into a data temporary table;

in this step, the stock data may be extracted according to a preset extraction frequency, and in order to improve the extraction efficiency, an extraction time may be set to avoid a peak period of pulling the log, thereby avoiding interruption of the extraction process.

S202, cleaning and converting the extracted stock data;

data cleansing is the processing of data that is not needed and is not in compliance with specifications. The data cleaning is carried out after the data extraction, so that the situation that some required data are cleaned by mistake in the extraction link can be avoided, and meanwhile, the extracted original data can be checked again.

The objects of data cleaning are mainly incomplete data (missing values), error data (abnormal values), repeated data, different types of processing data needing normalization and the like.

The step of cleaning the extracted stock data for the data object includes: null value processing, data correctness verification, data format specification, data transcoding and one or more data standardization processing;

null processing is to replace null values with specific values or to filter out directly according to traffic needs. That is, incomplete data such as the name of the supplier, the name of the branch company, the missing of regional information of the client, the failure of matching between the main table and the detailed table in the business system, etc.

Verifying data correctness is a unifying process of unsatisfactory data, such as filtering out non-date strings in a date field, or replacing strings in a field representing a quantity with 0, etc. The processing is performed on some error data, and the reason for generating the error data is that a service system is not sound enough, and the error data is not judged after receiving input and is directly written into a background database, so that correction is needed in the process.

The standard data format is to perform standard processing on the data format, such as formatting all date and time into a yyyy-MM-dd HH: MM: ss format;

data transcoding is the conversion of a field represented by a code in data into a value representing its true meaning by associating a code table.

The data normalization process is to normalize the data representation, for example, the data source table represents male and female, i.e., "male" and "female", and when cleaning is performed here, it is necessary to normalize the data representation to specific values, for example, "1" represents male and "0" represents female. The normalization process also includes normalization, which is to change the absolute number into a relative number. Since the absolute numbers of the different dimensions are not comparable, it is necessary at this time to convert the absolute numbers to relative numbers against a standard. The normalization processing mode mainly comprises the mode of most value normalization, mean variance normalization, nonlinear normalization and the like.

In the actual process, multiple rounds of data cleaning may be required, i.e., cleaning is performed repeatedly to find some unsatisfactory data and process it.

Converting the extracted inventory data includes: one or both of data type conversion and data format conversion.

Because the requirements of the data source table and the target data table are different, the databases are different, and the final purposes are different, data type conversion and data format conversion are required.

The data type conversion refers to converting the type of stock data, for example, the representation mode of the same type of data in different business systems may be different, for example, the code of the same supplier in a settlement system is XX0001, and the code in another system is YY0001, so that the codes need to be converted to achieve the purpose of unification.

In addition, the data type conversion also includes conversion of data granularity, for example, if the data in the data source table belongs to detail data, and the target data table only needs relatively rough data, then the stock data can be aggregated according to the granularity of the target data table, so as to achieve the purpose of data reduction.

The data format conversion refers to adjusting the data format to a target format.

In addition, in this step, in addition to the processing of data cleansing and data conversion, other data processing procedures, such as data encryption and decryption, field mapping, data calculation, data replacement, data filtering, data merging and data splitting, may be added as needed. The present embodiment can package these functions into components and set them as required, so that the system has corresponding functions, and data can be shared between the components through a data bus.

The present embodiment may perform the operations of data cleansing, data conversion, and the like through SQL statements. For example, a where condition is added to a query statement of SQL for filtering, a renamed field name in the query is mapped with a destination table, a substr function, case condition judgment, and the like.

And S203, loading the converted stock data from the data temporary table to the target data table.

After data extraction, cleansing, and conversion, the inventory data needs to be loaded into the target data table for use.

Data loading is the last step of data transfer. The manner in which data is loaded depends on the type of operation being performed and how much data needs to be loaded. The embodiment provides two loading modes: the first is to operate directly through SQL statements; the second is to load data in batches using a loading tool. And loading the converted stock data from the data temporary table to the target data table through an SQL statement or by adopting a loading tool.

The loading mode by using the SQL statement is simple and easy to use, for example, the operation is performed by using insert, update and delete statements, but some functions and purposes cannot be realized by the SQL statement. When the log is loaded in this way, the log is recorded, so that the log is easy to recover.

The second method may be loading by using tools such as bcp and bulk, or loading by using a batch loading tool or API of the relational database, and the second method is preferably used in this embodiment, so that the loading efficiency is high. In the loading process by adopting the method, data can be loaded by adopting a multithreading parallel processing mode, so that the program running efficiency is improved.

S103, monitoring an archive log of a data source table through a CDC task execution module, and transferring incremental data generated after a data cutting time point in the data source table to the target data table according to the archive log;

in this embodiment, the CDC architecture is based on a publisher/subscriber model. The publisher captures the incremental data and provides it to the subscriber. The subscriber uses the incremental data obtained from the publisher. The publisher first needs to identify the data source tables needed to capture the incremental data. The captured delta data is then saved in the created change table. By storing the incremental data in the change table, access to the incremental data by the subscriber may be facilitated. The subscriber may extract the desired target data from the incremental data because the incremental data may be of different types or characteristics, and the subscriber may only need certain types or characteristics of data.

In the step, incremental data are synchronized by using CDC, so that the influence on the system is small, and the existing service is not influenced without causing too much pressure on a service system. And the monitoring range is large, thereby being beneficial to tracking the change of the table and realizing the root-tracing and the source-tracing of the table operation. The operation is relatively simple, and the corresponding synchronous operation can be realized without complex business processing logic.

Similar to the ETL task execution module described above, the first step of the CDC task execution module is to extract data, where the extracted data is incremental data generated after the data cut time point. That is, the present embodiment extracts data in a time stamp manner.

Specifically, a timestamp field may be added to the data source table, and a timestamp may be added when the data in the data source table is updated. When data extraction is carried out, whether corresponding data are extracted or not can be determined by comparing the time stamp of each piece of data with the data cutting time point.

When the method is adopted, the filing log of the data source table can be monitored, and after the stock data in the data source table is extracted, the incremental data needs to be synchronized in time.

In particular, incremental data capture functionality may be enabled for the database in which the data source table resides. After this process is performed, incremental data capture functionality may be enabled for any data source table in the database. The capture of incremental data function may record insert, update, and delete operations in the data source table. Namely, the database is used as a scope to start the incremental data capture function, so as to realize the incremental data capture of any data source table in the database.

In one embodiment, as shown in fig. 3, the step S103 includes:

s301, acquiring a data source table to be transferred, creating a change table, setting a publisher and a subscriber, subscribing the data source table for the subscriber and activating a subscription process;

the step is to determine a data source table so as to transfer the incremental data in the data source table to the target data table. The incremental data may be defined to include: data inserted in the data source table after the data cut time point; data updated in the data source table after the data cut time point; data deleted in the data source table after the data cut time point.

Each data source table requires a change table. The name of the change set may be specifically passed as a variable. A change set is a collection of change tables.

The publisher is a database user who creates and maintains change tables. The role of the publisher is to identify and extract the incremental data and provide it to subscribers. In order to be able to subscribe to incremental data, the subscriber must have SELECT rights on the data source table and the change table.

There may be multiple subscribers, and each subscriber may be subscribed to the required data source table.

In addition, fields which can be subscribed in the subscribed data source table can be set, so that a subscriber can only see the fields specified by the subscribed data source table. To implement this function, the present embodiment can subscribe via a subscription handle. The subscription handle enables the subscriber to manage the required change tables and fields. The subscription handle returns the handle value in the form of a variable, for example, a variable is defined in the session to receive the handle value. After the subscription handle is established, the desired incremental data can be subscribed to. When subscribing to incremental data, data source tables and fields can be specified.

When the data source tables and fields are specified, the subscription process may be activated. Even if multiple tables are subscribed to, the subscription need only be activated once.

S302, setting a view display window, adding a subscriber view into the view display window, accessing and extracting incremental data in a change table according to the subscriber view, and then deleting the subscriber view and clearing the CDC window.

Data in the data source table changes continuously, for example, operations such as insertion, update, or deletion occur, and the present embodiment may view the data change in the data source table through the view display window.

After the view display window is set, a view is also provided to the subscriber so that the subscriber can view the incremental data. For example, one view may be set for each data source table to which a subscriber subscribes.

The subscriber view will display the incremental data needed. Of course, other additional information may also be included, such as the type of operation, the time of the operation, the operator, etc., depending on the setting. The subscriber view may then be accessed and the incremental data required extracted as needed for loading into the target data table.

After the incremental data is extracted and loaded, the subscriber view can be deleted and the view display window cleared, which can facilitate subsequent creation of a new view display window to view other incremental data.

In the embodiment, the ETL and CDC data exchange modes are integrated to form an integrated data acquisition and exchange mechanism, mass stock data are quickly synchronized, incremental data are synchronized in near real time, the requirement of complete data synchronization from a data source table to a target table is met, and the stock and incremental data synchronization and connection do not need to be considered respectively. The embodiment can effectively reduce the complexity of data exchange operation configuration, improve the timeliness of data exchange, improve the working efficiency and reduce the cost.

S104, performing K-means clustering on the target data set in the target data table to obtain a corresponding data clustering result; the data clustering result comprises a plurality of data clustering clusters, and each data clustering cluster corresponds to one clustered data label;

after an integrated data acquisition and exchange mechanism is formed by integrating the ETL and CDC data exchange modes, the data in the data source table is synchronized to the target data table, namely the data is stored in the data sharing and exchange platform. After the data is stored in the data sharing exchange platform, in order to increase the function of distributing the data in the target data table to the corresponding target server, the data can be clustered and classified in the data sharing exchange platform.

In one embodiment, as shown in fig. 4, the step S104 includes:

s401, selecting target data with the same number as the preset number of clustering clusters in a target data set, and taking the selected target data as an initial clustering center of each cluster;

s402, dividing the target data set according to the difference value of each data in the target data set and each initial clustering center to obtain an initial clustering result;

s403, acquiring the adjusted clustering center of each cluster according to the initial clustering result;

s404, according to the adjusted clustering center, according to the different value of each data in the target data set and the adjusted clustering center, the target data set is divided again until the same clustering result is kept for more times than the preset times, and a data clustering result corresponding to the preset clustering number is obtained.

In this embodiment, when clustering is performed on the target data set, one field (for example, economic strength of a user) of the target data is selected as a primary key, and the other fields are selected as attribute data. After the clustering classification is completed, the rapid grouping of massive target data sets can be realized, and a plurality of clustering clusters are obtained to form a data clustering result.

In the embodiment of the invention, the number k of clustering clusters can be determined firstly; then randomly selecting k target data from the target data set as an initial clustering center of each cluster; and then calculating the dissimilarity value of each data in the target data set and each initial clustering center, and then preliminarily dividing the target data set according to the dissimilarity value to obtain an initial clustering result. And then acquiring the adjusted clustering center of each cluster according to the initial clustering result, and re-dividing according to the similar steps until the clustering result keeps the same times more than the preset times, thereby obtaining a data clustering result corresponding to the preset clustering cluster number.

The dissimilarity value may be a distance between data, and the smaller the dissimilarity value, the more similar the data, so the division of data is performed according to the dissimilarity value.

In this case, for example, after the target data set is divided by using the label of the economic strength of the user as the main attribute, each cluster in the data clustering result corresponds to a level of consumption strength. At this time, according to the requirements of other service servers (the other service servers are the same as the first server and are also service servers deployed in subsidiaries of the group) on the client group, the data sharing and exchanging platform sends each data clustering cluster in the data clustering result to the corresponding service server.

S105, acquiring data demand information of each service server, and distributing each data clustering cluster in the data clustering result to the corresponding service server according to the data demand information; the data demand information of each service server comprises a plurality of clustering data labels.

The data required by each service server is different, so when the service server requests the data, the data required information needs to be clarified and added to the request, and after the request of the service server is received, the corresponding data required information can be obtained and then the data is distributed in a targeted manner.

In one embodiment, as shown in fig. 5, the step S105 includes:

s501, acquiring data demand information of each service server;

s502, analyzing the data demand information, and extracting a clustering data label in the data demand information;

s503, matching the corresponding data clustering cluster according to the clustering data label;

s504, distributing the data clustering cluster to a corresponding service server.

In this embodiment, since each service server has a cluster data tag that is correspondingly needed, at this time, the data demand information of each service server may be analyzed in the data sharing switching platform first, and after a data set composed of the data cluster needed by each service server is determined, the data sharing switching platform sends the data set to the corresponding service server.

For example, the data demand information corresponding to the service server a is a million annual salary group and a fifty-thousand annual salary group, and at this time, a cluster data label meeting the condition may be selected from the cluster data labels corresponding to the data cluster clusters to obtain a data set formed by the corresponding data cluster clusters, and the data set is sent to the service server a by the data sharing exchange platform. The data distribution synchronization process for other service servers is also referred to the above process.

Embodiments of the present invention further provide a one-step data exchange device, where the one-step data exchange device is configured to perform any one of the embodiments of the one-step data exchange method. Specifically, referring to fig. 6, fig. 6 is a schematic block diagram of a one-step data exchange device according to an embodiment of the present invention. The one-step data exchange device 600 may be configured in a server.

The one-step data exchange apparatus 600 may include:

a configuration unit 601, configured to determine a data source table, a target data table, and a data exchange start time point to be exchanged, and use a current time as a data cutting time point;

an ETL executing unit 602, configured to transfer, by an ETL task executing module, inventory data generated between a data exchange starting time point and a data cutting time point in the data source table to the target data table;

the CDC execution unit 603 is configured to monitor an archive log of a data source table through a CDC task execution module, and transfer incremental data generated after a data cutting time point in the data source table to the target data table according to the archive log;

a clustering unit 604, configured to perform K-means clustering on the target data set in the target data table to obtain a corresponding data clustering result; the data clustering result comprises a plurality of data clustering clusters, and each data clustering cluster corresponds to one clustered data label;

the distributing unit 605 is configured to obtain data demand information of each service server, and distribute each data cluster in the data clustering result to a corresponding service server according to the data demand information; the data demand information of each service server comprises a plurality of clustering data labels.

In one embodiment, as shown in fig. 7, the ETL execution unit 602 includes:

an extracting unit 701, configured to extract stock data generated between a data exchange start time point and a data cutting time point in the data source table into the data temporary table;

a purge conversion unit 702 for purging and converting the extracted stock data;

a loading unit 703, configured to load the converted stock data from the temporary data table into the target data table.

In an embodiment, as shown in fig. 8, the CDC execution unit 603 includes:

a first setting unit 801, configured to acquire a data source table to be transferred, create a change table, set a publisher and a subscriber, subscribe the data source table for the subscriber, and activate a subscription process;

a second setting unit 802, configured to set a view display window, add a subscriber view in the view display window, access and extract incremental data in the change table according to the subscriber view, then delete the subscriber view, and clear the CDC window.

In an embodiment, as shown in fig. 9, the clustering unit 604 includes:

a selecting unit 901, configured to select, in a target data set, target data with the same number as a preset number of clustered clusters, and use the selected target data as an initial clustering center of each cluster;

an initial dividing unit 902, configured to divide the target data set according to a difference value between each data in the target data set and each initial clustering center, so as to obtain an initial clustering result;

a center obtaining unit 903, configured to obtain an adjusted clustering center of each cluster according to the initial clustering result;

and a re-partitioning unit 904, configured to re-partition the target data set according to the adjusted clustering center and according to a difference value between each data in the target data set and the adjusted clustering center until the clustering result keeps the same number of times more than a preset number of times, so as to obtain a data clustering result corresponding to the preset number of clustering clusters.

In one embodiment, as shown in fig. 10, the distribution unit 605 includes:

an information obtaining unit 1001 configured to obtain data requirement information of each service server;

the extracting unit 1002 is configured to analyze the data requirement information and extract a clustered data tag therein;

a matching unit 1003, configured to match a corresponding data cluster according to the clustered data tag;

a cluster distributing unit 1004, configured to distribute the data cluster to a corresponding service server.

In one embodiment, purging the extracted inventory data includes: null value processing, data correctness verification, data format specification, data transcoding and one or more data standardization processing; converting the extracted inventory data includes: one or both of data type conversion and data format conversion.

In one embodiment, the loading unit 703 includes:

and the loading subunit is used for loading the converted stock data from the data temporary table to the target data table through an SQL statement or by adopting a loading tool.

The one-step data exchange device 600 provided by the embodiment of the invention can effectively reduce the configuration complexity of data exchange operation, improve the timeliness of data exchange, improve the working efficiency and reduce the cost.

The one-step data exchange apparatus 600 described above may be implemented in the form of a computer program that can be run on a computer device as shown in fig. 11.

Referring to fig. 11, fig. 11 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer device 11 is a server, and the server may be an independent server or a server cluster composed of a plurality of servers.

Referring to fig. 11, the computer device 1100 includes a processor 1102, memory and network interface 1105 connected by a system bus 1101, where the memory may include non-volatile storage media 1103 and internal memory 1104.

The non-volatile storage medium 1103 may store an operating system 11031 and computer programs 11032. The computer programs 11032, when executed, may cause the processor 1102 to perform a one-step data exchange method.

The processor 1102 is configured to provide computing and control capabilities that support the operation of the overall computing device 1100.

The internal memory 1104 provides an environment for the execution of the computer program 11032 in the non-volatile storage medium 1103, and the computer program 11032, when executed by the processor 1102, may cause the processor 1102 to perform a one-step data exchange method.

The network interface 1105 is used for network communications, such as to provide for the transmission of data information. Those skilled in the art will appreciate that the configuration shown in fig. 11 is a block diagram of only a portion of the configuration associated with aspects of the present invention and is not intended to limit the computing device 1100 to which aspects of the present invention may be applied, and that a particular computing device 1100 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

Wherein the processor 1102 is configured to run a computer program 11032 stored in the memory to implement the following functions:

In an embodiment, the processor 1102, when executing the step of transferring the stock data generated between the data exchange starting time point and the data cutting time point in the data source table to the target data table by the ETL task execution module, performs the following operations: extracting stock data generated between the data exchange starting time point and the data cutting time point in the data source table into a data temporary table; cleaning and converting the extracted stock data; and loading the converted stock data into the target data table from the data temporary table.

In an embodiment, when the step of monitoring the archive log of the data source table by the CDC task execution module and transferring the incremental data generated after the data cutting time point in the data source table to the target data table according to the archive log is executed, the processor 1102 performs the following operations: acquiring a data source table to be transferred, creating a change table, setting a publisher and a subscriber, subscribing the data source table for the subscriber and activating a subscription process; setting a view display window, adding a subscriber view into the view display window, accessing and extracting incremental data in a change table according to the subscriber view, and then deleting the subscriber view and clearing the CDC window.

In an embodiment, when the step of performing K-means clustering on the target data set in the target data table to obtain a corresponding data clustering result is performed, the processor 1102 performs the following operations: selecting target data with the same number as the preset number of clustering clusters in the target data set, and taking the selected target data as an initial clustering center of each cluster; dividing the target data set according to the difference value of each data in the target data set and each initial clustering center to obtain an initial clustering result; obtaining the adjusted clustering center of each cluster according to the initial clustering result; and according to the adjusted clustering center and the difference value of each data in the target data set and the adjusted clustering center, re-dividing the target data set until the clustering result keeps the same times more than the preset times, and obtaining a data clustering result corresponding to the preset clustering number.

In an embodiment, when the processor 1102 performs the step of acquiring the data demand information of each service server and distributing each data cluster in the data clustering result to the corresponding service server according to the data demand information, the following operations are performed: acquiring data demand information of each service server; analyzing the data demand information, and extracting a clustering data label in the data demand information; matching the corresponding data clustering cluster according to the clustering data label; and distributing the data clustering cluster to a corresponding service server.

In one embodiment, the processor 1102, when executing the step of loading the converted inventory data from the temporary data table into the target data table, performs the following operations: and loading the converted stock data from the data temporary table to the target data table through an SQL statement or by adopting a loading tool.

Those skilled in the art will appreciate that the embodiment of a computer device illustrated in fig. 11 does not constitute a limitation on the specific construction of the computer device, and that in other embodiments a computer device may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may only include a memory and a processor, and in such embodiments, the structures and functions of the memory and the processor are consistent with those of the embodiment shown in fig. 11, and are not described herein again.

It should be appreciated that in embodiments of the present invention, the Processor 1102 may be a Central Processing Unit (CPU), and the Processor 1102 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In another embodiment of the invention, a computer-readable storage medium is provided. The computer readable storage medium may be a non-volatile computer readable storage medium. The computer-readable storage medium stores a computer program, wherein the computer program when executed by a processor implements the steps of:

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, devices and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only a logical division, and there may be other divisions when the actual implementation is performed, or units having the same function may be grouped into one unit, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method of one-step data exchange, comprising:

2. The one-step data exchange method according to claim 1, wherein the transferring, by the ETL task execution module, the stock data generated between the data exchange starting time point to the data cutting time point in the data source table to the target data table comprises:

extracting stock data generated between the data exchange starting time point and the data cutting time point in the data source table into a data temporary table;

cleaning and converting the extracted stock data;

and loading the converted stock data into the target data table from the data temporary table.

3. The method of claim 1, wherein the monitoring, by the CDC task execution module, an archive log of a data source table, and transferring incremental data generated after a data cutting time point in the data source table to the target data table according to the archive log comprises:

acquiring a data source table to be transferred, creating a change table, setting a publisher and a subscriber, subscribing the data source table for the subscriber and activating a subscription process;

setting a view display window, adding a subscriber view into the view display window, accessing and extracting incremental data in a change table according to the subscriber view, and then deleting the subscriber view and clearing the CDC window.

4. The one-step data exchange method according to claim 1, wherein the K-means clustering is performed on the target data sets in the target data table to obtain corresponding data clustering results, and the method comprises:

selecting target data with the same number as the preset number of clustering clusters in the target data set, and taking the selected target data as an initial clustering center of each cluster;

dividing the target data set according to the difference value of each data in the target data set and each initial clustering center to obtain an initial clustering result;

obtaining the adjusted clustering center of each cluster according to the initial clustering result;

and according to the adjusted clustering center and the difference value of each data in the target data set and the adjusted clustering center, re-dividing the target data set until the clustering result keeps the same times more than the preset times, and obtaining a data clustering result corresponding to the preset clustering number.

5. The one-step data exchange method according to claim 1, wherein the acquiring data demand information of each service server and distributing each data cluster in the data clustering result to a corresponding service server according to the data demand information comprises:

acquiring data demand information of each service server;

analyzing the data demand information, and extracting a clustering data label in the data demand information;

matching the corresponding data clustering cluster according to the clustering data label;

and distributing the data clustering cluster to a corresponding service server.

6. The method of one-step data exchange of claim 2, wherein purging the extracted inventory data comprises: null value processing, data correctness verification, data format specification, data transcoding and one or more data standardization processing; converting the extracted inventory data includes: one or both of data type conversion and data format conversion.

7. The method of claim 2, wherein the loading the converted inventory data from the temporary data table into the target data table comprises:

and loading the converted stock data from the data temporary table to the target data table through an SQL statement or by adopting a loading tool.

8. A one-step data exchange apparatus, comprising:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the one-step data exchange method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to carry out the one-step data exchange method according to any one of claims 1 to 7.