CN118364028A - Data synchronization method and device - Google Patents

Data synchronization method and device Download PDF

Info

Publication number
CN118364028A
CN118364028A CN202410478482.5A CN202410478482A CN118364028A CN 118364028 A CN118364028 A CN 118364028A CN 202410478482 A CN202410478482 A CN 202410478482A CN 118364028 A CN118364028 A CN 118364028A
Authority
CN
China
Prior art keywords
data
synchronized
value
view
index item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410478482.5A
Other languages
Chinese (zh)
Inventor
周隽杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202410478482.5A priority Critical patent/CN118364028A/en
Publication of CN118364028A publication Critical patent/CN118364028A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data synchronization method and a data synchronization device, which are applied to the technical field of data synchronization and comprise the following steps: acquiring a data view to be synchronized; inquiring a data importance grading table to determine the importance level of the data to be synchronized; determining alarm parameters corresponding to the data to be synchronized; calculating the value of at least one index item according to the data to be synchronized; determining suspicious data in the data to be synchronized, and sending alarm information to an upstream system and a downstream system; generating multiple alternative views of the data view to be synchronized according to the field name of the data to be synchronized and the value of at least one index item; determining an alternative view from the multiple alternative views, and correcting the data views to be synchronized according to the determined alternative view; the invention can reduce manual intervention program, quickly locate and advance alarm data, and effectively overall manage different data by providing view.

Description

Data synchronization method and device
Technical Field
The present invention relates to the field of data synchronization technologies, and in particular, to a data synchronization method and apparatus.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
In internet, financial and banking systems, batch synchronization of data from an upstream system to a downstream system is a very common scenario, and is generally classified into incremental synchronization and full synchronization according to modes, and is generally classified into daily synchronization, weekly synchronization, monthly synchronization and annual synchronization according to frequencies. In architecture, there is usually direct synchronization between upstream and downstream, or each upstream system distributes data to some centralized or distributed data warehouse, and then the data warehouse synchronizes to each downstream system, so that for enterprises with huge scale, most of the architecture specifications currently adopt the latter form.
Taking a banking system as an example, the data of a few closely related systems or some special scenes is removed by adopting direct synchronization of upstream and downstream data, and most of the adopted data synchronization modes are transferred through a data warehouse. The upstream source system supplies data to the data warehouse, which is called a data exchange interface, the downstream system subscribes data to the data warehouse, which can perform custom processing or direct forwarding on the data according to the data consumption requirement of the downstream system, which is called a data integration requirement interface.
The disadvantages of the form of direct synchronization upstream and downstream are:
1. Depending on upstream unloading number transmission and downstream loading and warehousing, manual intervention is needed when a program in a certain link goes wrong.
2. The method can not effectively conduct overall planning and management through the unified view, can not globally observe which data are synchronized from which to which, and can not check whether the timeliness of the data reaches the standard.
3. When the data quality problem is found by several parties and the number of the upstream and downstream link nodes is large, the problem of which node data can not be positioned quickly.
Disadvantages of the form of transit synchronization through a data warehouse are:
1. and depending on upstream unloading number transmission, data warehouse processing transfer and downstream loading and warehousing, manual intervention is needed when a program in a certain link is in error.
2. The occurrence of data problems can only be handled and accountability is followed, and effective pre-alarming and prevention cannot be achieved.
For some important data, the data is not regularly checked by external system supervision data, and if the data problem is found and the reason for reasonable interpretation is not found, the data is subjected to external notification and fine. Such as customer number, certificate type, business license number, certificate start and expiration date, settlement account number, settlement account name, etc., which typically extend through many systems upstream and downstream, and once a system in the middle fails or is attacked to cause data to be tampered or scrambled, all downstream data synchronization will be affected. If these data are all synchronized directly upstream and downstream between systems, once such problems occur, the data are very difficult to trace back and roll back from each node; if the data warehouse transit synchronization mode is adopted, the problems occur, even if the system influence caused by the problems can be reduced from the view of architecture optimization, the problems can only be treated in an emergency mode afterwards and the occurrence of the events can not be avoided or reduced from the root because of the unprecedented manual operation and maintenance.
Disclosure of Invention
The embodiment of the invention provides a data synchronization method, which is used for reducing manual intervention programs, rapidly positioning and early warning data, and effectively overall managing different data by providing views, and comprises the following steps:
acquiring a data view to be synchronized; the data view to be synchronized comprises data to be synchronized;
inquiring a data importance grading table according to the field name of the data to be synchronized, and determining the importance level of the data to be synchronized; the data importance ranking table comprises a corresponding relation between field names and importance levels;
determining alarm parameters corresponding to the data to be synchronized according to field names of the data to be synchronized and importance levels of the data to be synchronized, wherein the alarm parameters comprise preset threshold ranges corresponding to different index items; the value of the index item reflects the variation of the data;
calculating the value of at least one index item according to the data to be synchronized;
Determining suspicious data in the data to be synchronized according to the value of at least one index item and a corresponding preset threshold range, and sending alarm information to an upstream system and a downstream system; the alarm information comprises positioning information of suspicious data;
When receiving an instruction for automatic correction determined by an upstream system and a downstream system according to alarm information, generating multiple alternative views of a data view to be synchronized according to field names of the data to be synchronized and values of at least one index item;
Determining an alternative view from the multiple alternative views, and correcting the data views to be synchronized according to the determined alternative view;
and synchronizing the corrected data view to be synchronized to a downstream system.
The embodiment of the invention also provides a data synchronization device for reducing manual intervention programs, rapidly positioning and early warning data problems, and effectively overall managing different data by providing views, the device comprises:
the acquisition module is used for acquiring a data view to be synchronized; the data view to be synchronized comprises data to be synchronized;
the importance level determining module is used for inquiring the data importance classification table according to the field name of the data to be synchronized and determining the importance level of the data to be synchronized; the data importance ranking table comprises a corresponding relation between field names and importance levels;
The alarm parameter determining module is used for determining alarm parameters corresponding to the data to be synchronized according to field names of the data to be synchronized and importance levels of the data to be synchronized, wherein the alarm parameters comprise preset threshold ranges corresponding to different index items; the value of the index item reflects the variation of the data;
the index item value determining module is used for calculating the value of at least one index item according to the data to be synchronized;
The alarm information sending module is used for determining suspicious data in the data to be synchronized according to the value of at least one index item and a corresponding preset threshold range, and sending alarm information to an upstream system and a downstream system; the alarm information comprises positioning information of suspicious data;
The alternative view generation module is used for generating multiple alternative views of the data view to be synchronized according to the field name of the data to be synchronized and the value of at least one index item when receiving an instruction for automatic correction determined by the upstream system and the downstream system according to the alarm information;
The correction module is used for determining an alternative view from the multiple alternative views and correcting the data views to be synchronized according to the determined alternative views;
and the synchronization module is used for synchronizing the corrected data view to be synchronized to a downstream system.
The embodiment of the invention also provides computer equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the data synchronization method when executing the computer program.
The embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the data synchronization method when being executed by a processor.
Embodiments of the present invention also provide a computer program product comprising a computer program which, when executed by a processor, implements the above-described data synchronization method.
Compared with the technical scheme of direct synchronization of upstream and downstream or transfer synchronization in a data warehouse in the prior art, the embodiment of the invention obtains the data view to be synchronized; the data view to be synchronized comprises data to be synchronized; inquiring a data importance grading table according to the field name of the data to be synchronized, and determining the importance level of the data to be synchronized; the data importance ranking table comprises a corresponding relation between field names and importance levels; determining alarm parameters corresponding to the data to be synchronized according to field names of the data to be synchronized and importance levels of the data to be synchronized, wherein the alarm parameters comprise preset threshold ranges corresponding to different index items; the value of the index item reflects the variation of the data; calculating the value of at least one index item according to the data to be synchronized; determining suspicious data in the data to be synchronized according to the value of at least one index item and a corresponding preset threshold range, and sending alarm information to an upstream system and a downstream system; the alarm information comprises positioning information of suspicious data; when receiving an instruction for automatic correction determined by an upstream system and a downstream system according to alarm information, generating multiple alternative views of a data view to be synchronized according to field names of the data to be synchronized and values of at least one index item; determining an alternative view from the multiple alternative views, and correcting the data views to be synchronized according to the determined alternative view; the corrected views of the data to be synchronized are synchronized to a downstream system, so that manual intervention programs can be reduced, the data problems can be rapidly positioned and alarmed in advance, and different data can be effectively comprehensively managed by providing views.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:
FIG. 1 is a flow chart of a data synchronization method in an embodiment of the invention;
FIG. 2 is a flowchart of a specific example of a data synchronization method according to an embodiment of the present invention;
FIG. 3 is a block diagram illustrating a data synchronization apparatus according to an embodiment of the present invention;
FIG. 4 is a block diagram showing a specific example of a data synchronization apparatus according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a computer device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings. The exemplary embodiments of the present invention and their descriptions herein are for the purpose of explaining the present invention, but are not to be construed as limiting the invention.
The technical scheme of the application obtains, stores, uses, processes and the like the data, which all meet the relevant regulations of national laws and regulations.
In order to solve the problems of the prior art, reduce manual intervention programs in the data synchronization process, quickly locate and advance alarm data, and effectively overall manage different data by providing views, the embodiment of the invention provides a data synchronization method. Fig. 1 is a flowchart of a data synchronization method in an embodiment of the present invention, as shown in fig. 1, the data synchronization method in an embodiment of the present invention may include:
step 101, obtaining a data view to be synchronized; the data view to be synchronized comprises data to be synchronized;
Step 102, inquiring a data importance grading table according to field names of data to be synchronized, and determining importance levels of the data to be synchronized; the data importance ranking table comprises a corresponding relation between field names and importance levels;
Step 103, determining alarm parameters corresponding to the data to be synchronized according to field names of the data to be synchronized and importance levels of the data to be synchronized, wherein the alarm parameters comprise preset threshold ranges corresponding to different index items; the value of the index item reflects the variation of the data;
104, calculating the value of at least one index item according to the data to be synchronized;
step 105, determining suspicious data in the data to be synchronized according to the value of at least one index item and a corresponding preset threshold range, and sending alarm information to an upstream system and a downstream system; the alarm information comprises positioning information of suspicious data;
Step 106, when receiving the automatic correction instruction determined by the upstream system and the downstream system according to the alarm information, generating multiple alternative views of the data view to be synchronized according to the field name of the data to be synchronized and the value of at least one index item;
Step 107, determining an alternative view from the multiple alternative views, and correcting the data views to be synchronized according to the determined alternative view;
and step 108, synchronizing the corrected data view to be synchronized to a downstream system.
Fig. 2 is a flowchart of a specific example of a data synchronization method in an embodiment of the present invention, as shown in fig. 2, in an embodiment, the method in the embodiment of the present invention generates multiple alternative views intelligently because accuracy of a single piece of data cannot be guaranteed. For important data, various processing methods can be adopted, such as clearing 0, emptying, taking yesterday data, taking an average value in a certain time range, removing the whole record, and the like. Permutation and combination of the differently processed data will yield a very large number of alternative views. When the suspicious data is detected and an alarm occurs, the upstream system can choose to ignore the alarm and maintain the data as is, or choose one of the generated alternative views to trigger synchronization again, and the downstream system can receive the risk prompt of the suspicious data, confirm to receive the modified data, choose to reject the receiving, and discard the data in the current period to maintain the data as is. The overall scheme architecture of the embodiment of the present invention is shown in table 1 below.
TABLE 1
In one embodiment, in step 101, the view of the data to be synchronized may be obtained, which is generated by the upstream system according to the data to be synchronized, or may be a view of the data to be synchronized generated after the system based on the data synchronization method in the embodiment of the present invention obtains the data to be synchronized sent by the upstream system.
For some important data, such as client numbers, certificate types, business license numbers, certificate start and expiration dates, settlement accounts, settlement account names, etc., the data usually passes through a plurality of systems on the upstream side and the downstream side, and once a certain system in the middle fails or is attacked, the data is tampered or scrambled, so that the synchronization of all downstream data can be affected. If these data are all synchronized directly upstream and downstream between systems, once such problems occur, it is very difficult for the data to trace back and roll back from each node. Therefore, the embodiment of the invention classifies the received data to be synchronized in the data view to be synchronized according to the importance of the data.
The data to be synchronized according to the embodiment of the invention can be divided into: very important, generally important, common, slight. It is very important: including data that requires reporting of external administration, including account amounts, transaction arrangements, transaction orders, customer personal information, and the like. Such information is generally not capable of error, and if such suspicious data is detected from upstream, important alarm information needs to be generated to inform the upstream of confirmation, and also make important risk prompts to downstream. If the error is prevented from happening in advance, the error needs to be remedied afterwards, but the processing response must be timely. It is generally important: including important business process streamlines, business result credentials, etc. Such information is relatively less important, can tolerate errors, generates general important alert information if such suspicious data is detected from upstream, informs upstream of the confirmation, and also prompts downstream risk, and can be post-processed if errors are confirmed. Common: including running the normal business operation process, text and pictures issued by clients, etc. Such information, if erroneous, gives a general alarm, confirming if processing is required as the case may be. Negligible: including various redundant backup copy data, browsing history, customer comment data, etc. Such information, if erroneous, gives a slight warning, is essentially negligible and generally does not require processing.
In this embodiment, in step 102, the importance level of the data to be synchronized may be determined by querying the data importance classification table according to the field name of the data to be synchronized; the data importance ranking table may include a correspondence between field names and importance levels, and table 2 is an embodiment of the data importance ranking table provided in the present embodiment, which is a client AUM (normal deposit+various financial) data table of a system to be batch-synchronized to a data warehouse, and as shown in table 2 below, the embodiment of the present invention divides data to be synchronized into: 5-very important, 4-more important, 3-important, 2-general, 1-minor, other numbers of importance levels may also be provided, as the invention is not limited in this regard.
TABLE 2
The importance level of the data to be synchronized according to the importance of the data, such as the field name of the card number, the current actual deposit, as in table 2 is configured as level 5, i.e., a very important level, for the card number: the card number is a precondition for all checks, which will fail once they fail, but is generally not modifiable as a primary key. For the current actual deposit: current deposits are typically used as real-time funds transactions, whether more or less, have a significant impact on the actual funds transactions and clearing results.
The importance location of the customer name and the mobile phone number is 4-important. For the customer name: the importance level is not high with a card number, and is usually used to check whether the transaction is carried out by the current cardholder or not by matching with the card number, and only the customer changes when the name is modified. For a mobile phone number: the card number and the customer name are matched to check whether the transaction occurs to the current cardholder, and the change can be only performed when the customer replaces the mobile phone number.
The importance location of the month and day average AUM (Asser Under Managerment, asset management scale), the year and day average AUM, and the day end AUM is 3-important, and the statistical data is usually used for generating a data analysis report form, generating a customer portrait, locating a customer grade, and having relatively lower importance.
The importance positioning of the deposit amount of the day end, the day end financing 1, the day end financing 2 and the day end AUM is 2-general, the day end statistics are not used as real-time funds transaction, other fields are usually calculated as basic combination items, and the importance is relatively lower.
The importance location of the descriptive remarks is 1-secondary, and the text remark fields are usually reference remark information manually recorded by service personnel, have no practical service significance, and are secondary.
In one embodiment, in step 103, according to the field name of the data to be synchronized and the importance level of the data to be synchronized, an alarm parameter corresponding to the data to be synchronized is determined, where the alarm parameter may include a preset threshold range corresponding to different index items; the value of the index item reflects the amount of change in the data.
The data comparison of numerical classes is generally divided into a longitudinal class and a transverse class, wherein the longitudinal class is the comparison of the same data at different moments, and the transverse class is the comparison of the different data at the same moment. Since the transverse data are very much and are usually used for direct comparison and are not meaningful, the transverse data are classified by a clustering algorithm and then compared in the same class, so that the transverse index item of any data in the embodiment of the invention refers to the data of other clients which are the same as the field name of the data and correspond to the clients except the data.
In this embodiment, the index items may include a longitudinal index item and a transverse index item; the value of the longitudinal index item reflects the variation of the data to be synchronized compared with the historical data in the preset time range; the value of the transverse index item reflects the variation of the data to be synchronized compared with the data with the same field name; the values of the longitudinal index entries may include: at least one of a longitudinal mean value, a longitudinal absolute value mean value, a longitudinal difference value, a longitudinal absolute difference value, a longitudinal maximum value, and a longitudinal maximum absolute value; the values of the lateral index entries may include: at least one of a transverse mean value, a transverse absolute value mean value, a transverse difference value, a transverse absolute difference value, a transverse maximum value and a transverse maximum absolute value. The values of the longitudinal index entries may also include: longitudinal contemporaneous approximation; the values of the lateral index entries may also include: lateral contemporaneous approximations. The longitudinal mean value, the longitudinal absolute value mean value, the longitudinal difference value, the longitudinal absolute difference value, the longitudinal maximum value and the longitudinal maximum absolute value are respectively the mean value, the absolute value mean value, the difference value, the absolute difference value, the maximum value and the maximum absolute value of historical data of the data to be synchronized within a preset time range. The meaning of the same period in the longitudinal synchronous approximation and the transverse synchronous approximation is that a certain time period interval is selected, and the approximation can be directly used as a mean value, or the mean value is taken and then is accurate to a certain digit, or the mean value is taken after outlier data is eliminated, and according to different actual business meanings of fields, the mode, the median and the like can also be used.
In this embodiment, in step 104, a value of at least one index item may be calculated according to the data to be synchronized. Taking the final day deposit amount as an example, for the vertical average of the data to be synchronized with the field name "final day deposit amount" in table 1, the average of the day of deposit of one month in the past for the client corresponding to the data to be synchronized may be the average of the day of deposit of one month in case that the average of the day of deposit of one month in the client a is 1 ten thousand yuan, the deposit of one day becomes 50 ten thousand yuan, and the suspicious data may be determined in the data to be synchronized according to 1 ten thousand yuan and the corresponding preset threshold range. The transverse average value, the transverse absolute value average value, the transverse difference value, the transverse absolute difference value, the transverse maximum value and the transverse maximum absolute value are respectively the average value, the absolute value average value, the difference value, the absolute difference value, the maximum value and the maximum absolute value of the data of other clients except the client corresponding to the data to be synchronized, wherein the field name of the data to be synchronized is the same as that of the data to be synchronized.
In this embodiment, after classifying the data by using the K-Means clustering algorithm, the K centroids of the K-Means are the average value of each class, and we can obtain the transverse average value in the values of the index item. For example, for a customer AUM, subjective assumptions cannot be used to analyze which gear the customer AUM has reached, which group it belongs to. K-Means is a common unsupervised learning clustering algorithm, and it may be determined that the client AUM is classified into K classes, for example, k=4 is specified here, and the total of 1000 pieces of data in the transverse direction is assumed, so that they are classified into 4 classes.
The K-Means clustering method may be as follows:
1. randomly selecting K points in the sample to serve as initial center points of each category;
2. Calculating the distances from all samples to the K initial center points, comparing the distances respectively, selecting the nearest distance, and classifying the samples into the categories of the initial center points, namely dividing the samples into K categories in total;
3. discarding the original initial center points, and respectively calculating new center points in the divided K categories so that the sum of the distances between the center points and all samples in the category is minimum;
4. Judging whether the newly obtained center point is the same as the old center point, if not, returning to the step 2, recalculating the distances between all samples and the K new center points, comparing, selecting the nearest distance from the K new center points and classifying the nearest distance into the new center point category, and continuing the following steps; and as such, is done, i.e., convergence.
In this embodiment, the value of each index item has a corresponding calculation method, and is a common algorithm in the prior art, which is not described herein.
In one embodiment, in step 105, suspicious data is determined in the data to be synchronized according to the value of at least one index item and a corresponding preset threshold range, and alarm information is sent to an upstream system and a downstream system; the alert information may include location information for the suspicious data.
In this embodiment, determining suspicious data in the data to be synchronized according to the value of at least one index item and the corresponding preset threshold range may include: and determining that the data to be synchronized is suspicious when the value of a single index item of the data to be synchronized exceeds a corresponding preset threshold range or the value of all index items in any combination of the values of a plurality of index items of the data to be synchronized exceeds a corresponding preset threshold range.
In this embodiment, the data synchronization method may further include: determining that the data to be synchronized, of which the field names are the card number, the client name, the date or the mobile phone number, are changed, is suspicious data; the data to be synchronized, which is a field name of remark and includes an irregular encoding, is determined to be suspicious data.
In this embodiment, generating multiple candidate views of the view of the data to be synchronized according to the field name of the data to be synchronized and the value of at least one index item may include: and combining the values of one index item corresponding to different field names in the data to be synchronized to generate an alternative view, wherein the values of a plurality of index items correspondingly generate a plurality of alternative views. Specific examples are shown in table 3 below. The higher the importance level of the data is, the lower the alarm threshold is, and the easier the alarm is triggered; the lower the data importance level, the higher the alarm threshold, and the less likely the alarm is triggered.
In this embodiment, whether to issue the alarm information may be determined according to the following table 3, as shown in table 3, taking the amount of the final deposit per day as an example, if the average value of the deposit per day after the user a has passed one month is 1 ten thousand yuan, the deposit per day becomes 10 ten thousand yuan; the average value of the deposit day of the past month of the user B is 10 ten thousand yuan, and the deposit day is changed into 100 ten thousand yuan; the average daily deposit value of the user C in the past month is 100 ten thousand yuan, and the daily deposit is 1000 ten thousand yuan. The deposit of the three is changed into the original 10 times from the numerical value, but the deposit of the user A is only 9 ten thousand, the deposit of the user B is 90 ten thousand, the deposit of the user C is 900 ten thousand, the analysis is carried out on the probability and rationality of reality, A > B > C, A is a reasonable condition with high probability, the probability of B is very low, and the probability of C is very low. It may be guessed that a might be a final annual prize, that B might be a car sale, and that C might be a house sale (either an improper transaction or an upstream data error). The occurrence of events with low probability should be of great concern, where an alarm parameter or probability threshold alarm function may be designed to trigger alarm information when events below this probability occur. Different preset threshold ranges can be set for different fields by different algorithms according to the characteristics of the fields, and conventional items comprise longitudinal mean value comparison, longitudinal maximum difference value comparison, transverse mean value comparison, transverse maximum difference value comparison and the like. For example, for deposit, we can use longitudinal mean value comparison to design a piecewise function of interval range 0-X 1,X1-X2……Xn-1-Xn, for a interval with average daily amount of 0-100, deposit becomes 1000 times or more of average daily amount as low probability event, 100-1 ten thousand intervals become 100 times or more as low probability event, 1 ten thousand-2 ten thousand intervals become 90 times or more as low probability event, 2 ten thousand-3 ten thousand intervals become 80 times or more as low probability event, so as to push fitting coefficient according to practical application scene and history condition, the larger average daily amount is, the lower probability of doubling deposit is, for example, when reaching 1000 ten thousand gear becomes 1.5 times, it can be judged as low probability event.
Assuming that the alert triggered by user C is simply an alert of the vertical mean, the level of importance and alert parameters for whether the final alert is triggered depends on the field name. For example, the importance level of the final daily deposit is 2-general, the alarm threshold is high, and the final alarm is triggered only when a plurality of index item alarms are met simultaneously.
TABLE 3 Table 3
In one embodiment, multiple alternative views of the view of the data to be synchronized may be generated based on the field name of the data to be synchronized and the value of at least one index item.
For example, if the data view to be synchronized includes only the average lunar-daily AUM and the final daily deposit amount, the values of all the index items of the average lunar-daily AUM and the final daily deposit amount may be calculated, for example, the average longitudinal value of the average lunar-daily AUM is a1, the average longitudinal absolute value is a2, the longitudinal difference value a3, the longitudinal absolute difference values a4 and … … are calculated, the average longitudinal average daily deposit amount is b1, the average longitudinal absolute value is b2, the longitudinal difference value b3, the longitudinal absolute difference values b4 and … … are calculated, and the values of a1, a2, a3, a4 and … … are respectively combined with b1, b2, b3, b4 and … … to generate multiple alternative views of the data view to be synchronized.
In this embodiment, generating multiple candidate views of the view of the data to be synchronized according to the field name of the data to be synchronized and the value of at least one index item may further include: performing data fitting on the values of a plurality of index items corresponding to different field names in the data to be synchronized by using a preset fitting algorithm to obtain fitting values; and combining fitting values corresponding to different field names in the data to be synchronized to generate an alternative view, wherein a plurality of fitting values correspondingly generate a plurality of alternative views.
For example, a1, a2, a3, a4, … … in the previous example are fitted to the average monthly and daily AUM in the data to be synchronized by using a preset fitting algorithm to obtain a fitting value a, b1, b2, b3, b4, … … in the previous example are fitted to the final daily deposit amount in the data to be synchronized by using a preset fitting algorithm to obtain a fitting value b, and a and b are combined to generate an alternative view of the data view to be synchronized.
In one embodiment, the alternative views may also be generated with reference to the importance level of the data, as shown in Table 4 below, the higher the importance level of the data, the finer the algorithm for alternative view generation, and the fewer the number of alternative views. The lower the level of importance of the data, the more diverse the algorithms for alternative view generation, and the greater the number of selectable views.
TABLE 4 Table 4
In one embodiment, the data synchronization method may further include: if the alternative view is generated by combining the values of one longitudinal index item corresponding to different field names in the data to be synchronized, fitting the real data of the suspicious data according to the history data of the suspicious data within a preset time range, and generating a fitting function of the data to be synchronized in the alternative view; smoothing the fitting function by using the value of the longitudinal index item; if the alternative view is generated by combining the values of one transverse index item corresponding to different field names in the data to be synchronized, fitting the real data of the suspicious data according to the data which is the same as the field names of the suspicious data and corresponds to other clients outside the guest corresponding to the suspicious data within a preset time range, and generating a fitting function of the data to be synchronized in the alternative view; and smoothing the fitting function by using the value of the transverse index item.
For example, single index 1: the fitting function of the user is derived from past end-of-day deposits and vertical mean fits, possibly with an approximately linearly increasing straight line y=ax+b, e.g. 101 ten thousand yesterday end-of-day deposits, 101.5 ten thousand today, for which 101.5 ten thousand view data are generated for the user today's end-of-day deposits. Single index 2: after the AUM is classified by the K-Means clustering algorithm, the K centroids of the K-Means are the average value of each class, and the value of an index item of a transverse average value can be obtained. The fitting function is the process of calculating the centroid of the corresponding category by the K-Means, for example, the month and day average AUM of the category to which the user belongs is 200 ten thousand, and the deposit average is 100 ten thousand. Fitting functions may be generated in the form of two-dimensional coordinates, with the x-axis being time and the y-axis being the end of day deposit, since the end of day deposit does not run perfectly according to this rule every day, it is possible that the first day 1, the second day 2000, the third day 10, the fourth day 50000, the data may be smoothed using a near n-day longitudinal mean, combining the n discrete points into one point.
For the candidate view generated according to the fitting value, for example, the fitting value obtained by fitting the longitudinal mean value and the transverse mean value is a two-dimensional index, the index value is that the longitudinal comparison of the data of the index itself is made, the transverse comparison of the similar data is also made, the longitudinal optimal value 101.5 ten thousand and the transverse similar optimal value 100 ten thousand are obtained, the data analysis and the data modeling can be continuously made, and the comprehensive fitting function suitable for the two indexes is fitted. Because the time complexity cost of calculating the high-dimension combined index is relatively high, the valuable single index item can be used for two-two combination or three-three combination.
In one embodiment, determining an alternative view from the multiple alternative views, and correcting the data view to be synchronized according to the determined alternative view may include: respectively carrying out data matching analysis on historical data of suspicious data in a preset time range and fitting functions of data to be synchronized in a plurality of candidate views, and determining a plurality of matched data amounts; and correcting the view of the data to be synchronized by utilizing the alternative view corresponding to the fitting function with the largest matching data quantity.
For example, by performing data matching offline analysis on the history data of suspicious data (more recent data is more valuable) and the fitting function of the data to be synchronized in the candidate view, for example, 70% of the data rule in the history data accords with the fitting function of the candidate view 1, 80% accords with the fitting function of the candidate view 2, and 90% accords with the fitting function of the candidate view 3, then the candidate view 3 can be used as an automatically corrected view.
In one embodiment, the data synchronization method may further include: receiving correction data determined by an upstream system and a downstream system according to alarm information; and correcting the data view to be synchronized according to the correction data.
In this embodiment, the data correction may be performed with reference to the importance level of the data, and as shown in table 5 below, the higher the importance level of the data, the higher the condition for automatic correction of the data. The lower the importance level of the data, the lower the condition for automatic correction of the data. If the upstream system and the downstream system are set to be automatically corrected, respectively carrying out data matching analysis on historical data of suspicious data in a preset time range and fitting functions of data to be synchronized in a plurality of alternative views, and determining a plurality of matched data amounts; and automatically correcting the view of the data to be synchronized by utilizing the alternative view corresponding to the fitting function with the largest matching data quantity. If the upstream system and the downstream system are not set to automatically correct, correcting the data view to be synchronized after manually inputting correction data.
TABLE 5
Compared with the technical scheme of direct synchronization of upstream and downstream or transfer synchronization in a data warehouse in the prior art, the embodiment of the invention obtains the data view to be synchronized; the data view to be synchronized comprises data to be synchronized; inquiring a data importance grading table according to the field name of the data to be synchronized, and determining the importance level of the data to be synchronized; the data importance ranking table comprises a corresponding relation between field names and importance levels; determining alarm parameters corresponding to the data to be synchronized according to field names of the data to be synchronized and importance levels of the data to be synchronized, wherein the alarm parameters comprise preset threshold ranges corresponding to different index items; the value of the index item reflects the variation of the data; calculating the value of at least one index item according to the data to be synchronized; determining suspicious data in the data to be synchronized according to the value of at least one index item and a corresponding preset threshold range, and sending alarm information to an upstream system and a downstream system; the alarm information comprises positioning information of suspicious data; when receiving an instruction for automatic correction determined by an upstream system and a downstream system according to alarm information, generating multiple alternative views of a data view to be synchronized according to field names of the data to be synchronized and values of at least one index item; determining an alternative view from the multiple alternative views, and correcting the data views to be synchronized according to the determined alternative view; the corrected views of the data to be synchronized are synchronized to a downstream system, so that manual intervention programs can be reduced, the data problems can be rapidly positioned and alarmed in advance, and different data can be effectively comprehensively managed by providing views.
The embodiment of the invention also provides a data synchronization device, which is described in the following embodiment. Because the principle of the device for solving the problem is similar to that of the data synchronization method, the implementation of the device can refer to the implementation of the data synchronization method, and the repetition is omitted.
Fig. 3 is a block diagram of a data synchronization device according to an embodiment of the present invention, and as shown in fig. 3, the data synchronization device may include:
An acquiring module 301, configured to acquire a data view to be synchronized; the data view to be synchronized comprises data to be synchronized;
The importance level determining module 302 is configured to query a data importance level table according to a field name of the data to be synchronized, and determine an importance level of the data to be synchronized; the data importance ranking table comprises a corresponding relation between field names and importance levels;
The alarm parameter determining module 303 is configured to determine an alarm parameter corresponding to the data to be synchronized according to a field name of the data to be synchronized and an importance level of the data to be synchronized, where the alarm parameter includes preset threshold ranges corresponding to different index items; the value of the index item reflects the variation of the data;
An index item value determining module 304, configured to calculate a value of at least one index item according to the data to be synchronized;
the alarm information sending module 305 is configured to determine suspicious data in the data to be synchronized according to the value of at least one index item and a corresponding preset threshold range, and send alarm information to an upstream system and a downstream system; the alarm information comprises positioning information of suspicious data;
The candidate view generating module 306 is configured to generate multiple candidate views of the data view to be synchronized according to the field name of the data to be synchronized and the value of at least one index item when receiving the instruction for automatic correction determined by the upstream system and the downstream system according to the alarm information;
A correction module 307, configured to determine an alternative view from the multiple alternative views, and correct the data views to be synchronized according to the determined alternative view;
the synchronization module 308 is configured to synchronize the corrected data view to be synchronized to a downstream system.
In one embodiment, the index items include a longitudinal index item and a transverse index item; the value of the longitudinal index item reflects the variation of the data to be synchronized compared with the historical data in a preset time range; the value of the transverse index item reflects the variation of the data to be synchronized compared with the data with the same field name;
The values of the longitudinal index items include: at least one of a longitudinal mean value, a longitudinal absolute value mean value, a longitudinal difference value, a longitudinal absolute difference value, a longitudinal maximum value, and a longitudinal maximum absolute value;
the values of the lateral index term include: at least one of a transverse mean value, a transverse absolute value mean value, a transverse difference value, a transverse absolute difference value, a transverse maximum value and a transverse maximum absolute value.
In this embodiment, the alternative view generation module 306 is specifically configured to:
And combining the values of one index item corresponding to different field names in the data to be synchronized to generate an alternative view, wherein the values of a plurality of index items correspondingly generate a plurality of alternative views.
In this embodiment, the alternative view generation module 306 is specifically configured to:
performing data fitting on the values of a plurality of index items corresponding to different field names in the data to be synchronized by using a preset fitting algorithm to obtain fitting values;
And combining fitting values corresponding to different field names in the data to be synchronized to generate an alternative view, wherein a plurality of fitting values correspondingly generate a plurality of alternative views.
As shown in fig. 4, in one embodiment, the data synchronization device may further include: a fitting function generating module 401, configured to:
If the alternative view is generated by combining the values of one longitudinal index item corresponding to different field names in the data to be synchronized, fitting the real data of the suspicious data according to the history data of the suspicious data within a preset time range, and generating a fitting function of the data to be synchronized in the alternative view; smoothing the fitting function by using the value of the longitudinal index item;
If the alternative view is generated by combining the values of one transverse index item corresponding to different field names in the data to be synchronized, fitting the real data of the suspicious data according to the data which is the same as the field names of the suspicious data and corresponds to other clients outside the guest corresponding to the suspicious data within a preset time range, and generating a fitting function of the data to be synchronized in the alternative view; and smoothing the fitting function by using the value of the transverse index item.
In one embodiment, the correction module 307 is specifically configured to:
Respectively carrying out data matching analysis on historical data of suspicious data in a preset time range and fitting functions of data to be synchronized in a plurality of candidate views, and determining a plurality of matched data amounts;
and correcting the view of the data to be synchronized by utilizing the alternative view corresponding to the fitting function with the largest matching data quantity.
In one embodiment, the alarm information issuing module 305 is specifically configured to:
And determining that the data to be synchronized is suspicious when the value of a single index item of the data to be synchronized exceeds a corresponding preset threshold range or the value of all index items in any combination of the values of a plurality of index items of the data to be synchronized exceeds a corresponding preset threshold range.
As shown in fig. 4, in one embodiment, the data synchronization device may further include: a suspicious data determination module 402 for:
Determining that the data to be synchronized, of which the field names are the card number, the client name, the date or the mobile phone number, are changed, is suspicious data;
the data to be synchronized, which is a field name of remark and includes an irregular encoding, is determined to be suspicious data.
In one embodiment, the data synchronization device may further include: a manual correction module 403, configured to:
receiving correction data determined by an upstream system and a downstream system according to alarm information;
and correcting the data view to be synchronized according to the correction data.
Based on the foregoing inventive concept, as shown in fig. 5, the present invention further proposes a computer device 500, including a memory 510, a processor 520, and a computer program 530 stored on the memory 510 and executable on the processor 520, where the processor 520 implements the foregoing data synchronization method when executing the computer program 530.
Based on the foregoing inventive concept, the present invention proposes a computer-readable storage medium storing a computer program which, when executed by a processor, implements the aforementioned data synchronization method.
Based on the foregoing inventive concept, the present invention proposes a computer program product comprising a computer program which, when executed by a processor, implements a data synchronization method.
The technical effects of the embodiment of the invention are as follows:
1. and intelligently generating a data view, and prescribing medicines according to different data.
2. And a data intelligent monitoring mechanism, wherein risk prompt early warning is performed in advance.
3. The data is manually reconfirmed and the multi-view scheme is selected.
4. And the data is triggered conveniently and synchronously, so that manual intervention procedures are reduced.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (13)

1. A method of data synchronization, comprising:
acquiring a data view to be synchronized; the data view to be synchronized comprises data to be synchronized;
inquiring a data importance grading table according to the field name of the data to be synchronized, and determining the importance level of the data to be synchronized; the data importance ranking table comprises a corresponding relation between field names and importance levels;
determining alarm parameters corresponding to the data to be synchronized according to field names of the data to be synchronized and importance levels of the data to be synchronized, wherein the alarm parameters comprise preset threshold ranges corresponding to different index items; the value of the index item reflects the variation of the data;
calculating the value of at least one index item according to the data to be synchronized;
Determining suspicious data in the data to be synchronized according to the value of at least one index item and a corresponding preset threshold range, and sending alarm information to an upstream system and a downstream system; the alarm information comprises positioning information of suspicious data;
When receiving an instruction for automatic correction determined by an upstream system and a downstream system according to alarm information, generating multiple alternative views of a data view to be synchronized according to field names of the data to be synchronized and values of at least one index item;
Determining an alternative view from the multiple alternative views, and correcting the data views to be synchronized according to the determined alternative view;
and synchronizing the corrected data view to be synchronized to a downstream system.
2. The method of claim 1, wherein the index items comprise a longitudinal index item and a transverse index item; the value of the longitudinal index item reflects the variation of the data to be synchronized compared with the historical data in a preset time range; the value of the transverse index item reflects the variation of the data to be synchronized compared with the data with the same field name;
The values of the longitudinal index items include: at least one of a longitudinal mean value, a longitudinal absolute value mean value, a longitudinal difference value, a longitudinal absolute difference value, a longitudinal maximum value, and a longitudinal maximum absolute value;
the values of the lateral index term include: at least one of a transverse mean value, a transverse absolute value mean value, a transverse difference value, a transverse absolute difference value, a transverse maximum value and a transverse maximum absolute value.
3. The method of claim 2, wherein generating multiple candidate views of the view of the data to be synchronized based on the field names of the data to be synchronized and the value of the at least one index item comprises:
And combining the values of one index item corresponding to different field names in the data to be synchronized to generate an alternative view, wherein the values of a plurality of index items correspondingly generate a plurality of alternative views.
4. The method of claim 3, wherein generating multiple candidate views of the view of the data to be synchronized based on the field names of the data to be synchronized and the value of the at least one index entry, further comprises:
performing data fitting on the values of a plurality of index items corresponding to different field names in the data to be synchronized by using a preset fitting algorithm to obtain fitting values;
And combining fitting values corresponding to different field names in the data to be synchronized to generate an alternative view, wherein a plurality of fitting values correspondingly generate a plurality of alternative views.
5. A method as recited in claim 3, further comprising:
If the alternative view is generated by combining the values of one longitudinal index item corresponding to different field names in the data to be synchronized, fitting the real data of the suspicious data according to the history data of the suspicious data within a preset time range, and generating a fitting function of the data to be synchronized in the alternative view; smoothing the fitting function by using the value of the longitudinal index item;
If the alternative view is generated by combining the values of one transverse index item corresponding to different field names in the data to be synchronized, fitting the real data of the suspicious data according to the data which is the same as the field names of the suspicious data and corresponds to other clients outside the guest corresponding to the suspicious data within a preset time range, and generating a fitting function of the data to be synchronized in the alternative view; and smoothing the fitting function by using the value of the transverse index item.
6. The method of claim 5, wherein determining an alternate view from the plurality of alternate views, and modifying the view of the data to be synchronized based on the determined alternate view, comprises:
Respectively carrying out data matching analysis on historical data of suspicious data in a preset time range and fitting functions of data to be synchronized in a plurality of candidate views, and determining a plurality of matched data amounts;
and correcting the view of the data to be synchronized by utilizing the alternative view corresponding to the fitting function with the largest matching data quantity.
7. The method of claim 1, wherein determining suspicious data among the data to be synchronized based on the value of the at least one indicator term and the corresponding preset threshold range comprises:
And determining that the data to be synchronized is suspicious when the value of a single index item of the data to be synchronized exceeds a corresponding preset threshold range or the value of all index items in any combination of the values of a plurality of index items of the data to be synchronized exceeds a corresponding preset threshold range.
8. The method as recited in claim 1, further comprising:
Determining that the data to be synchronized, of which the field names are the card number, the client name, the date or the mobile phone number, are changed, is suspicious data;
the data to be synchronized, which is a field name of remark and includes an irregular encoding, is determined to be suspicious data.
9. The method as recited in claim 1, further comprising:
receiving correction data determined by an upstream system and a downstream system according to alarm information;
and correcting the data view to be synchronized according to the correction data.
10. A data synchronization device, comprising:
the acquisition module is used for acquiring a data view to be synchronized; the data view to be synchronized comprises data to be synchronized;
the importance level determining module is used for inquiring the data importance classification table according to the field name of the data to be synchronized and determining the importance level of the data to be synchronized; the data importance ranking table comprises a corresponding relation between field names and importance levels;
The alarm parameter determining module is used for determining alarm parameters corresponding to the data to be synchronized according to field names of the data to be synchronized and importance levels of the data to be synchronized, wherein the alarm parameters comprise preset threshold ranges corresponding to different index items; the value of the index item reflects the variation of the data;
the index item value determining module is used for calculating the value of at least one index item according to the data to be synchronized;
The alarm information sending module is used for determining suspicious data in the data to be synchronized according to the value of at least one index item and a corresponding preset threshold range, and sending alarm information to an upstream system and a downstream system; the alarm information comprises positioning information of suspicious data;
The alternative view generation module is used for generating multiple alternative views of the data view to be synchronized according to the field name of the data to be synchronized and the value of at least one index item when receiving an instruction for automatic correction determined by the upstream system and the downstream system according to the alarm information;
The correction module is used for determining an alternative view from the multiple alternative views and correcting the data views to be synchronized according to the determined alternative views;
and the synchronization module is used for synchronizing the corrected data view to be synchronized to a downstream system.
11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1 to 9 when executing the computer program.
12. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, implements the method of any one of claims 1 to 9.
13. A computer program product, characterized in that the computer program product comprises a computer program which, when executed by a processor, implements the method of any one of claims 1 to 9.
CN202410478482.5A 2024-04-19 2024-04-19 Data synchronization method and device Pending CN118364028A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410478482.5A CN118364028A (en) 2024-04-19 2024-04-19 Data synchronization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410478482.5A CN118364028A (en) 2024-04-19 2024-04-19 Data synchronization method and device

Publications (1)

Publication Number Publication Date
CN118364028A true CN118364028A (en) 2024-07-19

Family

ID=91887316

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410478482.5A Pending CN118364028A (en) 2024-04-19 2024-04-19 Data synchronization method and device

Country Status (1)

Country Link
CN (1) CN118364028A (en)

Similar Documents

Publication Publication Date Title
CN110751371B (en) Commodity inventory risk early warning method and system based on statistical four-bit distance and computer readable storage medium
US8121962B2 (en) Automated entity identification for efficient profiling in an event probability prediction system
US20200051176A1 (en) Analysis system
WO2022155740A1 (en) Systems and methods for outlier detection of transactions
CA3203127A1 (en) Transaction data processing systems and methods
CN114201201A (en) Method, device and equipment for detecting abnormity of business system
US20230195715A1 (en) Systems and methods for detection and correction of anomalies priority
US11663658B1 (en) Assessing the presence of selective omission via collaborative counterfactual interventions
CN118037469B (en) Financial management system based on big data
CN107784578B (en) Bank foreign exchange data supervision method and device
CN104487942A (en) Event correlation
CN118364028A (en) Data synchronization method and device
CN116228431A (en) Abnormal transaction account determination method and device and electronic equipment
CN116308843A (en) Financial funds management method
WO2023121848A1 (en) Deduplication of accounts using account data collision detected by machine learning models
CN115775188A (en) Asset financial integrated system
CN113297146A (en) Processing model and method for local supervision submission data
CN108280151B (en) Method and system for monitoring data cleaning quality
CN117332212B (en) Intelligent risk exploration method and system based on chaotic engineering fault experiment
US12056992B2 (en) Identification of anomalies in an automatic teller machine (ATM) network
US12079375B2 (en) Automatic segmentation using hierarchical timeseries analysis
CN109726238B (en) Method and system for cleaning electricity consumption data of different industries in areas
CN118396758A (en) Financial automatic account checking method and device and electronic equipment
US20230224326A1 (en) Phishing detection and mitigation
CN115760342A (en) Capital transaction interface management system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination