CN117931953B

CN117931953B - Heterogeneous database data synchronization method and system

Info

Publication number: CN117931953B
Application number: CN202410331455.5A
Authority: CN
Inventors: 夏何均; 刘刚
Original assignee: Beijing Guqi Data Technology Co ltd
Current assignee: Beijing Guqi Data Technology Co ltd
Priority date: 2024-03-22
Filing date: 2024-03-22
Publication date: 2024-06-04
Anticipated expiration: 2044-03-22
Also published as: CN117931953A

Abstract

The invention discloses a method and a system for synchronizing heterogeneous database data, which relate to the technical field of databases and comprise the steps of sending an instruction to an adapter unit of a target database through a synchronous control center so as to extract a data set to be synchronized; adopting an advanced preprocessing strategy and a joint learning model to mine decision guiding characteristics of data; setting a synchronous mark on a source database data item, and maintaining by using a synchronous marker ST; performing security detection and classification on the synchronous data items, and executing corresponding security protection measures according to the data sensitivity level; based on decision-directed features, data sensitivity and current network state, formulating and executing a data transmission policy; and executing the integrity verification of the synchronous result, and automatically updating the synchronous operation record according to the verification result. The invention greatly improves the speed of data extraction and synchronization through dynamic multidimensional judgment rules and advanced preprocessing strategies.

Description

Heterogeneous database data synchronization method and system

Technical Field

The invention relates to the technical field of databases, in particular to a method and a system for synchronizing heterogeneous database data.

Background

In the information age of today, data plays an increasingly important role in various industries and fields. As technology advances, more and more organizations and businesses begin to employ a variety of database systems to meet their diverse business needs, which creates a heterogeneous database environment. Heterogeneous databases refer to database systems that run on different platforms under the same network environment, using different data models and storage structures. In such environments, data synchronization presents a challenge, particularly when data needs to be migrated, shared, or integrated between different database systems. The existing data synchronization technology often faces the problems of low efficiency, difficult guarantee of data consistency, safety risk and the like, especially when large-scale data and sensitive information are processed.

In addition, with the popularization of cloud computing and big data technology, data synchronization is not only as simple as data replication, but also involves complex demands of real-time property, accuracy, and cross-region efficient transmission of data. The need for data synchronization in heterogeneous database environments is urgent, but existing methods often cannot effectively cope with highly variable network environments, diversified data security requirements, and ever increasing amounts of data.

Disclosure of Invention

The invention is provided in view of the shortcomings of the existing heterogeneous database data synchronization in terms of data consistency, synchronization efficiency, security guarantee and adaptability to network environment.

Therefore, the invention aims to solve the problem of effectively realizing the data synchronization among high-efficiency, safe and reliable heterogeneous databases, simultaneously ensuring the consistency and the integrity of the data and adapting to the change of different network environments.

In order to solve the technical problems, the invention provides the following technical scheme:

In a first aspect, an embodiment of the present invention provides a method for data synchronization of heterogeneous databases, which includes sending, by a synchronization control center SCC, an instruction to an adapter unit AU of a target database to extract a data set to be synchronized; optimizing the data set to be synchronized by using an advanced preprocessing strategy, and mining decision guiding characteristics of the data set by using a joint learning model; setting a synchronous mark on a data item of a source database, and maintaining by using a synchronous marker ST to reflect the latest state of the data; performing security detection and classification on the synchronous data items, judging the data sensitivity level, and implementing corresponding security protection measures; determining a data transmission strategy according to the decision-directed characteristics, the sensitivity level and the current network state, and executing the integrity verification of the synchronous result; and designing and implementing an automatic updating mechanism of the synchronous operation record according to the output of the synchronous result integrity verification.

As a preferred embodiment of the method for synchronizing heterogeneous database data according to the present invention, the method comprises: mining decision directed features of a dataset using a joint learning model includes the steps of: performing format standardization and conversion processing on the data set through a data adapter, and identifying and removing atypical samples by using outlier detection; real-time monitoring the data set by utilizing an ARIMA algorithm and a variation scoring technology, capturing abnormal changes and marking; executing an interactive feature selection process to dynamically identify and iteratively optimize a feature set of decision-directed properties; designing a condition inference mechanism of a neural network based on a decision tree algorithm or a fused attention mechanism to guide feature selection and priority allocation; developing a characteristic extraction framework which is adaptively calibrated according to the fluctuation of a data source and the change of synchronous demands; metadata information is incorporated into the feature extraction process to enhance semantic expressive power and contextual relevance of the features.

As a preferred embodiment of the method for synchronizing heterogeneous database data according to the present invention, the method comprises: the condition inference mechanism comprises that if a certain feature is closely related to a positive business result in a new data stream, the weight of the feature in feature selection is increased, and the priority of the feature in data synchronization is adjusted according to the relativity change; when a certain feature in the user behavior data is abnormal, the condition inference mechanism designates the feature as an abnormal prediction index and analyzes the association of the abnormal prediction index with potential risks or opportunities; if a specific group has a high prediction success rate on a certain feature, the condition inference mechanism weights the feature so as to mine complex relation and locate a target client; aiming at seasonal fluctuation, adjusting a feature set according to a historical data mode and trend, and analyzing and predicting market dynamics by utilizing a multivariate time sequence; for potentially high risk features, initiating a security mode to automatically adjust encryption and data desensitization policies while indicating to check other implicit features that may affect data compliance; during execution of the condition inference mechanism, an evaluation monitoring mechanism of model performance and feature processing strategies is established; periodically evaluating prediction capability and characteristic processing effect, and identifying problems to perform model and characteristic optimization; according to the evaluation result, adjusting parameters of a decision tree algorithm, updating the attention mechanism of a neural network, optimizing feature selection and weights; and establishing a continuous closed loop iteration mechanism, and continuously optimizing a model and a feature processing strategy.

As a preferred embodiment of the method for synchronizing heterogeneous database data according to the present invention, the method comprises: the maintenance using the sync marker ST to reflect the latest state of the data includes the steps of: establishing an intelligent synchronization node for each data item in a source database; configuring an intelligent contract for the data item, and automatically generating a new block when the synchronization condition is met; monitoring the synchronization efficiency and consistency of the data items by using a machine learning model, and ensuring that the synchronization performance meets the standard; identifying data synchronization requirements through behavior analysis, triggering a synchronization marker ST to generate a synchronization plan, coordinating synchronization operation, executing marking, monitoring progress and recording a log when a model detects a requirement signal; simulating data element behaviors in a simulation environment by using a neural network so as to identify synchronous requirements in advance; according to the prediction output of the neural network, the early warning system presets a synchronous mark on the data item to be synchronized; an integrated intent-aware dynamic synchronization planner that selects synchronized data items based on business intent and contextual information; and establishing a self-learning feedback loop based on rewards or punishments, and adjusting and perfecting the synchronous marking logic.

As a preferred embodiment of the method for synchronizing heterogeneous database data according to the present invention, the method comprises: performing security detection and classification of synchronized data items includes the steps of: judging the semantic category of the data item by using natural language processing and deep learning technology, and matching the semantic category with a sensitive word stock to identify the data item containing confidential or personal sensitive information; invoking a security rating module, determining data item sensitivity based on association rule learning and attribute analysis, and classifying according to security level; the security level comprises a non-sensitive information disclosure category, a low-sensitive internal use category, a medium-sensitive important business category and a high-sensitive confidential or private information category; a data set in a middle sensitive important business class, a high sensitive confidential or privacy information class is automatically isolated into a security block by calling a data isolation technology; aiming at data blocks of different security categories, setting up an access control strategy and implementing a minimum authority principle; consistency monitoring is performed on the data within the secure enclave and storage encryption, anomaly detection and unauthorized access detection are performed to confirm security.

As a preferred embodiment of the method for synchronizing heterogeneous database data according to the present invention, the method comprises: classifying according to the security level includes judging that the data item belongs to a non-sensitive information disclosure class if the data item does not contain any sensitive vocabulary or is marked as a disclosure sharing class, and accessing all user roles by adopting an open access strategy; if the non-sensitive public data item is marked as secret or marked as internal use type by mistake, judging that the non-sensitive public data item belongs to the low-sensitive internal use type, only authorizing the appointed roles in the core business department to access, calculating the proportion of sensitive data, and if the proportion exceeds a preset threshold value, automatically triggering content inspection to judge whether the non-core roles and the external system are required to be adjusted to be in a higher security type; if the data item is highly correlated with the key business index or marked as a confidential class, judging that the data item belongs to the moderately sensitive important business class, storing the data item into an encrypted data block, only accessing part of roles of a project management layer and a core development team, and if the data item is highly coupled with a plurality of core systems, automatically triggering an importance evaluation program to consider that the data item is lifted into the highly sensitive confidential class; if the data item contains a large amount of sensitive words or personal privacy information or is marked as an impersonation class, the data item is judged to belong to a highly sensitive secret, the data item is judged to belong to the highly sensitive secret or privacy information class, the data item is stored into a highest security level block, a single-point access and four-eye authentication mechanism is implemented, if the data item is found to be matched with a user credit record or a shielded word stock, the access is automatically locked, the manual auditing and processing are waited, meanwhile, an integrity checking flow is started, and an information leakage source is tracked.

As a preferred embodiment of the method for synchronizing heterogeneous database data according to the present invention, the method comprises: determining a data transmission policy based on the decision directed characteristics, the sensitivity level, and the current network state comprises the steps of: analyzing decision-directed characteristics and sensitivity levels of the data, and distributing transmission priorities; evaluating the current network state and synchronizing under the condition of network capacity permission; adopting confidentiality and security measures according to the data sensitivity level to ensure the transmission security; performing strategy adjustment according to network conditions and synchronous task states so as to improve the synchronous quality; tracking the data transmission process to confirm the data integrity and performing a retransmission mechanism or error recovery operation when a transmission failure occurs; after the transmission is completed, the integrity verification of the synchronization result is performed.

In a second aspect, an embodiment of the present invention provides a system for data synchronization of heterogeneous databases, which includes an instruction sending module, configured to send an instruction to an adapter unit AU of a target database through a synchronization control center SCC, so as to extract a data set to be synchronized; the feature mining module is used for optimizing the data set to be synchronized by applying an advanced preprocessing strategy and mining decision guiding features of the data set by utilizing a joint learning model; a setting maintenance module for setting a synchronizing mark on a data item of the source database and maintaining by using a synchronizing marker ST to reflect the latest state of the data; the detection classification module is used for performing security detection and classification on the synchronous data items, judging the data sensitivity level and implementing corresponding security protection measures; the transmission selection module determines a data transmission strategy according to the decision guiding characteristics, the sensitivity level and the current network state, and performs integrity verification of the synchronous result; and the automatic updating module is used for designing and implementing an automatic updating mechanism of the synchronous operation record according to the output of the synchronous result integrity verification.

In a third aspect, embodiments of the present invention provide a computer apparatus comprising a memory and a processor, the memory storing a computer program, wherein: the computer program instructions, when executed by a processor, perform the steps of the method of heterogeneous database data synchronization according to the first aspect of the present invention.

In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium having a computer program stored thereon, wherein: the computer program instructions, when executed by a processor, perform the steps of the method of heterogeneous database data synchronization according to the first aspect of the present invention.

The invention has the beneficial effects that: the invention greatly improves the speed of data extraction and synchronization through a dynamic multidimensional judgment rule and an advanced preprocessing strategy; by setting the synchronous marker and executing the integrity verification of the synchronous result, the consistency and the integrity of synchronous data are ensured; the sensitivity analysis of the data items is carried out through the deep learning technology and natural language processing, and the layered safety protection measures are implemented according to the data sensitivity level, so that strong safety guarantee is provided for sensitive data; the data transmission strategy is dynamically adjusted according to the current network state and the synchronous task state, so that challenges brought by network condition change are effectively met, and the stability and efficiency of data synchronization are ensured; through the joint learning model and the condition inference mechanism, the invention not only can realize high-efficiency synchronization in the initial stage, but also can continuously learn and optimize based on user feedback and system performance data.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following description will briefly explain the drawings required to be used in the embodiments, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of the steps of a method for heterogeneous database data synchronization.

FIG. 2 is a conditional inference mechanism design diagram of a method of heterogeneous database data synchronization.

FIG. 3 is a data security detection and classification flow chart of a method of heterogeneous database data synchronization.

FIG. 4 is a computer device diagram of a method of heterogeneous database data synchronization.

Detailed Description

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.

Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

Example 1

Referring to fig. 1-4, a first embodiment of the present invention provides a method for data synchronization of heterogeneous databases, including,

S1: and sending an instruction to the adapter unit AU of the target database through the synchronous control center SCC so as to extract the data set to be synchronized.

Specifically, the synchronization control center SCC sends an instruction to the adapter unit AU of the target database; after the AU receives the instruction, a judging module RM is started to extract the data set to be synchronized; the RM extracts according to an embedded multidimensional judgment rule, and specifically comprises a time dimension, a content dimension, a relation dimension, an importance dimension, an access frequency dimension and a data sensitivity dimension, and if the last updating time of the data set is after the last synchronizing time, the data set is marked as to be synchronized; if the content of the data set is newly added, deleted or changed, marking the data set as to-be-synchronized; if the main data associated with the data set changes, marking the main data as to-be-synchronized; if the data set contains an importance mark or a keyword, marking as to-be-synchronized; if the access or query frequency of the data set is greater than a preset threshold value, marking the data set as to-be-synchronized; if the data set contains sensitive personal information or important business data, marking as to-be-synchronized; extracting all the data sets marked as to-be-synchronized, and summarizing the data sets meeting any rule by a summarizing doctor so as to generate the to-be-synchronized data sets; the AU feeds back the data set to be synchronized to the SCC.

S2: and optimizing the data set to be synchronized by using an advanced preprocessing strategy, and mining decision guiding characteristics of the data set by using a joint learning model.

Specifically, the method comprises the following steps:

S2.1: the data set is subjected to format standardization and conversion processing through the data adapter, and atypical samples are identified and removed by using outlier detection.

S2.2: the data set is monitored in real time by utilizing ARIMA algorithm and mutation scoring technology, and abnormal changes are captured and marked.

S2.3: an interactive feature selection process is performed to dynamically identify and iteratively optimize a feature set of decision-directed properties.

Preferably, the data set is subjected to integrity check and outlier processing, and a joint learning environment is established; performing preliminary evaluation on the feature contribution degree by using an explanatory model, and gradually optimizing a feature set in each iteration; combining the feature contribution degree evaluation result with the actual business requirement to determine a group of decision guiding features; verifying the optimized model and the features by adopting a verification set, and testing in an actual scene; setting up a monitoring mechanism, continuously monitoring the influence of decision guiding characteristics, and timely adjusting to meet the continuously-changing decision requirement; and collecting user feedback and system performance data, and periodically auditing and updating the decision model to realize continuous model improvement and decision optimization.

S2.4: a conditional inference mechanism based on decision tree algorithms or neural networks incorporating attention mechanisms is designed to guide feature selection and priority assignment.

Specifically, if a certain feature is closely related to a positive service result in a new data stream, the weight of the feature in feature selection is increased, and the priority of the feature in data synchronization is adjusted according to the correlation change; when a certain feature in the user behavior data is abnormal, the condition inference mechanism designates the feature as an abnormal prediction index and analyzes the association of the abnormal prediction index with potential risks or opportunities; if a specific group has a high prediction success rate on a certain feature, the condition inference mechanism weights the feature so as to mine complex relation and locate a target client; aiming at seasonal fluctuation, adjusting a feature set according to a historical data mode and trend, and analyzing and predicting market dynamics by utilizing a multivariate time sequence; for potentially high risk features, initiating a security mode to automatically adjust encryption and data desensitization policies while indicating to check other implicit features that may affect data compliance; during execution of the condition inference mechanism, an evaluation monitoring mechanism of model performance and feature processing strategies is established; periodically evaluating prediction capability and characteristic processing effect, and identifying problems to perform model and characteristic optimization; according to the evaluation result, adjusting parameters of a decision tree algorithm, updating the attention mechanism of a neural network, optimizing feature selection and weights; and establishing a continuous closed loop iteration mechanism, and continuously optimizing a model and a feature processing strategy.

S2.5: a feature extraction framework is developed that adaptively calibrates according to data source fluctuations and synchronization demand changes.

S2.6: metadata information is incorporated into the feature extraction process to enhance semantic expressive power and contextual relevance of the features.

S3: a sync mark is set on a data item of the source database and maintained using the sync mark ST to reflect the latest state of the data.

Specifically, intelligent synchronization nodes are established for each data item in a source database, and each node comprises a hash value (ensuring data integrity), a synchronization state (representing the current version of data and synchronization requirements) and timestamp information (recording the last modification time or checking time point of the data item); configuring an intelligent contract for the data item, and automatically generating a new block when the synchronization condition is met; monitoring the synchronization efficiency and consistency of the data items by using a machine learning model, and ensuring that the synchronization performance meets the standard; identifying data synchronization requirements through behavioral analysis, triggering a synchronization marker ST to generate a synchronization plan, coordinating synchronization operations, performing markers, monitoring progress, and logging when a demand signal (e.g., abnormal behavior, data quality degradation, etc.) is detected by the model

Simulating data element behaviors in a simulation environment by using a neural network so as to identify synchronous requirements in advance; according to the prediction output of the neural network, the early warning system presets a synchronous mark on the data item to be synchronized; an integrated intent-aware dynamic synchronization planner that selects synchronized data items based on business intent and contextual information; and establishing a self-learning feedback loop based on rewards or punishments, and adjusting and perfecting the synchronous marking logic.

S4: the SCC performs security detection and classification on the synchronous data items, determines the data sensitivity level, and implements corresponding security protection measures.

Specifically, the semantic category of the data item is judged by utilizing natural language processing and deep learning technology, and the semantic category is matched with a sensitive word stock to identify the data item containing confidential or personal sensitive information; invoking a security rating module, determining data item sensitivity based on association rule learning and attribute analysis, and classifying according to security level; the security level includes a no sensitive information disclosure category, a low sensitive internal use category, a medium sensitive important business category, and a high sensitive confidential or private information category; a data set in a middle sensitive important business class, a high sensitive confidential or privacy information class is automatically isolated into a security block by calling a data isolation technology; aiming at data blocks of different security categories, setting up an access control strategy and implementing a minimum authority principle; consistency monitoring is performed on the data within the secure enclave and storage encryption, anomaly detection and unauthorized access detection are performed to confirm security.

It should be noted that if the data item does not contain any sensitive vocabulary or is marked as a public sharing class, it is determined that the data item belongs to a non-sensitive information public class, and an open access strategy is adopted, so that all user roles can access and synchronize the data item; if the non-sensitive public data item is marked as secret or marked as internal use type by mistake, judging that the non-sensitive public data item belongs to the low-sensitive internal use type, only authorizing the appointed roles in the core business department to access, and enabling the non-core roles and the external system to be inaccessible, if the sensitive data duty ratio exceeds a preset threshold value, automatically triggering content examination, and judging whether the content examination needs to be adjusted to be in a higher security type; if the data item is highly correlated with the key business index or marked as a confidential class, judging that the data item belongs to the moderately sensitive important business class, storing the data item into an encrypted data block, only accessing part of roles of a project management layer and a core development team, and if the data item is highly coupled with a plurality of core systems, automatically triggering an importance evaluation program to consider that the data item is lifted into the highly sensitive confidential class; if the data item contains a large amount of sensitive words or personal privacy information or is marked as an impersonation class, the data item is judged to belong to a highly sensitive secret, the data item is judged to belong to the highly sensitive secret or privacy information class, the data item is stored into a highest security level block, a single-point access and four-eye authentication mechanism is implemented, if the data item is found to be matched with a user credit record or a shielded word stock, the access is automatically locked, the manual auditing and processing are waited, meanwhile, an integrity checking flow is started, and an information leakage source is tracked.

S5: the SSC determines a data transmission policy based on decision directed characteristics, sensitivity levels, and current network conditions and performs integrity verification of the synchronization results.

Preferably, the method comprises the following steps:

s5.1: the decision directed features and sensitivity levels of the data are analyzed and transmission priorities are assigned.

It should be noted that the decision-directed feature will decide which data is more critical and requires faster transmission; the sensitivity level will determine the security requirements at the time of data transmission.

S5.2: the current network state is evaluated and synchronization is performed under conditions where network capacity allows.

It should be noted that if the network state is not ideal, the transmission policy is adjusted, for example, data synchronization is performed during the network idle period.

S5.3: privacy and security measures are taken to ensure transmission security based on the level of data sensitivity.

Specifically, if the data belongs to the class without sensitive information disclosure, a standard transmission encryption protocol (such as SSL/TLS) is adopted to ensure confidentiality of the data in transmission, meanwhile, data integrity verification is carried out, and a hash function is used to generate a checksum so as to confirm that the data is not tampered in the transmission process; if the data belongs to the low-sensitivity internal use category, besides using SSL/TLS to carry out transmission encryption, additionally building a VPN or a special network on the internal network, ensuring that the data is transmitted in a safe internal environment, carrying out encryption storage and transmission on the data, setting access control, and ensuring that only authorized personnel can access; if the data belongs to the middle sensitive important business category, strict access control is implemented, the security is increased by combining a multi-factor authentication mechanism, the data is encrypted by adopting an end-to-end encryption technology, the data is protected from unauthorized access or interception on the network, meanwhile, proper segmentation of the network is ensured, and data transmission is protected from interception by other network segments; if the data belongs to a highly sensitive confidential or private information category, the TLS 1.3 protocol is used to perform end-to-end encryption to ensure extremely high security in the transmission process, and data desensitization processing is performed before transmission (so that even if the data is intercepted, the original sensitive information cannot be interpreted), and meanwhile, strict access control and identity verification are performed on the transmitted data, including the use of security tokens and device authorization mechanisms, and the data transmission behavior is monitored in the whole process, and relevant audit logs are recorded (so that the data flow and access conditions can be tracked if necessary).

S5.4: and carrying out strategy adjustment according to the network condition and the synchronous task state so as to improve the synchronous quality.

Preferably, the adjustment includes reallocating network bandwidth priority, selecting an optimal time window for data transmission (e.g., performing high capacity data transmission during a lower time period of network use), or reducing the data transmission rate when the network is congested to avoid congestion aggravation. At the same time, it is determined whether the transmission can be delayed to await more optimal network conditions or to take an alternative transmission path based on the priority and sensitivity of the data.

S5.5: the data transmission process is tracked to confirm data integrity and to perform retransmission mechanisms or error recovery operations in the event of a transmission failure.

Preferably, upon detection of an error or interruption in the data transmission, a retransmission mechanism or error recovery operation is immediately initiated, which may include retransmitting an unsuccessfully transmitted data packet, or employing techniques such as forward error correction to correct the error when it occurs, to minimize delays in the data transmission.

S5.6: after the transmission is completed, the integrity verification of the synchronization result is performed.

Further, after the data transmission is completed, the integrity verification of the data synchronization result is performed, so that all data are ensured to be correctly transmitted and to reach the destination without damage. This may include a comparison with the source data to verify or hash values to ensure that the data is not corrupted or changed during transmission. If verification fails, troubleshooting and necessary error correction measures are required to ensure accuracy and consistency of the data.

S6: the SCC designs and implements an automatic updating mechanism of the synchronous operation record according to the output of the synchronous result integrity verification.

Specifically, the SCC compares the synchronous operation result with the expected synchronous mark and the data state by using a real-time data monitoring module, and automatically updates the synchronous metadata in the source database and the target database after successful synchronization; by setting a trigger or a monitor, after each synchronous operation is completed, key information (such as operation time, data items, synchronous state, operators and the like) of the transaction is recorded into a global transaction log; the global transaction log is used as a basis of a history record and an audit trail of data synchronization, and provides detailed information for future data auditing, system optimization and possible fault recovery; in order to maintain the consistency among the data heterogeneous systems, a transaction version control mechanism is introduced to solve the potential conflict problem, and a global lock or distributed lock technology is used to ensure the data consistency in the synchronization process; after the integrity and consistency of the synchronous data are confirmed, an algorithm is used for compressing the database operation log so as to reduce storage occupation and improve audit efficiency; establishing an early warning and abnormality detection mechanism, informing an administrator to intervene in real time when the synchronous operation is abnormal or fails, and recording the intervention in an abnormality log for analysis; and the data synchronization result and the user feedback are combined, the data synchronization strategy is continuously optimized, the accuracy of data consistency check is improved, the rules and the triggering conditions of the synchronization operation are updated timely, and the performance and the robustness of the whole synchronization flow are improved.

Further, the embodiment also provides a system for data synchronization of heterogeneous databases, which comprises an instruction sending module, a synchronization control center SCC, a synchronization control module and a synchronization control module, wherein the instruction sending module is used for sending an instruction to an adapter unit AU of a target database so as to extract a data set to be synchronized; the feature mining module is used for optimizing the data set to be synchronized by applying an advanced preprocessing strategy and mining decision guiding features of the data set by utilizing a joint learning model; a setting maintenance module for setting a synchronizing mark on a data item of the source database and maintaining by using a synchronizing marker ST to reflect the latest state of the data; the detection classification module is used for performing security detection and classification on the synchronous data items, judging the data sensitivity level and implementing corresponding security protection measures; the transmission selection module determines a data transmission strategy according to the decision guiding characteristics, the sensitivity level and the current network state, and performs integrity verification of the synchronous result; and the automatic updating module is used for designing and implementing an automatic updating mechanism of the synchronous operation record according to the output of the synchronous result integrity verification.

The embodiment also provides a computer device, which is suitable for the situation of the heterogeneous database data synchronization method, and comprises a memory and a processor; the memory is used for storing computer executable instructions, and the processor is used for executing the computer executable instructions to implement the method for data synchronization of heterogeneous databases as proposed in the above embodiment.

The computer device may be a terminal comprising a processor, a memory, a communication interface, a display screen and input means connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

The present embodiment also provides a storage medium having stored thereon a computer program which, when executed by a processor, implements a method for implementing heterogeneous database data synchronization as proposed in the above embodiments.

In conclusion, the invention greatly improves the speed of data extraction and synchronization through a dynamic multidimensional judgment rule and an advanced preprocessing strategy; by setting the synchronous marker and executing the integrity verification of the synchronous result, the consistency and the integrity of synchronous data are ensured; the sensitivity analysis of the data items is carried out through the deep learning technology and natural language processing, and the layered safety protection measures are implemented according to the data sensitivity level, so that strong safety guarantee is provided for sensitive data; the data transmission strategy is dynamically adjusted according to the current network state and the synchronous task state, so that challenges brought by network condition change are effectively met, and the stability and efficiency of data synchronization are ensured; through the joint learning model and the condition inference mechanism, the invention not only can realize high-efficiency synchronization in the initial stage, but also can continuously learn and optimize based on user feedback and system performance data.

Example 2

Referring to fig. 1 to fig. 4, in a second embodiment of the present invention, a method for synchronizing heterogeneous database data is provided, and in order to verify the beneficial effects of the present invention, scientific demonstration is performed through economic benefit calculation and simulation experiments.

Specifically, in the simulation experiment, the data synchronization process between the two heterogeneous database systems A and B is simulated, and the aim of testing the performance of the technical scheme under different network environments and safety requirements is achieved. 1000 data records needing to be synchronized are set in the experiment, and the data records comprise a plurality of data types such as sales records, customer information and service records. The synchronization operation is set to be performed once per day to simulate the data synchronization requirements in an actual traffic scenario. In order to fully evaluate the performance of the solution, two network environments are considered in the experiment: one is a high-speed network environment (1 Gbps bandwidth, 10ms delay) and the other is a low-speed network environment (100 Mbps bandwidth, 100ms delay). Furthermore, data items are classified according to security level, from public data without sensitive information to data containing highly sensitive information, in order to test the effectiveness of different security measures.

Further, in the experimental process, an instruction for extracting the data set to be synchronized is firstly sent to the Adapter Unit (AU) of the database a through the Synchronization Control Center (SCC). Advanced preprocessing strategies, such as format normalization, outlier detection, feature selection, etc., are then applied to the data set to ensure the quality and consistency of the data. In the sync mark setting stage, intelligent sync nodes including hash value, sync state and time stamp information are established for each data item in the source database A to automatically generate new block and monitor the sync efficiency and consistency of the data item. The security detection link carries out sensitivity level judgment and security protection on the data, and the security detection link comprises measures such as data isolation, encryption, access control and the like. And finally, determining an optimal data transmission strategy according to the decision guiding characteristics, the sensitivity level and the current network state, executing data synchronization, and carrying out integrity verification of a synchronization result after the completion.

Further, the simulation experiment result shows that the average completion time of the data synchronization operation is 2 minutes in the high-speed network environment, and the average completion time is prolonged to 10 minutes in the low-speed network environment. All data items were not corrupted or lost during transmission, and data consistency was maintained at 100%. The security assessment proves that all sensitive data takes appropriate encryption and security measures during the synchronization process without recording any security events. User feedback shows that the business department is satisfied with the synchronization efficiency and the data accuracy, and is especially accepted for the protection measures of sensitive data. The simulation experiment verifies that the technical scheme can effectively meet the requirement of heterogeneous database data synchronization, ensures the high efficiency, safety and data consistency of the synchronization process, and particularly provides powerful support for practical application under different network environments and safety requirements.

Preferably, the comparative index of the present invention with the conventional method is shown in Table 1.

TABLE 1 comparative index of the present invention with the conventional method

Contrast dimension	The invention is that	Traditional synchronization method
			Synchronization efficiency	Averaging 2 minutes under a high-speed network; average 10 minutes under low speed network	Averaging 5 minutes under a high-speed network; average 15 minutes under low speed network
Data consistency	99%	95%
			Security assurance	(Advanced security) data isolation, encryption, access control	(Basic security measure) data encryption, basic access control
Network environment adaptation	High height	In (a)
			User satisfaction	High height	In general

In particular, table 1 demonstrates the advantages of the present invention over conventional approaches in a number of critical dimensions, particularly in terms of improving synchronization efficiency, ensuring data consistency and security, and the ability to maintain high performance in different network environments. With these improvements, the present invention is able to meet the increasing data synchronization demands, particularly in situations where large amounts of sensitive data need to be processed.

It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.

Claims

1. A method for data synchronization of heterogeneous databases, characterized in that: comprising the steps of (a) a step of,

Transmitting an instruction to an adapter unit AU of a target database through a synchronous control center SCC so as to extract a data set to be synchronized;

Optimizing the data set to be synchronized by using an advanced preprocessing strategy, and mining decision guiding characteristics of the data set by using a joint learning model;

Setting a synchronous mark on a data item of a source database, and maintaining by using a synchronous marker ST to reflect the latest state of the data;

Performing security detection and classification on the synchronous data items, judging the data sensitivity level, and implementing corresponding security protection measures;

Determining a data transmission strategy according to the decision-directed characteristics, the sensitivity level and the current network state, and executing the integrity verification of the synchronous result;

according to the output of the integrity verification of the synchronous result, an automatic updating mechanism of the synchronous operation record is designed and implemented;

the decision directed feature for mining a dataset using a joint learning model includes the steps of:

Carrying out format standardization and conversion processing on the data set through a data adapter, and identifying and removing atypical samples by using outlier detection;

Real-time monitoring the data set by utilizing an ARIMA algorithm and a variation scoring technology, capturing abnormal changes and marking;

executing an interactive feature selection process to dynamically identify and iteratively optimize a feature set of decision-directed properties;

designing a condition inference mechanism of a neural network based on a decision tree algorithm or a fused attention mechanism to guide feature selection and priority allocation;

developing a characteristic extraction framework which is adaptively calibrated according to the fluctuation of a data source and the change of synchronous demands;

The metadata information is incorporated into the feature extraction process to enhance semantic expressive power and contextual relevance of the features;

the condition inference mechanism includes that,

If a certain feature is closely related to the positive business result in the new data stream, the weight of the feature in feature selection is improved, and the priority of the feature in data synchronization is adjusted according to the change of the correlation;

when a certain feature in the user behavior data is abnormal, the condition inference mechanism designates the feature as an abnormal prediction index and analyzes the association of the abnormal prediction index with potential risks or opportunities;

if a specific group has a high prediction success rate on a certain feature, the condition inference mechanism weights the feature so as to mine complex relation and locate a target client;

Aiming at seasonal fluctuation, adjusting a feature set according to a historical data mode and trend, and analyzing and predicting market dynamics by utilizing a multivariate time sequence;

for potentially high risk features, initiating a security mode to automatically adjust encryption and data desensitization policies while indicating to check other implicit features that may affect data compliance;

During execution of the condition inference mechanism, an evaluation monitoring mechanism of model performance and feature processing strategies is established;

Periodically evaluating prediction capability and characteristic processing effect, and identifying problems to perform model and characteristic optimization;

According to the evaluation result, adjusting parameters of a decision tree algorithm, updating the attention mechanism of a neural network, optimizing feature selection and weights;

Establishing a continuous closed loop iteration mechanism, and continuously optimizing a model and a feature processing strategy;

the maintenance using the sync marker ST to reflect the latest state of the data includes the steps of:

establishing an intelligent synchronization node for each data item in a source database;

Configuring an intelligent contract for the data item, and automatically generating a new block when the synchronization condition is met;

monitoring the synchronization efficiency and consistency of the data items by using a machine learning model, and ensuring that the synchronization performance meets the standard;

identifying data synchronization requirements through behavior analysis, triggering a synchronization marker ST to generate a synchronization plan, coordinating synchronization operation, executing marking, monitoring progress and recording a log when a model detects a requirement signal;

simulating data element behaviors in a simulation environment by using a neural network so as to identify synchronous requirements in advance;

according to the prediction output of the neural network, the early warning system presets a synchronous mark in the data item to be synchronized;

an integrated intent-aware dynamic synchronization planner that selects synchronized data items based on business intent and contextual information;

Establishing a self-learning feedback loop based on rewards or punishments, and adjusting and perfecting synchronous marking logic;

The determining the data transmission strategy according to the decision-directed characteristics, the sensitivity level and the current network state comprises the following steps:

analyzing decision-directed characteristics and sensitivity levels of the data, and distributing transmission priorities;

evaluating the current network state and synchronizing under the condition of network capacity permission;

Adopting confidentiality and security measures according to the data sensitivity level to ensure the transmission security;

Performing strategy adjustment according to network conditions and synchronous task states so as to improve the synchronous quality;

Tracking the data transmission process to confirm the data integrity and performing a retransmission mechanism or error recovery operation when a transmission failure occurs;

After the transmission is completed, the integrity verification of the synchronization result is performed.

2. The method for heterogeneous database data synchronization of claim 1, wherein: said performing security detection and classification of the synchronized data items comprises the steps of:

Judging the semantic category of the data item by using natural language processing and deep learning technology, and matching the semantic category with a sensitive word stock to identify the data item containing confidential or personal sensitive information;

invoking a security rating module, determining data item sensitivity based on association rule learning and attribute analysis, and classifying according to security level;

the security level comprises a non-sensitive information disclosure category, a low-sensitive internal use category, a medium-sensitive important business category and a high-sensitive confidential or private information category;

A data set in a middle sensitive important business class, a high sensitive confidential or privacy information class is automatically isolated into a security block by calling a data isolation technology;

Aiming at data blocks of different security categories, setting up an access control strategy and implementing a minimum authority principle;

consistency monitoring is performed on the data within the secure enclave and storage encryption, anomaly detection and unauthorized access detection are performed to confirm security.

3. The method for heterogeneous database data synchronization of claim 2, wherein: the classifying according to the security level includes,

If the data item does not contain any sensitive vocabulary or is marked as a public sharing class, judging that the data item belongs to a non-sensitive information public class, and adopting an open access strategy, wherein all user roles can be accessed;

if the non-sensitive public data item is marked as secret or marked as internal use class by mistake, the data item is judged to belong to the low-sensitive internal use class, only the appointed role in the core business department is authorized to access, the non-core role and the external system cannot access,

Calculating the duty ratio of sensitive data, and automatically triggering content examination if the duty ratio exceeds a preset threshold value, and judging whether the content examination needs to be adjusted to be of a higher security class or not;

If the data item is highly correlated with the key business index or marked as confidential, the data item is judged to belong to the moderately sensitive important business class, the data item is stored in an encrypted data block, only the project management layer and the core development team part roles access,

If the data item is found to be highly coupled with a plurality of core systems, automatically triggering an importance assessment program, and considering the upgrade to be a highly sensitive confidential class;

If the data item contains a large amount of sensitive words or personal privacy information or is marked as an extremely secret class, the data item is judged to belong to a highly sensitive secret or privacy information class, the data item is stored in a highest security level block, a single-point access and four-eye authentication mechanism is implemented,

If the data item is found to be matched with the credit record of the user or the shielded word stock, automatically locking access, waiting for manual auditing and processing, and starting an integrity checking flow to track the information leakage source.

4. A system for synchronizing data of heterogeneous databases, which is based on the method for synchronizing data of heterogeneous databases according to any one of claims 1 to 3, and is characterized in that: also included is a method of manufacturing a semiconductor device,

The instruction sending module is used for sending an instruction to an adapter unit AU of the target database through the synchronous control center SCC so as to extract a data set to be synchronized;

the feature mining module is used for optimizing the data set to be synchronized by applying an advanced preprocessing strategy and mining decision guiding features of the data set by utilizing a joint learning model;

a setting maintenance module for setting a synchronizing mark on a data item of the source database and maintaining by using a synchronizing marker ST to reflect the latest state of the data;

The detection classification module is used for performing security detection and classification on the synchronous data items, judging the data sensitivity level and implementing corresponding security protection measures;

the transmission selection module determines a data transmission strategy according to the decision guiding characteristics, the sensitivity level and the current network state, and performs integrity verification of the synchronous result;

and the automatic updating module is used for designing and implementing an automatic updating mechanism of the synchronous operation record according to the output of the synchronous result integrity verification.