CN117648388A - Visual safe real-time data warehouse implementation method and system - Google Patents

Visual safe real-time data warehouse implementation method and system Download PDF

Info

Publication number
CN117648388A
CN117648388A CN202410119047.3A CN202410119047A CN117648388A CN 117648388 A CN117648388 A CN 117648388A CN 202410119047 A CN202410119047 A CN 202410119047A CN 117648388 A CN117648388 A CN 117648388A
Authority
CN
China
Prior art keywords
data
metadata
target metadata
inspection
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410119047.3A
Other languages
Chinese (zh)
Other versions
CN117648388B (en
Inventor
张玲
李林
黎明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Qizhu Intelligent Technology Co ltd
Original Assignee
Chengdu Qizhu Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Qizhu Intelligent Technology Co ltd filed Critical Chengdu Qizhu Intelligent Technology Co ltd
Priority to CN202410119047.3A priority Critical patent/CN117648388B/en
Publication of CN117648388A publication Critical patent/CN117648388A/en
Application granted granted Critical
Publication of CN117648388B publication Critical patent/CN117648388B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a visual safe real-time data warehouse realization method and a system thereof, which relate to the field of computers, wherein the realization method comprises the following steps: initializing a local database; collecting target metadata in a three-party database, analyzing the target metadata and localizing the target metadata to a local database; acquiring the authority of the target metadata; evaluating the quality of the target metadata and grading; the target metadata with qualified dispatching scores are used for creating a data warehouse, and the blood-margin relation of the target metadata is analyzed; configuring a data synchronization strategy, and monitoring and synchronously maintaining target metadata and other metadata with blood relationship with the target metadata; metadata in the data warehouse is classified and a corresponding data set is created for use by a three-party user. The method and the device can produce the data in real time under the conditions of ensuring the reliability, the integrity, the correctness and the usability of the data, and avoid the performance problem caused by the statistical efficiency of the database.

Description

Visual safe real-time data warehouse implementation method and system
Technical Field
The invention relates to the technical field of computers, in particular to a visual safe and real-time data warehouse implementation method and a visual safe and real-time data warehouse implementation system.
Background
In recent years, more middle and small enterprises and university business departments build management application systems and platforms such as personnel, finance and the like according to actual business requirements, and more management data is accumulated, but induction, arrangement and reuse work on a data logic level is weak, so that the development situation of the enterprises is difficult to grasp clearly by the leads of the enterprises and departments, and the lead layer is not beneficial to making fine and scientific decisions about future development of the enterprises.
The mature data warehouse product system in the market has high complexity, complex functions and high cost, so that the system is difficult to be used by small and medium enterprises or is excluded based on cost consideration, and various problems are endless due to the traditional manual statistics mode, which is unfavorable for the long-term healthy development of the enterprises.
Mature data warehouse products are usually additional development system products for assisting in processing data quality assurance, so that the cost of using the data warehouse products by users is increased, the use cost is also increased accordingly, and small and medium enterprises need to provide a plurality of corresponding technical posts to complete data warehouse construction, so that the enterprise cost is increased. The above problems are technical problems to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to solve the technical problems that: a method and system for implementing a visual, secure, real-time data warehouse is provided to address at least some of the above issues.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
a method for implementing a visual secure real-time data warehouse, comprising the steps of:
step 1, initializing a local database;
step 2, collecting target metadata in a three-party database, analyzing the target metadata and localizing the target metadata to a local database;
step 3, acquiring the authority of the target metadata;
step 4, evaluating the quality of the target metadata and grading;
step 5, scheduling the target metadata with qualified scores for creating a data warehouse, and analyzing the blood-margin relation of the target metadata;
step 6, configuring a data synchronization strategy, and monitoring and synchronously maintaining target metadata and other metadata with blood relationship with the target metadata;
and 7, classifying metadata in the data warehouse and creating a corresponding data set for the three-party user.
Further, the step 2 includes: acquiring target metadata in a three-party database; parsing the data table and the field of the target metadata based on a metadata parser; and calling a metadata adapter of the local database, and creating a local data table corresponding to the data table of the parsed target metadata in the local database.
Further, the step 4 includes: step 41, creating a data governance scheme, wherein the data governance scheme comprises a governance data table, a governance range, a governance period, a governance target score, an inspection rule and a data governance end; step 42, creating a data inspection rule according to the field of the managed data table and a total score of each inspection rule; step 43, the local database registers the dispatching task according to the treatment period, dispatches the data checking task, and executes the data checking task at the next time point of the treatment period; step 44, the local database judges that the current treatment scheme has an inspection task, and jumps out of the inspection task; if judging that the current treatment scheme does not have the checking task, starting the checking task, and locking the state of the current data treatment scheme as executing; step 45, completing an inspection task, and generating a data quality inspection report of data which does not pass inspection, wherein the data quality inspection report comprises an inspection score and an isolation data judgment result; 46, if the isolated data exist or the check score does not reach the treatment target score, generating a treatment task and distributing the treatment task to a data treatment end, if the current treatment scheme has incomplete treatment tasks, updating the details of the data which do not pass the check to the current treatment task; and if the control task does not exist, newly creating the control task.
Further, the method for generating the check score in step 45 includes: the check score of step 45The generation method comprises the following steps: step 451, obtaining the total number of check data in a data table and the number of check rules of a data management scheme; step 452, setting the score of each piece of inspection data when each piece of inspection rule passes, and marking the score of each piece of inspection data under all pieces of inspection rules as the total score of each piece of inspection data; the total score of each piece of inspection data passing through each piece of inspection rule is recorded as the sum of the total scores of all inspection rules of the data treatment scheme; step 453, calculate the inspection score, inspection score= [ sum of total scores of each inspection data/(sum of total scores of all inspection rules of data governance scheme)Total number of data]/>100。
Further, the step 5 includes: step 51, metadata with check scores reaching treatment target scores is scheduled to be used as source metadata to create a data warehouse; step 52, analyzing the blood-edge relationship of the source metadata by the local metadata engine, and recording the blood-edge relationship of the source metadata.
Further, the process of monitoring and synchronously maintaining in real time by adopting the link in the step 6 includes: step a, creating a CDC mapping table of all source metadata in a data warehouse at a flink server; and b, creating a data warehouse data query statement conforming to the link grammar, acquiring target metadata from a target metadata table and inserting the target metadata into a CDC mapping table.
Further, the process of monitoring and synchronously maintaining in real time by adopting the non-flink in the step 6 comprises the following steps:
step A, monitoring a transaction log of a three-party database where target metadata are located;
step B, when the change information is acquired, packaging the target metadata ID and the changed row ID, and sending a packaging message to a data trigger engine;
step C, creating a data triggering rule to register to a data triggering engine;
step D, the data triggering engine packages the IDs of other metadata which can cause the change of the target metadata into an ID list as a triggering source according to the blood-edge relation of the target metadata;
e, registering the ID list to a data trigger engine, and monitoring a data change message in real time;
and F, monitoring the trigger source, and carrying out corresponding data change on the target metadata to complete maintenance of the target metadata.
Further, the step 7 includes: step 71, creating a data set theme of the current user, creating a data set under the corresponding theme, setting a visible range of the data set, and opening the visible range of the data set to the user; step 72, the user determines the required data set in the visible range of the data set, and applies for the data set interface; and 73, setting a data management end corresponding to the data set to open the required data set to a user, and performing desensitization processing on the opened data set.
Further, the step 7 further comprises the following steps: and if the user does not see the required data set in the visible range of the data set, initiating a data demand application, and using the existing data set or configuring the corresponding data set for the user by the data management terminal based on the data demand.
The invention discloses a visual system of a safe real-time data warehouse, which comprises the following components:
the local database initializing module is used for initializing a local database;
the metadata acquisition module is used for acquiring target metadata in the three-party database, analyzing the target metadata and localizing the target metadata to the local database;
the data right determining module is used for acquiring the right of the target metadata;
the data scoring module is used for evaluating the quality of the target metadata and scoring the target metadata;
the data warehouse creating module is used for dispatching the target metadata with qualified scores to create a data warehouse and analyzing the blood relationship of the target metadata;
the data monitoring and synchronizing module is used for configuring a data synchronizing strategy, monitoring and synchronously maintaining target metadata and other metadata with blood relationship with the target metadata;
and the data set module is used for classifying metadata in the data warehouse and creating a corresponding data set for the use of the three-party user.
Compared with the prior art, the invention has the following beneficial effects:
the invention realizes a visual safe real-time data warehouse system, automatically maintains a set of detailed blood relationship when metadata is generated, realizes the maintenance of the metadata, abstracts other metadata ranges affected by the current metadata, and introduces a method for processing the data in real time. The invention realizes the data trigger engine in the business layer, when the engine is started, the blood relationship is automatically abstracted into the trigger factor and the data change rule of the data trigger engine, the business data change is used as the drive to generate the event message, the trigger engine processes the data change, the target form generates the next event message when the data change, the event message is always transmitted to trigger the data to be rolled up, thereby realizing the multi-dimensional data real-time production, solving the problems of high technical cost, difficult maintenance, unoccupied data logic, complex arrangement and weak reuse of the database trigger, supporting the requirement of users on real-time statistics of big data, and avoiding the performance problem caused by the database statistics efficiency.
The invention can also score the data quality, thereby improving the data quality, simultaneously carrying out data task tracking in real time, directly prohibiting the data with the isolation level from participating in the generation of the data bin, thus ensuring the reliability, the integrity, the correctness and the usability of the data, avoiding the need of additional development of system products to carry out auxiliary processing on the data and reducing the use cost of users.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention.
Detailed Description
Term interpretation:
oracle, mysql, SQLServer, mongodb, DB2, TIDB, postgreSQL, oceanBase are each a specific database type;
JSON is a lightweight data exchange format;
the CVS is code version control software;
the link is a distributed system;
CDC is change data capture;
kafka is an open source stream processing platform.
The present invention will be described in further detail with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, a method for implementing a visual secure real-time data warehouse includes the following steps:
step 1, initializing a local database;
step 2, collecting target metadata in a three-party database, analyzing the target metadata and localizing the target metadata to a local database;
step 3, acquiring the authority of the target metadata;
step 4, evaluating the quality of the target metadata and grading;
step 5, scheduling the target metadata with qualified scores for creating a data warehouse, and analyzing the blood-margin relation of the target metadata;
step 6, configuring a data synchronization strategy, and monitoring and synchronously maintaining target metadata and other metadata with blood relationship with the target metadata;
and 7, classifying metadata in the data warehouse and creating a corresponding data set for the three-party user.
In a specific embodiment, the invention uses a single local production library as a database structure standard, supports multiple types of database warehouse libraries, realizes differentiation of data structure creation according to different database types, scans all data structures within the authority range through a plurality of metadata analysis adapters, and reversely converts the data structures into local database structures to realize metadata localization. The business department obtains the maximum metadata authority through confirming the authority, and distributes the metadata authority to the data management end, the data management end establishes a proper data warehouse according to the business requirement after the data quality is evaluated, the system automatically analyzes the blood-edge relation of the data, monitors and synchronously maintains the metadata and other metadata with blood-edge relation with the target metadata through a synchronous strategy, and completes real-time monitoring and acquisition of the metadata. According to the method, metadata in the data warehouse are classified according to the use purposes, a data set is created under the corresponding theme, the visible range of the data set is set, a user can check data in the visible range of the data set, if the user does not check the data, a data demand application can be proposed, after the user puts forward the data demand, the data management end creates the data set on the data demand approval interface according to the description of the data demand or directly uses the existing data set, a data interface is directly generated for the user and is associated with the data demand, and the user can check the approved data interface on the data demand approval interface to preview the data or derive the details of the data interface.
Preferably, the step 2 includes: acquiring target metadata in a three-party database; parsing the data table and the field of the target metadata based on a metadata parser; and calling a metadata adapter of the local database, and creating a local data table corresponding to the data table of the parsed target metadata in the local database.
In a specific embodiment, the system analyzes the data table and the field of the target metadata in the database according to all database structures which can be acquired by the user through the metadata analyzer, invokes the metadata adapter of the local database, acquires table statement execution, creates a corresponding local data table of the analyzed metadata in the local database, realizes metadata localization, and performs timing structure check and automatic synchronization according to the set synchronization strategy system, localizes the synchronized metadata, and ensures that the latest metadata is acquired in real time.
The metadata analyzer is realized by defining a general database interface, the interface requirements comprise database grouping, applicable version and metadata analysis, and the metadata table structure corresponding to one is obtained from the target database through the interface. The invention realizes Oracle, mysql, SQLServer, mongodb, DB and TIDB, postgreSQL, oceanBase resolvers of multiple versions of files.
The different databases have the commonalities of fields, field lengths and field types, in particular, relational data, such as the mongab database is json format data, the CVS file is corresponding formatted data, the data is parsed by a data parser, the parsing process is to ignore the source of the data, in particular, a two-stage data normal form structure, table names and 1-N fields are defined, the target data source must convert metadata related to the current data source according to the defined mode, and the target structure is given to a system for unified management, so that the localization of the data is realized.
If the CVS file is parsed, the file contents are as follows:
age of name and school number
Zhao Mou 1 and 16
The parsed data metadata structure is:
table name: student watch
Fields:
name string length 255
Number length 11
Age number length 3
For another example, the json data analysis of mongamb includes:
{ name: "Zhao Mou", num: 1, age: 16}
If the set name is student, the parsed metadata structure is:
table name: student watch
Fields:
name string length 255
Num number length 11
-age number length 3
The invention can intervene in the analysis process because of the data source, such as 'table name' and 'field' are the intervention of the system on the metadata. Because the analysis process has repeatability, the invention respectively defines the resolvers for each type of resource to achieve the aim of repeated use, and reduces the technical cost.
Preferably, the step 4 includes: the step 4 comprises the following steps: step 41, creating a data governance scheme, wherein the data governance scheme comprises a governance data table, a governance range, a governance period, a governance target score, an inspection rule and a data governance end; step 42, creating a data inspection rule according to the field of the managed data table and a total score of each inspection rule; step 43, the local database registers the dispatching task according to the treatment period, dispatches the data checking task, and executes the data checking task at the next time point of the treatment period; step 44, the local database judges that the current treatment scheme has an inspection task, and jumps out of the inspection task; if judging that the current treatment scheme does not have the checking task, starting the checking task, and locking the state of the current data treatment scheme as executing; step 45, completing an inspection task, and generating a data quality inspection report of data which does not pass inspection, wherein the data quality inspection report comprises an inspection score and an isolation data judgment result; 46, if the isolated data exist or the check score does not reach the treatment target score, generating a treatment task and distributing the treatment task to a data treatment end, if the current treatment scheme has incomplete treatment tasks, updating the details of the data which do not pass the check to the current treatment task; and if the control task does not exist, newly creating the control task.
In a specific embodiment, the task scheduler of the local database starts a scheduling task according to the configured governance period in a state that the data governance scheme is opened, executes an index set designated by the data governance scheme at a next execution time point of the governance period, performs inspection using the configured inspection rule, then generates a data quality inspection report including inspection scores and isolation data judgment results, generates a governance task to be distributed to the governance client if the inspection scores of the data quality inspection report do not reach a set target score or isolation data exists, and thenAfter one period is completed, checking whether an unfinished governance task exists in the current period, if the unfinished governance task exists, modifying error data information of governance data into the current governance task, and calibrating the governance task to be expected to be unprocessed by the governance task, immediately executing next data quality check by a data governance terminal if the data governance terminal is processed in time, and calculating governance efficiency of this time, wherein governance efficiency= (current score-last score)/(100-first check score of the current governance period)100; if there are no outstanding abatement tasks, a new abatement task is created. The stopping conditions of the treatment task are as follows: the check score of the data reaches the target score and no isolated data is generated, so that the reliability, the integrity and the correctness of the data are ensured through data management, and the availability of the data is ensured.
Preferably, the generating method of the check score in step 45 includes: step 451, obtaining the total number of check data in a data table and the number of check rules of a data management scheme; step 452, setting the score of each piece of inspection data when each piece of inspection rule passes, and marking the score of each piece of inspection data under all pieces of inspection rules as the total score of each piece of inspection data; the total score of each piece of inspection data passing through each piece of inspection rule is recorded as the sum of the total scores of all inspection rules of the data treatment scheme; step 453, calculate the inspection score, inspection score= [ sum of total scores of each inspection data/(sum of total scores of all inspection rules of data governance scheme)Total number of data]/>100。
As in the existing data table 1: student's study
-field 1: name of the name
-field 2: num (num)
-field 3: sex (Sex)
The data governance scheme includes the following inspection rules:
checking rule 1: the name cannot be empty, and the checking name field data cannot be empty, and the total score of the checking rule is set to be 30 minutes at the moment;
checking rule 2: the number num length is 8, the checking number length must be equal to 8, and the total score of the checking rule is set to 20 minutes at the moment;
checking rule 3: the sex code sex must be included in the ranges 1. Male, 2. Female, 3. Unknown, at which time the total score of 10 points of the inspection rule is set; the total score of the data governance scheme inspection rule is 50 points;
ten pieces of data in the data table are respectively student1, student2, student3, & gt, and student10, wherein the name of the first piece of data student1 is empty, so that the first piece of data does not pass through the inspection rule 1, and the rest of the inspection passes through all, and the total score of the first piece of data is 50-30=20;
the num length of the second piece of data student2 is 7, if the second piece of data does not pass the inspection rule 2, and the rest of the inspection passes all, the total score of the second piece of data is 50-20=30;
all of the third data student3 to tenth data student10 passed all the checks, and the total score was 50 points;
……
and the like, obtaining the total score of each piece of data;
then the check score = (20+30+50)8)/(50/>10)/>100=90.00;
Each inspection rule score= (sum of scores of all data under the inspection rule/total score of the inspection rule is data total number of pieces) ×100; the method comprises the following steps:
check rule 1 score= (309)/(30/>10)/>100=90.00;
Check rule 2 score= (209)/(20/>10)/>100=90.00;
Check rule 3 score= (1010)/(10/>10)/>100=100.00;
Score percent of the total score of each piece of data = (total score of each piece of data/sum of total scores of all inspection rules)100; the method comprises the following steps:
score percent of total score of first data student1 = (20/50)100=40.00;
Score percent of the total score of the second data student2 = (30/50)100=60.00。
Wherein all scores remain to the last two decimal places.
After the calculation is completed, the total number of pieces of inspection data, the number of pieces of inspection rules, the inspection score, the total score of each piece of data, the score of each piece of inspection rule, the score of percent of the total score of each piece of data, and the inspection details of all the data which do not pass the inspection are recorded into a data quality inspection report. Wherein the inspection details of all data that fails the inspection include: checking row ID, checking failed rule ID, judging whether there is isolating data result. The checking rule is automatically generated according to the data standard applicable to the current data.
Wherein the data quality inspection report further includes an integrity score, a consistency score, a normalization score, a timeliness score, a uniqueness score, and an accuracy score.
Preferably, the step 5 includes: step 51, metadata with check scores reaching treatment target scores is scheduled to be used as source metadata to create a data warehouse; step 52, analyzing the blood-edge relationship of the source metadata by the local metadata engine, and recording the blood-edge relationship of the source metadata.
In specific implementation, the scheduled metadata are metadata with qualified data inspection scores, the system supports real-time capturing metadata and timing capturing metadata to create a data warehouse, the data warehouse is created in groups according to data management, the created data warehouse theme is generally managed in a classified manner according to departments, subsystems and service modules, dimensionality and measurement are created according to user requirements, and a database definition statement for warehouse creation is generated and executed according to integrated data center database types; and analyzing the target metadata through a local data engine to obtain the blood relationship of the target metadata.
Preferably, the process of monitoring and synchronously maintaining in real time by adopting the link in the step 6 includes: step a, creating a CDC mapping table of all source metadata in a data warehouse at a flink server; and b, creating a data warehouse data query statement conforming to the link grammar, acquiring target metadata from a target metadata table and inserting the target metadata into a CDC mapping table.
Preferably, the process of monitoring and synchronously maintaining in real time by adopting the non-flink in the step 6 comprises the following steps:
step A, monitoring a transaction log of a three-party database where target metadata are located;
step B, when the change information is acquired, packaging the target metadata ID and the changed row ID, and sending a packaging message to a data trigger engine;
step C, creating a data triggering rule to register to a data triggering engine;
step D, the data triggering engine packages the IDs of other metadata which can cause the change of the target metadata into an ID list as a triggering source according to the blood-edge relation of the target metadata;
e, registering the ID list to a data trigger engine, and monitoring a data change message in real time;
and F, monitoring the trigger source, and carrying out corresponding data change on the target metadata to complete maintenance of the target metadata.
In a specific embodiment, when a method of real-time monitoring and synchronizing metadata by a link is adopted, after a real-time synchronization strategy is submitted, a system automatically generates a link CDC task according to an original library type and a table structure, issues the link CDC task to a link scheduling task platform and records a task table ID to a synchronous monitoring pool, the system takes a database address, a database name, a table name, a change data ID and a change capacity as parameters, captures data information through an engine, initiates a data change task after the data is changed, carries out localization processing on the data, and changes the data of a target table. When the table monitoring is stopped, the system acquires the synchronous task ID corresponding to the table, and the task is stopped by calling the cancellation interface of the flink platform, and the table is removed from the synchronous pool. When the non-flink real-time monitoring and synchronous maintenance of metadata is adopted, other third-party plug-ins such as spark and debenzam.io or custom plug-ins such as a binlog plug-in of mysql and a clusterig plug-in of mongolib can be adopted, the above mode is different from the flink CDC, the above mode monitors the change of a file in a polling mode to acquire change part information and create a change message to be sent to kafka, and a message monitoring end monitors the message and processes corresponding change tasks when a driver is completed.
Preferably, the step 7 includes: step 71, creating a data set theme of the current user, creating a data set under the corresponding theme, setting a visible range of the data set, and opening the visible range of the data set to the user; step 72, the user determines the required data set in the visible range of the data set, and applies for the data set interface; and 73, setting a data management end corresponding to the data set to open the required data set to a user, and performing desensitization processing on the opened data set.
Preferably, the step 7 further comprises the following steps: and if the user does not see the required data set in the visible range of the data set, initiating a data demand application, and using the existing data set or configuring the corresponding data set for the user by the data management terminal based on the data demand.
In a specific embodiment, the creation of the data set theme is performed according to the use purpose of the data, and the theme classification management is generally performed according to the use group, such as students, teachers and leaders, and the data set is created under the corresponding data set theme. The data set may be repeatedly opened for use by the user, which mainly configures the source of the data, and provides the data items. The user may set a defined condition, the system selects a visible range of the data set according to the condition set by the user, and opens an interface. When a user finds a usable data set in the visible range of the data set, the user can apply for the interface to use the data set, wherein the user needs to select a system call mode or a personal use mode, the personal use mode only provides preview data and downloading mode, the system use mode can be used in an interface call mode, and the data set is applied for the data set to desensitize an output field by a data management end of the data set and then an interface is opened to a third party system.
If the needed data is not found in the existing data set, the user can submit the data requirement, so that a data interface is applied, the data technical department receives the request and describes according to the requirement, the existing or configured data set is used, a data interface is generated for the user after approval by the data management end, the data set is provided for the user, and the user previews the data or beats out the details of the data interface through the data interface on the data requirement interface.
The data management end not only sets whether to open an interface and desensitize the processing, but also can configure a flow control strategy of a data structure to prevent the interface from being maliciously accessed by legal persons, thereby ensuring the safety of data.
The invention also comprises a visual safe real-time data warehouse system corresponding to the visual safe real-time data warehouse implementation method, which comprises the following steps:
the local database initializing module is used for initializing a local database;
the metadata acquisition module is used for acquiring target metadata in the three-party database, analyzing the target metadata and localizing the target metadata to the local database;
the data right determining module is used for acquiring the right of the target metadata;
the data scoring module is used for evaluating the quality of the target metadata and scoring the target metadata;
the data warehouse creating module is used for dispatching the target metadata with qualified scores to create a data warehouse and analyzing the blood relationship of the target metadata;
the data monitoring and synchronizing module is used for configuring a data synchronizing strategy, monitoring and synchronously maintaining target metadata and other metadata with blood relationship with the target metadata;
and the data set module is used for classifying metadata in the data warehouse and creating a corresponding data set for the use of the three-party user.
Finally, it should be noted that: the above embodiments are merely preferred embodiments of the present invention for illustrating the technical solution of the present invention, but not limiting the scope of the present invention; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions; that is, even though the main design concept and spirit of the present invention is modified or finished in an insubstantial manner, the technical problem solved by the present invention is still consistent with the present invention, and all the technical problems are included in the protection scope of the present invention; in addition, the technical scheme of the invention is directly or indirectly applied to other related technical fields, and the technical scheme is included in the scope of the invention.

Claims (10)

1. A method for implementing a visual secure real-time data warehouse, comprising the steps of:
step 1, initializing a local database;
step 2, collecting target metadata in a three-party database, analyzing the target metadata and localizing the target metadata to a local database;
step 3, acquiring the authority of the target metadata;
step 4, evaluating the quality of the target metadata and grading;
step 5, scheduling the target metadata with qualified scores for creating a data warehouse, and analyzing the blood-margin relation of the target metadata;
step 6, configuring a data synchronization strategy, and monitoring and synchronously maintaining target metadata and other metadata with blood relationship with the target metadata;
and 7, classifying metadata in the data warehouse and creating a corresponding data set for the three-party user.
2. A method for implementing a visual secure real-time data warehouse according to claim 1, wherein said step 2 comprises: acquiring target metadata in a three-party database; parsing the data table and the field of the target metadata based on a metadata parser; and calling a metadata adapter of the local database, and creating a local data table corresponding to the data table of the parsed target metadata in the local database.
3. A method for implementing a visual secure real-time data warehouse according to claim 1, wherein said step 4 comprises: step 41, creating a data governance scheme, wherein the data governance scheme comprises a governance data table, a governance range, a governance period, a governance target score and a data governance end; step 42, creating a data inspection rule according to the field of the managed data table and a total score of each inspection rule; step 43, the local database registers the dispatching task according to the treatment period, dispatches the data checking task, and executes the data checking task at the next time point of the treatment period; step 44, the local database judges that the current treatment scheme has an inspection task, and jumps out of the inspection task; if judging that the current treatment scheme does not have the checking task, starting the checking task, and locking the state of the current data treatment scheme as executing; step 45, completing an inspection task, and generating a data quality inspection report of data which does not pass inspection, wherein the data quality inspection report comprises an inspection score and an isolation data judgment result; 46, if the isolated data exist or the check score does not reach the treatment target score, generating a treatment task and distributing the treatment task to a data treatment end, if the current treatment scheme has incomplete treatment tasks, updating the details of the data which do not pass the check to the current treatment task; and if the control task does not exist, newly creating the control task.
4. A method of implementing a visual secure real-time data warehouse as defined in claim 3, wherein said method of generating the inspection score of step 45 comprises: step 451, obtaining the total number of check data in a data table and the number of check rules of a data management scheme; step 452, setting the score of each piece of inspection data when each piece of inspection rule passes, and marking the score of each piece of inspection data under all pieces of inspection rules as the total score of each piece of inspection data; the total score of each piece of inspection data passing through each piece of inspection rule is recorded as the sum of the total scores of all inspection rules of the data treatment scheme; step 453, calculate the inspection score, inspection score= [ sum of total scores of each inspection data/(sum of total scores of all inspection rules of data governance scheme)Total number of data]/>100。
5. A method of implementing a visual secure real-time data warehouse as defined in claim 3, wherein said step 5 comprises: step 51, metadata with check scores reaching treatment target scores is scheduled to be used as source metadata to create a data warehouse; step 52, analyzing the blood-edge relationship of the source metadata by the local metadata engine, and recording the blood-edge relationship of the source metadata.
6. The method for implementing a visual secure real-time data warehouse as claimed in claim 1, wherein the step 6 of using a flink real-time monitoring and synchronous maintenance process comprises: step a, creating a CDC mapping table of all source metadata in a data warehouse at a flink server; and b, creating a data warehouse data query statement conforming to the link grammar, acquiring target metadata from a target metadata table and inserting the target metadata into a CDC mapping table.
7. The method for implementing a visual secure real-time data warehouse as claimed in claim 1, wherein the step 6 of monitoring and synchronously maintaining the data in real time by using a non-flink comprises the following steps:
step A, monitoring a transaction log of a three-party database where target metadata are located;
step B, when the change information is acquired, packaging the target metadata ID and the changed row ID, and sending a packaging message to a data trigger engine;
step C, creating a data triggering rule to register to a data triggering engine;
step D, the data triggering engine packages the IDs of other metadata which can cause the change of the target metadata into an ID list as a triggering source according to the blood-edge relation of the target metadata;
e, registering the ID list to a data trigger engine, and monitoring a data change message in real time;
and F, monitoring the trigger source, and carrying out corresponding data change on the target metadata to complete maintenance of the target metadata.
8. A method of implementing a visual secure real-time data warehouse as defined in claim 1, wherein said step 7 comprises: step 71, creating a data set theme of the current user, creating a data set under the corresponding theme, setting a visible range of the data set, and opening the visible range of the data set to the user; step 72, the user determines the required data set in the visible range of the data set, and applies for the data set interface; and 73, setting a data management end corresponding to the data set to open the required data set to a user, and performing desensitization processing on the opened data set.
9. The method for implementing a visual secure real-time data warehouse of claim 6, wherein said step 7 further comprises the steps of: and if the user does not see the required data set in the visible range of the data set, initiating a data demand application, and using the existing data set or configuring the corresponding data set for the user by the data management terminal based on the data demand.
10. A system for visualizing a secure real-time data warehouse, comprising:
the local database initializing module is used for initializing a local database;
the metadata acquisition module is used for acquiring target metadata in the three-party database, analyzing the target metadata and localizing the target metadata to the local database;
the data right determining module is used for acquiring the right of the target metadata;
the data scoring module is used for evaluating the quality of the target metadata and scoring the target metadata;
the data warehouse creating module is used for dispatching the target metadata with qualified scores to create a data warehouse and analyzing the blood relationship of the target metadata;
the data monitoring and synchronizing module is used for configuring a data synchronizing strategy, monitoring and synchronously maintaining target metadata and other metadata with blood relationship with the target metadata;
and the data set module is used for classifying metadata in the data warehouse and creating a corresponding data set for the use of the three-party user.
CN202410119047.3A 2024-01-29 2024-01-29 Visual safe real-time data warehouse implementation method and system Active CN117648388B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410119047.3A CN117648388B (en) 2024-01-29 2024-01-29 Visual safe real-time data warehouse implementation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410119047.3A CN117648388B (en) 2024-01-29 2024-01-29 Visual safe real-time data warehouse implementation method and system

Publications (2)

Publication Number Publication Date
CN117648388A true CN117648388A (en) 2024-03-05
CN117648388B CN117648388B (en) 2024-04-12

Family

ID=90046326

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410119047.3A Active CN117648388B (en) 2024-01-29 2024-01-29 Visual safe real-time data warehouse implementation method and system

Country Status (1)

Country Link
CN (1) CN117648388B (en)

Citations (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150193709A1 (en) * 2014-01-06 2015-07-09 Energica Advisory Services Pvt . Ltd. System and method for it sourcing management and governance covering multi geography, multi sourcing and multi vendor environments
CN109739893A (en) * 2018-12-28 2019-05-10 上海连尚网络科技有限公司 A kind of metadata management method, equipment and computer-readable medium
CN110232098A (en) * 2019-04-22 2019-09-13 汇通达网络股份有限公司 A kind of data warehouse administered based on data and genetic connection designs
CN111324602A (en) * 2020-02-21 2020-06-23 上海软中信息技术有限公司 Method for realizing financial big data oriented analysis visualization
CN112256782A (en) * 2020-10-30 2021-01-22 内蒙古电力(集团)有限责任公司乌海超高压供电局 Electric power big data processing system based on Hadoop
CN112422234A (en) * 2020-11-06 2021-02-26 应急管理部通信信息中心 Data management service method for self-adaptive deep learning based on time perception
CN112766676A (en) * 2021-01-08 2021-05-07 深圳市酷开网络科技股份有限公司 Closed-loop data quality control method and device, terminal equipment and storage medium
CN112765245A (en) * 2020-12-31 2021-05-07 广西中科曙光云计算有限公司 Electronic government affair big data processing platform
CN113934868A (en) * 2021-10-14 2022-01-14 山东亿云信息技术有限公司 Government affair big data management method and system
CN114780525A (en) * 2022-04-06 2022-07-22 厦门知彦信息科技有限公司 Data governance platform for full life cycle of education field
CN114925045A (en) * 2022-04-11 2022-08-19 杭州半云科技有限公司 PaaS platform for large data integration and management
US20220276920A1 (en) * 2021-03-01 2022-09-01 Ab Initio Technology Llc Generation and execution of processing workflows for correcting data quality issues in data sets
CN114996247A (en) * 2022-04-22 2022-09-02 华能澜沧江水电股份有限公司 Large-scale drainage basin hydropower enterprise data management method
CA3177585A1 (en) * 2021-04-16 2022-10-16 Strong Force Vcn Portfolio 2019, Llc Systems, methods, kits, and apparatuses for digital product network systems and biology-based value chain networks
CN115543973A (en) * 2022-09-19 2022-12-30 北京三维天地科技股份有限公司 Data quality rule recommendation method based on knowledge spectrogram and machine learning
CN115658785A (en) * 2022-10-10 2023-01-31 天元大数据信用管理有限公司 Financial subject bin construction method, device and medium for government affair data
CN115934855A (en) * 2022-11-29 2023-04-07 广发银行股份有限公司 Full-link field level blood margin analysis method, system, equipment and storage medium
US20230171274A1 (en) * 2021-11-30 2023-06-01 International Business Machines Corporation System and method to perform governance on suspicious activity detection pipeline in risk networks
CN116303822A (en) * 2023-03-03 2023-06-23 上海时代光华教育发展有限公司 Data warehouse management method, device, computer equipment and storage medium
CN116680354A (en) * 2023-06-07 2023-09-01 合肥国轩高科动力能源有限公司 Metadata management method and system for lithium battery production and manufacturing industry
CN116821098A (en) * 2023-07-05 2023-09-29 乾升利信息技术(上海)有限公司 Data warehouse management method, service system and storage medium
US20230325829A1 (en) * 2021-11-23 2023-10-12 Strong Force TX Portfolio 2018, LLC Enterprise data set exchanges
CN116991931A (en) * 2023-08-29 2023-11-03 中数通信息有限公司 Metadata management method and system
CN117056308A (en) * 2023-08-11 2023-11-14 苏银凯基消费金融有限公司 Method for generating financial big data blood-edge relation based on OpenLinear database
CN117271477A (en) * 2023-10-11 2023-12-22 蓝卓数字科技有限公司 Method for metadata acquisition and data blood-margin analysis in lake and warehouse integrated system
CN117349368A (en) * 2023-09-13 2024-01-05 中交武汉智行国际工程咨询有限公司 Cross-database data real-time synchronous task management system and method based on Flink

Patent Citations (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150193709A1 (en) * 2014-01-06 2015-07-09 Energica Advisory Services Pvt . Ltd. System and method for it sourcing management and governance covering multi geography, multi sourcing and multi vendor environments
CN109739893A (en) * 2018-12-28 2019-05-10 上海连尚网络科技有限公司 A kind of metadata management method, equipment and computer-readable medium
CN110232098A (en) * 2019-04-22 2019-09-13 汇通达网络股份有限公司 A kind of data warehouse administered based on data and genetic connection designs
CN111324602A (en) * 2020-02-21 2020-06-23 上海软中信息技术有限公司 Method for realizing financial big data oriented analysis visualization
CN112256782A (en) * 2020-10-30 2021-01-22 内蒙古电力(集团)有限责任公司乌海超高压供电局 Electric power big data processing system based on Hadoop
CN112422234A (en) * 2020-11-06 2021-02-26 应急管理部通信信息中心 Data management service method for self-adaptive deep learning based on time perception
CN112765245A (en) * 2020-12-31 2021-05-07 广西中科曙光云计算有限公司 Electronic government affair big data processing platform
CN112766676A (en) * 2021-01-08 2021-05-07 深圳市酷开网络科技股份有限公司 Closed-loop data quality control method and device, terminal equipment and storage medium
US20220276920A1 (en) * 2021-03-01 2022-09-01 Ab Initio Technology Llc Generation and execution of processing workflows for correcting data quality issues in data sets
CA3177585A1 (en) * 2021-04-16 2022-10-16 Strong Force Vcn Portfolio 2019, Llc Systems, methods, kits, and apparatuses for digital product network systems and biology-based value chain networks
CN113934868A (en) * 2021-10-14 2022-01-14 山东亿云信息技术有限公司 Government affair big data management method and system
US20230325829A1 (en) * 2021-11-23 2023-10-12 Strong Force TX Portfolio 2018, LLC Enterprise data set exchanges
US20230171274A1 (en) * 2021-11-30 2023-06-01 International Business Machines Corporation System and method to perform governance on suspicious activity detection pipeline in risk networks
CN114780525A (en) * 2022-04-06 2022-07-22 厦门知彦信息科技有限公司 Data governance platform for full life cycle of education field
CN114925045A (en) * 2022-04-11 2022-08-19 杭州半云科技有限公司 PaaS platform for large data integration and management
CN114996247A (en) * 2022-04-22 2022-09-02 华能澜沧江水电股份有限公司 Large-scale drainage basin hydropower enterprise data management method
CN115543973A (en) * 2022-09-19 2022-12-30 北京三维天地科技股份有限公司 Data quality rule recommendation method based on knowledge spectrogram and machine learning
CN115658785A (en) * 2022-10-10 2023-01-31 天元大数据信用管理有限公司 Financial subject bin construction method, device and medium for government affair data
CN115934855A (en) * 2022-11-29 2023-04-07 广发银行股份有限公司 Full-link field level blood margin analysis method, system, equipment and storage medium
CN116303822A (en) * 2023-03-03 2023-06-23 上海时代光华教育发展有限公司 Data warehouse management method, device, computer equipment and storage medium
CN116680354A (en) * 2023-06-07 2023-09-01 合肥国轩高科动力能源有限公司 Metadata management method and system for lithium battery production and manufacturing industry
CN116821098A (en) * 2023-07-05 2023-09-29 乾升利信息技术(上海)有限公司 Data warehouse management method, service system and storage medium
CN117056308A (en) * 2023-08-11 2023-11-14 苏银凯基消费金融有限公司 Method for generating financial big data blood-edge relation based on OpenLinear database
CN116991931A (en) * 2023-08-29 2023-11-03 中数通信息有限公司 Metadata management method and system
CN117349368A (en) * 2023-09-13 2024-01-05 中交武汉智行国际工程咨询有限公司 Cross-database data real-time synchronous task management system and method based on Flink
CN117271477A (en) * 2023-10-11 2023-12-22 蓝卓数字科技有限公司 Method for metadata acquisition and data blood-margin analysis in lake and warehouse integrated system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
IKBAL TALEB: "Big data quality framework: a holistic approach to continuous quality management", 《JOURNAL OF BIG DATA》, vol. 8, 29 May 2021 (2021-05-29), pages 1 - 41 *
刘彦军: "大数据时代F铁路公司数据治理体系研究", 《全文硕士学位论文全文数据库 工程科技》, 15 July 2023 (2023-07-15), pages 033 - 26 *

Also Published As

Publication number Publication date
CN117648388B (en) 2024-04-12

Similar Documents

Publication Publication Date Title
US8943059B2 (en) Systems and methods for merging source records in accordance with survivorship rules
US9195952B2 (en) Systems and methods for contextual mapping utilized in business process controls
CN106792630B (en) Method and system for realizing mobile network service opening
US20090271351A1 (en) Rules engine test harness
EP3683683A1 (en) Test cycle optimization using contextual association mapping
CN114925045A (en) PaaS platform for large data integration and management
CN111381940B (en) Distributed data processing method and device
JP6419667B2 (en) Test DB data generation method and apparatus
CN112667619B (en) Method, device, terminal equipment and storage medium for auxiliary checking data
CN114238463A (en) Calculation engine control method and device for distributed index calculation
CN117648388B (en) Visual safe real-time data warehouse implementation method and system
US20070033178A1 (en) Quality of service feedback for technology-neutral data reporting
CN112784273A (en) SQL risk identification method, device and equipment
CN112965793B (en) Identification analysis data-oriented data warehouse task scheduling method and system
CN115344633A (en) Data processing method, device, equipment and storage medium
CN114757805A (en) Block chain-based convenient government affair service system
CN115168297A (en) Bypassing log auditing method and device
CN111651167A (en) Method and device for identifying dependency relationship of scheduling task and computer readable storage medium
CN117421153B (en) Automatic testing system and method for railway wagon fault image recognition model
CN117331926B (en) Data auditing method and device, electronic equipment and storage medium
CN112561368B (en) Visual performance calculation method and device for OA approval system
CN118012832A (en) Log processing method and device, electronic equipment and readable storage medium
CN117909395A (en) Platform data reporting service method
CN117391197A (en) Rule reasoning method and system for compliance judgment
CN116775703A (en) Data processing method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant