CN116860741A - Automatic data standard checking and synchronizing system and method based on message queue - Google Patents
Automatic data standard checking and synchronizing system and method based on message queue Download PDFInfo
- Publication number
- CN116860741A CN116860741A CN202311107792.8A CN202311107792A CN116860741A CN 116860741 A CN116860741 A CN 116860741A CN 202311107792 A CN202311107792 A CN 202311107792A CN 116860741 A CN116860741 A CN 116860741A
- Authority
- CN
- China
- Prior art keywords
- data
- checking
- pushing
- verification
- standard
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000012545 processing Methods 0.000 claims abstract description 25
- 230000000694 effects Effects 0.000 claims abstract description 9
- 238000012795 verification Methods 0.000 claims description 64
- 238000007689 inspection Methods 0.000 claims description 52
- 238000012360 testing method Methods 0.000 claims description 27
- 230000008859 change Effects 0.000 claims description 14
- 238000013524 data verification Methods 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 10
- 230000001360 synchronised effect Effects 0.000 claims description 10
- 238000001514 detection method Methods 0.000 claims description 8
- 230000008439 repair process Effects 0.000 claims description 7
- 230000002159 abnormal effect Effects 0.000 claims description 6
- 230000000977 initiatory effect Effects 0.000 claims description 6
- 238000013481 data capture Methods 0.000 claims description 5
- 230000005856 abnormality Effects 0.000 claims description 4
- 238000004806 packaging method and process Methods 0.000 claims description 4
- 238000009472 formulation Methods 0.000 claims 1
- 239000000203 mixture Substances 0.000 claims 1
- 238000004891 communication Methods 0.000 abstract description 3
- 230000001960 triggered effect Effects 0.000 description 4
- 238000010276 construction Methods 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000005669 field effect Effects 0.000 description 1
- 210000001503 joint Anatomy 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/226—Validation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/54—Indexing scheme relating to G06F9/54
- G06F2209/548—Queue
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Quality & Reliability (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses a data standard automatic checking and synchronizing system and method based on a message queue, wherein the system comprises a data standard making module, a local checking program module, a database setting module, a data checking message queue module, an automatic checking program module, a web page tool module, a data synchronizing message queue module and a data center database module eight-big functional module; the automatic checking and synchronizing system and method provided by the invention are used for automating the data quality checking, and the instant feedback of the automatic checking and webpage tools to the checking result is used for improving the efficiency of data checking and problem feedback communication, guiding the quality of data to be ensured by the data pushing party, greatly reducing the labor and time cost of the data center building party, improving the efficiency of data center building and data standard pushing, improving the data quality and ensuring the effect of subsequent data processing and use.
Description
Technical Field
The invention relates to the technical field of data inspection, in particular to a data standard automatic inspection and synchronization system and method based on a message queue.
Background
The most important work of the data center in the construction unit is represented in four aspects: the data standard is formulated and released, data inspection, data warehouse entry and data processing, wherein the data inspection is usually performed manually by a large data processor in the prior art, and is the part consuming the most manpower and time in the continuous operation process of the data center, and professional data processors are required to perform quality inspection on each piece of pushed data for a long time, so that the data is communicated with a data pushing party to process the data which does not reach the standard. From standard establishment and issuing, data joint debugging to first inspection and standard warehousing, two to three months are usually required, the data is influenced by the quantity of data fields to be pushed, the time consumption of the data fields is longer, human errors or other reasons can exist in the process from the successful initial pushing to the continuous subsequent pushing, the quality of the warehousing data is poor, and the data pushing efficiency and the data processing and using effect are finally influenced.
For a data center construction unit, only one pushing party needing to be in butt joint often cannot use the traditional manual verification to ensure the data quality, and consumed manpower and time are multiplied.
Disclosure of Invention
The invention provides a data standard automatic checking and synchronizing system and method based on a message queue, which are used for solving the problems that the traditional manual checking ensures the data quality and the consumed manpower and time are multiplied.
In a first aspect, a message queue based data standard automatic verification and synchronization system includes the following functional modules:
the data standard making module: based on data acquisition and processing use requirements, formulating a basic data pushing standard;
local verification program module: the method comprises the steps that the method is provided for a pushing party, and before formal pushing, whether pushing logic and pushing results of the method are correct or not is locally checked;
a pre-database module: automatically checking a warehouse for data quality;
a data check message queue module: using a CDC database change data capture component to identify and capture changes made to the data in the pre-database data push table and issue the newly added and updated data to the message queue in real time;
automatic checking program module: checking the data push table based on the data standard requirement;
the web page tool module: the auxiliary data pushing party checks, processes and repairs the substandard data;
a data synchronization message queue module: identifying and capturing data inserted in a data inspection result table of the front database by using a CDC database change data capturing component, and publishing the newly added data to a message queue in real time;
a data center library module: and storing all data pushed by the data pushing party and checking the passed data for subsequent data processing and use.
Further, the data standard making module basic data pushing standard comprises: the method comprises the following steps of data table structure, field content, field constraint, field format specification, value range, dictionary value drawing corresponding to dictionary type, table data association relation, pushing step and mode; the formulated data standard is used for informing the data pushing party of the data detailed pushing standard and is also the basis of the automatic checking program checking logic.
Further, the local verification program module verifies compliance of the same batch of data, including: structure, content, field constraints, field format specifications, value range, dictionary values, master-slave table associations within the same batch.
Further, the pre-database module opens a pre-database for each data pusher, where the pre-database comprises four parts of data tables:
data push table: establishing a data pushing party for pushing data based on the data standard requirement, wherein the data pushing party comprises a main table data content, a main key id and a data source identification field;
data checking result table: recording data automatic test results, wherein each data pushing table is provided with a corresponding data test result table, and the structure of the data pushing result table comprises fields: table name, primary key value, check time, whether check pass, fail cause, valid, and push to data center library;
check log record table: recording a verification log, comprising: the message queue automatically checks the log, manually triggers the check log, regularly checks the log, records the time consumption of each check, and whether the execution is successful or not and the exception information are recorded, and is used for analyzing the performance of an automatic effect program and checking the exception problem;
submitting a check list: the field is included: the table name, the primary key name and the primary key value, the data pushing party can trigger the checking of the checking program data by inserting the data submitted to the checking table.
Further, the automatic checking program module checks the data push table, and the checking content includes: data push table structure, content, field constraint, field format specification, value range, dictionary value, master-slave table association relation and pre-association data; the method also comprises a plurality of inspection modes, namely, streaming real-time inspection and batch multi-thread inspection, and comprises the following steps:
real-time streaming inspection of message queues by consumer data verification: the method is characterized in that the data written into a pre-database by a pushing party is checked in real time by streaming processing based on a message queue in a main checking mode;
batch verification based on commit verification tables: under the condition of complex multi-table association, after all the master-slave association tables are pushed, the data pushing party triggers batch inspection of the corresponding master-slave association data by inserting and submitting inspection table data;
verification of manual trigger table or data using web tool: inquiring the test results of all the data push tables, and initiating rechecking aiming at single data or initiating batch rechecking of all the tables aiming at the tables;
timing batch rechecking: and (3) carrying out timing batch rechecking on all the data which are not checked and are not checked in the pre-database, and carrying out timing execution on the timing tasks to solve the problems of missing detection, false detection, program service abnormality and correlation table check failure caused by push party data push sequence.
On the other hand, the data standard automatic checking and synchronizing method based on the message queue is realized by a data standard automatic checking and synchronizing system based on the message queue, and comprises the following steps:
step S1: determining the data range and the content of a data center, combing the data association relationship and the list of table fields, and making a data standard;
step S2: and configuring a data verification program, and configuring a data automatic verification program based on the formulated data standard. Configuring a table to be checked in a data automatic checking program, wherein the table comprises data quality items to be checked, such as a table structure, fields, value specifications, value fields, field association relations, data service association relations and the like;
step S3: data pushing, wherein a data pushing unit realizes data pushing according to a standard based on a issued data pushing standard;
step S4: the data pushing unit automatically checks;
step S5: automatically generating a data quality analysis report at regular intervals based on the verification result data in the verification result table;
step S6: and processing and using the data synchronized to the data center.
Further, the step S1 data standard includes: data table and field definition, field value format specification and dictionary value range constraint, necessary filling item and partial condition necessary filling item, association relation between table and field, business logic sequence between table and table, code table definition, related national standard file reference description and data writing logic description.
Further, the step S2 further includes the following substeps:
step S21: packaging the configured data automatic checking program into an offline local checking program for local self-checking of a data pushing party;
step S22: and deploying the configured data automatic verification program into an online automatic verification service, and providing online instant verification capability when the data is pushed.
Further, the step S3 further includes the following substeps:
step S31: using a local verification program to verify whether the push data preliminarily reach the standard;
step S32: and pushing the data which is up to the standard in the local verification to a data table corresponding to the front database.
Further, the step S4 further includes the following substeps:
step S41: when the data pushing unit pushes data to the front-end processor data table, the database change data capturing component recognizes and captures data inserted or updated in the front-end database data table and issues the data to the data verification message queue;
step S42: the online automatic verification program consumes the data verification message queue to obtain data to be verified, verifies according to the configured verification items, and outputs a verification result to a verification result table;
step S43: the data inserted in the test result table is identified and captured by the database change data capturing component and is released to the data synchronous message queue;
step S44: the data synchronization message queue is used for synchronizing the data which pass the inspection and reach the quality standard to the data center library, and the corresponding data state of the write-back front machine is successful in synchronization;
step S45: and (3) timing automatic full-library rechecking by the automatic checking program to compensate for failure of checking the data association relation caused by the data pushing sequence. Outputting the checking result of the automatic full-library re-checking to a checking result table, and continuing to step S43;
step S46: the data pushing party queries and processes the webpage tool through the abnormal data, checks the data which is not checked, repairs the data, and can manually trigger single data rechecking or whole-table data batch rechecking on the webpage tool after repairing, and the check result is output to the check result table to continue step S43.
The invention has the beneficial effects that: the invention provides a data standard automatic checking and synchronizing system and method based on a message queue, wherein the system comprises a data standard making module, a local checking program module, a database setting module, a data checking message queue module, an automatic checking program module, a web page tool module, a data synchronizing message queue module and a data center database module; the automatic checking and synchronizing system and method provided by the invention are used for automating the data quality checking, and the instant feedback of the automatic checking and webpage tools to the checking result is used for improving the efficiency of data checking and problem feedback communication, guiding the quality of data to be ensured by the data pushing party, greatly reducing the labor and time cost of the data center building party, improving the efficiency of data center building and data standard pushing, improving the data quality and ensuring the effect of subsequent data processing and use.
Drawings
FIG. 1 is a diagram of an automated inspection and synchronization system architecture for message queue based data standards in accordance with the present invention;
FIG. 2 is a flow chart of the message queue based data standard auto-verification and synchronization method of the present invention.
Detailed Description
For a clearer understanding of technical features, objects, and effects of the present invention, a specific embodiment of the present invention will be described with reference to the accompanying drawings.
The invention provides a data standard automatic checking and synchronizing system and method based on a message queue, wherein the data standard automatic checking and synchronizing system based on the message queue is shown in figure 1 and comprises the following functional modules:
the data standard making module: based on data acquisition and processing use requirements, formulating a basic data pushing standard;
local verification program module: the method comprises the steps that the method is provided for a pushing party, and before formal pushing, whether pushing logic and pushing results of the method are correct or not is locally checked;
a pre-database module: automatically checking a warehouse for data quality;
a data check message queue module: using a CDC database change data capture component to identify and capture changes made to the data in the pre-database data push table and issue the newly added and updated data to the message queue in real time;
automatic checking program module: checking the data push table based on the data standard requirement;
the web page tool module: the auxiliary data pushing party checks, processes and repairs the substandard data;
a data synchronization message queue module: identifying and capturing data inserted in a data inspection result table of the front database by using a CDC database change data capturing component, and publishing the newly added data to a message queue in real time;
a data center library module: and storing all data pushed by the data pushing party and checking the passed data for subsequent data processing and use.
The data standard making module basic data pushing standard comprises the following steps: the method comprises the following steps of data table structure, field content, field constraint, field format specification, value range, dictionary value drawing corresponding to dictionary type, table data association relation, pushing step and mode; the formulated data standard is used for informing the data pushing party of the data detailed pushing standard and is also the basis of the automatic checking program checking logic. The local verification program module verifies compliance of the same batch of data, and comprises the following steps: structure, content, field constraints, field format specifications, value range, dictionary values, master-slave table associations within the same batch.
The pre-database module opens a pre-database for each data pusher, wherein the pre-database comprises four parts of data tables:
data push table: establishing a data pushing party for pushing data based on the data standard requirement, wherein the data pushing party comprises a main table data content, a main key id and a data source identification field;
data checking result table: recording data automatic test results, wherein each data pushing table is provided with a corresponding data test result table, and the structure of the data pushing result table comprises fields: table name, primary key value, check time, whether check pass, fail cause, valid, and push to data center library;
check log record table: recording a verification log, comprising: the message queue automatically checks the log, manually triggers the check log, regularly checks the log, records the time consumption of each check, and whether the execution is successful or not and the exception information are recorded, and is used for analyzing the performance of an automatic effect program and checking the exception problem;
submitting a check list: the field is included: the table name, the primary key name and the primary key value, and the data pushing party can trigger the checking program to check the data by inserting the data submitted to the checking table.
The automatic checking program module checks the data push table, and the checking content comprises: data push table structure, content, field constraint, field format specification, value range, dictionary value, master-slave table association relation and pre-association data; the method also comprises a plurality of inspection modes, namely, streaming real-time inspection and batch multi-thread inspection, and comprises the following steps:
real-time streaming inspection of message queues by consumer data verification: the method is characterized in that the data written into a pre-database by a pushing party is checked in real time by streaming processing based on a message queue in a main checking mode;
batch verification based on commit verification tables: under the condition of complex multi-table association, after all the master-slave association tables are pushed, the data pushing party triggers batch inspection of the corresponding master-slave association data by inserting and submitting inspection table data;
verification of manual trigger table or data using web tool: inquiring the test results of all the data push tables, and initiating rechecking aiming at single data or initiating batch rechecking of all the tables aiming at the tables;
timing batch rechecking: and (3) carrying out timing batch rechecking on all the data which are not checked and are not checked in the pre-database, and carrying out timing execution on the timing tasks to solve the problems of missing detection, false detection, program service abnormality and correlation table check failure caused by push party data push sequence.
On the other hand, the data standard automatic checking and synchronizing method based on the message queue is realized by a data standard automatic checking and synchronizing system based on the message queue, as shown in fig. 2, and comprises the following steps:
step S1: determining the data range and the content of a data center, combing the data association relationship and the list of table fields, and making a data standard;
step S2: and configuring a data verification program, and configuring a data automatic verification program based on the formulated data standard. Configuring a table to be checked in a data automatic checking program, wherein the table comprises data quality items to be checked, such as a table structure, fields, value specifications, value fields, field association relations, data service association relations and the like;
step S3: data pushing, wherein a data pushing unit realizes data pushing according to a standard based on a issued data pushing standard;
step S4: the data pushing unit automatically checks;
step S5: automatically generating a data quality analysis report at regular intervals based on the verification result data in the verification result table;
step S6: and processing and using the data synchronized to the data center.
The step S1 data standard comprises the following steps: data table and field definition, field value format specification and dictionary value range constraint, necessary filling item and partial condition necessary filling item, association relation between table and field, business logic sequence between table and table, code table definition, related national standard file reference description and data writing logic description.
The step S2 further comprises the sub-steps of:
step S21: packaging the configured data automatic checking program into an offline local checking program for local self-checking of a data pushing party;
step S22: and deploying the configured data automatic verification program into an online automatic verification service, and providing online instant verification capability when the data is pushed.
The step S3 further comprises the sub-steps of:
step S31: using a local verification program to verify whether the push data preliminarily reach the standard;
step S32: and pushing the data which is up to the standard in the local verification to a data table corresponding to the front database.
The step S4 further comprises the sub-steps of:
step S41: when the data pushing unit pushes data to the front-end processor data table, the database change data capturing component recognizes and captures data inserted or updated in the front-end database data table and issues the data to the data verification message queue;
step S42: the online automatic verification program consumes the data verification message queue to obtain data to be verified, verifies according to the configured verification items, and outputs a verification result to a verification result table;
step S43: the data inserted in the test result table is identified and captured by the database change data capturing component and is released to the data synchronous message queue;
step S44: the data synchronization message queue is used for synchronizing the data which pass the inspection and reach the quality standard to the data center library, and the corresponding data state of the write-back front machine is successful in synchronization;
step S45: and (3) timing automatic full-library rechecking by the automatic checking program to compensate for failure of checking the data association relation caused by the data pushing sequence. Outputting the checking result of the automatic full-library re-checking to a checking result table, and continuing to step S43;
step S46: the data pushing party queries and processes the webpage tool through the abnormal data, checks the data which is not checked, repairs the data, and can manually trigger single data rechecking or whole-table data batch rechecking on the webpage tool after repairing, and the check result is output to the check result table to continue step S43.
In this embodiment, the data standard automatic checking and synchronizing program based on the message queue includes eight functional modules, such as a data standard making, a local checking program, a pre-database, a data checking message queue, an automatic checking program, a web tool, a data synchronizing message queue, and a data center library, as shown in fig. 1.
And formulating a data standard, namely formulating a basic data pushing standard based on data acquisition and processing use requirements, wherein the formulated data standard comprises a data table structure, field content, field constraint, field format specification, a value range, dictionary type corresponding dictionary value establishment, table data association relation, pushing steps, a pushing mode and the like, and is used for informing a data pushing party of the data detailed pushing standard and is also a basis for automatically checking program checking logic.
The local verification program is a simplified version of an automatic verification program, and is used for providing a pushing party with the local verification program to locally check whether own pushing logic and pushing results are correct or not before formal pushing, and the local verification program can only verify the compliance of data in the same batch, such as structure, content, field constraint, field format specification, value range, dictionary value, master-slave table association relationship in the same batch and the like. The data association relation between different batches cannot be checked, so that the master-slave table data with strong association is required in a push mode of a common data push standard and needs to be pushed in the same push transaction; the association relation of the checked data cannot be checked, for example, when a house guarantee house is pushed, the house guarantee project associated with the house needs to be checked, the house guarantee project is checked and synchronously put into a data center, and the pre-association data is restrained, so that a local checking program cannot access the data center database and cannot be checked.
The pre-database is a data quality automatic checking warehouse. Opening a pre-database for each data pushing party, wherein the data table comprises four parts:
and a data pushing table is established based on the data standard requirement, so that a data pushing party can push data, each main table also comprises identification fields such as a main key id, a data source and the like besides the data content, and the main key id rule comprises a data source adding self-increasing sequence, so that the uniqueness of the whole data center is ensured.
The data checking RESULT table is used for recording the data automatic checking RESULT, and each data pushing table is provided with a corresponding data checking RESULT table, namely a data pushing table 'A', and the corresponding data pushing RESULT table is named as 'A_RESULT'. All data push result tables are consistent in structure and comprise the following fields: table name, primary key value, check time, whether to check pass, fail cause, valid, push to data center library, etc. Associating corresponding data in the data push table through a primary key name and a primary key value field in the result table record; identifying the checking result by a 'whether to check pass' field; recording the specific reason that the data test fails through a 'fail reason' field; according to the field of 'valid' and 'no valid', when the corresponding test result is updated by the same piece of data, the old result is invalidated, and the latest test result is distinguished; through the "push to data center library" field, the mark is made after the verification passes and the synchronization to the data center library is successful.
The test log record list is used for recording test logs, comprises automatic test logs of message queues, manually triggers the test logs, and regularly tests the test logs, records time consuming, successful execution or abnormal information of each test, and is used for analyzing the performance of an automatic effect program and checking abnormal problems.
A commit check table comprising the fields: the table name, the primary key value and the like, and the data pushing party can trigger the test of the test program on specific data through the data insertion of the table, so that the test is triggered after all the master-slave association tables are pushed completely under the condition of complex multi-table association, and the test failure caused by the data test triggered based on the data check message queue when only part of tables are pushed successfully is avoided.
The data pushing party has the authority of adding, deleting, checking and changing the data of the data pushing table, the data checking result table and the checking log record table only have the authority of data inquiry, and the manual submitting checking table has the authority of new addition and inquiry.
The data check message queue is a data check message queue that uses a database change data capture (Change Data Capture, abbreviated CDC hereinafter) component, such as Canal, etc. that supports msyql, to identify and capture changes (including data insertion and update) made to data in a pre-database data push table, and to issue these newly added and updated data in real-time to the message queue, which is the data check message queue.
And the automatic verification program is used for verifying the data push table structure, the content, the field constraint, the field format specification, the value range, the dictionary value, the master-slave table association relationship, the pre-association data and the like based on the data standard requirement. And establishing a corresponding entity object according to the data push table structure of the front-end processor, and annotating general check contents such as field constraint, field format, whether to check, value range, dictionary value and the like on the field attribute through notes. And receiving the data to be checked by using the corresponding entity object, obtaining the attribute names, the attribute values and the general check contents annotated on the fields of all the field attributes of the object through reflection, sequentially checking the field attribute values according to the obtained general check content identifiers, and checking the master-slave table association relation data and the front association data of the data after the general check is finished. And recording the verification result of each field, wherein when all field effects pass, the data and the slave table data are verification passed, and if only one field is not passed, the data and the slave table data are not passed. And finally, writing the verification result into a corresponding data verification result table.
The automatic checking program has a plurality of checking modes, including streaming real-time checking and batch multi-thread checking, and is used for adapting to different conditions, and the checking modes are as follows:
the real-time streaming inspection of the message queue is verified by the consumption data, which is the main inspection mode, and the data written into the pre-database by the pushing party is inspected in real time by the streaming processing based on the message queue.
The batch inspection based on the submitted inspection table is used for the condition of complex multi-table association, after all the master-slave association tables are pushed, the data pushing party inserts the submitted inspection table data to trigger the batch inspection of the corresponding master-slave association data, so that the batch inspection is the capability complement of the real-time streaming inspection of the consumption data inspection message queue, and the data inspection triggered based on the data inspection message queue when only part of tables are pushed successfully is avoided, thereby causing inspection failure.
The inspection of a particular table or data is manually triggered using a web tool that can query the inspection results of all data push tables, initiate a review for a particular single piece of data, or initiate a full table batch review for a table. The method is used for assisting the data pushing party in checking, processing and repairing the substandard data.
And (3) timing batch rechecking, namely timing batch rechecking of all the data which are not checked and are not checked in the pre-database, and performing timing task and timing execution to solve the problems of missing detection, false detection, program service abnormality and correlation table check failure caused by push party data push sequence. The checking correctness of the whole checking program is ensured.
In addition to the verification, the automatic verification program also comprises a timing retry synchronization function, and data which passes the verification but is not successfully synchronized in the verification result table is reissued to the data synchronization message queue through a timing task.
And the webpage tool is used for assisting the data pushing party in checking and processing and repairing the substandard data. The inspection results of all data push tables can be queried, rechecking can be initiated for a specific single piece of data, or whole-table batch rechecking can be initiated for a certain table. And the automatic inspection of the automatic inspection program and the webpage tool feed back the data inspection result and the manual rechecking capability to the data pushing party, so that the data quality is ensured by the data pushing party, and the workload and the labor investment of the data center building party are reduced.
And the data synchronous message queue uses CDC to identify and capture the data inserted in the data checking result table of the front database, and distributes the newly added data to the message queue in real time, wherein the message queue is the data synchronous message queue. And through consuming the queue in real time, synchronizing the data corresponding to the checked result in the data checking result table to the data center library in real time.
8. The data center library is used for storing data pushed and checked by all data pushing parties and is used for subsequent data processing.
The main operation flow of the eight functional modules is as follows: and (3) formulating a data standard based on data acquisition and processing use requirements, respectively establishing a front-end database for each data pushing party based on the data standard, respectively issuing the data standard, a local checking program and a front-end computer database address account number to each data pushing party, pushing data to a front-end database data pushing table by the data pushing party, real-time issuing the pushed data to a data checking message queue through a CDC data pushing table, real-time checking newly added and repaired pushed data through a consumption data checking message queue by an automatic checking program, writing a checking result into a corresponding data checking result table, real-time issuing the checking result to a data synchronizing message queue through a CDC data checking result table, and synchronizing the checked data to a data center database through consuming the queue. The whole flow is shown in figure 2.
In this embodiment, the operation flow includes in detail:
and (one) establishing a data standard: determining the data range and the content of a data center, combing the data association relationship and the list of table fields, and making a data standard; data criteria include, but are not limited to:
1. data table and field definitions.
2. Field value format specification and dictionary value range constraints.
3. The mandatory entry and the partial condition mandatory entry.
4. Association between table and field.
5. Business logic sequencing from table to table.
6. Code table definitions.
7. The relevant national standard documents refer to the description.
8. Data writing logic.
9. And (5) configuring a data verification program.
And (II) configuring a data automatic verification program based on the established data standard. And configuring a table to be checked in the data automatic checking program, wherein the table comprises data quality items to be checked, such as a table structure, fields, value specifications, value fields, field association relations, data service association relations and the like. And (3) automatically checking the configured data:
1. packaging the data into an offline local verification program for local self-checking of a data pushing party;
2. the online automatic verification service is deployed, and online instant verification capability is provided when data is pushed;
the local verification cannot connect with the data center database, and cannot verify whether the associated data are correct.
And (III) data pushing: the data pushing unit realizes data pushing according to the standard based on the issued data pushing standard.
1. And verifying whether the push data preliminarily reaches the standard by using a local verification program.
2. And pushing the data which is up to the standard in the local verification to a data table corresponding to the front database.
(IV) automatic verification
1. When the data pushing unit pushes data to the front end processor data table, the CDC identifies and captures the data inserted or updated in the front end processor data table and issues the data to the data checking message queue.
2. The online automatic verification program consumes the data verification message queue, obtains the data to be verified, verifies according to the configured verification items, and outputs the verification result to the verification result table.
3. The data inserted in the test result table is identified and captured by the CDC and issued to the data synchronization message queue.
4. And (4) consuming a data synchronization message queue, and synchronizing the data which passes the inspection and reach the quality standard to a data center library, wherein the state of the corresponding data of the write-back front-end processor is successful in synchronization.
5. And (3) timing automatic full-library rechecking by the automatic checking program to compensate for failure of checking the data association relation caused by the data pushing sequence. And outputting the checking result of the automatic full-library recheck to a checking result table, and continuing the step 3.
6. The data pushing party queries and processes the webpage tool through the abnormal data, checks the data which is not passed by the inspection, repairs the data, and can manually trigger single data rechecking or whole-table data batch rechecking on the webpage tool after repairing, and the inspection result is output to the inspection result table to continue the step 3.
And (V) reporting data quality: and automatically generating a data quality analysis report periodically based on the verification result data in the verification result table.
And (six) data processing and use: and processing and using the data synchronized to the data center.
The invention provides a data standard automatic checking and synchronizing system and method based on a message queue, wherein the system comprises a data standard making module, a local checking program module, a database setting module, a data checking message queue module, an automatic checking program module, a web page tool module, a data synchronizing message queue module and a data center database module; the automatic checking and synchronizing system and method provided by the invention are used for automating the data quality checking, and the instant feedback of the automatic checking and webpage tools to the checking result is used for improving the efficiency of data checking and problem feedback communication, guiding the quality of data to be ensured by the data pushing party, greatly reducing the labor and time cost of the data center building party, improving the efficiency of data center building and data standard pushing, improving the data quality and ensuring the effect of subsequent data processing and use.
The foregoing has shown and described the basic principles and features of the invention and the advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (10)
1. The data standard automatic checking and synchronizing system based on the message queue is characterized by comprising the following functional modules:
the data standard making module: based on data acquisition and processing use requirements, formulating a basic data pushing standard;
local verification program module: the method comprises the steps that the method is provided for a pushing party, and before formal pushing, whether pushing logic and pushing results of the method are correct or not is locally checked;
a pre-database module: automatically checking a warehouse for data quality;
a data check message queue module: using a CDC database change data capture component to identify and capture changes made to the data in the pre-database data push table and issue the newly added and updated data to the message queue in real time;
automatic checking program module: checking the data push table based on the data standard requirement;
the web page tool module: the auxiliary data pushing party checks, processes and repairs the substandard data;
a data synchronization message queue module: identifying and capturing data inserted in a data inspection result table of the front database by using a CDC database change data capturing component, and publishing the newly added data to a message queue in real time;
a data center library module: and storing all data pushed by the data pushing party and checking the passed data for subsequent data processing and use.
2. The message queue based data standard auto-verification and synchronization system of claim 1, wherein the data standard formulation module base data push standard comprises: the method comprises the following steps of data table structure, field content, field constraint, field format specification, value range, dictionary value drawing corresponding to dictionary type, table data association relation, pushing step and mode; the formulated data standard is used for informing the data pushing party of the data detailed pushing standard and is also the basis of the automatic checking program checking logic.
3. The message queue-based data standard auto-verification and synchronization system of claim 1, wherein the local verification program module verifies compliance of the same batch of data, comprising: structure, content, field constraints, field format specifications, value range, dictionary values, master-slave table associations within the same batch.
4. The message queue based data standard auto-verification and synchronization system of claim 2, wherein the pre-database module opens a pre-database for each data pusher comprising four parts of a data table:
data push table: establishing a data pushing party for pushing data based on the data standard requirement, wherein the data pushing party comprises a main table data content, a main key id and a data source identification field;
data checking result table: recording data automatic test results, wherein each data pushing table is provided with a corresponding data test result table, and the structure of the data pushing result table comprises fields: table name, primary key value, check time, whether check pass, fail cause, valid, and push to data center library;
check log record table: recording a verification log, comprising: the message queue automatically checks the log, manually triggers the check log, regularly checks the log, records the time consumption of each check, and whether the execution is successful or not and the exception information are recorded, and is used for analyzing the performance of an automatic effect program and checking the exception problem;
submitting a check list: the field is included: the table name, the primary key name and the primary key value, and the data pushing party can trigger the checking program to check the data by inserting the data submitted to the checking table.
5. The message queue-based data standard autoverification and synchronization system of claim 1, wherein the autoverification program module verifies the data push table, the verification comprising: data push table structure, content, field constraint, field format specification, value range, dictionary value, master-slave table association relation and pre-association data; the method also comprises a plurality of inspection modes, namely, streaming real-time inspection and batch multi-thread inspection, and comprises the following steps:
real-time streaming inspection of message queues by consumer data verification: the method is characterized in that the data written into a pre-database by a pushing party is checked in real time by streaming processing based on a message queue in a main checking mode;
batch verification based on commit verification tables: under the condition of complex multi-table association, after all the master-slave association tables are pushed, the data pushing party triggers batch inspection of the corresponding master-slave association data by inserting and submitting inspection table data;
verification of manual trigger table or data using web tool: inquiring the test results of all the data push tables, and initiating rechecking aiming at single data or initiating batch rechecking of all the tables aiming at the tables;
timing batch rechecking: and (3) carrying out timing batch rechecking on all the data which are not checked and are not checked in the pre-database, and carrying out timing execution on the timing tasks to solve the problems of missing detection, false detection, program service abnormality and correlation table check failure caused by push party data push sequence.
6. The automatic checking and synchronizing method of data standards based on message queues, which is realized based on the automatic checking and synchronizing system of data standards based on message queues according to any one of claims 1 to 5, and is characterized by comprising the following steps:
step S1: determining the data range and the content of a data center, combing the data association relationship and the list of table fields, and making a data standard;
step S2: configuring a data verification program, configuring a data automatic verification program based on a formulated data standard, and configuring a table to be verified in the data automatic verification program, wherein the table comprises data quality items to be verified, such as a table structure, fields, value specifications, value fields, field association relations, data service association relations and the like;
step S3: data pushing, wherein a data pushing unit realizes data pushing according to a standard based on a issued data pushing standard;
step S4: the data pushing unit automatically checks;
step S5: automatically generating a data quality analysis report at regular intervals based on the verification result data in the verification result table;
step S6: and processing and using the data synchronized to the data center.
7. The message queue based data standard auto-verification and synchronization method of claim 6, wherein the step S1 data standard comprises: data table and field definition, field value format specification and dictionary value range constraint, necessary filling item and partial condition necessary filling item, association relation between table and field, business logic sequence between table and table, code table definition, related national standard file reference description and data writing logic description.
8. The message queue based data standard auto-verification and synchronization method of claim 6, wherein step S2 further comprises the sub-steps of:
step S21: packaging the configured data automatic checking program into an offline local checking program for local self-checking of a data pushing party;
step S22: and deploying the configured data automatic verification program into an online automatic verification service, and providing online instant verification capability when the data is pushed.
9. The message queue based data standard auto-verification and synchronization method according to claim 6, wherein said step S3 further comprises the sub-steps of:
step S31: using a local verification program to verify whether the push data preliminarily reach the standard;
step S32: and pushing the data which is up to the standard in the local verification to a data table corresponding to the front database.
10. The message queue based data standard auto-verification and synchronization method of claim 6, wherein step S4 further comprises the sub-steps of:
step S41: when the data pushing unit pushes data to the front-end processor data table, the database change data capturing component recognizes and captures data inserted or updated in the front-end database data table and issues the data to the data verification message queue;
step S42: the online automatic verification program consumes the data verification message queue to obtain data to be verified, verifies according to the configured verification items, and outputs a verification result to a verification result table;
step S43: the data inserted in the test result table is identified and captured by the database change data capturing component and is released to the data synchronous message queue;
step S44: the data synchronization message queue is used for synchronizing the data which pass the inspection and reach the quality standard to the data center library, and the corresponding data state of the write-back front machine is successful in synchronization;
step S45: the automatic checking program automatically performs full-library re-checking at regular time to compensate for the failure of checking the data association relation caused by the data pushing sequence;
outputting the checking result of the automatic full-library re-checking to a checking result table, and continuing to step S43;
step S46: the data pushing party queries and processes the webpage tool through the abnormal data, checks the data which is not checked, repairs the data, and can manually trigger single data rechecking or whole-table data batch rechecking on the webpage tool after repairing, and the check result is output to the check result table to continue step S43.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311107792.8A CN116860741B (en) | 2023-08-31 | 2023-08-31 | Automatic data standard checking and synchronizing system and method based on message queue |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311107792.8A CN116860741B (en) | 2023-08-31 | 2023-08-31 | Automatic data standard checking and synchronizing system and method based on message queue |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116860741A true CN116860741A (en) | 2023-10-10 |
CN116860741B CN116860741B (en) | 2023-11-10 |
Family
ID=88225290
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311107792.8A Active CN116860741B (en) | 2023-08-31 | 2023-08-31 | Automatic data standard checking and synchronizing system and method based on message queue |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116860741B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104954469A (en) * | 2015-06-19 | 2015-09-30 | 长沙廖氏软件科技有限公司 | Information exchange method for heterogeneous system |
CN105005683A (en) * | 2015-06-17 | 2015-10-28 | 北京锐易特软件技术有限公司 | Caching system and method for solving data normalization problem of regional medical system |
CN109344148A (en) * | 2018-10-16 | 2019-02-15 | 万达信息股份有限公司 | A kind of data acquisition management system and method |
CN114329190A (en) * | 2021-12-13 | 2022-04-12 | 南京莱斯信息技术股份有限公司 | Data standard processing system |
CN115481116A (en) * | 2022-09-22 | 2022-12-16 | 中国银行股份有限公司 | Data quality inspection method and device |
US20230237043A1 (en) * | 2022-01-21 | 2023-07-27 | Snowflake Inc. | Accelerating change data capture determination using row bitsets |
-
2023
- 2023-08-31 CN CN202311107792.8A patent/CN116860741B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105005683A (en) * | 2015-06-17 | 2015-10-28 | 北京锐易特软件技术有限公司 | Caching system and method for solving data normalization problem of regional medical system |
CN104954469A (en) * | 2015-06-19 | 2015-09-30 | 长沙廖氏软件科技有限公司 | Information exchange method for heterogeneous system |
CN109344148A (en) * | 2018-10-16 | 2019-02-15 | 万达信息股份有限公司 | A kind of data acquisition management system and method |
CN114329190A (en) * | 2021-12-13 | 2022-04-12 | 南京莱斯信息技术股份有限公司 | Data standard processing system |
US20230237043A1 (en) * | 2022-01-21 | 2023-07-27 | Snowflake Inc. | Accelerating change data capture determination using row bitsets |
CN115481116A (en) * | 2022-09-22 | 2022-12-16 | 中国银行股份有限公司 | Data quality inspection method and device |
Non-Patent Citations (3)
Title |
---|
DEVINDER KUMAR ET AL.: "Applications of the internet of things for optimizing warehousing and logistics operations: A systematic literature review and future research directions", 《COMPUTERS & INDUSTRIAL ENGINEERING》, pages 1 - 17 * |
张迪: "电力物联网物联代理信息承载模型的研究", 《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》, pages 042 - 2638 * |
黄晨晖 等: "基于分布式SQL和流复制的数据同步系统", 《微计算机信息》, pages 266 - 268 * |
Also Published As
Publication number | Publication date |
---|---|
CN116860741B (en) | 2023-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110347746B (en) | Heterogeneous database synchronous data consistency checking method and device | |
US8078582B2 (en) | Data change ordering in multi-log based replication | |
CN108647357B (en) | Data query method and device | |
US20110161132A1 (en) | Method and system for extracting process sequences | |
CN103514223A (en) | Data synchronism method and system of database | |
CN110704475A (en) | Method and system for comparing ETL loading table structures | |
CN113010413B (en) | Automatic interface testing method and device | |
CN112800044A (en) | Data quality determination and monitoring method, management system, storage medium and terminal | |
CN116860741B (en) | Automatic data standard checking and synchronizing system and method based on message queue | |
CN109947797B (en) | Data inspection device and method | |
CN117421238A (en) | Disaster recovery account supplementing test method and device, electronic equipment and storage medium | |
CN111190906B (en) | Sensor network data anomaly detection method | |
Hinrichs et al. | An ISO 9001: 2000 Compliant Quality Management System for Data Integration in Data Warehouse Systems. | |
CN111914028A (en) | Method and device for synchronizing data relation of heterogeneous data sources based on graph increment | |
CN111552639B (en) | Software test comprehensive control method and system | |
CN111639478B (en) | Automatic data auditing method and system based on EXCEL document | |
CN112612773A (en) | Database synchronization test method and device, computer equipment and storage medium | |
CN116361391B (en) | Method and device for detecting and repairing structural abnormality of data synchronization table | |
CN113010521B (en) | Method and device for processing remote transaction service and computer readable storage medium | |
CN116562832B (en) | Authority auditing system and method | |
CN116431494A (en) | Positioning method and positioning device for test data, target equipment and target server | |
CN116910025A (en) | Full-quantity and incremental data verification method from Oracle to openGauss | |
CN114461622A (en) | Data quality inspection method and device | |
CN114116800A (en) | Data logic checking method and system | |
Manjunath et al. | A case study on regression test automation for data warehouse quality assurance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |