CN114185948A - Data quality monitoring method and system based on data center - Google Patents

Data quality monitoring method and system based on data center Download PDF

Info

Publication number
CN114185948A
CN114185948A CN202111545084.3A CN202111545084A CN114185948A CN 114185948 A CN114185948 A CN 114185948A CN 202111545084 A CN202111545084 A CN 202111545084A CN 114185948 A CN114185948 A CN 114185948A
Authority
CN
China
Prior art keywords
data
alarm
module
rule
checking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111545084.3A
Other languages
Chinese (zh)
Inventor
王显辉
黄波
李瑜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Hongtian Information Technology Co ltd
Original Assignee
Beijing Hongtian Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Hongtian Information Technology Co ltd filed Critical Beijing Hongtian Information Technology Co ltd
Priority to CN202111545084.3A priority Critical patent/CN114185948A/en
Publication of CN114185948A publication Critical patent/CN114185948A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The invention discloses a data quality monitoring method and a system based on a data center, which comprises a rule center, a check engine, a disposal center, a knowledge base and a large-scale analysis; the rule center is divided into a basic rule and a model rule; the checking engine comprises a data acquisition part and a data checking part; the disposal center is divided into an alarm module and a processing module; the knowledge base and the analysis module are mainly used for counting and analyzing abnormal conditions of all services, outputting abnormal account checking ratios of all services every day, analyzing reasons, and increasing and decreasing abnormal ratios. According to the invention, through the disposal center arranged in the system, various alarm modes are adopted, so that the workers can find and process existing problems in time, and through the arrangement of the rule center in the system, an automatic comparison mode is adopted, so that the accuracy of data is further improved, the system has better usability, and the working efficiency of the workers is further improved.

Description

Data quality monitoring method and system based on data center
Technical Field
The invention relates to the technical field of data quality monitoring, in particular to a data quality monitoring method and system based on a data center station.
Background
Data quality monitoring is to monitor the integrity, accuracy, consistency and timeliness of data, and for data departments, the data quality is the life line of the departments. Data latency may be acceptable, but data loss, data inaccuracy is fatal. And the problem is found in advance, and the loss is reduced by adopting a scheme in advance. And thus data quality verification is indispensable.
The data quality monitoring method and system based on the data center station in the prior art have the defects that:
1. the comparison document CN112948845A discloses a data processing method and system based on an Internet of things data center, wherein a three-level data processing structure is designed, preliminary setting and screening are set at an acquisition end, two levels of data guiding are also set, and the acquisition end can select a data processing platform or a corresponding special data processing point for data processing according to the security and processing urgency of acquired data. Meanwhile, a generator, a user and a processor of the data are recorded, different authorities are respectively granted to the generator, the user and the processor of the data, the authorities needed in the data processing process are monitored in real time in the data processing process, and if the authorities are not matched correspondingly, the safety problem of the data is obviously explained. If a multi-stage data processing end is designed at the acquisition end, the timeliness of data transmission is increased and the problem is reduced, the problem can be completely solved by the method, the preprocessed data are quickly classified and processed in the first step through quick matching and marking setting rules, but the system does not have a good warning function when the data are abnormal in the use process, so that a worker cannot find the problem in time, and the overall working efficiency is influenced;
2. the comparison document CN111241086A discloses a data quality improvement method and system based on medical big data, which carries out calculation based on HIS atomic index values, and carries out quality management through normative detail data, non-normative detail data, state data, atomic index summary and other field level checks; calculating based on the platform atomic value, and finely checking the resident personal information and the service treatment record by collecting data through a public service platform; calculating based on BI atom index values, and performing directional rule verification on a related basic table by taking the atom index as a guide; writing a dynamic sql execution statement, and performing data quality control and statistics based on hadoop and hash calculation engines. According to the method for improving the data quality based on the medical big data, the data quality control can be realized by using three-path integration for data verification for multiple times, but the system does not have a good account checking function, so that workers are difficult to quickly know about abnormal proportion, reason analysis, abnormal proportion increase, abnormal proportion decrease and the like, and the functionality is poor;
3. the comparison document CN105550511A discloses a data quality evaluation system and method based on a data verification technology, which comprises a data acquisition unit, a verification unit, a quality evaluation unit, a report feedback unit and a statistical analysis unit; the method comprises the following steps: acquiring service data information in a medical data source in a heterogeneous medical data system; carrying out compliance verification; evaluating different quality evaluation indexes of the obtained service data, and carrying out comprehensive grading on the quality evaluation indexes of the service data; feeding back the quality evaluation result of the service data to a data maintainer in the form of a quality scoring report and a data verification report; and carrying out statistical analysis on the actual quality evaluation result of the service data. The invention solves the problem of data quality evaluation in the data verification process of the regional medical system, so that the data verification processing not only finds the data problem, but also comprehensively evaluates the data problem and the data quality, thereby improving the data quality.
Disclosure of Invention
The present invention provides a data quality monitoring method and system based on a data center station, so as to solve the problems proposed in the background art.
In order to achieve the purpose, the invention provides the following technical scheme, which comprises a rule center, a checking engine, a disposal center, a knowledge base and a large-scale analysis;
the rule center is divided into a basic rule and a model rule;
the checking engine comprises a data acquisition part and a data checking part;
the disposal center is divided into an alarm module and a processing module;
the knowledge base and the analysis module are mainly used for counting and analyzing abnormal conditions of all services, outputting abnormal account checking ratios of all services every day, analyzing reasons, and increasing and decreasing abnormal ratios.
Preferably, the core of the basic rule is range matching, relationship matching is a special case of range matching, range matching can determine whether comparison relationships between a plurality of references and a range in a certain interval are established, and a range matching formula including two references is: contrast calculation logic + operator + contrast calculation logic + comparator + range symbol.
Preferably, the model rules are both policy and machine learning.
Preferably, the data obtaining module may pull data from a data source, where the data source may be mysql, hive, es, or the like, or may be an hdfs path containing schema information.
Preferably, the checking engine can be divided into a quasi-real-time checking module and an off-line checking module according to the checking time;
the quasi-real-time module consumes data from the MQ by using a spark or flink stream processing platform and compares and verifies the data;
the offline module is an executable script or jar package.
Preferably, the alarm module includes common alarm modes, such as short message alarm, telephone alarm, etc., and is provided by a platform, the alarm can also display alarm history by means of a histogram, the height of a column represents the total amount of notifications, the subscribed alarm history can be checked by clicking the column, the sum of the alarm history is analyzed, and then the learning is performed by an AI machine, and the main information includes:
firstly, alarming time;
secondly, alarm type;
thirdly, a data source;
fourthly, alarming details;
the default alarm time may set a threshold range, with the minimum range being the last hour.
Preferably, the processing module can select a code to automatically process according to the type of the alarm, the processing logic is related to the user and is unrelated to the platform, the user can configure the own processing logic on the webpage, and when the quality detection task is configured, the processing module can select various processing logics according to a rule.
Preferably, the knowledge base;
when a quality comparison exception is generated, the system can automatically generate an exception post in the knowledge base, record exception information including a data source, rules, exception generation time and the like, and remind related users, wherein the users can reply an exception processing flow under the post, summarize problem reasons, solve a short-term and long-term solution and the like;
after the exception is processed, the user can close the post and mark that the exception processing is finished, and other users can search the appointed post through the keyword to inquire the processing progress and scheme of a certain exception.
Compared with the prior art, the invention has the following beneficial effects:
1. according to the invention, through the disposal center arranged in the system, various alarm modes are adopted, so that the workers can find and process existing problems in time, and through the arrangement of the rule center in the system, an automatic comparison mode is adopted, so that the accuracy of data is further improved, and the working efficiency of the workers is further improved.
2. According to the invention, the engine is set by the system kernel, so that the system has a better account checking function, and thus, workers can conveniently and rapidly know abnormal occupation, reason analysis, abnormal proportion increase, reduction and the like, and the working efficiency of the workers is improved.
3. According to the invention, through the setting of the knowledge base, the system can simultaneously remind a plurality of users of abnormal posts, and after the users reply the processing flow under the posts, the users can conveniently search or check other users, so that the overall connectivity of the system is improved, and the system has better usability.
Drawings
FIG. 1 is a schematic flow chart of the present invention
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that the terms "upper", "lower", "inner", "outer", "front", "rear", "both ends", "one end", "the other end", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it is to be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "disposed," "connected," and the like are to be construed broadly, such as "connected," which may be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
The first embodiment is as follows:
referring to fig. 1, an embodiment of the present invention provides a data quality monitoring method and system based on a data center, which includes a rule center, a checking engine, a disposal center, a knowledge base, and a large-scale analysis;
the rule center: the method is divided into two types of basic rules and model rules:
the core of the basic rule is range matching, relationship matching is a special case of range matching, range matching refers to whether comparison relations between a plurality of comparison objects and a certain interval range are established or not through calculation, and a range matching formula comprising two comparison objects is as follows: contrast calculation logic + operator + contrast calculation logic + comparator + range symbol, for example: a + B/C ═ 1,1.01), the relationship matching is to determine whether the relationship between the reference objects is true, for example, a > B, it can also be seen from the example that the relationship matching can be converted into range matching by transformation, the relationship matching a > B can be converted into range matching a-B >0, two types of range matching execution logics are mainly convenient for the user to configure, and the bottom layer execution is all used;
model rules are two types of policies and machine learning, policies being some of the pool types of rules, for example: the data fluctuation strategy is that the same or ring ratio is reduced by 50 percent, the data is considered to be abnormal, the machine learning module is similar to the strategy, and the returned result is also a pool type result, but the rule is changed into a machine learning model;
a checking engine: the method comprises two parts of data acquisition and data checking:
the data acquisition module can pull back data from a data source, the data source can be mysql, hive, es and the like which are commonly used by people, or an hdfs path containing schema information, the platform pulls the data from each data engine according to the configuration (which can be sql or customized configuration) of a user, and the data checking module is used for checking whether the configured rule is met;
the checking engine can be divided into a quasi-real-time checking module and an off-line checking module according to the checking time, the quasi-real-time module generally consumes data from MQ by using a spark or flight stream processing platform and then compares and verifies, and the off-line module generally is an executable script or jar packet, wherein the logic of pulling off comparison is encapsulated.
The disposal center comprises two modules of alarming and processing:
the alarm module includes common alarm modes, such as: short message is reported an emergency and asked for help or increased vigilance, and the module is provided by the platform, reports an emergency and asks for help or increased vigilance history that also can rely on the bar graph form to report an emergency and ask for help or increased vigilance history, and the post height represents the total amount of notice, through clicking the post form, can look over the warning history of subscription to report an emergency and ask for help or increased vigilance history and carry out sum analysis, learn through the AI machine, main information includes:
firstly, alarming time;
secondly, alarm type;
thirdly, a data source;
fourthly, alarming details;
the default alarm time can set a threshold range, and the minimum range is the latest hour;
the processing module can select codes to automatically process according to the types of alarms, the processing logic is related to a user and is unrelated to a platform, the user can configure own processing logic on self, and when a quality detection task is configured, various processing logics can be selected aiming at one rule;
knowledge base and analysis module: the method mainly comprises the steps of counting and analyzing abnormal conditions of each service, outputting abnormal account-checking ratios of each service every day, analyzing reasons, and increasing and decreasing abnormal ratios;
when a quality comparison exception is generated, the system can automatically generate an exception post in the knowledge base, record exception information including a data source, rules, exception generation time and the like, and remind related users, wherein the users can reply an exception processing flow under the post, summarize problem reasons, solve a short-term and long-term solution and the like;
after the exception is processed, the user can close the post and mark that the exception processing is finished, and other users can search the appointed post through the keyword to inquire the processing progress and scheme of a certain exception.
In the system:
control substance: colloquially, it is a value, which may be a constant, may be an execution result of sql, or may be a result calculated by an algorithm model;
operator: the commonly used calculation symbols include +, -,% and five;
comparing characters: common comparison symbols are >, wherein <, wherein the five symbols are defined as;
the range symbol: the commonly used range descriptors are the same as mathematical definitions, namely, a left open interval (a right open interval), a left closed interval [, a right closed interval ].
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims (8)

1. A data quality monitoring method and system based on a data center station comprises a rule center, a check engine, a disposal center, a knowledge base and a large-scale analysis, and is characterized in that:
the rule center is divided into a basic rule and a model rule;
the checking engine comprises a data acquisition part and a data checking part;
the disposal center is divided into an alarm module and a processing module;
the knowledge base and the analysis module are mainly used for counting and analyzing abnormal conditions of all services, outputting abnormal account checking ratios of all services every day, analyzing reasons, and increasing and decreasing abnormal ratios.
2. The data quality monitoring method and system based on the data center station as claimed in claim 1, wherein: the core of the basic rule is range matching, relationship matching is a special case of range matching, range matching can determine whether comparison relationship between a plurality of comparison objects and a certain interval range is established through calculation, and a range matching formula comprising two comparison objects is as follows: contrast calculation logic + operator + contrast calculation logic + comparator + range symbol.
3. The data quality monitoring method and system based on the data center station as claimed in claim 1, wherein: the model rules have two types of strategies and machine learning.
4. The data quality monitoring method and system based on the data center station as claimed in claim 1, wherein: the data acquisition module can pull back data from a data source, wherein the data source can be mysql, hive, es and the like, and can also be an hdfs path containing schema information.
5. The data quality monitoring method and system based on the data center station as claimed in claim 1, wherein: the checking engine can be divided into a quasi-real-time checking module and an off-line checking module according to the checking time;
the quasi-real-time module consumes data from the MQ by using a spark or flink stream processing platform and compares and verifies the data;
the offline module is an executable script or jar package.
6. The data quality monitoring method and system based on the data center station as claimed in claim 1, wherein: the alarm module comprises common alarm modes, such as short message alarm, telephone alarm and the like, is provided by a platform, alarm can also display alarm history by means of a histogram mode, the height of a column represents the total amount of notification, the subscribed alarm history can be checked by clicking the column, the sum of the alarm history is analyzed, and learning is performed through an AI machine, and main information comprises:
firstly, alarming time;
secondly, alarm type;
thirdly, a data source;
fourthly, alarming details;
the default alarm time may set a threshold range, with the minimum range being the last hour.
7. The data quality monitoring method and system based on the data center station as claimed in claim 1, wherein: the processing module can select codes to automatically process according to the types of the alarms, the processing logic is related to the user and is not related to the platform, the user can configure the own processing logic on a webpage, and when a quality detection task is configured, various processing logics can be selected according to a rule.
8. The data quality monitoring method and system based on the data center station as claimed in claim 1, wherein: the knowledge base;
when a quality comparison exception is generated, the system can automatically generate an exception post in the knowledge base, record exception information including a data source, rules, exception generation time and the like, and remind related users, wherein the users can reply an exception processing flow under the post, summarize problem reasons, solve a short-term and long-term solution and the like;
after the exception is processed, the user can close the post and mark that the exception processing is finished, and other users can search the appointed post through the keyword to inquire the processing progress and scheme of a certain exception.
CN202111545084.3A 2021-12-16 2021-12-16 Data quality monitoring method and system based on data center Pending CN114185948A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111545084.3A CN114185948A (en) 2021-12-16 2021-12-16 Data quality monitoring method and system based on data center

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111545084.3A CN114185948A (en) 2021-12-16 2021-12-16 Data quality monitoring method and system based on data center

Publications (1)

Publication Number Publication Date
CN114185948A true CN114185948A (en) 2022-03-15

Family

ID=80605411

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111545084.3A Pending CN114185948A (en) 2021-12-16 2021-12-16 Data quality monitoring method and system based on data center

Country Status (1)

Country Link
CN (1) CN114185948A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116579866A (en) * 2023-05-16 2023-08-11 佛山众陶联供应链服务有限公司 Data checking method and system based on Spark and Hadoop

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116579866A (en) * 2023-05-16 2023-08-11 佛山众陶联供应链服务有限公司 Data checking method and system based on Spark and Hadoop
CN116579866B (en) * 2023-05-16 2023-11-03 佛山众陶联供应链服务有限公司 Data checking method and system based on Spark and Hadoop

Similar Documents

Publication Publication Date Title
CN105867351B (en) Vehicle trouble code acquires the method and device with historical data analysis diagnosis in real time
CA2755216C (en) Identify code hierarchy bias in medical priority dispatch systems
CN109684160A (en) Database method for inspecting, device, equipment and computer readable storage medium
CN105868373B (en) Method and device for processing key data of power business information system
CN105956151B (en) Aid decision-making method, Tailings Dam monitoring method and system based on prediction scheme
CN110830438A (en) Abnormal log warning method and device and electronic equipment
CN109347665A (en) A kind of Website Usability alarm method and its system based on web log
CN108880845A (en) A kind of method and relevant apparatus of information alert
CN111865407A (en) Intelligent early warning method, device, equipment and storage medium for optical channel performance degradation
CN108734201B (en) Classification method and system for experience feedback events of nuclear power plant based on hierarchical reason analysis method
CN114185948A (en) Data quality monitoring method and system based on data center
CN110991668A (en) Electric vehicle power battery monitoring data analysis method based on association rule
CN111126751A (en) Intelligent inspection and safety monitoring early warning system and method based on mobile interconnection
CN114680889A (en) Method and device for identifying unsafe behaviors of drilling operation personnel
CN109639456A (en) A kind of automation processing platform for the improved method and alarm data that automation alerts
CN111563111A (en) Alarm method, alarm device, electronic equipment and storage medium
CN111475495A (en) Mass analysis method, system and storage medium based on big data
CN109947615A (en) The monitoring method and device of distributed system
CN115526527A (en) Risk control method and device based on medical equipment operation and maintenance data
CN116360685A (en) Multi-source clinical medical data management method and platform based on blockchain
CN114513334B (en) Risk management method and risk management device
CN115766793A (en) Based on data center computer lab basis environmental monitoring alarm device
CN114090385A (en) Monitoring and early warning method, device and equipment for service running state
CN113837408A (en) Traffic facility operation and maintenance management system based on equipment full-life-cycle supervision
CN110677271B (en) Big data alarm method, device, equipment and storage medium based on ELK

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination