CN117371856A - Data quality monitoring method and device, storage medium and computer equipment - Google Patents

Data quality monitoring method and device, storage medium and computer equipment Download PDF

Info

Publication number
CN117371856A
CN117371856A CN202311349035.1A CN202311349035A CN117371856A CN 117371856 A CN117371856 A CN 117371856A CN 202311349035 A CN202311349035 A CN 202311349035A CN 117371856 A CN117371856 A CN 117371856A
Authority
CN
China
Prior art keywords
data
data quality
rule
checking
quality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311349035.1A
Other languages
Chinese (zh)
Inventor
韩永生
李春廷
赵忠洋
夏竹侠
王辉
司彩霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yuanguang Software Co Ltd
Original Assignee
Yuanguang Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yuanguang Software Co Ltd filed Critical Yuanguang Software Co Ltd
Priority to CN202311349035.1A priority Critical patent/CN117371856A/en
Publication of CN117371856A publication Critical patent/CN117371856A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply

Abstract

The embodiment of the application discloses a data quality monitoring method, a data quality monitoring device, a storage medium and computer equipment, and relates to the field of data monitoring. The method comprises the following steps: collecting service data, checking the service data by utilizing each checking rule in the checking rule base set, and outputting an abnormal data set corresponding to each checking rule; identifying data which do not meet the data quality criterion in each abnormal data set according to a mean shift algorithm, and generating a data quality list according to the identification result; and calculating the evaluation score of each data quality set according to the data quality index, and if the evaluation score is larger than the score threshold, carrying out alarm prompt to realize the data quality problem of the full life cycle of the automatic checking equipment, replacing manual checking work and improving the data quality checking work efficiency.

Description

Data quality monitoring method and device, storage medium and computer equipment
Technical Field
The present invention relates to the field of data monitoring, and in particular, to a method and apparatus for monitoring data quality, a storage medium, and a server.
Background
The profit mode of the power grid enterprise is changed from acquisition of purchase price difference to transmission and distribution grant total income, the operation performance is changed from main dependence on electric quantity increase to dependence on effective asset increase and operation efficiency improvement, under the condition of adapting to supervision requirements, the relationship among assets, business and cost is required to be enhanced, reasonable transmission and distribution price parameters and price checking level are strived for, and the power grid enterprise is required to improve asset management level and tamp effective asset foundation. During the operation process, a large amount of service data can be generated by the power grid enterprises, and how to monitor the quality of the service data is a hot spot of current research.
Disclosure of Invention
The embodiment of the application provides a data quality monitoring method, a device, a storage medium and computer equipment, which can automatically check the data quality problem of the whole life cycle of the equipment, replace manual checking work and improve the data quality checking work efficiency. The technical scheme is as follows:
in a first aspect, an embodiment of the present application provides a method for monitoring data quality, where the method includes:
collecting service data, checking the service data by utilizing each checking rule in a checking rule base set, and outputting an abnormal data set corresponding to each checking rule;
identifying data which do not meet the data quality criterion in each abnormal data set according to a mean shift algorithm, and generating a data quality list according to the identification result; wherein the data quality list comprises a plurality of data quality sets, and each data quality set corresponds to one data quality index;
and calculating the evaluation score of each data quality set according to the data quality index, and if the evaluation score is greater than the score threshold value, carrying out alarm prompt.
In a second aspect, an embodiment of the present application provides a device for monitoring data quality, where the device includes:
the system comprises an acquisition unit, a detection unit and a storage unit, wherein the acquisition unit is used for acquiring service data, detecting the service data by utilizing each detection rule in a detection rule base set and outputting an abnormal data set corresponding to each detection rule;
the generating unit is used for identifying the data which do not meet the data quality criterion in each abnormal data set according to the mean shift algorithm and generating a data quality list according to the identification result; wherein the data quality list comprises a plurality of data quality sets, and each data quality set corresponds to one data quality index;
and the prompting unit is used for calculating the evaluation score of each data quality set according to the data quality index, and if the evaluation score index score threshold value is reached, alarming and prompting are carried out.
In a third aspect, embodiments of the present application provide a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the above-described method steps.
In a fourth aspect, embodiments of the present application provide a computer device, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.
The technical scheme provided by some embodiments of the present application has the beneficial effects that at least includes:
setting a data check rule according to a business rule, a business index standard and a data element standard, constructing a data check model, and setting data check rule sample data from the aspects of account and card object consistency, business data integrity and cost data accuracy. Based on work order, multidimensional, inspection and switching cost data generated by equipment, asset value and equipment full life cycle, constructing a data quality abnormality autonomous identification service assembly, automatically checking the equipment full life cycle data quality problem, replacing manual checking work, and improving the data quality checking work efficiency; the data quality monitoring and early warning device is applied in an integrated manner of data quality analysis, monitoring and early warning, and is used for monitoring, early warning and analyzing data in a multi-dimensional manner in the whole process of quantifying the data link from the production cost, so that the data quality problem exists, and an effective support is provided for the lean management of the data quality of the production cost of enterprise equipment.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of a network architecture provided in an embodiment of the present application;
fig. 2 is a flow chart of a method for monitoring data quality according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a data quality monitoring device provided in the present application;
fig. 4 is a schematic structural diagram of a computer device provided in the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the following detailed description of the embodiments of the present application will be given with reference to the accompanying drawings.
It should be noted that, the method for monitoring data quality provided in the present application is generally executed by a computer device, and accordingly, the device for monitoring data quality is generally disposed in the computer device.
Fig. 1 shows an exemplary system architecture of a data quality monitoring method or a data quality monitoring apparatus that can be applied to the present application.
As shown in fig. 1, the system architecture may include: a computer device 101 and a server 102. Communication between the computer device 101 and the server 102 may be through a network that is used as a medium to provide communication links between the various elements described above. The network may include various types of wired or wireless communication links, such as: the wired communication link includes an optical fiber, a twisted pair wire, a coaxial cable, or the like, and the WIreless communication link includes a bluetooth communication link, a WIreless-FIdelity (Wi-Fi) communication link, a microwave communication link, or the like.
The server 102 is provided with a data center, business data of each business system are deposited in the data center, and the computer equipment 101 acquires the business data from the data center and performs data instruction analysis and early warning on the business data.
According to the method, a quantized data link process of equipment production cost is used as a data quality monitoring and early warning object, original data of the equipment production cost is quantized based on a data center station through data access service, a data verification rule model is constructed to generate abnormal sample data, the data quality is subjected to labeling processing and structured storage through a data quality identification model, then data quality relevance analysis is carried out based on a big data analysis model, and the abnormal data is subjected to visual monitoring and early warning through a terminal display device.
The server is provided with the following components:
1. a data quality checking component:
1. the method comprises the steps of collecting and accessing service data of a source end system, accessing main data of service systems such as power grid production ERP, PMS, service supply, marketing, finance and the like, work order, inspection, switching and multidimensional lean cost data through a data center, setting relation mapping rules of each service system organization, equipment type, voltage level and the like, and converting the main data and the service data of each service system according to the set mapping relation to form a unified standard data element.
2. And setting a source-end data verification rule, quantifying service requirements according to the production cost of power grid equipment, and constructing a source-end system service verification rule model. Firstly, account and card object consistency rule setting, wherein the corresponding attributes of the PMS equipment account and the ERP asset card comprise voltage level, affiliated line stations, fixed asset subclasses, asset storage places, measurement units of an asset adding mode, quantity and manufacturers, and data verification rules of the equipment account free asset card or the equipment card free account. Secondly, business integrity rules are set, and the PMS, service providing, marketing and financial business system business integrity rules comprise relevant business attribute settings such as voltage class, equipment type, asset type, operation and maintenance team, cost center and the like. Setting a business data accuracy rule, setting a work order, inspecting, switching, multidimensional lean cost data accurate checking rule, and ensuring that the operation and the cost data are consistent with standard data elements. Fourthly, setting production cost sharing basis rules according to the attributes of the operation and maintenance team, the voltage class, the equipment type, the business activity, the cost center and the like.
2. The data quality identification analysis service component:
1. the data quality recognition analysis training model quantifies the degree of correlation between data from the correlation coefficients among data characteristics, business logic and importance data, and recognizes data with strong correlation through the sequencing of the correlation coefficients. Obtaining an abnormal result data sample, calculating the correlation between the factors and the influence degree rate to identify abnormal factors by quantifying the influence degree rate of the abnormal sample, and analyzing the data quality abnormal factors by using a clustering algorithm to further determine an influence degree criterion.
2. And storing the data quality abnormal data, namely carrying out structural storage on the identified equipment production cost data quality abnormal list according to the classification labels, and establishing an abnormal data index according to the physical logic.
3. Data quality monitoring and early warning component:
1. constructing a data quality evaluation model, setting data quality weights according to evaluation indexes such as correctness, completeness, consistency, effectiveness and the like, and outputting a final quantized result of data quality monitoring and early warning based on evaluation rules.
2. The data quality monitoring and visualization is to analyze, monitor and pre-warn the data quality, display the production cost in the whole process of the quantized data link in the computer equipment in a graphical form, monitor, pre-warn and analyze the data in a multi-dimension way, and the data quality problem exists.
The computer device 101 and the server 102 may be hardware or software. When the computer device 101 and the server 102 are hardware, they may be implemented as a distributed server cluster formed by a plurality of servers, or as a single server. When the computer device 101 and the server 102 are software, they may be implemented as a plurality of software or software modules (for example, to provide distributed services), or may be implemented as a single software or software module, which is not specifically limited herein.
Various communication client applications may be installed on the computer device of the present application, such as: video recording applications, video playing applications, voice interaction applications, search class applications, instant messaging tools, mailbox clients, social platform software, and the like.
The computer device may be hardware or software. When the computer device is hardware, it may be a variety of computer devices with a display screen including, but not limited to, smartphones, tablet computers, laptop and desktop computers, and the like. When the computer device is software, the computer device may be installed in the above-listed computer device. Which may be implemented as multiple software or software modules (e.g., to provide distributed services), or as a single software or software module, without limitation.
When the computer equipment is hardware, a display device and a camera can be arranged on the computer equipment, the display device can be various equipment capable of realizing the display function, and the camera is used for collecting video streams; for example: the display device may be a cathode ray tube display (cathode ray tube display, CR), a light-emitting diode display (light-emitting diode display, LED), an electronic ink screen, a liquid crystal display (liquid crystal display, LCD), a plasma display panel (plasma displaypanel, PDP), or the like. A user may utilize a display device on a computer device to view displayed text, pictures, video, etc.
It should be understood that the number of computer devices, networks, and servers in fig. 1 are illustrative only. Any number of computer devices, networks, and servers may be used as desired for implementation.
The method for monitoring the data quality according to the embodiment of the present application will be described in detail with reference to fig. 2. The device for monitoring data quality in the embodiment of the present application may be a computer device shown in fig. 1.
Referring to fig. 2, a flow chart of a method for monitoring data quality is provided in an embodiment of the present application. As shown in fig. 2, the method according to the embodiment of the present application may include the following steps:
s201, collecting service data, checking the service data by utilizing each checking rule in the checking rule base set, and outputting an abnormal data set corresponding to each checking rule.
The computer device may periodically collect service data through a data center table, where the service data is composed of a plurality of entity objects, and the entity objects refer to a data set to be checked, and the number of the data sets to be checked is one or more, for example: traffic data includes, but is not limited to: PMS (power production management system, engineering production management system) equipment ledgers, ERP (enterprise resource planning, enterprise resource management) asset cards, service system operation and inspection operations, and other data. The inspection rule base is composed of a plurality of inspection rules for inspecting attribute values of respective attributes in the entity object, for example: integrity check, consistency check, accuracy check, etc., i.e., each check rule corresponds to a data quality indicator.
The computer equipment can adopt a Map-Reduce algorithm to check service data, and the specific process is as follows: and when each Mapper is started, calling a setup function to load into an inspection rule library, determining an entity object to be inspected and attributes to be inspected in the entity object by the setup function according to each inspection rule to generate data to be inspected, determining the attributes of the data to be inspected by the inspection rule, and storing the determined data to be inspected in a memory for calling by a subsequent Map function. The nodes of the Map function read the data to be checked from the HDFS (Hadoop Distributed File System, distributed file system) in a streaming mode, the data to be checked are checked in batches according to the rule expression, and the data in the process are not stored in the memory completely, so that the memory space is saved. And when the Mapper works, calling a clearup function, and finally outputting abnormal data sets of each inspection rule, wherein each abnormal data set comprises a plurality of pieces of abnormal data, namely data to be inspected which does not meet the inspection rule, the abnormal data sets are stored according to the classification labels, and different inspection rules have different classification labels.
The computer equipment extracts corresponding attribute values from the corresponding entity objects according to the attributes of the inspection rules, and generates data to be inspected according to the extracted attribute values. For example, referring to the schematic diagram of the inspection rule base shown in table 1, a plurality of inspection rules are set in the inspection rule base, and each inspection rule includes a rule code, a rule name, an entity object, an attribute of data to be inspected, and a rule expression. The attribute values included in the data to be checked corresponding to the rule 1 are voltage class, equipment type, manufacturer, capacity, commissioning date, asset code and the like, and the attribute values included in the data to be checked corresponding to the rule 2 are voltage class, asset code, asset type and manufacturer; the attribute values included in the data to be checked corresponding to the rule 3 are rated power, delivery date and load. And checking each piece of data to be checked according to the rule expression, if not, identifying the data to be checked as abnormal data, otherwise, identifying the data to be checked as normal data, and generating a corresponding abnormal data set according to the checking result of each checking rule.
TABLE 1
S202, identifying abnormal data which do not meet the data quality criterion in each abnormal data set according to a mean shift algorithm, and generating a data quality list according to the identification result.
The data in each abnormal data set comprises a plurality of attribute values, each attribute value represents a factor, data quality analysis is carried out according to each data in the abnormal data set, and data with abnormal data quality are identified. The data quality recognition analysis training model quantifies the degree of correlation between data from the correlation coefficients among data characteristics, business logic and importance data, and recognizes data with strong correlation through the sequencing of the correlation coefficients. And obtaining each abnormal data set, calculating the correlation between each factor and the influence degree rate by quantifying the influence degree rate of the abnormal data set, identifying the data quality abnormal factor, and analyzing the data quality abnormal factor by using a clustering algorithm to determine a data quality criterion. And comparing the current data with a data quality criterion so as to judge whether the data has abnormal data quality problems.
Further, the influence degree rate is quantized, the correlation coefficient between the abnormal factor and the data quality rate is calculated by using a distance correlation algorithm, the correlation coefficient between a plurality of factors and the data quality is sequenced by using a correlation sequencing algorithm, and the data quality factor is finally determined. And then, analyzing the factors by adopting a clustering algorithm to the determined data quality factors to calculate a data quality criterion. The clustering algorithm can analyze the distribution position of the data with the maximum density in the data quality factor by adopting a Mean Shift algorithm, and the method randomly selects one data x in the abnormal data set as a current reference point according to the formula And calculating an offset mean value of the current datum point x, wherein g (x-xi)/h I2) is a kernel function, h is the size of a kernel, and the result value represents the weight of each sample point. If I M (x)>Move the point toThen, with this as a new start, the offset average is recalculated. Where ∈is the set threshold. After n movements, when m (x n ) When the coordinate value of the point is x, the point is considered to be moved to the place where the data is the most dense n I.e. the cluster center, the data quality criterion is |x-x n I < k×e, e is the standard deviation of the outlier dataset, x n As a data quality criterion, x is any one data in the abnormal data set, and k is a positive real number. x is taken as a datum point, x i Points other than this reference point are identified. Using the formula |x-x n And judging whether the data in the abnormal data set meets the data quality criterion or not. If the data quality is satisfiedAnd judging that the data has abnormal data quality problems according to the criterion. And generating a data quality list according to the identification result, and storing the data quality list in a classified mode according to classification labels, wherein each classification label corresponds to one data quality index, namely the data quality index, and the data quality index comprises correctness, completeness, consistency and effectiveness. The data quality list includes a plurality of data quality sets, each data quality set corresponding to a data quality indicator.
In some embodiments of the present application, indexes may be respectively established for each data quality set in the data quality list according to physical logic, so as to facilitate the user to perform the search query.
And S203, calculating the evaluation score of each data quality set according to the data quality index, and if the evaluation score is greater than a score threshold value, carrying out alarm prompt.
Wherein, the application is configured with a data quality index set. The set of data quality indicators comprises a plurality of data quality indicators, for example: the data quality index comprises correctness, integrity, consistency and validity, each data quality index is provided with a plurality of corresponding data quality index rules, each data quality index rule is used for checking each data in the data set, determining the number of data pieces which do not meet each data quality index rule, and then normalizing the number of data pieces, namely limiting the number of data pieces to a certain fixed range, for example: within [0, 100], each data quality index rule is configured with a weight, the size of the weight is between [0,1], and the sum of the weights of each data quality index rule is equal to 1. And calculating the evaluation score of the data quality index according to the weight and the normalized data number, and if the evaluation score is greater than a score threshold value, carrying out alarm prompt to prompt that the data quality index is abnormal. The alarm prompt may be: alerts are provided on the user's computer device in the form of sounds, messages or charts. In the embodiment of the present application, the score threshold values of the data quality indexes may be the same, or different score threshold values may be set for different data quality indexes, which is not limited in the present application.
For example, a data quality index set tableShown as i= { I 1 ,I 2 ,……I n And n is an integer greater than 1, namely n data quality indexes are included, and the data quality list correspondingly includes n data quality sets, namely data quality sets 1-n, and each data quality set corresponds to one data quality index. Suppose that for the 2 nd data quality index I in the data quality index set 2 For the data quality index I 2 4 data index rules, namely rule 1, rule 2, rule 3 and rule 4, are configured, the data quality set 2 is checked by the 4 rules, the checking result is that the number of data bars after normalization processing which does not meet rule 1 is 100, the number of data bars after normalization processing which does not meet rule 2 is 80, the number of data bars after normalization processing which does not meet rule 3 is 60, and the number of data bars after normalization processing which does not meet rule 4 is 40. Assuming that the weights of rule 1 to rule 4 are 0.25, then calculate the data quality index I 2 The evaluation score of (2) is 100×0.25+80×0.25+60×0.25+40×0.25=70, and if the set score threshold is 50, the calculated evaluation score is greater than the score threshold, and an alarm prompt is given.
When monitoring the business data of the production equipment, firstly, the embodiment of the application sets the data checking rule according to the business rule, the business index standard and the data element standard, constructs a data checking model and sets the data checking rule sample data from the aspects of account and card object consistency, business data integrity and cost data accuracy. Secondly, constructing an autonomous identification service component with abnormal data quality based on work orders, multidimensional, inspection and switching cost data generated by equipment, asset value and equipment life cycle, automatically checking the data quality problem of the equipment life cycle, replacing manual checking work, and improving the data quality checking work efficiency; and thirdly, the data quality monitoring and early warning device is applied by integrating data quality analysis, monitoring and early warning, and the data quality problem exists in the process of monitoring, early warning and analyzing the data from the whole process and multiple dimensions of the production cost quantized data link, so that effective support is provided for the production cost lean management of the data quality of enterprise equipment.
The following are device embodiments of the present application, which may be used to perform method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.
Referring to fig. 3, a schematic structural diagram of a data quality monitoring device according to an exemplary embodiment of the present application is shown, and the device 3 is hereinafter referred to as "device 3". The apparatus 3 may be implemented as all or part of a computer device by software, hardware or a combination of both. The device 3 comprises: acquisition unit 301, generation unit 302, suggestion unit 303.
The collecting unit 301 is configured to collect service data, and utilize each inspection rule in the inspection rule base set to inspect the service data, and output an abnormal data set corresponding to each inspection rule;
a generating unit 302, configured to identify data that does not meet a data quality criterion in each abnormal data set according to a mean shift algorithm, and generate a data quality list according to the identification result; wherein the data quality list comprises a plurality of data quality sets, and each data quality set corresponds to one data quality index;
and the prompting unit 303 is configured to calculate an evaluation score of each data quality set according to the data quality index, and if the evaluation score is greater than the score threshold, perform alarm prompting.
In one or more possible embodiments, the business data is collected in a data center station and checked based on Map-Reduce detection algorithm.
In one or more possible embodiments, each data quality indicator is configured with a plurality of data quality indicator rules, each data quality indicator rule configured with a weight;
wherein the calculating the evaluation score of each data quality set according to the data quality index comprises:
counting the number of data which do not meet each data index rule in the data quality set, and carrying out normalization processing on the number of data;
and calculating the evaluation score of the data quality index corresponding to the data quality set according to the pre-configured weight and the normalized data number.
In one or more possible embodiments, if the data x satisfies the equation |x-x n The data x is judged to be abnormal data if the I is less than k×e; wherein e is the standard deviation of the abnormal data set, x n As a data quality criterion, x is any one data in the abnormal data set, and k is a positive real number.
In one or more possible embodiments, classification labels are respectively set to each data quality set in the data quality list, and each classification label corresponds to one data quality index.
In one or more possible embodiments, the checking rules include: rule codes, rule names, entity objects, attributes, and rule expressions.
In one or more possible embodiments, the business data includes: equipment ledgers and asset cards.
It should be noted that, when the apparatus 3 provided in the foregoing embodiment performs the data quality monitoring method, only the division of the foregoing functional modules is used as an example, and in practical application, the foregoing functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the foregoing functions. In addition, the device for monitoring data quality and the method for monitoring data quality provided in the foregoing embodiments belong to the same concept, which embody detailed implementation procedures in the method embodiment, and are not described herein again.
The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.
The embodiment of the present application further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, where the instructions are adapted to be loaded by a processor and execute the method steps of the embodiment shown in fig. 2, and the specific execution process may refer to the specific description of the embodiment shown in fig. 2, which is not repeated herein.
The present application also provides a computer program product storing at least one instruction that is loaded and executed by the processor to implement the method of monitoring data quality as described in the various embodiments above.
Referring to fig. 4, a schematic structural diagram of a computer device is provided in an embodiment of the present application. As shown in fig. 4, the computer device 400 may include: at least one processor 401, at least one network interface 404, a user interface 403, a memory 405, and at least one communication bus 402.
Wherein communication bus 402 is used to enable connected communications between these components.
The user interface 403 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 403 may further include a standard wired interface and a standard wireless interface.
The network interface 404 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.
Wherein the processor 401 may include one or more processing cores. The processor 401 connects the various parts within the overall computer device 400 using various interfaces and lines, performs various functions of the computer device 400 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 405, and invoking data stored in the memory 405. Alternatively, the processor 401 may be implemented in at least one hardware form of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 401 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 401 and may be implemented by a single chip.
The Memory 405 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 405 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 405 may be used to store instructions, programs, code sets, or instruction sets. The memory 405 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described various method embodiments, etc.; the storage data area may store data or the like referred to in the above respective method embodiments. The memory 405 may also optionally be at least one storage device located remotely from the aforementioned processor 401. As shown in fig. 4, an operating system, a network communication module, a user interface module, and application programs may be included in the memory 405, which is one type of computer storage medium.
In the computer device 400 shown in fig. 4, the user interface 403 is mainly used as an interface for providing input for a user, and obtains data input by the user; the processor 401 may be configured to invoke an application program stored in the memory 405, and specifically execute the method shown in fig. 2, and the specific process may be shown in fig. 2, which is not repeated herein.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, or the like.
The foregoing disclosure is only illustrative of the preferred embodiments of the present application and is not intended to limit the scope of the claims herein, as the equivalent of the claims herein shall be construed to fall within the scope of the claims herein.

Claims (10)

1. A method for monitoring data quality, comprising:
collecting service data, checking the service data by utilizing each checking rule in a checking rule base set, and outputting an abnormal data set corresponding to each checking rule;
identifying data which do not meet the data quality criterion in each abnormal data set according to a mean shift algorithm, and generating a data quality list according to the identification result; wherein the data quality list comprises a plurality of data quality sets, and each data quality set corresponds to one data quality index;
and calculating the evaluation score of each data quality set according to the data quality index, and if the evaluation score is greater than the score threshold value, carrying out alarm prompt.
2. The method of claim 1, wherein service data is collected in a data center station and checked based on Map-Reduce detection algorithm.
3. A method according to claim 1 or 2, wherein each data quality indicator is configured with a plurality of data quality indicator rules, each data quality indicator rule being configured with a weight;
wherein the calculating the evaluation score of each data quality set according to the data quality index comprises:
counting the number of data which do not meet each data index rule in the data quality set, and carrying out normalization processing on the number of data;
and calculating the evaluation score of the data quality index corresponding to the data quality set according to the pre-configured weight and the normalized data number.
4. A method according to claim 3, wherein if the data x satisfies the formula |x-x n The data x is judged to be abnormal data if the I is less than k×e; wherein e is the standard deviation of the abnormal data set, x n As a data quality criterion, x is any one data in the abnormal data set, and k is a positive real number.
5. The method according to claim 1, 2 or 4, wherein classification labels are respectively set for the respective data quality sets in the data quality list, each classification label corresponding to a data quality indicator.
6. The method of claim 5, wherein checking the rule comprises: rule codes, rule names, entity objects, attributes, and rule expressions.
7. The method of claim 1 or 2 or 4 or 6, wherein the service data comprises: equipment ledgers and asset cards.
8. A data quality monitoring device, comprising:
the system comprises an acquisition unit, a detection unit and a storage unit, wherein the acquisition unit is used for acquiring service data, detecting the service data by utilizing each detection rule in a detection rule base set and outputting an abnormal data set corresponding to each detection rule;
the generating unit is used for identifying the data which do not meet the data quality criterion in each abnormal data set according to the mean shift algorithm and generating a data quality list according to the identification result; wherein the data quality list comprises a plurality of data quality sets, and each data quality set corresponds to one data quality index;
and the prompting unit is used for calculating the evaluation score of each data quality set according to the data quality index, and if the evaluation score is greater than the score threshold value, alarming and prompting are carried out.
9. A computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method steps of any one of claims 1 to 7.
10. A computer device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1-7.
CN202311349035.1A 2023-10-18 2023-10-18 Data quality monitoring method and device, storage medium and computer equipment Pending CN117371856A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311349035.1A CN117371856A (en) 2023-10-18 2023-10-18 Data quality monitoring method and device, storage medium and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311349035.1A CN117371856A (en) 2023-10-18 2023-10-18 Data quality monitoring method and device, storage medium and computer equipment

Publications (1)

Publication Number Publication Date
CN117371856A true CN117371856A (en) 2024-01-09

Family

ID=89401831

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311349035.1A Pending CN117371856A (en) 2023-10-18 2023-10-18 Data quality monitoring method and device, storage medium and computer equipment

Country Status (1)

Country Link
CN (1) CN117371856A (en)

Similar Documents

Publication Publication Date Title
CN111401777B (en) Enterprise risk assessment method, enterprise risk assessment device, terminal equipment and storage medium
CN110163457A (en) A kind of abnormal localization method and device of operational indicator
CN115409283A (en) Equipment failure prediction method, equipment failure prediction device, equipment and storage medium
CN112861895B (en) Abnormal article detection method and device
CN113516417A (en) Service evaluation method and device based on intelligent modeling, electronic equipment and medium
CN113095931A (en) Post-loan risk monitoring method and device and computer equipment
CN112163154A (en) Data processing method, device, equipment and storage medium
CN116431498A (en) Performance test method and device, electronic equipment and computer readable storage medium
CN110910061A (en) Material management method, material management system, storage medium and electronic equipment
CN117371856A (en) Data quality monitoring method and device, storage medium and computer equipment
CN115730284A (en) Method, device, equipment and storage medium for controlling authority of report data
CN115660451A (en) Supplier risk early warning method, device, equipment and medium based on RPA
CN114510405A (en) Index data evaluation method, index data evaluation device, index data evaluation apparatus, storage medium, and program product
CN113052509A (en) Model evaluation method, model evaluation apparatus, electronic device, and storage medium
CN115174353A (en) Fault root cause determination method, device, equipment and medium
CN114741392A (en) Data query method and device, electronic equipment and storage medium
CN113537519A (en) Method and device for identifying abnormal equipment
CN112231299A (en) Method and device for dynamically adjusting feature library
CN111865696A (en) Visualization method, device, equipment and medium for network security
CN116911805B (en) Resource alarm method, device, electronic equipment and computer readable medium
CN104217093A (en) Method and apparatus for identifying root cause of defect using composite defect map
CN115879826B (en) Fine chemical process quality inspection method, system and medium based on big data
CN116029578B (en) Service object business level detection method, device, electronic equipment and storage medium
CN115796710B (en) Intelligent sampling inspection method and device for power supplies, electronic equipment and readable storage medium
CN114579619B (en) Data query method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination