CN112052138A - Service data quality detection method and device, computer equipment and storage medium - Google Patents

Service data quality detection method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112052138A
CN112052138A CN202010899921.1A CN202010899921A CN112052138A CN 112052138 A CN112052138 A CN 112052138A CN 202010899921 A CN202010899921 A CN 202010899921A CN 112052138 A CN112052138 A CN 112052138A
Authority
CN
China
Prior art keywords
detection
data
service data
data table
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010899921.1A
Other languages
Chinese (zh)
Inventor
胡立波
张茜
侯宗元
郑玉桂
张敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010899921.1A priority Critical patent/CN112052138A/en
Publication of CN112052138A publication Critical patent/CN112052138A/en
Priority to PCT/CN2020/135593 priority patent/WO2021147559A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2358Change logging, detection, and notification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application belongs to the technical field of big data, and relates to a service data quality detection method, which comprises the steps of receiving a service data detection task, and obtaining detection parameters including a data table name, a library name, a detection queue and a detection type; accessing a database to determine a data table based on the data table name and the library name, and acquiring service data to be detected according to the data table; performing metadata analysis and metadata identification on the service data to be detected to obtain analysis identification data; and determining at least one detection element according to the detection type, determining the allocated resources according to the detection queue, enabling the detection element to detect the analysis identification data based on the allocated resources, and outputting a detection result. The application also provides a service data quality detection device, a computer device and a storage medium. In addition, the application also relates to a block chain technology, and the service data to be detected acquired according to the data table can be stored in the block chain. The data detection of different dimensions can be automatically realized, and the detection is more comprehensive and intelligent, and the efficiency is higher.

Description

Service data quality detection method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of big data technologies, and in particular, to a method and an apparatus for detecting quality of service data, a computer device, and a storage medium.
Background
The monitoring system is a system for realizing data storage, acquisition and monitoring of an environment by utilizing technologies such as computers, control and the like, common monitoring systems such as Zabbix, Nagios, Cati and the like belong to the category of operation and maintenance monitoring systems, can support index monitoring of various aspects such as hardware information, CPU, memory, network, disk space performance, data volume, data increment and the like, cannot support monitoring of data quality with business logic, the monitoring of the data quality comprises detecting whether the data volume, data value and the like are abnormal or not, the current detection work is completely manually processed, the processing process consumes time and labor, and data problems cannot be comprehensively found.
Disclosure of Invention
An embodiment of the present application provides a method and an apparatus for detecting quality of service data, a computer device, and a storage medium, so as to solve the problems of low detection efficiency and incomplete detection in the prior art that a manual processing manner is adopted for detecting quality of service data.
In order to solve the above technical problem, an embodiment of the present application provides a method for detecting quality of service data, which adopts the following technical solutions:
a service data quality detection method comprises the following steps:
receiving a service data detection task, and acquiring corresponding detection parameters according to the service data detection task, wherein the detection parameters at least comprise a data table name, a library name, a detection queue and a detection type;
accessing a database and determining a data table based on the data table name and the library name, acquiring service data to be detected stored in the database according to the data table, and performing metadata analysis and metadata identification on the service data to be detected to obtain analysis identification data;
and determining at least one detection element according to the detection type, determining distributed resources according to the detection queue, enabling the detection element to detect the analysis identification data based on the distributed resources, and outputting a detection result.
In order to solve the above technical problem, an embodiment of the present application further provides a device for detecting quality of service data, which adopts the following technical solutions:
a service data quality detection apparatus, comprising:
the system comprises a parameter acquisition module, a data processing module and a data processing module, wherein the parameter acquisition module is used for receiving a service data detection task and acquiring corresponding detection parameters according to the service data detection task, and the detection parameters at least comprise a data table name, a library name, a detection queue and a detection type;
the data acquisition module is used for accessing a database and determining a data table based on the data table name and the library name, acquiring to-be-detected service data stored in the database according to the data table, and performing metadata analysis and metadata identification on the to-be-detected service data to obtain analysis identification data;
and the detection module is used for determining at least one detection element according to the detection type, determining distributed resources according to the detection queue, enabling the detection element to detect the analysis identification data based on the distributed resources and outputting a detection result.
In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:
a computer device comprising a memory having computer readable instructions stored therein and a processor that when executed implements the steps of a method for quality of service data detection as described above.
In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:
a computer readable storage medium having computer readable instructions stored thereon, which when executed by a processor, implement the steps of the service data quality detection method as described above.
Compared with the prior art, the service data quality detection method, the service data quality detection device, the computer equipment and the storage medium provided by the embodiment of the application have the following main beneficial effects:
the application can automatically realize acquisition, analysis and identification of the service data to be detected after receiving a service data detection task, realize modular detection through a detection element, automatically realize data detection of different dimensions, the detection can be more comprehensive and intelligent, the efficiency is higher, the manpower input can be reduced simultaneously, particularly for the service data running on line, the abnormal change of indexes can be monitored in real time, the early and timely data abnormity discovery is facilitated, and the usability, the stability and the accuracy of the service data are improved.
Drawings
In order to more clearly illustrate the solution of the present application, the drawings needed for the description of the embodiments of the present application will be briefly described below, and the drawings in the following description correspond to some embodiments of the present application, and it will be obvious to those skilled in the art that other drawings can be obtained from the drawings without inventive effort.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a method for quality of service data detection according to the present application;
fig. 3 is a schematic structural diagram of an embodiment of a service data quality detection apparatus according to the present application;
FIG. 4 is a schematic block diagram of one embodiment of a computer device according to the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and in the claims of the present application or in the drawings described above, are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that, the service data quality detection method provided in the embodiment of the present application is generally executed by a server, and accordingly, the service data quality detection apparatus is generally disposed in the server.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continuing reference to fig. 2, a flow diagram of one embodiment of a method of quality of service data detection in accordance with the present application is shown. The service data quality detection method comprises the following steps:
s201, receiving a service data detection task, and acquiring corresponding detection parameters according to the service data detection task, wherein the detection parameters at least comprise a data table name, a library name, a detection queue and a detection type;
s202, accessing a database and determining a data table based on the data table name and the library name, acquiring to-be-detected service data stored in the database according to the data table, and performing metadata analysis and metadata identification on the to-be-detected service data to obtain analysis identification data;
s203, determining at least one detection element according to the detection type, determining distributed resources according to the detection queue, enabling the detection element to detect the analysis identification data based on the distributed resources, and outputting a detection result.
The above steps are explained in the following.
In step S201, before the newly developed data table is online, or after the data table is updated, the service data is detected, so that the abnormal data in the data table can be detected in time, and the newly developed data table can reach the online standard of the service, or the updated data table can continuously meet the online standard of the service.
In this embodiment of the application, the service data detection task may be submitted by a task submitting end, for example, a user submits through a Web page or a terminal, when a plurality of service data detection tasks are submitted at the same time, task information may be written into a task table of a relational database PG, the data detection end accesses the task table regularly to determine whether the detection task needs to be executed, the task table access may be implemented by setting a daemon process, a plurality of daemon processes may be set for data quality detection of a service, each daemon process runs on different nodes of a cluster with identities of different virtual users (having different data permissions), and each daemon process initiates an access request at intervals to determine whether a task needs to be executed. And when a plurality of task information exists in the task table, sequentially executing detection operation on the service data detection tasks in the task table.
In this embodiment, the service data detection task relates to information such as a data table and a detection type that need to be detected, and specifically corresponds to detection parameters including a data table name, a library name, a detection queue, a detection type, and the like, and the resource used for each detection and the content to be detected can be determined by the service data detection task.
Specifically, the data table name is used for determining the data table to be detected.
The library name is used for determining a database for storing the data table to be detected.
The detection queues are used for determining at least one processing queue from a plurality of existing processing queues to perform detection operation, each processing queue is allocated with independent detection resources, detection queues with different detection resources can be selected according to the size of data volume contained in a data table to be detected, or when a plurality of detection tasks exist, the detection tasks are allocated to different detection queues to perform synchronous detection, and specifically, at least one default processing queue can be automatically selected according to a library name and preset configuration information.
The detection type is used to determine what kind of detection is performed on the data table to be detected, for example, whether the value (or value range) of the entire data volume or some data field or fields is abnormal is detected.
In this embodiment, the detection type at least includes a statistical type and a predictive type, where when the detection type is the statistical type, the detection type is specifically a statistical operation performed on one or more statistical items on the data table, and the statistical result may be used for data monitoring in the service operation process; when the detection type is a prediction type, the data table is subjected to abnormality detection based on an abnormality detection model.
In further embodiments, the statistical types include descriptive statistics, trend statistics, comparative statistics, and the like.
The descriptive statistics and the trend statistics automatically calculate a plurality of preset indexes, such as logarithmic fields, which can include indexes such as record number, maximum value, minimum value, mean value, quantile, saturation and the like; for the non-numerical field, indexes such as record number and saturation can be included; for the enumeration value type field, indexes such as distribution of each enumeration value can be included; when the data table is detected, part of indexes can be selected for detection, on one hand, all dimensions of the data table are summarized and described through the indexes, and on the other hand, whether the data are abnormal or not is detected through whether the index values are reasonable or abnormal, so that the detection results of part of the indexes can be used for subsequent data monitoring, and the detection results of part of the indexes can be used for improving the descriptive information of the data table.
The comparison statistics is specifically performed according to preset field check rules, for example, preset enumeration value range, data type value range, field coding and other check rules, for example, formatted mobile phone number field should be 11-digit pure numbers, or the value range of gender field cannot exceed male, female and unknown.
In a further embodiment, the anomaly detection model may adopt an isolated forest anomaly detection model, and in an isolated forest algorithm adopted by the isolated forest anomaly detection model, an "anomaly point" is defined as an "outlier point which is easy to isolate", and is sparsely distributed and far away from a population with high density. According to the method and the device, the characteristics and the split values are randomly selected on the data set formed by the service data to be detected, a plurality of random trees are constructed, due to the fact that the abnormal points are rare, the distribution is sparse, the abnormal points can be distinguished easily, the distance from the root node is short, and therefore abnormal data can be detected. Compared with manual evaluation, the judgment standard is easy to define when the model prediction is adopted, and the characteristics of different data can be fully considered. When the anomaly detection model is adopted for anomaly detection, date factors such as month, working day and holiday are considered, and the condition that the data are normal and the data are not reported abnormally is avoided.
For step S202, only one or more databases may be accessed, and accordingly, the data table and the service data to be detected may be stored in one or more databases.
In this embodiment, the metadata analysis refers to acquiring a table building statement of a data table in the system by executing a hiveddl command, acquiring column information (which columns and columns are similar, and column remark information), table information (table time, data compression format, and the like), table data information (partition, number of files, file size, and the like) and the like in the service data to be detected according to the table building statement, and then analyzing and storing the table data information as structured data through a python program for other subsequent uses.
Further, the metadata identification may automatically identify a data type, when the service data is imported into Hive from the relational database, values (numerical values, dates, and the like) of each type are stored according to a string format, that is, stored as text fields, in the metadata identification in the embodiment of the present application, parameter data analyzed by using the metadata is identified, a real type of the text field is determined by using a regular expression, an original type of the service data is identified, and data types such as integer, floating point, dates, and the like may be specifically identified, for example: a string beginning with + or-followed by all 0-9 digits would be considered numeric; a pattern such as xxxx-yy-zz, where x, y, and z are positive integers and are within a reasonable range, would be considered to be a date pattern.
And analyzing the service data to be detected by the metadata analysis and the metadata identification to obtain analysis identification data, wherein the analysis identification data is structured data with a determined data type.
In some embodiments, the obtaining, according to the data table, the service data to be detected stored in the database includes: determining the data volume contained in the data table, and judging whether the data volume of the data table is larger than a preset threshold value or not; and when the data volume of the data table is not greater than a preset threshold value, directly acquiring the service data to be detected according to the data table, otherwise, randomly extracting a preset number of data from the data table to generate a temporary data table, and acquiring the service data to be detected according to the temporary data table.
Specifically, the preset number does not exceed the preset threshold, for example, 30 ten thousand. When the temporary data table is generated, metadata analysis and metadata identification are subsequently carried out on the temporary data table, and during actual processing, the metadata identification relates to compatibility among data types, such as a column of data, most of numbers, a small amount of character strings, and the whole column can only be classified into the character strings, and in principle, random sampling of partial data is carried out for judgment. When the service data is detected aiming at the temporary data table, although the accuracy of the detection result is reduced compared with that of full data, the detection time consumption of the detection element can be effectively reduced when an overlarge data table is faced, the detection stability of the detection element is improved, and the temporary data table is particularly effective in the detection process that the full data is time-consuming, the result is not particularly accurate (such as quantiles, the accurate quantiles relate to full data sorting, and the accurate value cannot be calculated generally). In addition, in other embodiments, when the data amount of the data table is too large and the detection process is executed through Spark, the detection efficiency and stability can also be improved by optimizing the resource parameter configuration of Spark and using Spark dataframe to replace Spark sql in part of the calculation process.
In some embodiments, after said determining the amount of data contained by said data table, said method further comprises: determining the minimum data quantity required by each detection element to perform detection; and judging whether the minimum data quantity required by each detection element in the step is larger than the data quantity contained in the data table or not, and rejecting the detection element corresponding to the minimum data quantity larger than the data quantity contained in the data table. During actual detection, some detection elements have requirements on the data volume of data to be detected, for example, when a data table is subjected to model prediction to detect data abnormality, the data volume is too small, which causes the abnormality of a model training process and further causes inaccurate data detection, and at this time, model evaluation is skipped, and only corresponding records are made in an output detection result table.
In some embodiments, prior to said determining at least one detection element according to said detection type, said method further comprises: and acquiring preset special character recognition configuration information, and performing metadata recognition according to the characteristic character recognition configuration information. By adopting the step, in order to improve the accuracy of metadata identification, for example, when a 'NULL' character string exists in a date type field, the character string may be mistakenly identified but not be identified as the date type, accurate identification can be realized through preset special character configuration information, similar special characters also have redundant spaces at the head and the tail of the character string, and the like.
For step S203, the resources allocated to the detection queue may include a processor, a storage space, and the like.
The detection result data which is detected and output by the detection element on the analysis identification data can be written into a PG database, the result data can be checked or downloaded through reports, and different reports or result data can be generated aiming at different mechanisms because the service data is usually butted with a plurality of mechanisms or departments, so that the safety and the privacy of the data are ensured. In addition, the detection element will update the log information in real time as it performs the detection.
In this step, it is mentioned that whether there is a task to be executed is determined by the daemon process, and when it is determined that there is a task, the detection elements are controlled by the daemon process to sequentially execute detection.
The above embodiments mentioned the detection types include statistical type and predictive type, each detection type corresponds to a detection element, and the corresponding detection element may be a descriptive statistical element, a trend statistical element, a comparison statistical element, a model predictive element, etc. In the embodiment of the present application, these elements exist in the form of encapsulated modules, specifically, each element is encapsulated SQL code automatically generated based on the type of detection to be performed, and the detection is performed by Spark.
In this embodiment, the detection elements to be executed by daemon are determined by detecting the parameter of the type, for example, if the parameter value of the detection type is "all", all the detection elements are executed in sequence, and if the parameter value of the detection type is "descriptive statistics", only the descriptive statistics element is executed. In the embodiment of the application, when the detection type is subjected to parameter assignment, the type of the detection element may be identified by a number, for example, "0" identifies all, "" 1 "identifies a descriptive statistical element," "2" identifies a trend statistical element, "" 3 "identifies a model prediction element," and the like.
In the embodiment of the present application, the executing component for metadata parsing and metadata identification may also be embodied in the form of a package component, and respectively corresponds to the metadata parsing component and the metadata identification component.
Correspondingly, when the data abnormity exists in the detected data abnormity, the summarizing and the output of the data abnormity problem can be realized by configuring a problem finding element. Specifically, the problem discovery component can automatically collect and summarize the possible problems in the data table according to the detection data acquired by the descriptive statistics, the trend statistics, the comparison statistics and the model prediction, and divide the severity of the data problem, so that the problem discovery component can be conveniently used for layered display, such as: the conflict of empty tables and main keys is a serious problem; saturation of a field below 30% is a generally serious problem. When data display is carried out, a filter is provided or the data display is split into different chart displays according to the severity of the problem, and reminding or warning can be set for the more serious data problem, for example, instant messaging software, a mailbox and the like are accessed to realize the sending of warning information. In the embodiment of the present application, the problem finding element is not necessary, for example, for the requirements of data exploration and data profile understanding in some scenarios, only descriptive statistical information such as saturation of each field, maximum value, minimum value, mean value of the numerical field, and the like needs to be output, and no exception information is involved, so the embodiment of the present application further includes determining whether each detection element is involved in data exception detection after determining at least one detection element according to the detection type, and if so, loading the problem finding element, otherwise, not loading the problem finding element. According to the embodiment of the application, all parts related to execution data detection are modularized, so that the data detection method and the data detection device can be conveniently called, and the detection flexibility is improved.
In some embodiments, prior to said causing said detection element to detect said resolved identification data based on said allocated resources, said method further comprises: and when at least two detection elements are determined according to the detection type, judging whether a dependency relationship exists between the detection elements, and if so, determining the execution sequence of the detection elements according to the dependency relationship. Specifically, when there are a plurality of detection elements, the detection of some detection elements may have a precedence order, for example, if the model prediction module depends on the trend statistics module, the model prediction module and the trend statistics module store a dependency relationship therebetween, at this time, the trend statistics module is executed before the model prediction module, and if the problem discovery module depends on all the previous detection elements, the problem discovery module is executed last.
In some embodiments, prior to said causing said detection element to detect said resolved identification data based on said allocated resources, said method further comprises: acquiring and analyzing auxiliary parameters, judging whether the assignment of each parameter item in the auxiliary parameters is null, screening the to-be-detected service data to be subjected to metadata analysis and metadata identification according to the assignment non-null parameter item, and screening the detection elements determined according to the detection types according to the assignment non-null parameter item. Specifically, the detection parameters may further include auxiliary parameters, which may be user-defined parameters, specifically, the auxiliary parameters may include parameter items such as a test field, a numerical field, a character field, an enumeration field, a service date field, a condition, a primary key, a virtual user, and the like, and in actual detection, the service data to be detected, which is to be subjected to metadata analysis and metadata identification, may be screened according to assignment conditions of the parameter items, and the detection elements determined according to the detection types may be screened. These parameter items are explained below.
In this embodiment, the test field is used to specify the field to be tested, if the parameter is null, all fields are evaluated by default, and if the parameter is not null, the subsequent metadata parsing and metadata identification are only for the field specified to be tested.
The numeric field is used for specifying which fields are numeric, the character field is used for specifying which fields are character, the enumeration field is used for specifying which fields are enumeration, if three parameters of the numeric field, the character field and the enumeration field are null, the result of metadata identification is used, and if the three parameters are not null, the metadata identification is only for the fields of which the types are not specified.
And if the parameter is null, the trend statistics and model prediction are not executed even if the detection type parameter comprises the trend statistics and model prediction.
The condition parameter is used for identifying whether to screen the data in the data table, for example, performing where condition screening on the test table, and if the parameter is null, not performing condition screening;
the primary key is used for identifying whether to carry out uniqueness test on the data table, if the parameter is input, the uniqueness test of the primary key or the combined primary key is executed, and if the parameter is null, the uniqueness test of the primary key is not executed;
the virtual user is used for designating the execution script virtual user during data detection, and if the execution script virtual user is empty, the default virtual user is selected.
In some embodiments, said determining at least one detection element according to said detection type comprises: and judging whether a detection element corresponding to the detection type exists at present, if so, directly acquiring the corresponding detection element, otherwise, generating a new detection element corresponding to the detection type based on the detection type. For example, the descriptive statistics element is generated based on the detection type of the descriptive statistics, specifically, by obtaining corresponding descriptive statistics configuration information, which may include the category, data range, time range, statistical rule, etc. that the descriptive statistics needs to be counted, the descriptive statistics element is generated based on the configuration information. And for the generated detection elements, each detection element is packaged into a Python function which is independent of each other, each detection element can independently operate, and when detection is carried out, each detection element to be executed is connected in series by a shell main program to form an integral data quality detection function. Therefore, new detection elements can be generated according to detection requirements or corresponding detection elements can be deleted when certain detection requirements do not exist, and the flexibility is high.
The business data quality detection method provided by the embodiment of the application can automatically realize acquisition, analysis and identification of business data to be detected after receiving a business data detection task, realize modular detection through a detection element, automatically realize data detection of different dimensions, the detection can be more comprehensive and intelligent, the efficiency is higher, the manpower input can be reduced, especially for the business data running on line, the abnormal change of indexes can be monitored in real time, the data abnormity updating, incomplete updating, missing updating and the like can be found earlier and more timely, the problems of abnormal calculation logic, abnormal data index change and the like can be found timely, and the usability, stability and accuracy of the business data are improved. In addition, the embodiment of the application can automatically generate the basic information, the descriptive information and the like of the data table, which is beneficial to improving the efficiency of the work of data exploration, data combing and the like. The system is convenient to use, tasks can be submitted through a Web page, a report is automatically generated according to a detection result, and data can be checked and used in a visual mode.
It should be emphasized that, in order to further ensure the privacy and security of the information, the service data to be detected, which is obtained according to the data table, may also be stored in a node of a block chain.
The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, the processes of the embodiments of the methods described above can be included. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a service data quality detection apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.
As shown in fig. 3, the apparatus for detecting quality of service data according to this embodiment includes: a parameter acquisition module 301, a data acquisition module 302, and a detection module 303.
The parameter obtaining module 301 is configured to receive a service data detection task, and obtain corresponding detection parameters according to the service data detection task, where the detection parameters at least include a data table name, a library name, a detection queue, and a detection type; the data acquisition module 302 is configured to access a database and determine a data table based on the data table name and the library name, acquire to-be-detected service data stored in the database according to the data table, and perform metadata analysis and metadata identification on the to-be-detected service data to obtain analysis identification data; the detection module 303 is configured to determine at least one detection element according to the detection type, determine allocated resources according to the detection queue, enable the detection element to detect the parsing identification data based on the allocated resources, and output a detection result.
Specifically, before the newly developed data table is online, or after the data table is updated, the service data can be detected by the service data quality detection device, so that abnormal data in the data table can be detected in time, the newly developed data table can reach the online standard of the service, or the updated data table can continuously meet the online standard of the service.
In this embodiment of the application, the service data detection task may be submitted by a task submitting end, for example, a user submits through a Web page or a terminal, when a plurality of service data detection tasks are submitted at the same time, task information may be written into a task table of the relational database PG, the service data quality detection apparatus accesses the task table regularly to determine whether the detection task needs to be executed, the task table may be accessed by setting a daemon process, and the daemon process may refer to relevant contents of the above method embodiment specifically, and is not expanded here. When a plurality of task information exists in the task table, the service data quality detection device sequentially executes detection operations on the service data detection tasks in the task table.
In this embodiment, the service data detection task relates to information such as a data table and a detection type that need to be detected, and specifically corresponds to detection parameters including a data table name, a library name, a detection queue, a detection type, and the like, and the parameter obtaining module 301 may determine a resource used for each detection and a content to be detected through the service data detection task. Reference may be made to the above method embodiments for the relevant contents of the data table name, the library name, the detection queue, the detection type, etc., and no expansion is made here.
In this embodiment, there may be only one database accessed by the data obtaining module 302, or there may be multiple databases, that is, the data table and the service data to be detected may be stored in one or more databases.
In this embodiment, the data obtaining module 302 performs metadata analysis, that is, by executing the hive ddl command, obtaining a table building statement of a data table in the system, obtaining column information (which columns and columns are similar, and column remark information), table information (table time, data compression format, etc.), table data information (whether partitions exist, the number of files, the size of a file, etc.) and the like in the service data to be detected according to the table building statement, and then analyzing and storing the table data information as structured data through a python program for other subsequent uses.
Further, the data obtaining module 302 may automatically identify the data type through metadata identification, and when the service data is imported into Hive from the relational database, store the values (numerical values, dates, etc.) of each type according to a string format, that is, store the values as text fields, in the metadata identification in the embodiment of the present application, parameter data analyzed by using metadata is used, and the actual type of the text fields is determined through a regular expression, so as to identify the original type of the service data, and specifically, the data types such as integer, floating point, dates, and the like may be identified, for example: a string beginning with + or-followed by all 0-9 digits would be considered numeric; a pattern such as xxxx-yy-zz, where x, y, and z are positive integers and are within a reasonable range, would be considered to be a date pattern.
The data obtaining module 302 performs the above-mentioned metadata parsing and metadata identification on the service data to be detected, so as to obtain parsed identification data, where the parsed identification data is structured data with a determined data type.
In some embodiments, when the data obtaining module 302 obtains the to-be-detected service data stored in the database according to the data table, the data obtaining module is specifically configured to determine a data amount included in the data table, and determine whether the data amount of the data table is greater than a preset threshold; and when the data volume of the data table is not greater than a preset threshold value, directly acquiring the service data to be detected according to the data table, otherwise, randomly extracting a preset number of data from the data table to generate a temporary data table, and acquiring the service data to be detected according to the temporary data table. For the related contents of the temporary data table, reference may be made to the above method embodiments, which are not expanded herein.
In some embodiments, after the data obtaining module 302 determines the data amount included in the data table, the detecting module 303 is further configured to determine a minimum data amount required by each of the detecting elements to perform detection, determine whether the minimum data amount required by each of the detecting elements is greater than the data amount included in the data table, and eliminate the detecting element corresponding to the minimum data amount greater than the data amount included in the data table. During actual detection, some detection elements have requirements on the data volume of data to be detected, for example, when a data table is subjected to model prediction to detect data abnormality, the data volume is too small, which may cause abnormality in a model training process, and further cause inaccurate data detection, at this time, the detection module 303 skips model evaluation, and only makes corresponding records in an output detection result table.
In some embodiments, the detection module 303 is further configured to, before determining at least one detection element according to the detection type, acquire preset special character recognition configuration information, and perform metadata recognition according to the characteristic character recognition configuration information. By adopting the step, in order to improve the accuracy of metadata identification, for example, when a 'NULL' character string exists in a date type field, the character string may be mistakenly identified but not be identified as the date type, accurate identification can be realized through preset special character configuration information, similar special characters also have redundant spaces at the head and the tail of the character string, and the like.
In this embodiment, the detecting module 303 is specifically configured to determine resources, such as a processor and a storage space, to be called when the detection queue is configured to determine the resources allocated to the detection queue.
The detection result data output by the detection module 303 through detecting the analysis identification data by the detection element can be written into the PG database, and the result data can be checked or downloaded through a report. In addition, the detection element will update the log information in real time as it performs the detection.
It is mentioned above that whether there is a task to be executed is confirmed by the daemon process, and when it is confirmed that there is a task, the detection module 303 correspondingly controls the detection elements to execute detection in sequence through the daemon process.
The aforementioned detection types include statistical type and predictive type, each type corresponds to a detection element, and the detection elements called by the detection module 303 may include descriptive statistics elements, trend statistics elements, contrast statistics elements, model predictive elements, and the like. In the embodiment of the present application, these elements exist in the form of encapsulated modules, specifically, each element is encapsulated SQL code automatically generated based on the type of detection to be performed, and the detection is performed by Spark.
In this embodiment, the detecting module 303 determines the detecting element to be executed by daemon according to the parameter of the detection type, which may specifically refer to the above method embodiment, and is not expanded herein.
In the embodiment of the present application, the executing component for metadata parsing and metadata identification may also be embodied in the form of a package component, and respectively corresponds to the metadata parsing component and the metadata identification component. Correspondingly, when data abnormality is detected, summarizing and outputting of data abnormality problems can be realized by configuring a problem finding element, and reference can be specifically made to the above method embodiment, which is not expanded herein.
In some embodiments, before the detecting element detects the parsing identification data based on the allocated resource, the detecting module 303 is further configured to determine whether a dependency exists between the detecting elements when at least two detecting elements are determined according to the detection type, and if so, determine an execution order of the detecting elements according to the dependency. Specifically, when there are a plurality of detection elements, the detection of some detection elements may have a precedence order, for example, if the model prediction module depends on the trend statistics module, the model prediction module and the trend statistics module store a dependency relationship therebetween, at this time, the trend statistics module is executed before the model prediction module, and if the problem discovery module depends on all the previous detection elements, the problem discovery module is executed last.
In some embodiments, before the detection module 303 detects the analysis identification data based on the allocated resource, the parameter obtaining module 301 is further configured to obtain and analyze an auxiliary parameter, the data obtaining module 302 is further configured to determine whether an assignment of each parameter item in the auxiliary parameter is null, and screen the service data to be detected, which is to be subjected to metadata analysis and metadata identification, according to the parameter item whose assignment is non-null, and the detection module 303 is further configured to screen the detection element determined according to the detection type according to the parameter item whose assignment is non-null. The auxiliary parameter may be a user-defined parameter, specifically, the auxiliary parameter may include a test field, a numerical field, a character field, an enumeration field, a service date field, a condition, a primary key, a virtual user, and other parameter items, and the related contents of these parameter items may refer to the above method embodiments, which are not expanded herein. During actual detection, the service data to be detected, which is to be subjected to metadata analysis and metadata identification, can be screened according to the assignment conditions of the parameter items, and the detection elements determined according to the detection types can be screened.
In some embodiments, when determining at least one detection element according to the detection type, the detection module 303 is specifically configured to determine whether a detection element corresponding to the detection type exists currently, and if so, directly acquire the corresponding detection element, otherwise, generate a new detection element corresponding to the detection type based on the detection type. And for the generated detection elements, each detection element is packaged into a Python function which is independent of each other, each detection element can independently operate, and when detection is carried out, each detection element to be executed is connected in series by a shell main program to form an integral data quality detection function. Therefore, new detection elements can be generated according to detection requirements or corresponding detection elements can be deleted when certain detection requirements do not exist, and the flexibility is high.
The application provides a business data quality detection device can realize automatically after receiving the business data detection task that wait to detect the acquisition and analysis and the discernment of business data, and realize modular detection through detecting element, realize the data detection of different dimensions automatically, it can be more comprehensive to detect, intelligence, efficiency is higher, can reduce the human input simultaneously, especially to the business data of online operation, the unusual change of real time monitoring index, help earlier more timely discovery data anomaly update, incomplete update, leak data anomalies such as update, and in time discover that the computational logic is unusual, the unusual scheduling problem of change of data index, the usability of business data is improved, stability and accuracy. In addition, the embodiment of the application can automatically generate the basic information, the descriptive information and the like of the data table, which is beneficial to improving the efficiency of the work of data exploration, data combing and the like. The system is convenient to use, tasks can be submitted through a Web page, a report is automatically generated according to a detection result, and data can be checked and used in a visual mode.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4, fig. 4 is a block diagram of a basic structure of a computer device according to the present embodiment. The computer device 4 includes a memory 41, a processor 42, and a network interface 43, which are communicatively connected to each other through a system bus, where the memory 41 stores computer readable instructions, and the processor 42 implements the steps of the service data quality detection method in the foregoing method embodiment when executing the computer readable instructions, and has beneficial effects corresponding to the service data quality detection method, which are not expanded herein.
It is noted that only computer device 4 having memory 41, processor 42, and network interface 43 is shown, but it is understood that not all of the illustrated components are required to be implemented, and that more or fewer components may alternatively be implemented. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
In the present embodiment, the memory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 4. Of course, the memory 41 may also include both internal and external storage devices of the computer device 4. In this embodiment, the memory 41 is generally used for storing an operating system and various types of application software installed in the computer device 4, such as computer readable instructions corresponding to the service data quality detection method described above. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, for example, execute computer readable instructions corresponding to the service data quality detection method.
The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.
The present application further provides another embodiment, which is to provide a computer-readable storage medium, wherein the computer-readable storage medium stores computer-readable instructions, which are executable by at least one processor, so as to cause the at least one processor to perform the steps of the service data quality detection method described above, and have the corresponding beneficial effects with respect to the service data quality detection method, which are not expanded herein.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (10)

1. A service data quality detection method is characterized by comprising the following steps:
receiving a service data detection task, and acquiring corresponding detection parameters according to the service data detection task, wherein the detection parameters at least comprise a data table name, a library name, a detection queue and a detection type;
accessing a database and determining a data table based on the data table name and the library name, acquiring service data to be detected stored in the database according to the data table, and performing metadata analysis and metadata identification on the service data to be detected to obtain analysis identification data;
and determining at least one detection element according to the detection type, determining distributed resources according to the detection queue, enabling the detection element to detect the analysis identification data based on the distributed resources, and outputting a detection result.
2. The method for detecting the quality of service data according to claim 1, wherein the obtaining the service data to be detected stored in the database according to the data table comprises:
determining the data volume contained in the data table, and judging whether the data volume of the data table is larger than a preset threshold value or not; and when the data volume of the data table is not greater than a preset threshold value, directly acquiring the service data to be detected according to the data table, otherwise, randomly extracting a preset number of data from the data table to generate a temporary data table, and acquiring the service data to be detected according to the temporary data table.
3. The method of claim 2, wherein after the determining the amount of data contained in the data table, the method further comprises:
determining the minimum data quantity required by each detection element to perform detection;
and judging whether the minimum data quantity required by each detection element in the step is larger than the data quantity contained in the data table or not, and rejecting the detection element corresponding to the minimum data quantity larger than the data quantity contained in the data table.
4. The method of any of claims 1 to 3, wherein before said determining at least one detection element according to said detection type, said method further comprises: and acquiring preset special character recognition configuration information, and performing metadata recognition according to the characteristic character recognition configuration information.
5. The method of any of claims 1 to 3, wherein before said causing said detecting element to detect said parsed identification data based on said allocated resources, said method further comprises:
and when at least two detection elements are determined according to the detection type, judging whether a dependency relationship exists between the detection elements, and if so, determining the execution sequence of the detection elements according to the dependency relationship.
6. The method of any of claims 1 to 3, wherein before said causing said detecting element to detect said parsed identification data based on said allocated resources, said method further comprises:
acquiring and analyzing auxiliary parameters, judging whether the assignment of each parameter item in the auxiliary parameters is null, screening the to-be-detected service data to be subjected to metadata analysis and metadata identification according to the assignment non-null parameter item, and screening the detection elements determined according to the detection types according to the assignment non-null parameter item.
7. The method of any of claims 1 to 3, wherein the determining at least one detection element according to the detection type comprises:
and judging whether a detection element corresponding to the detection type exists at present, if so, directly acquiring the corresponding detection element, otherwise, generating a new detection element corresponding to the detection type based on the detection type.
8. A service data quality detection apparatus, comprising:
the system comprises a parameter acquisition module, a data processing module and a data processing module, wherein the parameter acquisition module is used for receiving a service data detection task and acquiring corresponding detection parameters according to the service data detection task, and the detection parameters at least comprise a data table name, a library name, a detection queue and a detection type;
the data acquisition module is used for accessing a database and determining a data table based on the data table name and the library name, acquiring to-be-detected service data stored in the database according to the data table, and performing metadata analysis and metadata identification on the to-be-detected service data to obtain analysis identification data;
and the detection module is used for determining at least one detection element according to the detection type, determining distributed resources according to the detection queue, enabling the detection element to detect the analysis identification data based on the distributed resources and outputting a detection result.
9. A computer device comprising a memory having computer readable instructions stored therein and a processor that when executed performs the steps of the method of quality of service data detection according to any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon computer-readable instructions, which, when executed by a processor, implement the steps of the service data quality detection method according to any one of claims 1 to 7.
CN202010899921.1A 2020-08-31 2020-08-31 Service data quality detection method and device, computer equipment and storage medium Pending CN112052138A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010899921.1A CN112052138A (en) 2020-08-31 2020-08-31 Service data quality detection method and device, computer equipment and storage medium
PCT/CN2020/135593 WO2021147559A1 (en) 2020-08-31 2020-12-11 Service data quality measurement method, apparatus, computer device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010899921.1A CN112052138A (en) 2020-08-31 2020-08-31 Service data quality detection method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112052138A true CN112052138A (en) 2020-12-08

Family

ID=73606615

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010899921.1A Pending CN112052138A (en) 2020-08-31 2020-08-31 Service data quality detection method and device, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN112052138A (en)
WO (1) WO2021147559A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597142A (en) * 2020-12-26 2021-04-02 中国农业银行股份有限公司 Data quality detection method and data quality detection engine
CN112613892A (en) * 2020-12-25 2021-04-06 北京知因智慧科技有限公司 Data processing method and device based on business system and electronic equipment
CN112632048A (en) * 2020-12-18 2021-04-09 恩亿科(北京)数据科技有限公司 Data quality detection method, system, electronic equipment and storage medium
CN113049935A (en) * 2021-03-04 2021-06-29 长鑫存储技术有限公司 Semiconductor intelligent detection system, intelligent detection method and storage medium
WO2021147559A1 (en) * 2020-08-31 2021-07-29 平安科技(深圳)有限公司 Service data quality measurement method, apparatus, computer device, and storage medium
CN113591485A (en) * 2021-06-17 2021-11-02 国网浙江省电力有限公司 Intelligent data quality auditing system and method based on data science
CN114186244A (en) * 2022-01-26 2022-03-15 中国电子信息产业集团有限公司 Data element operation framework and system
CN116701383A (en) * 2023-08-03 2023-09-05 中航信移动科技有限公司 Data real-time quality monitoring method, electronic equipment and storage medium
WO2023245893A1 (en) * 2022-06-24 2023-12-28 深圳前海微众银行股份有限公司 Monitoring method and device, and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10248674B2 (en) * 2015-12-04 2019-04-02 Jiangxi Electric Power Corporation Information And Communications Branch Of State Grid Method and apparatus for data quality management and control
CN109656812A (en) * 2018-11-19 2019-04-19 平安科技(深圳)有限公司 Data quality checking method, apparatus and storage medium
CN111177134A (en) * 2019-12-26 2020-05-19 上海科技发展有限公司 Data quality analysis method, device, terminal and medium suitable for mass data
CN111400365A (en) * 2020-02-26 2020-07-10 杭州美创科技有限公司 Business system data quality detection method based on standard SQ L
CN111488363A (en) * 2020-06-28 2020-08-04 平安国际智慧城市科技股份有限公司 Data processing method, device, electronic equipment and medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2002951910A0 (en) * 2002-10-04 2002-10-24 Tenix Industries Pty Limited Data quality and integrity engine
CN106708909B (en) * 2015-11-18 2020-12-08 阿里巴巴集团控股有限公司 Data quality detection method and device
CN110704186B (en) * 2019-09-25 2022-05-24 国家计算机网络与信息安全管理中心 Computing resource allocation method and device based on hybrid distribution architecture and storage medium
CN111427928A (en) * 2020-03-26 2020-07-17 京东数字科技控股有限公司 Data quality detection method and device
CN112052138A (en) * 2020-08-31 2020-12-08 平安科技(深圳)有限公司 Service data quality detection method and device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10248674B2 (en) * 2015-12-04 2019-04-02 Jiangxi Electric Power Corporation Information And Communications Branch Of State Grid Method and apparatus for data quality management and control
CN109656812A (en) * 2018-11-19 2019-04-19 平安科技(深圳)有限公司 Data quality checking method, apparatus and storage medium
CN111177134A (en) * 2019-12-26 2020-05-19 上海科技发展有限公司 Data quality analysis method, device, terminal and medium suitable for mass data
CN111400365A (en) * 2020-02-26 2020-07-10 杭州美创科技有限公司 Business system data quality detection method based on standard SQ L
CN111488363A (en) * 2020-06-28 2020-08-04 平安国际智慧城市科技股份有限公司 Data processing method, device, electronic equipment and medium

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021147559A1 (en) * 2020-08-31 2021-07-29 平安科技(深圳)有限公司 Service data quality measurement method, apparatus, computer device, and storage medium
CN112632048A (en) * 2020-12-18 2021-04-09 恩亿科(北京)数据科技有限公司 Data quality detection method, system, electronic equipment and storage medium
CN112613892B (en) * 2020-12-25 2024-03-15 北京知因智慧科技有限公司 Data processing method and device based on service system and electronic equipment
CN112613892A (en) * 2020-12-25 2021-04-06 北京知因智慧科技有限公司 Data processing method and device based on business system and electronic equipment
CN112597142A (en) * 2020-12-26 2021-04-02 中国农业银行股份有限公司 Data quality detection method and data quality detection engine
CN113049935A (en) * 2021-03-04 2021-06-29 长鑫存储技术有限公司 Semiconductor intelligent detection system, intelligent detection method and storage medium
CN113591485A (en) * 2021-06-17 2021-11-02 国网浙江省电力有限公司 Intelligent data quality auditing system and method based on data science
CN113591485B (en) * 2021-06-17 2024-07-12 国网浙江省电力有限公司 Intelligent data quality auditing system and method based on data science
CN114186244A (en) * 2022-01-26 2022-03-15 中国电子信息产业集团有限公司 Data element operation framework and system
CN114186244B (en) * 2022-01-26 2022-09-16 中国电子信息产业集团有限公司 Data element operation framework and system
WO2023245893A1 (en) * 2022-06-24 2023-12-28 深圳前海微众银行股份有限公司 Monitoring method and device, and storage medium
CN116701383B (en) * 2023-08-03 2023-10-27 中航信移动科技有限公司 Data real-time quality monitoring method, electronic equipment and storage medium
CN116701383A (en) * 2023-08-03 2023-09-05 中航信移动科技有限公司 Data real-time quality monitoring method, electronic equipment and storage medium

Also Published As

Publication number Publication date
WO2021147559A1 (en) 2021-07-29

Similar Documents

Publication Publication Date Title
CN112052138A (en) Service data quality detection method and device, computer equipment and storage medium
EP4099170B1 (en) Method and apparatus of auditing log, electronic device, and medium
US11823072B2 (en) Customer behavior predictive modeling
US8533235B2 (en) Infrastructure and architecture for development and execution of predictive models
CN111813845B (en) Incremental data extraction method, device, equipment and medium based on ETL task
CN113010542A (en) Service data processing method and device, computer equipment and storage medium
CN114461644A (en) Data acquisition method and device, electronic equipment and storage medium
CN112363814A (en) Task scheduling method and device, computer equipment and storage medium
CN114741392A (en) Data query method and device, electronic equipment and storage medium
CN111625567A (en) Data model matching method, device, computer system and readable storage medium
CN110874366A (en) Data processing and query method and device
CN113836157A (en) Method and device for acquiring incremental data of database
CN116860311A (en) Script analysis method, script analysis device, computer equipment and storage medium
CN116955856A (en) Information display method, device, electronic equipment and storage medium
CN111752958A (en) Intelligent associated label method, device, computer equipment and storage medium
CN116362212A (en) Report generation method, device, equipment and storage medium
CN115859273A (en) Method, device and equipment for detecting abnormal access of database and storage medium
CN115545753A (en) Partner prediction method based on Bayesian algorithm and related equipment
US9317125B2 (en) Searching of line pattern representations using gestures
CN113868138A (en) Method, system, equipment and storage medium for acquiring test data
CN117171758A (en) Security detection method, security detection device, computer device and storage medium
CN115328920A (en) Batch data exception handling method and device, computer equipment and storage medium
CN117390023A (en) Data aggregation method, data aggregation device, apparatus, and storage medium
CN115185666A (en) Task scheduling method and device, computer equipment and storage medium
CN115099710A (en) Method and system for rapidly evaluating business association influence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201208