WO2021147559A1 - 业务数据质量检测方法、装置、计算机设备及存储介质 - Google Patents

业务数据质量检测方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2021147559A1
WO2021147559A1 PCT/CN2020/135593 CN2020135593W WO2021147559A1 WO 2021147559 A1 WO2021147559 A1 WO 2021147559A1 CN 2020135593 W CN2020135593 W CN 2020135593W WO 2021147559 A1 WO2021147559 A1 WO 2021147559A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
detection
data table
computer
detection element
Prior art date
Application number
PCT/CN2020/135593
Other languages
English (en)
French (fr)
Inventor
胡立波
张茜
侯宗元
郑玉桂
张敏
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021147559A1 publication Critical patent/WO2021147559A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2358Change logging, detection, and notification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database

Definitions

  • This application relates to the field of big data technology, and in particular to a method, device, computer equipment and storage medium for detecting business data quality.
  • the monitoring system is a system that uses computer and control technologies to realize environmental data storage, collection and monitoring.
  • Common monitoring systems such as Zabbix, Nagios, Cacti, etc. belong to the category of operation and maintenance monitoring systems, which can support hardware information, CPU, memory, and network
  • the inventor found that these monitoring systems cannot support the monitoring of data quality with business logic.
  • the monitoring of data quality includes the detection of data volume and data value. Whether it is abnormal or not, the current detection work is completely processed manually, which is time-consuming and laborious, and data problems cannot be fully discovered.
  • the purpose of the embodiments of this application is to propose a service data quality detection method, device, computer equipment, and storage medium, so as to solve the problem of low detection efficiency and incomplete detection in the prior art using manual processing for service data quality detection. problem.
  • an embodiment of the present application provides a service data quality detection method, which adopts the following technical solutions:
  • a method for detecting business data quality including the following steps:
  • the detection parameters including at least a data table name, a library name, a detection queue, and a detection type;
  • At least one detection element is determined according to the detection type, and the allocated resource is determined according to the detection queue, so that the detection element detects the analytical identification data based on the allocated resource, and outputs a detection result.
  • an embodiment of the present application also provides a service data quality detection device, which adopts the following technical solutions:
  • a service data quality detection device including:
  • the parameter acquisition module is configured to receive a business data detection task, and acquire corresponding detection parameters according to the business data detection task, where the detection parameters include at least a data table name, a library name, a detection queue, and a detection type;
  • the data acquisition module is used to access the database and determine the data table based on the data table name and the library name, obtain the service data to be detected stored in the database according to the data table, and perform metadata on the service data to be detected Data analysis and metadata identification to obtain analytical identification data;
  • the detection module is configured to determine at least one detection element according to the detection type, and determine the allocated resource according to the detection queue, so that the detection element detects the analytical identification data based on the allocated resource, and outputs Test results.
  • the embodiments of the present application also provide a computer device, which adopts the following technical solutions:
  • a computer device includes a memory and a processor.
  • the memory stores computer readable instructions.
  • the processor executes the computer readable instructions, the following steps are implemented:
  • the detection parameters including at least a data table name, a library name, a detection queue, and a detection type;
  • At least one detection element is determined according to the detection type, and the allocated resource is determined according to the detection queue, so that the detection element detects the analytical identification data based on the allocated resource, and outputs a detection result.
  • the embodiments of the present application also provide a computer-readable storage medium, which adopts the following technical solutions:
  • a computer-readable storage medium having computer-readable instructions stored thereon, and when the computer-readable instructions are executed by a processor, the processor is caused to perform the following steps:
  • the detection parameters including at least a data table name, a library name, a detection queue, and a detection type;
  • At least one detection element is determined according to the detection type, and the allocated resource is determined according to the detection queue, so that the detection element detects the analytical identification data based on the allocated resource, and outputs a detection result.
  • the service data quality detection method, device, computer equipment, and storage medium provided by the embodiments of the present application mainly have the following beneficial effects:
  • this application After receiving the service data detection task, this application can automatically realize the acquisition, analysis and identification of the service data to be detected, and realize modular detection through detection elements, and automatically realize data detection in different dimensions.
  • the detection will be more comprehensive, intelligent, and more efficient.
  • it can reduce the manpower input.
  • the business data running online it can monitor the abnormal changes of the indicators in real time, which helps to find data abnormalities earlier and in time, and improve the availability, stability and accuracy of business data.
  • Figure 1 is an exemplary system architecture diagram to which the present application can be applied;
  • Fig. 2 is a flowchart of an embodiment of a service data quality detection method according to the present application
  • Fig. 3 is a schematic structural diagram of an embodiment of a service data quality detection device according to the present application.
  • Fig. 4 is a schematic structural diagram of an embodiment of a computer device according to the present application.
  • the embodiment of the present application provides an embodiment of a service data quality detection method.
  • the service data quality detection method includes the following steps:
  • S201 Receive a business data detection task, and obtain corresponding detection parameters according to the business data detection task, where the detection parameters include at least a data table name, a library name, a detection queue, and a detection type;
  • S202 Access a database based on the data table name and the library name and determine the data table, obtain the service data to be detected stored in the database according to the data table, and perform metadata analysis and metadata on the service data to be detected.
  • Data identification obtain analytical identification data;
  • S203 Determine at least one detection element according to the detection type, and determine the allocated resource according to the detection queue, so that the detection element detects the analytical identification data based on the allocated resource, and outputs a detection result.
  • the service data quality detection device includes: a parameter acquisition module 301, a data acquisition module 302, and a detection module 303.
  • the parameter acquisition module 301 is configured to receive a business data detection task, and obtain corresponding detection parameters according to the business data detection task.
  • the detection parameters include at least a data table name, a library name, a detection queue, and a detection type;
  • the data acquisition module 302 is configured to access the database and determine the data table based on the data table name and the library name, obtain the service data to be detected stored in the database according to the data table, and perform metadata on the service data to be detected Data analysis and metadata identification are used to obtain analytical identification data;
  • the detection module 303 is configured to determine at least one detection element according to the detection type, and determine the allocated resources according to the detection queue, so that the detection element is based on the The allocated resources detect the analytical identification data and output the detection result.
  • FIG. 4 shows the basic structure block diagram of the computer equipment.
  • the computer device 4 includes a memory 41, a processor 42, and a network interface 43 that communicate with each other through a system bus.
  • the memory 41 stores computer readable instructions, and the processor 42 executes the following steps:
  • the detection parameters including at least a data table name, a library name, a detection queue, and a detection type;
  • At least one detection element is determined according to the detection type, and the allocated resource is determined according to the detection queue, so that the detection element detects the analytical identification data based on the allocated resource, and outputs a detection result.
  • the embodiment of the present application also provides an embodiment of a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the processor executes the following step:
  • the detection parameters including at least a data table name, a library name, a detection queue, and a detection type;
  • At least one detection element is determined according to the detection type, and the allocated resource is determined according to the detection queue, so that the detection element detects the analytical identification data based on the allocated resource, and outputs a detection result.
  • the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105.
  • the network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the user can use the terminal devices 101, 102, and 103 to interact with the server 105 through the network 104 to receive or send messages and so on.
  • Various communication client applications such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, and social platform software, can be installed on the terminal devices 101, 102, and 103.
  • the terminal devices 101, 102, 103 can be various electronic devices with a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, Motion Picture Experts compress standard audio layer 4) Players, laptop portable computers and desktop computers, etc.
  • MP3 players Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio level 3
  • MP4 Motion Picture Experts compress standard audio layer 4
  • laptop portable computers and desktop computers etc.
  • the server 105 may be a server that provides various services, for example, a background server that provides support for pages displayed on the terminal devices 101, 102, and 103.
  • the service data quality detection method provided by the embodiment of the present application is generally executed by a server, and accordingly, the service data quality detection device is generally set in the server.
  • terminal devices, networks, and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks, and servers according to implementation needs.
  • FIG. 2 shows a flowchart of an embodiment of the service data quality detection method according to the present application.
  • the described service data quality detection method includes the following steps:
  • S201 Receive a business data detection task, and obtain corresponding detection parameters according to the business data detection task, where the detection parameters include at least a data table name, a library name, a detection queue, and a detection type;
  • S202 Access a database based on the data table name and the library name and determine the data table, obtain the service data to be detected stored in the database according to the data table, and perform metadata analysis and metadata on the service data to be detected.
  • Data identification obtain analytical identification data;
  • S203 Determine at least one detection element according to the detection type, and determine the allocated resource according to the detection queue, so that the detection element detects the analytical identification data based on the allocated resource, and outputs a detection result.
  • step S201 before the newly developed data table goes online, or after the online data table is updated, the business data will be detected to ensure that the abnormal data in the data table can be detected in time, so that the newly developed data can be expressed Go to the online standard of the business, or make the updated data sheet continue to meet the online standard of the business.
  • the business data detection task can be submitted by the task submission terminal, for example, the user can submit it through a Web page or terminal.
  • the task information can be written into the relational type.
  • the data detection terminal regularly accesses the task table to confirm whether there is a detection task to be performed.
  • the access to the task table can be achieved by setting up a daemon. Multiple daemons can be set up for business data quality detection. Processes run on different nodes of the cluster as different virtual users (with different data permissions). Each daemon process initiates access requests at regular intervals to confirm whether there are tasks to be executed.
  • the detection operations are sequentially performed on the business data detection tasks in the task table.
  • the business data detection task involves information such as the data table that needs to be detected and the detection type, which specifically corresponds to the detection parameters, including data table name, library name, detection queue and detection type, etc., which can be determined by the business data detection task
  • the resources used for each test and the content to be tested are the business data detection task.
  • the data table name is used to determine the data table to be tested.
  • the database name is used to determine the database storing the data table to be tested.
  • the detection queue is used to determine at least one processing queue from a number of existing processing queues to perform detection operations.
  • Each processing queue is assigned an independent detection resource, which can be selected according to the amount of data contained in the data table to be detected
  • detection tasks are assigned to different detection queues for synchronous detection.
  • at least one default processing queue can be automatically selected according to the library name and preset configuration information.
  • the detection type is used to determine what kind of detection is performed for the data table to be detected, such as detecting whether the overall data volume, the value (or the value range) of one or some data fields is abnormal.
  • the detection type includes at least a statistical type and a predictive type.
  • the detection type is a statistical type, it specifically refers to performing statistical operations on one or several statistical items on the data table, and the statistical results can be used for business operations. Data monitoring during operation; when the detection type is predictive, it is specifically anomaly detection on the data table based on the anomaly detection model.
  • the statistical type includes descriptive statistics, trend statistics, comparison statistics, and the like.
  • descriptive statistics and trend statistics will automatically calculate multiple preset indicators, such as logarithmic fields, which can include records, maximum, minimum, mean, quantile, saturation and other indicators; for non-numerical Fields can include indicators such as the number of records and saturation; for enumerated value type fields, they can include indicators such as the distribution of each enumerated value; when testing, some indicators can be selected for testing, on the one hand, the indicators are used to summarize and describe the data;
  • Each dimension of the table detects whether the data is abnormal by whether the value of the indicator is reasonable/abnormal changes, so the detection results of some of the indicators can be used for subsequent data monitoring, and the detection results of some indicators can be used to improve the data table Descriptive information.
  • the comparison and statistics are specifically carried out through preset field inspection rules, such as preset enumeration value range, data type value range, field encoding and other inspection rules.
  • preset field inspection rules such as preset enumeration value range, data type value range, field encoding and other inspection rules.
  • the formatted mobile phone number field should be 11-digit pure The value range of the number or gender field cannot exceed male, female, and unknown.
  • the anomaly detection model may adopt an isolated forest anomaly detection model.
  • “outliers” are defined as “outliers that are easily isolated"
  • the distribution is sparse and far away from high-density groups.
  • the embodiment of this application randomly selects features and split values on the data set formed by the service data to be detected, and constructs multiple random trees. Since the "abnormal points" are more sparse and the distribution is more sparse, it will be easier to be Distinguish, the distance from the root node will be shorter, so that abnormal data can be detected. Compared with manual evaluation, the judgment standard is easier to define when using model prediction, and the characteristics of different data can be fully considered.
  • the embodiment of the present application will consider date factors, such as month, working day or not, holiday or not, etc., so as to avoid reporting abnormalities in normal data fluctuations.
  • step S202 there may be only one or more databases to be accessed.
  • the data table and the service data to be detected may be stored in one or more databases.
  • the metadata analysis refers to obtaining the table building statement of the data table in the system by executing the hive ddl command, and obtaining the column information in the business data to be detected according to the table building statement (which columns, columns are similar, column Remarks information), table information (table time, data compression format, etc.), table data information (with or without partitions, number of files, file size, etc.), etc., and then parsed by the python program and stored as structured data for other subsequent use.
  • the table building statement which columns, columns are similar, column Remarks information
  • table information table time, data compression format, etc.
  • table data information with or without partitions, number of files, file size, etc.
  • Metadata recognition can automatically identify data types.
  • business data is imported into Hive from a relational database, all types of values (numerical values, dates, etc.) are stored in a string format, that is, stored as text fields.
  • Application embodiment metadata recognition uses metadata parsed out parameter data, judges the true type of text fields through regular expressions, and recognizes the original type of business data. Specifically, it can recognize data such as integers, floating-point numbers, dates, etc.
  • Type for example: a string starting with + or -, followed by all 0-9 digits, will be considered as a numeric type; the form is xxxx-yy-zz, where x, y, and z are all positive integers and take If the value is within a reasonable range, it will be considered as a date type.
  • analytical identification data is obtained, and the analytical identification data is structured data with a certain data type.
  • the obtaining the service data to be detected stored in the database according to the data table includes: determining the amount of data contained in the data table, and determining whether the data amount of the data table is greater than a preset threshold When the amount of data in the data table is not greater than the preset threshold, directly obtain the service data to be detected according to the data table, otherwise randomly extract a preset number of data from the data table to generate a temporary data table, according to The temporary data table acquires the service data to be detected.
  • the preset number does not exceed the preset threshold, such as 300,000.
  • the preset threshold such as 300,000.
  • metadata identification involves compatibility between data types, such as a list of data, most of which are numbers, and a few are strings. The entire column can only be classified as a string, in principle, it is enough to randomly sample part of the data to make a judgment.
  • the accuracy of the detection results is lower than that of full data, it can effectively reduce the detection time of the detection element and improve the detection stability of the detection element when faced with a large data table.
  • the use of temporary data tables is especially effective.
  • the detection efficiency and stability can also be improved by optimizing the resource parameter configuration of Spark and using SparkDataFrame instead of SparkSQL in part of the calculation process. sex.
  • the method further includes: determining the minimum amount of data required by each of the detection elements to perform detection; and the step of judging each of the detection elements Whether the required minimum amount of data is greater than the amount of data contained in the data table, and the detection elements corresponding to the required minimum amount of data greater than the amount of data contained in the data table are eliminated.
  • some detection components have requirements for the amount of data to be detected. For example, when the data table is modeled to detect abnormal data, too small amount of data will cause abnormal model training process, which will lead to inaccurate data detection. At this time, the model evaluation will be skipped, and only corresponding records will be made in the output test result table.
  • the method before the determining at least one detection element according to the detection type, the method further includes: obtaining preset special character recognition configuration information, and performing metadata recognition according to the characteristic character recognition configuration information.
  • This step can be used to improve the accuracy of metadata recognition. For example, when there is a "NULL" character string in the date type field, it may be mistakenly recognized as a character string, but not as a date type.
  • the preset special character configuration information can be used To achieve accurate recognition, similar special characters and extra spaces at the beginning and end of the string, etc.
  • the allocated resources of the detection queue may include a processor, a storage space, and the like.
  • the detection result data that the detection element detects and outputs the analysis and identification data can be written into the PG database, and the result data can be viewed or downloaded through reports. Since business data is often connected to multiple institutions or departments, different institutions can be generated for different institutions. Report or result data to ensure data security and privacy. In addition, the log information will be updated in real time when the detection element performs detection.
  • this step the above mentioned that the daemon process is used to confirm whether there is a task to be executed.
  • this step correspondingly controls the detection element through the guard to perform the detection sequentially.
  • the detection types include statistical and predictive types.
  • Each detection type corresponds to a detection element.
  • the corresponding detection elements may include descriptive statistical elements, trend statistical elements, comparison statistical elements, model prediction elements, and so on.
  • these components exist in the form of encapsulated modules. Specifically, each component is an encapsulated SQL code automatically generated based on the type of detection to be performed, and the detection is executed through Spark.
  • the detection type parameter is used to determine the detection elements to be executed by the guardian. For example, if the detection type parameter value is "all", all detection elements will be executed in sequence. If the detection type parameter value is "descriptive statistics" ", only descriptive statistical components will be executed. In the embodiment of the application, when assigning values to the detection type, the detection element type can be identified by numbers, such as "0" for "all", “1” for “descriptive statistics element", and "2" for "trend statistics” "Element", "3" identify “model prediction element", etc. Of course, other identification methods can also be used in other embodiments, which are not limited here.
  • execution components for metadata analysis and metadata recognition may also be embodied in the form of encapsulated components, corresponding to metadata analysis components and metadata identification components, respectively.
  • the problem-discovery components can also be configured to summarize and output data abnormalities.
  • problem-discovered components can automatically collect and summarize possible problems in the data table based on descriptive statistics, trend statistics, comparative statistics, and model predictions based on the detection data obtained by each component, and at the same time classify the "severity" of the data problem , So that you can do hierarchical display, such as: empty tables, primary key conflicts are serious problems; the saturation of a field below 30% is a general serious problem.
  • When displaying data according to the severity of the problem, provide a filter or split into different charts to display. For more serious data problems, you can set reminders or alarms, such as connecting to instant messaging software, mailboxes, etc.
  • the problem discovery component is not necessary. For example, in some scenarios, for the needs of exploring data and understanding data overview, it only needs to output the saturation of each field, the maximum value, minimum value, and average value of a numeric field. Statistical information does not involve abnormal information. Therefore, after determining at least one detection element according to the detection type, the embodiment of this application also includes determining whether each detection element is involved in data abnormality detection. If it is involved, load the problem discovery element, otherwise it does not load . In the embodiments of the present application, by modularizing each part involved in performing data detection, it is convenient to call and improve the flexibility of detection.
  • the method before enabling the detection element to detect the analytical identification data based on the allocated resources, the method further includes: when it is determined that there are at least two detection elements according to the detection type When determining whether there is a dependency relationship between the detection elements, if there is a dependency relationship, the execution order of the detection elements is determined according to the dependency relationship. Specifically, when there are multiple detection elements, the detection of some detection elements may have a sequence. For example, the model prediction module depends on the trend statistics module, and the dependency relationship between the two is stored. At this time, the trend statistics module is executed in the model prediction module. Before, and the aforementioned problem discovery module relies on all previous detection elements, the problem discovery module is executed last.
  • the method before the detection element is caused to detect the analytical identification data based on the allocated resources, the method further includes: acquiring and analyzing auxiliary parameters, and judging that the auxiliary parameters are Whether the assignment of each parameter item is empty, and filter the service data to be tested for metadata analysis and metadata identification according to the parameter item with a non-empty assignment, and at the same time according to the parameter item with a non-empty assignment according to the test type
  • the identified detection elements are screened.
  • the detection parameters may also include auxiliary parameters, which may be user-defined parameters.
  • auxiliary parameters may include test fields, numeric fields, character fields, enumerated fields, business date fields,
  • parameter items such as conditions, primary keys, virtual users, etc.
  • the business data to be tested for metadata analysis and metadata identification can be screened according to the assignment of these parameter items, and based on the detection type The identified detection elements are screened. The following is an expanded description of these parameter items.
  • test field is used to specify the field to be detected. If the parameter is empty, all fields will be evaluated by default. If the parameter is not empty, the subsequent metadata analysis and metadata identification are only for the specified field to be detected. Field.
  • Numeric fields are used to specify which fields are numeric, character fields are used to specify which fields are character types, and enumerated fields are used to specify which fields are enumerated. If the three parameters of the field are empty, the result of metadata identification is used. If it is not empty, the metadata identification is only for the fields of unspecified type.
  • the business date field is used to specify the business date field, and use this field as the business date. If the data to be tested contains this field, trend statistics and model prediction can be performed. If this parameter is empty, even if the detection type parameter contains trend statistics And model predictions, nor do trend statistics and model predictions.
  • condition parameter is used to identify whether to filter the data in the data table, such as where condition filtering is performed on the test table, if the parameter is empty, no condition filtering is performed;
  • the primary key is used to identify whether to perform a uniqueness test on the data table. If this parameter is entered, the uniqueness test of the primary key or the combined primary key will be performed; if the parameter is empty, the primary key uniqueness test will not be performed;
  • the virtual user is used to specify the virtual user to execute the script during data detection. If it is empty, the default virtual user is selected.
  • the determining at least one detection element according to the detection type includes: determining whether a detection element corresponding to the detection type currently exists, and if it exists, directly acquiring the corresponding detection element; otherwise, based on all detection elements.
  • the detection type generates a new detection element corresponding to the detection type.
  • these configuration information may include the categories, data ranges, time ranges, statistical rules, etc. that need to be counted for descriptive statistics.
  • each detection element to be executed is connected in series by a shell main program to form the overall function of data quality detection. Since the detection elements are independent of each other, modules can be added or deleted easily, or a part of modules can be selectively executed. Therefore, a new detection element can be generated according to the detection requirement or the corresponding detection element can be deleted when a certain detection requirement does not exist, with high flexibility.
  • the service data quality detection method provided by the embodiment of the application can automatically realize the acquisition, analysis and identification of the service data to be detected after receiving the service data detection task, and realize modular detection through detection elements, and automatically realize data detection in different dimensions.
  • the detection will be more comprehensive, intelligent, and more efficient.
  • it can reduce manpower input.
  • it can monitor abnormal changes in indicators in real time, which helps to detect abnormal data updates and incomplete updates earlier and in time. Data abnormalities such as missing updates, and timely detection of calculation logic abnormalities, abnormal changes in data indicators, etc., to improve the availability, stability, and accuracy of business data.
  • the embodiments of the present application can automatically generate basic information, descriptive information, etc. of the data table, which helps to improve the efficiency of data exploration and data sorting. It is easy to use, you can submit tasks through the Web page, and automatically generate reports from the detection results, and view and use the data in a visual way.
  • the service data to be detected obtained according to the data table can also be stored in a node of a blockchain.
  • the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • This application can be used in many general or special computer system environments or configurations. For example: personal computers, server computers, handheld devices or portable devices, tablet devices, multi-processor systems, microprocessor-based systems, set-top boxes, programmable consumer electronic devices, network PCs, small computers, large computers, including Distributed computing environment for any of the above systems or equipment, etc.
  • This application may be described in the general context of computer-executable instructions executed by a computer, such as a program module.
  • program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types.
  • This application can also be practiced in distributed computing environments. In these distributed computing environments, tasks are performed by remote processing devices connected through a communication network.
  • program modules can be located in local and remote computer storage media including storage devices.
  • the aforementioned computer-readable storage medium may be a non-volatile storage medium or a volatile storage medium, such as a magnetic disk, an optical disk, or a read-only storage memory (Read-Only Memory, ROM) and other non-volatile storage media, or random storage memory (Random Access Memory, RAM) etc.
  • this application provides an embodiment of a service data quality detection device.
  • the device embodiment corresponds to the method embodiment shown in FIG. Specifically, it can be applied to various electronic devices.
  • the service data quality detection device described in this embodiment includes: a parameter acquisition module 301, a data acquisition module 302, and a detection module 303.
  • the parameter acquisition module 301 is configured to receive a business data detection task, and acquire corresponding detection parameters according to the business data detection task.
  • the detection parameters include at least a data table name, a library name, a detection queue, and a detection type;
  • the data acquisition module 302 is configured to access the database and determine the data table based on the data table name and the library name, obtain the service data to be detected stored in the database according to the data table, and perform metadata on the service data to be detected Data analysis and metadata identification are used to obtain analytical identification data;
  • the detection module 303 is configured to determine at least one detection element according to the detection type, and determine the allocated resources according to the detection queue, so that the detection element is based on the The allocated resources detect the analytical identification data and output the detection result.
  • the business data can be detected by the business data quality inspection device to ensure that the abnormal data in the data table can be detected in time, so that The newly developed data is expressed to the online standards of the business, or the updated data table continues to meet the online standards of the business.
  • the business data detection task can be submitted by the task submission terminal, for example, the user can submit it through a Web page or terminal.
  • the task information can be written into the relational type.
  • the service data quality detection device regularly accesses the task table to confirm whether there is a detection task to be executed.
  • the access to the task table can be achieved by setting a daemon process. For the daemon process, please refer to the relevant content of the above method embodiment. Do not expand here.
  • the service data quality detection device sequentially performs detection operations on the service data detection tasks in the task table.
  • the business data detection task involves information such as the data table that needs to be detected and the detection type, which specifically corresponds to the detection parameters, including the data table name, library name, detection queue, and detection type.
  • the parameter acquisition module 301 passes the business data
  • the detection task can determine the resources used for each detection and the content to be detected. For related content such as data table name, library name, detection queue, detection type, etc., reference may be made to the foregoing method embodiment, which is not expanded here.
  • the metadata analysis performed by the data acquisition module 302 refers to the execution of the hive ddl command to acquire the table creation statement of the data table in the system, and obtain the column information in the business data to be detected according to the table creation statement (which columns, columns, etc.) Similar, column remarks information), table information (table time, data compression format, etc.), table data information (with or without partitions, number of files, file size, etc.), etc., and then parsed and stored as structured data through the python program for other Follow-up use.
  • the data acquisition module 302 can automatically identify data types through metadata recognition.
  • all types of values (numerical values, dates, etc.) are stored in string format, that is, stored as text Field
  • the metadata identification of the embodiment of the application uses metadata parsed out parameter data, the true type of the text field is judged through regular expressions, and the original type of the business data is identified.
  • integer, floating-point, and Date and other data types for example: a string starting with + or -, followed by all 0-9 digits, will be considered as a numeric type; the form is xxxx-yy-zz, where x, y, and z are all positive If it is an integer and the value is within a reasonable range, it will be considered as a date type.
  • the analytical identification data is obtained, and the analytical identification data is structured data with a certain data type.
  • the data acquisition module 302 when the data acquisition module 302 acquires the service data to be detected stored in the database according to the data table, it is specifically used to determine the amount of data contained in the data table, and to determine the size of the data table. Whether the amount of data is greater than the preset threshold; when the amount of data in the data table is not greater than the preset threshold, the service data to be detected is directly obtained according to the data table, otherwise a preset amount of data is randomly selected from the data table A temporary data table is generated from the data, and the service data to be detected is obtained according to the temporary data table. For the relevant content of the temporary data table, reference may be made to the foregoing method embodiment, which will not be expanded here.
  • the detection module 303 is further configured to determine the minimum amount of data required for each detection element to perform detection, and It is judged whether the minimum amount of data required by each detection element is greater than the amount of data contained in the data table, and the detection element corresponding to the minimum amount of data required is greater than the amount of data contained in the data table is eliminated.
  • some detection components have requirements for the amount of data to be detected. For example, when the data table is modeled to detect abnormal data, too small amount of data will cause abnormal model training process, which will lead to inaccurate data detection. The time detection module 303 will skip the model evaluation and only make corresponding records in the output detection result table.
  • the detection module 303 is further configured to obtain preset special character recognition configuration information before determining at least one detection element according to the detection type, and perform metadata recognition according to the characteristic character recognition configuration information. This step can be used to improve the accuracy of metadata recognition. For example, when there is a "NULL" character string in the date type field, it may be mistakenly recognized as a character string, but not as a date type.
  • the preset special character configuration information can be used To achieve accurate recognition, similar special characters and extra spaces at the beginning and end of the string, etc.
  • the detection module 303 determines the resources allocated to the detection queue, it is specifically used to determine resources such as processors and storage space to be called when performing detection.
  • the detection module 303 detects and outputs the analytical identification data through detection elements.
  • the detection result data can be written into the PG database, and the result data can be viewed or downloaded through reports. Since business data is often connected to multiple institutions or departments, it can be targeted for different The organization generates different reports or result data to ensure data security and privacy. In addition, the log information will be updated in real time when the detection element performs detection.
  • the daemon process is used to confirm whether there is a task to be executed.
  • the detection module 303 correspondingly controls the detection element to perform the detection sequentially through the guard.
  • the detection types include statistical and predictive types. Each type corresponds to a detection element.
  • the detection elements called by the corresponding detection module 303 can include descriptive statistical elements, trend statistical elements, comparative statistical elements, and model predictions. Components and so on. In the embodiments of the present application, these components exist in the form of encapsulated modules. Specifically, each component is an encapsulated SQL code automatically generated based on the type of detection to be performed, and the detection is executed through Spark.
  • the detection module 303 uses the parameter of the detection type to determine the detection element to be executed by the guardian. For details, please refer to the above method embodiment, which will not be expanded here.
  • the above-mentioned execution components for metadata analysis and metadata recognition may also be embodied in the form of encapsulated components, corresponding to metadata analysis components and metadata identification components, respectively.
  • the data abnormality problem can also be summarized and output by configuring the problem discovery component. For details, please refer to the above method embodiment, which will not be expanded here.
  • the detection module 303 is further configured to determine that there are at least two detection elements according to the detection type before making the detection element detect the analytic identification data based on the allocated resources.
  • the execution order of the detection elements is determined according to the dependency relationship.
  • the model prediction module depends on the trend statistics module, and the dependency relationship between the two is stored. At this time, the trend statistics module is executed in the model prediction module. Before, and the aforementioned problem discovery module relies on all previous detection elements, the problem discovery module is executed last.
  • the parameter acquisition module 301 is further configured to acquire and analyze auxiliary parameters.
  • the data acquisition module 302 is also used to determine whether the assignment of each parameter item in the auxiliary parameter is empty, and filter the service data to be detected for metadata analysis and metadata identification according to the parameter items whose assignment is not empty,
  • the detection module 303 is further configured to screen the detection elements determined according to the detection type according to the parameter items whose values are not empty.
  • the auxiliary parameters may be user-defined parameters. Specifically, auxiliary parameters may include test fields, numeric fields, character fields, enumerated fields, business date fields, conditions, primary keys, virtual users and other parameter items.
  • the relevant content of the parameter item can refer to the above method embodiment, which will not be expanded here.
  • the service data to be detected to be subjected to metadata analysis and metadata identification can be screened according to the assignment of these parameter items, and detection elements determined according to the detection type can be screened.
  • the detection module 303 determines at least one detection element according to the detection type, it is specifically used to determine whether there is currently a detection element corresponding to the detection type, and if it exists, it directly obtains the corresponding detection element.
  • the detection element otherwise, a new detection element corresponding to the detection type is generated based on the detection type.
  • each detection element is encapsulated into a mutually independent Python function, and each detection element can run independently.
  • each detection element to be executed is connected in series by a shell main program to form the overall function of data quality detection. Since the detection elements are independent of each other, modules can be added or deleted easily, or a part of modules can be selectively executed. Therefore, a new detection element can be generated according to the detection requirement or the corresponding detection element can be deleted when a certain detection requirement does not exist, with high flexibility.
  • the service data quality detection device provided in this application can automatically realize the acquisition, analysis and identification of the service data to be detected after receiving the service data detection task, and realize modular detection through detection elements, and automatically realize data detection in different dimensions. It is more comprehensive, smart, and more efficient, while reducing manpower input.
  • abnormal changes in indicators can be monitored in real time, which helps to detect abnormal data updates, incomplete updates, and omissions earlier and in time.
  • Data abnormalities such as updates, and timely detection of calculation logic abnormalities, abnormal changes in data indicators, etc., improve the availability, stability, and accuracy of business data.
  • the embodiments of the present application can automatically generate basic information, descriptive information, etc. of the data table, which helps to improve the efficiency of data exploration and data sorting. It is easy to use, you can submit tasks through the Web page, and automatically generate reports from the detection results, and view and use the data in a visual way.
  • FIG. 4 is a block diagram of the basic structure of the computer device in this embodiment.
  • the computer device 4 includes a memory 41, a processor 42, and a network interface 43 that communicate with each other via a system bus.
  • the memory 41 stores computer readable instructions.
  • the processor 42 implements the above when the computer readable instructions are executed.
  • the steps of the service data quality detection method described in the method embodiment have beneficial effects corresponding to the foregoing service data quality detection method, and will not be expanded here.
  • the figure only shows the computer device 4 with the memory 41, the processor 42, and the network interface 43. However, it should be understood that it is not required to implement all the illustrated components, and more or more may be implemented instead. Fewer components. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions. Its hardware includes, but is not limited to, a microprocessor, a dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA), Digital Signal Processor (DSP), embedded devices, etc.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Signal Processor
  • the computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
  • the memory 41 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access Memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or memory of the computer device 4.
  • the memory 41 may also be an external storage device of the computer device 4, for example, a plug-in hard disk, a smart memory card (Smart Media Card, SMC), and a secure digital device equipped on the computer device 4. (Secure Digital, SD) card, flash memory card (Flash Card) and so on.
  • the memory 41 may also include both the internal storage unit of the computer device 4 and its external storage device.
  • the memory 41 is generally used to store an operating system and various application software installed in the computer device 4, such as computer-readable instructions corresponding to the above-mentioned service data quality detection method.
  • the memory 41 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 42 may be a central processing unit (Central Processing Unit) in some embodiments. Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip.
  • the processor 42 is generally used to control the overall operation of the computer device 4.
  • the processor 42 is configured to run computer-readable instructions or process data stored in the memory 41, for example, run computer-readable instructions corresponding to the service data quality detection method.
  • the network interface 43 may include a wireless network interface or a wired network interface, and the network interface 43 is generally used to establish a communication connection between the computer device 4 and other electronic devices.
  • the present application also provides another implementation manner, that is, a computer-readable storage medium is provided with computer-readable instructions stored thereon, and the computer-readable instructions can be executed by at least one processor to The at least one processor is caused to execute the steps of the above-mentioned service data quality detection method, and has the beneficial effects corresponding to the above-mentioned service data quality detection method, and will not be expanded here.
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several computer-readable instructions to enable a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present application.
  • a terminal device which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种业务数据质量检测方法、检测装置、计算机设备及存储介质,该方法包括:接收业务数据检测任务,获取包含数据表名、库名、检测队列和检测类型的检测参数;基于数据表名和库名访问数据库并确定数据表,根据数据表获取待检测业务数据;对待检测业务数据进行元数据解析和元数据识别,得到解析识别数据;根据检测类型确定至少一个检测元件,根据检测队列确定被分配的资源,使检测元件基于被分配的资源对解析识别数据进行检测,输出检测结果。该方法可以自动实现不同维度的数据检测,检测更加全面、智能,效率更高。

Description

业务数据质量检测方法、装置、计算机设备及存储介质
本申请要求于 2020 08 31 日提交中国专利局、申请号为 202010899921.1 ,发明名称为“业务数据质量检测方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及大数据技术领域,尤其涉及业务数据质量检测方法、装置、计算机设备及存储介质。
背景技术
监控系统是利用计算机、控制等技术实现环境的数据存储和采集、监控的系统,常见的监控系统诸如Zabbix、Nagios、Cacti等属于运维监控系统的范畴,可以支持硬件信息、CPU、内存、网络、磁盘空间性能、数据量、数据增量等各方面的指标监控,发明人发现这些监控系统对于具有业务逻辑的数据质量的监控不能够予以支持,数据质量的监控包括检测数据量、数据取值等是否异常,当前检测工作完全人工处理,处理过程耗费时力,且无法全面发现数据问题。
技术问题
本申请实施例的目的在于提出一种业务数据质量检测方法、装置、计算机设备及存储介质,以解决现有技术中采用人工处理的方式进行业务数据质量检测存在的检测效率低、检测不全面的问题。
技术解决方案
为了解决上述技术问题,本申请实施例提供一种业务数据质量检测方法,采用了如下所述的技术方案:
一种业务数据质量检测方法,包括下述步骤:
接收业务数据检测任务,根据所述业务数据检测任务获取对应的检测参数,所述检测参数至少包括数据表名、库名、检测队列和检测类型;
基于所述数据表名和所述库名访问数据库并确定数据表,根据所述数据表获取存储于所述数据库中的待检测业务数据,对所述待检测业务数据进行元数据解析和元数据识别,得到解析识别数据;
根据所述检测类型确定至少一个检测元件,并根据所述检测队列确定被分配的资源,使所述检测元件基于所述被分配的资源对所述解析识别数据进行检测,输出检测结果。
为了解决上述技术问题,本申请实施例还提供一种业务数据质量检测装置,采用了如下所述的技术方案:
一种业务数据质量检测装置,包括:
参数获取模块,用于接收业务数据检测任务,根据所述业务数据检测任务获取对应的检测参数,所述检测参数至少包括数据表名、库名、检测队列和检测类型;
数据获取模块,用于基于所述数据表名和所述库名访问数据库并确定数据表,根据所述数据表获取存储于所述数据库中的待检测业务数据,对所述待检测业务数据进行元数据解析和元数据识别,得到解析识别数据;
检测模块,用于根据所述检测类型确定至少一个检测元件,并根据所述检测队列确定被分配的资源,使所述检测元件基于所述被分配的资源对所述解析识别数据进行检测,输出检测结果。
为了解决上述技术问题,本申请实施例还提供一种计算机设备,采用了如下所述的技术方案:
一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
接收业务数据检测任务,根据所述业务数据检测任务获取对应的检测参数,所述检测参数至少包括数据表名、库名、检测队列和检测类型;
基于所述数据表名和所述库名访问数据库并确定数据表,根据所述数据表获取存储于所述数据库中的待检测业务数据,对所述待检测业务数据进行元数据解析和元数据识别,得到解析识别数据;
根据所述检测类型确定至少一个检测元件,并根据所述检测队列确定被分配的资源,使所述检测元件基于所述被分配的资源对所述解析识别数据进行检测,输出检测结果。
为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,采用了如下所述的技术方案:
一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时,使得所述处理器执行如下步骤:
接收业务数据检测任务,根据所述业务数据检测任务获取对应的检测参数,所述检测参数至少包括数据表名、库名、检测队列和检测类型;
基于所述数据表名和所述库名访问数据库并确定数据表,根据所述数据表获取存储于所述数据库中的待检测业务数据,对所述待检测业务数据进行元数据解析和元数据识别,得到解析识别数据;
根据所述检测类型确定至少一个检测元件,并根据所述检测队列确定被分配的资源,使所述检测元件基于所述被分配的资源对所述解析识别数据进行检测,输出检测结果。
有益效果
与现有技术相比,本申请实施例提供的业务数据质量检测方法、装置、计算机设备及存储介质主要有以下有益效果:
本申请接收业务数据检测任务后可以自动实现待检测业务数据的获取以及解析和识别,并通过检测元件实现模块化的检测,自动实现不同维度的数据检测,检测会更加的全面、智能,效率更高,同时可降低人力投入,特别对于上线运行的业务数据,可以实时监控指标的异常变化,有助于更早更及时的发现数据异常,提高业务数据的可用性、稳定性和准确性。
附图说明
为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,下面描述中的附图对应于本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请可以应用于其中的示例性系统架构图;
图2是根据本申请的业务数据质量检测方法的一个实施例的流程图;
图3是根据本申请的业务数据质量检测装置的一个实施例的结构示意图;
图4是根据本申请的计算机设备的一个实施例的结构示意图。
本发明的最佳实施方式
本申请实施例提供业务数据质量检测方法的一个实施例,参考图2,所述的业务数据质量检测方法包括以下步骤:
S201,接收业务数据检测任务,根据所述业务数据检测任务获取对应的检测参数,所述检测参数至少包括数据表名、库名、检测队列和检测类型;
S202,基于所述数据表名和所述库名访问数据库并确定数据表,根据所述数据表获取存储于所述数据库中的待检测业务数据,对所述待检测业务数据进行元数据解析和元数据识别,得到解析识别数据;
S203,根据所述检测类型确定至少一个检测元件,并根据所述检测队列确定被分配的资源,使所述检测元件基于所述被分配的资源对所述解析识别数据进行检测,输出检测结果。
本实施例还提供业务数据质量检测装置的一个实施例,如图3所示,所述的业务数据质量检测装置包括:参数获取模块301、数据获取模块302以及检测模块303。
其中,所述参数获取模块301用于接收业务数据检测任务,根据所述业务数据检测任务获取对应的检测参数,所述检测参数至少包括数据表名、库名、检测队列和检测类型;所述数据获取模块302用于基于所述数据表名和所述库名访问数据库并确定数据表,根据所述数据表获取存储于所述数据库中的待检测业务数据,对所述待检测业务数据进行元数据解析和元数据识别,得到解析识别数据;所述检测模块303用于根据所述检测类型确定至少一个检测元件,并根据所述检测队列确定被分配的资源,使所述检测元件基于所述被分配的资源对所述解析识别数据进行检测,输出检测结果。
本申请实施例还提供计算机设备的一个实施例。如图4所示的计算机设备基本结构框图。所述计算机设备4包括通过系统总线相互通信连接存储器41、处理器42、网络接口43,所述存储器41中存储有计算机可读指令,所述处理器42执行如下步骤:
接收业务数据检测任务,根据所述业务数据检测任务获取对应的检测参数,所述检测参数至少包括数据表名、库名、检测队列和检测类型;
基于所述数据表名和所述库名访问数据库并确定数据表,根据所述数据表获取存储于所述数据库中的待检测业务数据,对所述待检测业务数据进行元数据解析和元数据识别,得到解析识别数据;
根据所述检测类型确定至少一个检测元件,并根据所述检测队列确定被分配的资源,使所述检测元件基于所述被分配的资源对所述解析识别数据进行检测,输出检测结果。
本申请实施例还提供计算机可读存储介质的一个实施例,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时,使得所述处理器执行如下步骤:
接收业务数据检测任务,根据所述业务数据检测任务获取对应的检测参数,所述检测参数至少包括数据表名、库名、检测队列和检测类型;
基于所述数据表名和所述库名访问数据库并确定数据表,根据所述数据表获取存储于所述数据库中的待检测业务数据,对所述待检测业务数据进行元数据解析和元数据识别,得到解析识别数据;
根据所述检测类型确定至少一个检测元件,并根据所述检测队列确定被分配的资源,使所述检测元件基于所述被分配的资源对所述解析识别数据进行检测,输出检测结果。
本发明的实施方式
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
为了使本技术领域的人员更好地理解本申请方案,下面将结合附图,对本申请实施例中的技术方案进行清楚、完整地描述。
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如网页浏览器应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。
终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器( Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3 )、MP4( Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4 )播放器、膝上型便携计算机和台式计算机等等。
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上显示的页面提供支持的后台服务器。
需要说明的是,本申请实施例所提供的业务数据质量检测方法一般由服务器执行,相应地,业务数据质量检测装置一般设置于服务器中。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。
继续参考图2,其示出了根据本申请的业务数据质量检测方法的一个实施例的流程图。所述的业务数据质量检测方法包括以下步骤:
S201,接收业务数据检测任务,根据所述业务数据检测任务获取对应的检测参数,所述检测参数至少包括数据表名、库名、检测队列和检测类型;
S202,基于所述数据表名和所述库名访问数据库并确定数据表,根据所述数据表获取存储于所述数据库中的待检测业务数据,对所述待检测业务数据进行元数据解析和元数据识别,得到解析识别数据;
S203,根据所述检测类型确定至少一个检测元件,并根据所述检测队列确定被分配的资源,使所述检测元件基于所述被分配的资源对所述解析识别数据进行检测,输出检测结果。
下面对上述步骤进行展开说明。
对于步骤S201,新开发的数据表在上线前,或者上线后的数据表被更新后,都将进行业务数据的检测,确保数据表中的异常数据能够被及时检测出来,使得新开发的数据表达到业务的上线标准,或者使更新的数据表继续满足业务的上线标准。
在本申请实施例中,所述业务数据检测任务可由任务提交端提交,比如用户通过Web页面或终端等提交,在同时有多个业务数据检测任务被提交时,任务信息可被写入关系型数据库PG的任务表中,由数据检测端定时访问任务表以确认是否有检测任务需要执行,访问任务表可以通过设置守护进程来实现,可针对业务的数据质量检测设置多个守护进程,各守护进程以不同虚拟用户(具有不同的数据权限)的身份运行在集群的不同节点之上,每个守护进程每隔一段时间发起访问请求,以确认是否有任务需要执行。在任务表中存在多个任务信息时,对任务表中的业务数据检测任务依次执行检测操作。
在本实施例中,业务数据检测任务涉及需要检测的数据表及检测类型等信息,具体对应于检测参数,包括数据表名、库名、检测队列和检测类型等,通过业务数据检测任务可以确定每次检测动用的资源和待检测的内容。
具体的,所述数据表名用于确定待检测的数据表。
所述库名用于确定存储待检测的数据表的数据库。
所述检测队列用于从现有的若干处理队列中确定至少一个处理队列进行检测运算,每个处理队列被分配有独立的检测资源,根据待检测的数据表包含的数据量的大小,可以选择检测资源不同的检测队列,或者存在多个检测任务时,这些检测任务被分配至不同检测队列进行同步检测,具体可根据库名和预设的配置信息自动选择至少一个默认的处理队列。
所述检测类型用于确定针对待检测的数据表执行何种检测,比如对整体数据量、某个或某些数据字段的取值(或取值范围)是否异常进行检测。
在本实施例中,所述检测类型至少包括统计型和预测型,其中,检测类型为统计型时,具体为对数据表执行某个或某几个统计项的统计操作,统计结果可用于业务运行过程中的数据监测;检测类型为预测型时,具体为基于异常检测模型对数据表进行异常检测。
在进一步的实施例中,所述统计型包括描述性统计、趋势统计、对比统计等。
其中,描述性统计和趋势统计会自动计算预设的多个指标,例如对数值型字段,可包括记录数、最大值、最小值、均值、分位数和饱和度等指标;对于非数值型字段,可包括记录数、饱和度等指标;对于枚举值类型字段,可包括各枚举值的分布等指标;在进行检测时,可以选取部分指标进行检测,一方面通过指标来汇总描述数据表的各个维度,另一方面通过指标取值是否合理/是否有异常的变化来检测数据是否异常,因此其中部分指标的检测结果可用于后续的数据监测,部分指标的检测结果可用于完善数据表的描述性信息。
对比统计具体通过预设的字段的检查规则进行,例如预设枚举值取值范围、数据型取值范围、字段编码等检查规则,比方说格式化的手机号字段,应为11位的纯数字,又或者性别字段取值范围不能超出男、女、未知三种。
在进一步的实施例中,所述异常检测模型可采用孤立森林异常检测模型,孤立森林异常检测模型所采用的孤立森林算法中,“异常点”被定义为“容易被孤立的离群点”,分布稀疏且离密度高的群体较远。本申请实施例在由待检测业务数据形成的数据集上,随机选择特征以及拆分的值,构造多颗随机树,由于“异常点”更加的稀少,分布也更加的稀疏,会更容易被区分开,距离根节点的距离会更短,从而可检测出异常的数据。相比人工评估,采用模型预测时判断标准容易界定,可以充分考虑不同数据的特性。在采用异常检测模型进行异常检测时,本申请实施例会考虑到日期因素,比如月份、工作日与否、假期与否等,避免对数据正常的波动报异常。
对于步骤S202,被访问的数据库可以仅有一个,也可有多个,相应的,数据表和待检测的业务数据可存储在一个或多个数据库中。
在本实施例中,所述元数据解析是指通过执行hive ddl命令,获取系统里数据表的建表语句,根据建表语句获取待检测业务数据中的列信息(哪些列、列类似、列备注信息)、表信息(表时间、数据压缩格式等)、表数据信息(有无分区、文件数、文件大小等)等,然后通过python程序解析、存储为结构化数据,供其他后续使用。
进一步地,所述元数据识别可以自动识别数据类型,业务数据由关系型数据库导入Hive时,将各类型的值(数值、日期等)都按照字符串格式来存储,即存储为文本字段,本申请实施例元数据识别使用元数据解析的出参数据,通过正则表达式对文本字段的真实类型做出判断,识别出业务数据原本的类型,具体可以识别整型、浮点数型、日期等数据类型,例如:一个字符串以+或者-开头,后跟的全为0-9数字,则会被认为是数值型;形如xxxx-yy-zz,这里x、y、z均为正整数且取值在合理的范围内,则会被认为是日期型。
待检测业务数据进行上述的元数据解析和元数据识别后即得到解析识别数据,解析识别数据为具有确定的数据类型的结构化数据。
在一些实施例中,所述根据所述数据表获取存储于所述数据库中的待检测业务数据包括:确定所述数据表包含的数据量,判断所述数据表的数据量是否大于预设阈值;当所述数据表的数据量不大于预设阈值时,直接根据所述数据表获取所述待检测业务数据,否则从所述数据表中随机抽取预设数量的数据生成临时数据表,根据所述临时数据表获取所述待检测业务数据。
具体的,所述预设数量不超过所述预设阈值,比如30万条。当生成临时数据表时,后续将针对临时数据表进行元数据解析和元数据识别,在实际处理时,元数据识别涉及数据类型间的兼容,如一列数据,大部分数字,少量为字符串,整列也只能被归类为字符串,原则上随机抽样部分数据做判别即可。在针对临时数据表进行业务数据检测时,尽管检测结果精确度相比于全量数据时降低,但在面对超大数据表时可有效降低检测元件的检测耗时,并提升检测元件的检测稳定性,对于采用全量数据比较耗时、结果又相对不要求特别精确(如分位数,精确的分位数涉及全量数据排序,一般不会计算精确值)的检测过程,采用临时数据表尤其有效。此外,在另一些实施例中,当数据表数据量过大、且检测过程通过Spark执行时,也可以通过优化Spark的资源参数配置、部分计算过程使用SparkDataFrame代替SparkSQL的操作来提高检测效率和稳定性。
在一些实施例中,在所述确定所述数据表包含的数据量之后,所述方法还包括:确定的各所述检测元件执行检测时所需的最低数据量;判断步骤各所述检测元件所需的最低数据量是否大于所述数据表包含的数据量,并将所需的最低数据量大于所述数据表包含的数据量所对应的检测元件剔除。在实际检测时,一些检测元件对待检测数据的数据量有要求,例如对数据表做模型预测来检测数据异常时,数据量过小会导致模型训练过程异常,进而会导致数据检测不准确,此时将跳过模型评测,仅在输出的检测结果表里做相应的记录。
在一些实施例中,在所述根据所述检测类型确定至少一个检测元件之前,所述方法还包括:获取预设的特殊字符识别配置信息,根据特征字符识别配置信息进行元数据识别。采用此步骤可以为了提高元数据识别的准确度,例如当日期类型字段中存在“NULL”字符串时,可能误识别为字符串,而未识别为日期类型,通过预设的特殊字符配置信息可以实现准确的识别,类似的特殊字符还有字符串头尾多余的空格等。
对于步骤S203,所述检测队列被分配的资源可包括处理器、存储空间等。
所述检测元件对所述解析识别数据进行检测输出的检测结果数据可写入PG数据库,可通过报表查看或下载结果数据,由于业务数据往往对接多个机构或部门,可针对不同机构生成不同的报表或结果数据,确保数据的安全性和私密性。此外,检测元件执行检测时,将实时更新日志信息。
在本步骤中,前文提到通过守护进程来确认是否有任务需要执行,当确认有任务时,本步骤相应的通过守护进行控制检测元件依次执行检测。
上述实施例提到检测类型包含有统计型和预测型,每一种检测类型对应一个检测元件,相应的上述检测元件可以有描述性统计元件、趋势统计元件、对比统计元件、模型预测元件等。在本申请实施例中,这些元件以封装模块的形式存在,具体的,各元件为基于要执行检测的检测类型自动生成的封装SQL代码,执行检测时通过Spark执行。
在本实施例,通过检测类型这一参数来确定守护进行要执行的检测元件,比如检测类型的参数值为“全部”,则依次执行所有检测元件,若检测类型的参数值为“描述性统计”,则只会执行描述性统计元件。本申请实施例在对检测类型进行参数赋值时,可以通过数字来标识检测元件的类型,比如“0”标识“全部”,“1”标识“描述性统计元件”,“2”标识“趋势统计元件”,“3”标识“模型预测元件”等,当然,在其他实施例中也可用其他标识方式,此处不做限定。
在本申请实施例中,上述元数据解析和元数据识别的执行部件也可以封装元件的形式体现,分别对应元数据解析元件和元数据识别元件。
相应的,在对检测出来存在数据异常时,也可通过配置问题发现元件来实现数据异常问题的汇总和输出。具体的,问题发现元件可根据描述性统计、趋势统计、对比统计、模型预测各元件获取的检测数据,自动收集、汇总数据表中可能存在的问题,同时对数据问题的“严重程度”进行划分,便于可以做分层的展示,如:空表、主键冲突为严重问题;某字段的饱和度低于30%为一般严重问题。在进行数据展示时,根据问题的严重程度,提供筛选器或者拆分为不同的图表展示,对于比较严重的数据问题可设置提醒或告警,比如接入即时通信软件、邮箱等实现告警信息的发送。在本申请实施例中,问题发现元件并非必须,比如部分场景下出于探索数据、了解数据概况的需求,只需要输出各字段的饱和度、数值型字段的最大值、最小值、均值等描述性统计信息,不涉及到异常信息,故本申请实施例在根据根据检测类型确定至少一个检测元件后,还包括确定各检测元件是否涉及数据异常检测,若涉及则加载问题发现元件,否则不加载。本申请实施例通过将执行数据检测涉及的各部分模块化,可以方便调用,提高检测的灵活性。
在一些实施例中,在所述使所述检测元件基于所述被分配的资源对所述解析识别数据进行检测之前,所述方法还包括:当根据所述检测类型确定有至少两个检测元件时,判断各所述检测元件之间是否存在依赖关系,若存在则根据所述依赖关系确定各所述检测元件的执行顺序。具体的,在存在多个检测元件时,部分检测元件的检测可能存在先后顺序,比如模型预测模块依赖于趋势统计模块,则二者之间存储依赖关系,此时趋势统计模块执行在模型预测模块之前,而前述的问题发现模块依赖于在前的所有检测元件,则问题发现模块最后执行。
在一些实施例中,在所述使所述检测元件基于所述被分配的资源对所述解析识别数据进行检测之前,所述方法还包括:获取并解析辅助参数,判断所述辅助参数中的各参数项的赋值是否为空,并根据赋值非空的参数项对待进行元数据解析和元数据识别的所述待检测业务数据进行筛选,同时根据赋值非空的参数项对根据所述检测类型确定的检测元件进行筛选。具体的,所述检测参数还可包括辅助参数,辅助参数可为用户自定义的参数,具体的,辅助参数可包括测试字段、数值型字段、字符型字段、枚举型字段、业务日期字段、条件、主键、虚拟用户等参数项,在实际检测时,可根据这些参数项的赋值情况来对待进行元数据解析和元数据识别的所述待检测业务数据进行筛选,以及对根据所述检测类型确定的检测元件进行筛选。下面对这些参数项进行展开说明。
在本实施例中,测试字段用于指定要检测的字段,如果该参数为空,则默认评测全部字段,如果该参数不为空,则后续的元数据解析和元数据识别仅针对指定要检测的字段。
数值型字段用于指定哪些字段为数值型,字符型字段用于指定哪些字段为字符型,枚举型字段用于指定哪些字段为枚举型,如果数值型字段、字符型字段、枚举型字段这三个参数为空,则使用元数据识别的结果,若不为空,则元数据识别仅针对未指定类型的字段。
业务日期字段用于指定业务日期字段,则以该字段为业务日期,待检测的数据若含有该字段,则可执行趋势统计和模型预测,如该参数为空,则即便检测类型参数包含趋势统计和模型预测,也不执行趋势统计和模型预测。
条件参数用于标识是否对数据表中的数据进行筛选,比如对测试表做where条件筛选,如果该参数为空,则不做条件筛选;
主键用于标识是否对数据表进行唯一性测试,如果输入该参数,则会执行主键或联合主键的唯一性测试,如果该参数为空,则不执行主键唯一性测试;
虚拟用户用于指定数据检测时的执行脚本虚拟用户,如果为空,则选择默认的虚拟用户。
在一些实施例中,所述根据所述检测类型确定至少一个检测元件包括:判断当前是否存在与所述检测类型相对应的检测元件,若存在则直接获取对应的所述检测元件,否则基于所述检测类型生成与所述检测类型相对应的新的检测元件。例如基于描述性统计这一检测类型来生成描述性统计元件,具体通过获取对应的描述性统计配置信息,这些配置信息可包含描述性统计需要统计的类别、数据范围、时间范围、统计规则等,基于配置信息来生成描述性统计元件。对于生成的检测元件,每个检测元件被封装成互相独立的Python函数,每个检测元件可独立运行,进行检测时,待执行的各个检测元件由一个shell主程序串联构成数据质量检测整体功能,由于各检测元件互相独立,可以方便地增删模块,或者选择性执行一部分模块。因此可以根据检测需求生成新的检测元件或当某一检测需求不存在时删除对应的检测元件,灵活性高。
本申请实施例提供的业务数据质量检测方法在接收业务数据检测任务后可以自动实现待检测业务数据的获取以及解析和识别,并通过检测元件实现模块化的检测,自动实现不同维度的数据检测,检测会更加的全面、智能,效率更高,同时可降低人力投入,特别对于上线运行的业务数据,可以实时监控指标的异常变化,有助于更早更及时的发现数据异常更新、不完全更新、漏更新等数据异常,以及及时发现计算逻辑异常、数据指标异常变化等问题,提高业务数据的可用性、稳定性和准确性。此外,本申请实施例可自动生成数据表的基本信息、描述性信息等,这有助于提升数据探索、数据梳理等工作的效率。使用方便,可以通过Web页面提交任务,检测结果自动生成报表,可视化的方式查看、使用数据。
需要强调的是,为进一步保证信息的私密和安全性,根据数据表获取的待检测业务数据还可以存储于一区块链的节点中。
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
本申请可用于众多通用或专用的计算机系统环境或配置中。例如:个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、多处理器系统、基于微处理器的系统、置顶盒、可编程的消费电子设备、网络PC、小型计算机、大型计算机、包括以上任何系统或设备的分布式计算环境等等。本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,前述的所述计算机可读存储介质可以是非易失性存储介质,也可以是易失性存储介质,比如可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
进一步参考图3,作为对上述图2所示方法的实现,本申请提供了一种业务数据质量检测装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图3所示,本实施例所述的业务数据质量检测装置包括:参数获取模块301、数据获取模块302以及检测模块303。
其中,所述参数获取模块301用于接收业务数据检测任务,根据所述业务数据检测任务获取对应的检测参数,所述检测参数至少包括数据表名、库名、检测队列和检测类型;所述数据获取模块302用于基于所述数据表名和所述库名访问数据库并确定数据表,根据所述数据表获取存储于所述数据库中的待检测业务数据,对所述待检测业务数据进行元数据解析和元数据识别,得到解析识别数据;所述检测模块303用于根据所述检测类型确定至少一个检测元件,并根据所述检测队列确定被分配的资源,使所述检测元件基于所述被分配的资源对所述解析识别数据进行检测,输出检测结果。
具体的,新开发的数据表在上线前,或者上线后的数据表被更新后,都可通过业务数据质量检测装置进行业务数据的检测,确保数据表中的异常数据能够被及时检测出来,使得新开发的数据表达到业务的上线标准,或者使更新的数据表继续满足业务的上线标准。
在本申请实施例中,所述业务数据检测任务可由任务提交端提交,比如用户通过Web页面或终端等提交,在同时有多个业务数据检测任务被提交时,任务信息可被写入关系型数据库PG的任务表中,业务数据质量检测装置定时访问任务表以确认是否有检测任务需要执行,访问任务表可以通过设置守护进程来实现,关于守护进程具体可参考上述方法实施例的相关内容,在此不作展开。在任务表中存在多个任务信息时,业务数据质量检测装置对任务表中的业务数据检测任务依次执行检测操作。
在本实施例中,业务数据检测任务涉及需要检测的数据表及检测类型等信息,具体对应于检测参数,包括数据表名、库名、检测队列和检测类型等,参数获取模块301通过业务数据检测任务可以确定每次检测动用的资源和待检测的内容。对于数据表名、库名、检测队列和检测类型等的相关内容可参考上述方法实施例,在此不作展开。
在本实施例中,数据获取模块302访问的数据库可以仅有一个,也可有多个,即数据表和待检测的业务数据可存储在一个或多个数据库中。
在本实施例中,数据获取模块302进行元数据解析是指通过执行hive ddl命令,获取系统里数据表的建表语句,根据建表语句获取待检测业务数据中的列信息(哪些列、列类似、列备注信息)、表信息(表时间、数据压缩格式等)、表数据信息(有无分区、文件数、文件大小等)等,然后通过python程序解析、存储为结构化数据,供其他后续使用。
进一步地,数据获取模块302通过元数据识别可以自动识别数据类型,业务数据由关系型数据库导入Hive时,将各类型的值(数值、日期等)都按照字符串格式来存储,即存储为文本字段,本申请实施例元数据识别使用元数据解析的出参数据,通过正则表达式对文本字段的真实类型做出判断,识别出业务数据原本的类型,具体可以识别整型、浮点数型、日期等数据类型,例如:一个字符串以+或者-开头,后跟的全为0-9数字,则会被认为是数值型;形如xxxx-yy-zz,这里x、y、z均为正整数且取值在合理的范围内,则会被认为是日期型。
数据获取模块302对待检测业务数据进行上述的元数据解析和元数据识别后即得到解析识别数据,解析识别数据为具有确定的数据类型的结构化数据。
在一些实施例中,所述数据获取模块302根据所述数据表获取存储于所述数据库中的待检测业务数据时,具体用于确定所述数据表包含的数据量,判断所述数据表的数据量是否大于预设阈值;当所述数据表的数据量不大于预设阈值时,直接根据所述数据表获取所述待检测业务数据,否则从所述数据表中随机抽取预设数量的数据生成临时数据表,根据所述临时数据表获取所述待检测业务数据。关于临时数据表的相关内容具体可参考上述方法实施例,在此不作展开。
在一些实施例中,在所述数据获取模块302确定所述数据表包含的数据量之后,所述检测模块303还用于确定的各所述检测元件执行检测时所需的最低数据量,并判断步骤各所述检测元件所需的最低数据量是否大于所述数据表包含的数据量,并将所需的最低数据量大于所述数据表包含的数据量所对应的检测元件剔除。在实际检测时,一些检测元件对待检测数据的数据量有要求,例如对数据表做模型预测来检测数据异常时,数据量过小会导致模型训练过程异常,进而会导致数据检测不准确,此时检测模块303将跳过模型评测,仅在输出的检测结果表里做相应的记录。
在一些实施例中,所述检测模块303在根据所述检测类型确定至少一个检测元件之前,还用于获取预设的特殊字符识别配置信息,根据特征字符识别配置信息进行元数据识别。采用此步骤可以为了提高元数据识别的准确度,例如当日期类型字段中存在“NULL”字符串时,可能误识别为字符串,而未识别为日期类型,通过预设的特殊字符配置信息可以实现准确的识别,类似的特殊字符还有字符串头尾多余的空格等。
在本实施例中,所述检测模块303确定检测队列被分配的资源时具体用于确定执行检测时所调用的处理器、存储空间等资源。
所述检测模块303通过检测元件对所述解析识别数据进行检测输出的检测结果数据可写入PG数据库,可通过报表查看或下载结果数据,由于业务数据往往对接多个机构或部门,可针对不同机构生成不同的报表或结果数据,确保数据的安全性和私密性。此外,检测元件执行检测时,将实时更新日志信息。
前文提到通过守护进程来确认是否有任务需要执行,当确认有任务时,所述检测模块303相应的通过守护进行控制检测元件依次执行检测。
前文提到检测类型包含有统计型和预测型,每一种类型对应一个检测元件,相应的所述检测模块303调用的检测元件可以有描述性统计元件、趋势统计元件、对比统计元件、模型预测元件等。在本申请实施例中,这些元件以封装模块的形式存在,具体的,各元件为基于要执行检测的检测类型自动生成的封装SQL代码,执行检测时通过Spark执行。
在本实施例,所述检测模块303通过检测类型这一参数来确定守护进行要执行的检测元件,具体可参考上述方法实施例,在此不作展开。
在本申请实施例中,上述元数据解析和元数据识别的执行部件也可以封装元件的形式体现,分别对应元数据解析元件和元数据识别元件。相应的,在对检测出来存在数据异常时,也可通过配置问题发现元件来实现数据异常问题的汇总和输出,具体可参考上述方法实施例,在此不作展开。
在一些实施例中,所述检测模块303在使所述检测元件基于所述被分配的资源对所述解析识别数据进行检测之前,还用于在根据所述检测类型确定有至少两个检测元件时,判断各所述检测元件之间是否存在依赖关系,若存在则根据所述依赖关系确定各所述检测元件的执行顺序。具体的,在存在多个检测元件时,部分检测元件的检测可能存在先后顺序,比如模型预测模块依赖于趋势统计模块,则二者之间存储依赖关系,此时趋势统计模块执行在模型预测模块之前,而前述的问题发现模块依赖于在前的所有检测元件,则问题发现模块最后执行。
在一些实施例中,所述检测模块303在使所述检测元件基于所述被分配的资源对所述解析识别数据进行检测之前,所述参数获取模块301还用于获取并解析辅助参数,所述数据获取模块302还用于判断所述辅助参数中的各参数项的赋值是否为空,根据赋值非空的参数项对待进行元数据解析和元数据识别的所述待检测业务数据进行筛选,所述检测模块303还用于根据赋值非空的参数项对根据所述检测类型确定的检测元件进行筛选。所述辅助参数可为用户自定义的参数,具体的,辅助参数可包括测试字段、数值型字段、字符型字段、枚举型字段、业务日期字段、条件、主键、虚拟用户等参数项,这些参数项的相关内容可参考上述方法实施例,在此不作展开。在实际检测时,可根据这些参数项的赋值情况来对待进行元数据解析和元数据识别的所述待检测业务数据进行筛选,以及对根据所述检测类型确定的检测元件进行筛选。
在一些实施例中,所述检测模块303根据所述检测类型确定至少一个检测元件时,具体用于判断当前是否存在与所述检测类型相对应的检测元件,若存在则直接获取对应的所述检测元件,否则基于所述检测类型生成与所述检测类型相对应的新的检测元件。对于生成的检测元件,每个检测元件被封装成互相独立的Python函数,每个检测元件可独立运行,进行检测时,待执行的各个检测元件由一个shell主程序串联构成数据质量检测整体功能,由于各检测元件互相独立,可以方便地增删模块,或者选择性执行一部分模块。因此可以根据检测需求生成新的检测元件或当某一检测需求不存在时删除对应的检测元件,灵活性高。
本申请提供的业务数据质量检测装置在接收业务数据检测任务后可以自动实现待检测业务数据的获取以及解析和识别,并通过检测元件实现模块化的检测,自动实现不同维度的数据检测,检测会更加的全面、智能,效率更高,同时可降低人力投入,特别对于上线运行的业务数据,可以实时监控指标的异常变化,有助于更早更及时的发现数据异常更新、不完全更新、漏更新等数据异常,以及及时发现计算逻辑异常、数据指标异常变化等问题,提高业务数据的可用性、稳定性和准确性。此外,本申请实施例可自动生成数据表的基本信息、描述性信息等,这有助于提升数据探索、数据梳理等工作的效率。使用方便,可以通过Web页面提交任务,检测结果自动生成报表,可视化的方式查看、使用数据。
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图4,图4为本实施例计算机设备基本结构框图。所述计算机设备4包括通过系统总线相互通信连接存储器41、处理器42、网络接口43,所述存储器41中存储有计算机可读指令,所述处理器42执行所述计算机可读指令时实现上述方法实施例中所述的业务数据质量检测方法的步骤,并具有与上述业务数据质量检测方法相对应的有益效果,在此不作展开。
需要指出的是,图中仅示出了具有存储器41、处理器42、网络接口43的计算机设备4,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器 (Digital Signal Processor,DSP)、嵌入式设备等。
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。
在本实施例中,所述存储器41至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器41可以是所述计算机设备4的内部存储单元,例如该计算机设备4的硬盘或内存。在另一些实施例中,所述存储器41也可以是所述计算机设备4的外部存储设备,例如该计算机设备4上配备的插接式硬盘,智能存储卡(Smart Media Card, SMC),安全数字(Secure Digital, SD)卡,闪存卡(Flash Card)等。当然,所述存储器41还可以既包括所述计算机设备4的内部存储单元也包括其外部存储设备。本实施例中,所述存储器41通常用于存储安装于所述计算机设备4的操作系统和各类应用软件,例如对应于上述业务数据质量检测方法的计算机可读指令等。此外,所述存储器41还可以用于暂时地存储已经输出或者将要输出的各类数据。
所述处理器42在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器42通常用于控制所述计算机设备4的总体操作。本实施例中,所述处理器42用于运行所述存储器41中存储的计算机可读指令或者处理数据,例如运行对应于所述业务数据质量检测方法的计算机可读指令。
所述网络接口43可包括无线网络接口或有线网络接口,该网络接口43通常用于在所述计算机设备4与其他电子设备之间建立通信连接。
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令可被至少一个处理器执行,以使所述至少一个处理器执行如上述的业务数据质量检测方法的步骤,并具有与上述业务数据质量检测方法相对应的有益效果,在此不作展开。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干计算机可读指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。

Claims (20)

  1. 一种业务数据质量检测方法,包括下述步骤:
    接收业务数据检测任务,根据所述业务数据检测任务获取对应的检测参数,所述检测参数至少包括数据表名、库名、检测队列和检测类型;
    基于所述数据表名和所述库名访问数据库并确定数据表,根据所述数据表获取存储于所述数据库中的待检测业务数据,对所述待检测业务数据进行元数据解析和元数据识别,得到解析识别数据;
    根据所述检测类型确定至少一个检测元件,并根据所述检测队列确定被分配的资源,使所述检测元件基于所述被分配的资源对所述解析识别数据进行检测,输出检测结果。
  2. 根据权利要求1所述的业务数据质量检测方法,其中,所述根据所述数据表获取存储于所述数据库中的待检测业务数据包括:
    确定所述数据表包含的数据量,判断所述数据表的数据量是否大于预设阈值;当所述数据表的数据量不大于预设阈值时,直接根据所述数据表获取所述待检测业务数据,否则从所述数据表中随机抽取预设数量的数据生成临时数据表,根据所述临时数据表获取所述待检测业务数据。
  3. 根据权利要求2所述的业务数据质量检测方法,其中,在所述确定所述数据表包含的数据量之后,所述方法还包括:
    确定的各所述检测元件执行检测时所需的最低数据量;
    判断步骤各所述检测元件所需的最低数据量是否大于所述数据表包含的数据量,并将所需的最低数据量大于所述数据表包含的数据量所对应的检测元件剔除。
  4. 根据权利要求1至3任一项所述的业务数据质量检测方法,其中,在所述根据所述检测类型确定至少一个检测元件之前,所述方法还包括:获取预设的特殊字符识别配置信息,根据特征字符识别配置信息进行元数据识别。
  5. 根据权利要求1至3任一项所述的业务数据质量检测方法,其中,在所述使所述检测元件基于所述被分配的资源对所述解析识别数据进行检测之前,所述方法还包括:
    当根据所述检测类型确定有至少两个检测元件时,判断各所述检测元件之间是否存在依赖关系,若存在则根据所述依赖关系确定各所述检测元件的执行顺序。
  6. 根据权利要求1至3任一项所述的业务数据质量检测方法,其中,在所述使所述检测元件基于所述被分配的资源对所述解析识别数据进行检测之前,所述方法还包括:
    获取并解析辅助参数,判断所述辅助参数中的各参数项的赋值是否为空,并根据赋值非空的参数项对待进行元数据解析和元数据识别的所述待检测业务数据进行筛选,同时根据赋值非空的参数项对根据所述检测类型确定的检测元件进行筛选。
  7. 根据权利要求1至3任一项所述的业务数据质量检测方法,其中,所述根据所述检测类型确定至少一个检测元件包括:
    判断当前是否存在与所述检测类型相对应的检测元件,若存在则直接获取对应的所述检测元件,否则基于所述检测类型生成与所述检测类型相对应的新的检测元件。
  8. 一种业务数据质量检测装置,包括:
    参数获取模块,用于接收业务数据检测任务,根据所述业务数据检测任务获取对应的检测参数,所述检测参数至少包括数据表名、库名、检测队列和检测类型;
    数据获取模块,用于基于所述数据表名和所述库名访问数据库并确定数据表,根据所述数据表获取存储于所述数据库中的待检测业务数据,对所述待检测业务数据进行元数据解析和元数据识别,得到解析识别数据;
    检测模块,用于根据所述检测类型确定至少一个检测元件,并根据所述检测队列确定被分配的资源,使所述检测元件基于所述被分配的资源对所述解析识别数据进行检测,输出检测结果。
  9. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
    接收业务数据检测任务,根据所述业务数据检测任务获取对应的检测参数,所述检测参数至少包括数据表名、库名、检测队列和检测类型;
    基于所述数据表名和所述库名访问数据库并确定数据表,根据所述数据表获取存储于所述数据库中的待检测业务数据,对所述待检测业务数据进行元数据解析和元数据识别,得到解析识别数据;
    根据所述检测类型确定至少一个检测元件,并根据所述检测队列确定被分配的资源,使所述检测元件基于所述被分配的资源对所述解析识别数据进行检测,输出检测结果。
  10. 根据权利要求9所述的计算机设备,其中,所述处理器执行所述计算机可读指令实现所述根据所述数据表获取存储于所述数据库中的待检测业务数据的步骤时,具体实现如下步骤:
    确定所述数据表包含的数据量,判断所述数据表的数据量是否大于预设阈值;当所述数据表的数据量不大于预设阈值时,直接根据所述数据表获取所述待检测业务数据,否则从所述数据表中随机抽取预设数量的数据生成临时数据表,根据所述临时数据表获取所述待检测业务数据。
  11. 根据权利要求10所述的计算机设备,其中,所述处理器在执行所述计算机可读指令实现所述确定所述数据表包含的数据量的步骤之后,所述处理器执行所述计算机可读指令时还实现如下步骤:
    确定的各所述检测元件执行检测时所需的最低数据量;
    判断步骤各所述检测元件所需的最低数据量是否大于所述数据表包含的数据量,并将所需的最低数据量大于所述数据表包含的数据量所对应的检测元件剔除。
  12. 根据权利要求9至11任一项所述的计算机设备,其中,所述处理器在执行所述计算机可读指令实现所述根据所述检测类型确定至少一个检测元件的步骤之前,所述处理器执行所述计算机可读指令时还实现如下步骤:
    获取预设的特殊字符识别配置信息,根据特征字符识别配置信息进行元数据识别。
  13. 根据权利要求9至11任一项所述的计算机设备,其中,所述处理器在执行所述计算机可读指令实现所述使所述检测元件基于所述被分配的资源对所述解析识别数据进行检测的步骤之前,所述处理器执行所述计算机可读指令时还实现如下步骤:
    当根据所述检测类型确定有至少两个检测元件时,判断各所述检测元件之间是否存在依赖关系,若存在则根据所述依赖关系确定各所述检测元件的执行顺序。
  14. 根据权利要求9至11任一项所述的计算机设备,其中,所述处理器在执行所述计算机可读指令实现所述使所述检测元件基于所述被分配的资源对所述解析识别数据进行检测的步骤之前,所述处理器执行所述计算机可读指令时还实现如下步骤:
    获取并解析辅助参数,判断所述辅助参数中的各参数项的赋值是否为空,并根据赋值非空的参数项对待进行元数据解析和元数据识别的所述待检测业务数据进行筛选,同时根据赋值非空的参数项对根据所述检测类型确定的检测元件进行筛选。
  15. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时,使得所述处理器执行如下步骤:
    接收业务数据检测任务,根据所述业务数据检测任务获取对应的检测参数,所述检测参数至少包括数据表名、库名、检测队列和检测类型;
    基于所述数据表名和所述库名访问数据库并确定数据表,根据所述数据表获取存储于所述数据库中的待检测业务数据,对所述待检测业务数据进行元数据解析和元数据识别,得到解析识别数据;
    根据所述检测类型确定至少一个检测元件,并根据所述检测队列确定被分配的资源,使所述检测元件基于所述被分配的资源对所述解析识别数据进行检测,输出检测结果。
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述计算机可读指令被所述处理器执行,使得所述处理器执行所述根据所述数据表获取存储于所述数据库中的待检测业务数据的步骤时,具体执行如下步骤:
    确定所述数据表包含的数据量,判断所述数据表的数据量是否大于预设阈值;当所述数据表的数据量不大于预设阈值时,直接根据所述数据表获取所述待检测业务数据,否则从所述数据表中随机抽取预设数量的数据生成临时数据表,根据所述临时数据表获取所述待检测业务数据。
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述计算机可读指令被所述处理器执行,使得所述处理器在执行所述确定所述数据表包含的数据量的步骤之后,还执行如下步骤:
    确定的各所述检测元件执行检测时所需的最低数据量;
    判断步骤各所述检测元件所需的最低数据量是否大于所述数据表包含的数据量,并将所需的最低数据量大于所述数据表包含的数据量所对应的检测元件剔除。
  18. 根据权利要求15至17任一项所述的计算机可读存储介质,其中,所述计算机可读指令被所述处理器执行,使得所述处理器在执行所述根据所述检测类型确定至少一个检测元件的步骤之前,还执行如下步骤:
    获取预设的特殊字符识别配置信息,根据特征字符识别配置信息进行元数据识别。
  19. 根据权利要求15至17任一项所述的计算机可读存储介质,其中,所述计算机可读指令被所述处理器执行,使得所述处理器在执行所述使所述检测元件基于所述被分配的资源对所述解析识别数据进行检测的步骤之前,还执行如下步骤:
    当根据所述检测类型确定有至少两个检测元件时,判断各所述检测元件之间是否存在依赖关系,若存在则根据所述依赖关系确定各所述检测元件的执行顺序。
  20. 根据权利要求15至17任一项所述的计算机可读存储介质,其中,所述计算机可读指令被所述处理器执行,使得所述处理器在执行所述使所述检测元件基于所述被分配的资源对所述解析识别数据进行检测的步骤之前,还执行如下步骤:
    获取并解析辅助参数,判断所述辅助参数中的各参数项的赋值是否为空,并根据赋值非空的参数项对待进行元数据解析和元数据识别的所述待检测业务数据进行筛选,同时根据赋值非空的参数项对根据所述检测类型确定的检测元件进行筛选。
PCT/CN2020/135593 2020-08-31 2020-12-11 业务数据质量检测方法、装置、计算机设备及存储介质 WO2021147559A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010899921.1 2020-08-31
CN202010899921.1A CN112052138A (zh) 2020-08-31 2020-08-31 业务数据质量检测方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021147559A1 true WO2021147559A1 (zh) 2021-07-29

Family

ID=73606615

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/135593 WO2021147559A1 (zh) 2020-08-31 2020-12-11 业务数据质量检测方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN112052138A (zh)
WO (1) WO2021147559A1 (zh)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112052138A (zh) * 2020-08-31 2020-12-08 平安科技(深圳)有限公司 业务数据质量检测方法、装置、计算机设备及存储介质
CN112632048A (zh) * 2020-12-18 2021-04-09 恩亿科(北京)数据科技有限公司 一种数据质量检测方法、系统、电子设备及存储介质
CN112613892B (zh) * 2020-12-25 2024-03-15 北京知因智慧科技有限公司 基于业务系统的数据处理方法、装置以及电子设备
CN112597142A (zh) * 2020-12-26 2021-04-02 中国农业银行股份有限公司 一种数据质量检测方法和数据质量检测引擎
CN113049935A (zh) * 2021-03-04 2021-06-29 长鑫存储技术有限公司 半导体智能检测系统、智能检测方法及存储介质
CN113591485B (zh) * 2021-06-17 2024-07-12 国网浙江省电力有限公司 一种基于数据科学的智能化数据质量稽核系统及方法
CN114186244B (zh) * 2022-01-26 2022-09-16 中国电子信息产业集团有限公司 一种数据要素操作框架及系统
CN115129498A (zh) * 2022-06-24 2022-09-30 深圳前海微众银行股份有限公司 一种监控方法、设备以及存储介质
CN116701383B (zh) * 2023-08-03 2023-10-27 中航信移动科技有限公司 一种数据实时质量监测方法、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050066240A1 (en) * 2002-10-04 2005-03-24 Tenix Investments Pty Ltd Data quality & integrity engine
CN106708909A (zh) * 2015-11-18 2017-05-24 阿里巴巴集团控股有限公司 数据质量的检测方法和装置
CN109656812A (zh) * 2018-11-19 2019-04-19 平安科技(深圳)有限公司 数据质量检测方法、装置及存储介质
CN110704186A (zh) * 2019-09-25 2020-01-17 国家计算机网络与信息安全管理中心 基于混合分布架构的计算资源分配方法、装置和存储介质
CN111427928A (zh) * 2020-03-26 2020-07-17 京东数字科技控股有限公司 一种数据质量检测方法及装置
CN112052138A (zh) * 2020-08-31 2020-12-08 平安科技(深圳)有限公司 业务数据质量检测方法、装置、计算机设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512283B (zh) * 2015-12-04 2019-05-03 国网江西省电力公司信息通信分公司 数据质量管理控制方法及装置
CN111177134B (zh) * 2019-12-26 2021-04-02 上海科技发展有限公司 适用于海量数据的数据质量分析方法、装置、终端及介质
CN111400365B (zh) * 2020-02-26 2023-09-19 杭州美创科技股份有限公司 基于标准sql下的业务系统数据质量检测方法
CN111488363B (zh) * 2020-06-28 2020-10-02 平安国际智慧城市科技股份有限公司 数据处理方法、装置、电子设备及介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050066240A1 (en) * 2002-10-04 2005-03-24 Tenix Investments Pty Ltd Data quality & integrity engine
CN106708909A (zh) * 2015-11-18 2017-05-24 阿里巴巴集团控股有限公司 数据质量的检测方法和装置
CN109656812A (zh) * 2018-11-19 2019-04-19 平安科技(深圳)有限公司 数据质量检测方法、装置及存储介质
CN110704186A (zh) * 2019-09-25 2020-01-17 国家计算机网络与信息安全管理中心 基于混合分布架构的计算资源分配方法、装置和存储介质
CN111427928A (zh) * 2020-03-26 2020-07-17 京东数字科技控股有限公司 一种数据质量检测方法及装置
CN112052138A (zh) * 2020-08-31 2020-12-08 平安科技(深圳)有限公司 业务数据质量检测方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN112052138A (zh) 2020-12-08

Similar Documents

Publication Publication Date Title
WO2021147559A1 (zh) 业务数据质量检测方法、装置、计算机设备及存储介质
US11670021B1 (en) Enhanced graphical user interface for representing events
EP4099170B1 (en) Method and apparatus of auditing log, electronic device, and medium
US8229973B2 (en) Infrastructure and architecture for development and execution of predictive models
CN110880136A (zh) 配套产品的推荐方法、系统、设备和存储介质
CN113836131A (zh) 一种大数据清洗方法、装置、计算机设备及存储介质
CN114461644A (zh) 一种数据采集方法、装置、电子设备及存储介质
CN112085087A (zh) 业务规则生成的方法、装置、计算机设备及存储介质
CN112363814A (zh) 任务调度方法、装置、计算机设备及存储介质
CN114741392A (zh) 数据查询方法、装置、电子设备及存储介质
CN113010542B (zh) 业务数据处理方法、装置、计算机设备及存储介质
CN112487021A (zh) 业务数据的关联分析方法、装置及设备
CN116955856A (zh) 信息展示方法、装置、电子设备以及存储介质
CN112100177A (zh) 数据存储方法、装置、计算机设备及存储介质
CN116860311A (zh) 脚本分析方法、装置、计算机设备及存储介质
CN111950623A (zh) 数据稳定性监控方法、装置、计算机设备及介质
CN116450723A (zh) 数据提取方法、装置、计算机设备及存储介质
CN114240663A (zh) 数据对账方法、装置、终端及存储介质
CN114443663A (zh) 数据表处理方法、装置、设备及介质
CN114818635A (zh) 数据报表生成方法、装置、电子设备及存储介质
CN112487262A (zh) 一种数据处理的方法和装置
US20140325457A1 (en) Searching of line pattern representations using gestures
US20230086429A1 (en) Method of recognizing address, electronic device and storage medium
CN115941712B (zh) 报送数据的处理方法、装置、计算机设备及存储介质
CN116542779A (zh) 基于人工智能的产品推荐方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20916089

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20916089

Country of ref document: EP

Kind code of ref document: A1