CN117131027A - Data quality detection method, device, terminal equipment and storage medium - Google Patents

Data quality detection method, device, terminal equipment and storage medium Download PDF

Info

Publication number
CN117131027A
CN117131027A CN202311063670.3A CN202311063670A CN117131027A CN 117131027 A CN117131027 A CN 117131027A CN 202311063670 A CN202311063670 A CN 202311063670A CN 117131027 A CN117131027 A CN 117131027A
Authority
CN
China
Prior art keywords
quality detection
task
data
metadata
detection task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311063670.3A
Other languages
Chinese (zh)
Inventor
周璇
陈礼和
彭章华
柯鹏
陈作特
张文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Merchants Bank Co Ltd
Original Assignee
China Merchants Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Merchants Bank Co Ltd filed Critical China Merchants Bank Co Ltd
Priority to CN202311063670.3A priority Critical patent/CN117131027A/en
Publication of CN117131027A publication Critical patent/CN117131027A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data quality detection method, a device, terminal equipment and a storage medium, wherein the method comprises the following steps: generating a quality detection task based on the metadata acquired in advance; executing the quality detection task and obtaining a task execution result; and if the task execution result does not meet the preset constraint condition, sending out an alarm notification. The invention solves the problems of multiple data sources in the data quality detection, such as non-adaptation, single detection dimension, manual coding requirement and incapability of timely alarming, and improves the efficiency of data quality detection.

Description

Data quality detection method, device, terminal equipment and storage medium
Technical Field
The present invention relates to the field of data management technologies, and in particular, to a data quality detection method, a data quality detection device, a terminal device, and a storage medium.
Background
Traditional data quality detection needs to pass through more complicated processes, including processes such as determining detection indexes, data quality detection task code development, data review, result analysis and the like, and is time-consuming, labor-consuming and low in instantaneity.
The existing automatic quality detection technical scheme is only suitable for simpler data quality detection scenes in a relational database, and has no universality, portability and expandability.
The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art.
Disclosure of Invention
The invention mainly aims to provide a data quality detection method, a device, terminal equipment and a storage medium, and aims to solve the technical problems that various data sources are not adaptive, the detection dimension is single, manual coding is needed, and timely warning cannot be performed.
In order to achieve the above object, the present invention provides a data quality detection method, including:
generating a quality detection task based on the metadata acquired in advance;
executing the quality detection task and obtaining a task execution result;
and if the task execution result does not meet the preset constraint condition, sending out an alarm notification.
Optionally, the step of generating the quality detection task based on the pre-acquired metadata includes:
extracting metadata acquired in advance to acquire structural information of the metadata;
converting and adapting through a preset query statement generator according to the structure information to obtain a meta model;
and generating a quality detection task according to the meta-model.
Optionally, the step of executing the quality detection task and obtaining a task execution result includes:
Acquiring detection task configuration information and detection task query sentences according to the quality detection task;
and performing quality detection on the detection task configuration information and the detection task query statement to acquire a task execution result.
Optionally, the step of sending an alarm notification if the task execution result does not meet a preset constraint condition includes:
reading the task execution result through an abnormality alarming module to obtain a reading result;
judging according to the reading result and a preset constraint condition;
and if the constraint condition is not satisfied, sending out an alarm notification.
Optionally, the step of generating a quality detection task according to the meta-model includes:
generating a metadata service interface according to the meta model;
and converting the metadata into input configuration of a quality task according to the metadata service interface, and generating a quality detection task.
Optionally, the step of obtaining detection task configuration information and detection task query statement according to the quality detection task includes:
reading the quality detection task through the metadata service interface to obtain detection task configuration information;
And converting the quality detection task through the query statement generator to obtain a detection task query statement.
Optionally, the step of performing quality detection on the detection task configuration information and the detection task query statement, and obtaining a task execution result includes:
sending the detection task configuration information and the detection task query statement to a data cluster;
and calculating according to the data cluster through a preset calculation program to obtain a task execution result.
The embodiment of the invention also provides a data quality detection device, which comprises:
the generation module is used for generating a quality detection task based on the metadata acquired in advance;
the execution module is used for executing the quality detection task and obtaining a task execution result;
and the alarm module is used for sending an alarm notification if the task execution result does not meet the preset constraint condition.
The embodiment of the invention also provides a terminal device which comprises a memory, a processor and a data quality detection program stored in the memory and capable of running on the processor, wherein the data quality detection program realizes the steps of the data quality detection method when being executed by the processor.
The embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a data quality detection program, and the data quality detection program realizes the steps of the data quality detection method when being executed by a processor.
The embodiment of the invention provides a data quality detection method, a device, terminal equipment and a storage medium, which are used for generating a quality detection task based on metadata acquired in advance; executing the quality detection task and obtaining a task execution result; and if the task execution result does not meet the preset constraint condition, sending out an alarm notification. The quality detection task is generated based on metadata, then the quality detection task is executed, the detection result is obtained, the detection result is analyzed according to constraint conditions, and an alarm notification is timely sent out, so that the detection and alarm of the data quality are realized, the problems that various data sources are not adaptive, the detection dimension is single, manual coding is needed, and the alarm cannot be timely carried out are solved, and the efficiency of the data quality detection is improved.
Drawings
FIG. 1 is a schematic diagram of functional blocks of a terminal device to which a data quality detection apparatus of the present invention belongs;
FIG. 2 is a flow chart of an exemplary embodiment of a data quality detection method according to the present invention;
FIG. 3 is a schematic diagram illustrating a method for detecting data quality according to the present invention;
FIG. 4 is another overall schematic diagram of the data quality detection method of the present invention;
FIG. 5 is a flow chart of another exemplary embodiment of a data quality detection method of the present invention;
FIG. 6 is a schematic diagram of a data quality detection method of the present invention involving the generation of quality detection tasks;
FIG. 7 is a flow chart of another exemplary embodiment of a data quality detection method of the present invention;
FIG. 8 is a schematic diagram of a data quality detection method according to the present invention involving acquisition of task execution results;
FIG. 9 is a flow chart of another exemplary embodiment of a data quality detection method of the present invention;
FIG. 10 is a diagram of a data quality detection method of the present invention involving the sending of alert notifications;
FIG. 11 is a flow chart of a data quality detection method of the present invention involving the generation of quality detection tasks;
FIG. 12 is a flow chart of a data quality detection method of the present invention involving acquisition of task configuration information and query statements;
fig. 13 is a schematic flow chart of a data quality detection method according to the present invention, which relates to obtaining a task execution result.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The main solutions of the embodiments of the present invention are: extracting metadata acquired in advance to acquire structural information of the metadata; converting and adapting through a preset query statement generator according to the structure information to obtain a meta model; and generating a quality detection task according to the meta-model. Acquiring detection task configuration information and detection task query sentences according to the quality detection task; and performing quality detection on the detection task configuration information and the detection task query statement to acquire a task execution result. Reading the task execution result through an abnormality alarming module to obtain a reading result; judging according to the reading result and a preset constraint condition; and if the constraint condition is not satisfied, sending out an alarm notification. Generating a metadata service interface according to the meta model; and converting the metadata into input configuration of a quality task according to the metadata service interface, and generating a quality detection task. Reading the quality detection task through the metadata service interface to obtain detection task configuration information; and converting the quality detection task through the query statement generator to obtain a detection task query statement. Sending the detection task configuration information and the detection task query statement to a data cluster; and calculating according to the data cluster through a preset calculation program to obtain a task execution result. Therefore, the problems that various data sources are not adaptive, the detection dimension is single, manual coding is needed, and timely warning cannot be achieved are solved, the detection and warning of the data quality are achieved, and the efficiency of the data quality detection is improved. Based on the scheme of the invention, from the problems that in reality, the data quality detection has multiple data source uncomfortableness, single detection dimension, manual coding is needed and timely alarm cannot be carried out, the data quality detection method is designed, the effectiveness of the data quality detection method is verified when the data quality is detected, and finally the efficiency of the data quality detection by the method is obviously improved.
Technical terms related to the embodiment of the invention:
metadata service: metadata services are services that provide management and querying of data set metadata, which is data describing data that contains information about the definition, structure, attributes, origin, quality, etc. of the data, which is important to the value of the data management and data assets, and which can help organizations better understand and utilize the data, and generally include the following functions: metadata collection and registration, which collects metadata information of a dataset through various ways and registers and records, may include information of a source system, a table structure, field definition, data quality rules, data ownership, etc. of the data; metadata storage and management: the metadata is stored in a proper database or metadata warehouse, and corresponding management interfaces and tools are provided, so that the user can conveniently browse, search, edit and update the metadata; metadata query and retrieval, providing flexible query and retrieval functions enabling users to find and access desired metadata information according to specific conditions and requirements, which may include keyword, attribute, tag, etc. based queries, as well as advanced query and filtering functions; metadata lineage and relationship analysis, which helps users to understand lineage and dependency relationships between datasets, by analyzing relationships between metadata, sources, flow paths, and effects of data between different systems and applications can be tracked; metadata complement and maintenance, supporting the user to complement, modify and maintain metadata, which can include adding, deleting and modifying metadata information, and providing functions of approval, version control and the like; metadata visualizations and reports, providing intuitive visualization interfaces and reports, helping users to better understand and present metadata, which may include generating data dictionaries, relationship graphs, data flow charts, etc., and generating statistics and analysis reports, through metadata services, data assets may be better managed and utilized, improving the credibility, accessibility, and reusability of the data.
SQL: SQL (Structured Query Language) is a standardized language for managing and manipulating relational databases, which is widely used in database management systems (DBMSs) for querying, inserting, updating and deleting data, and creating and managing the structure of databases, and SQL generators refer to a tool or library for generating codes of SQL query statements. The method can help developers to construct and combine SQL queries more conveniently, and reduce the workload of handwriting SQL.
Spark: spark is an open source big data processing framework, which provides a unified data processing engine, can process mass data efficiently, and is originally developed by AMPLab laboratories of berkeley division of california university, and is open source in 2010, and the main characteristics of Spark include: the method has the advantages that the Spark executes tasks in the memory, compared with the traditional disk read-write mode, the method utilizes the concept of an elastic distributed data set (Resilient Distributed Datasets, RDD for short), can buffer data in the memory, reduces the cost of disk IO, and improves the processing speed; support multiple data processing modes: spark supports a plurality of data processing modes such as batch processing, interactive query and stream processing, provides rich APIs, and can perform various tasks such as data operation, machine learning, graph calculation and the like; the Spark has good fault tolerance, and when a node fails, the Spark can automatically resume calculation and continue task execution without restarting the operation; the Spark provides an easy-to-use programming interface, including APIs of Scala, java, python, R and other languages, and a developer can select a proper programming language for development according to own preference and demand; extensibility, spark supports horizontal extension, can run on distributed computing clusters, can be integrated with Hadoop, and utilizes the resource manager (such as yacn) of Hadoop to manage and schedule tasks; the Spark possesses rich ecosystem, including Spark SQL, spark Streaming, spark MLlib, graphX, etc. components, these components expand Spark's function, make it can carry on more extensive data processing and analysis, spark is suitable for processing scene such as large-scale data, complex data processing and real-time data processing, it is used in various industries such as finance, telecommunication, medical treatment, electronic commerce, etc. field, is used for application such as data mining, data analysis, machine learning, real-time recommendation, simultaneously, spark is also an important component in the Hadoop ecosystem, cooperate with Hadoop, hive and other instruments, offer powerful solution for big data processing.
JDBC executor: JDBC executor refers to a tool or library for performing Java database connectivity (JDBC) operations, JDBC being a standard Java API for interacting with relational databases. JDBC executors can simplify and unify access and operation to databases, and JDBC executors typically provide the following functions: the JDBC executor can manage the creation, release and multiplexing of database connection, which provides a mechanism of a connection pool, can effectively manage a plurality of database connections, and improves the performance and the resource utilization rate; SQL execution, a JDBC executor provides a method for executing SQL sentences, can execute query Sentences (SELECT), update sentences (INSERT, UPDATE, DELETE), call stored procedures and the like, and a developer can enter the SQL sentences through the API of the executor and acquire an execution result or the number of affected rows; the parameterized query is supported by the JDBC executor, the SQL sentence with the placeholder can be processed, and the parameter value is transmitted to the precompiled SQL sentence, so that SQL injection attack can be avoided, and the execution efficiency and the safety are improved; transaction management, a JDBC executor can process database transactions, support the starting, submitting and rollback operations of the transactions, and a developer can use an API of the executor to manage the boundaries of the transactions, so that a group of related database operations are ensured to be submitted completely or rolled back completely; exception handling, in which a JDBC executor handles possible exception conditions in database operations and provides a corresponding exception handling mechanism, by which developers can handle errors and exception conditions in database operations by capturing and handling exceptions; the result set is processed, the JDBC executor can package the query result into Java objects, the developer can process and operate the result set conveniently, various methods and APIs are provided for obtaining data in the result set, traversing the result set, obtaining column values and the like, the JDBC executor is a common choice for database operation by using JDBC, a simple and flexible mode is provided for accessing and operating the relational database, the developer can use the JDBC executor to execute SQL sentences of various types and process transactions and abnormal conditions, and in addition, the JDBC executor can be used for managing a connection pool, so that the performance and the efficiency of the database operation are improved.
Hive: hive is a Hadoop-based data warehouse infrastructure that provides a query language (HiveQL) similar to SQL to handle large-scale distributed data, and its design goal is to provide a simplified way for developers familiar with relational databases and SQL language to handle and analyze large data, with the main features and functions of Hive including: SQL-based query languages, hive uses a query language (hiveQL) similar to SQL, so that developers can use familiar SQL grammar to query and analyze data, and Hive converts the hiveQL into an underlying MapReduce task or a more efficient execution engine, thereby realizing the processing of large-scale data; data abstraction, hive supports abstract and organize data, uses concepts such as tables, partitions and buckets to manage the data, and through the abstractions, developers can organize the data into meaningful structures, so that the data is convenient to query and analyze; extensibility, hive can run on distributed computing clusters, utilize Hadoop's resource manager (e.g., YARN) to manage and schedule tasks, which can handle large-scale data and handle ever-increasing amounts of data through horizontal extensions; the data storage formats, hive supports various storage formats including texts, serialized files, RCFile, parquet and the like, and the storage formats can be selected according to the characteristics and the requirements of data, so that the inquiry performance and the storage efficiency are improved; performing optimization: hive has the function of performing optimization, and optimizes the query execution plan through Cost-based optimization (CBO), statistics collection, and other techniques. The relevance of the query can be automatically presumed, and the query performance is improved by optimizing; expanding ecosystems, hive has rich ecosystems, including Hive extensions, user-defined functions (UDF), hive plug-ins, etc., which enable Hive to handle more types of data and complex analysis tasks; hive is widely applied to the field of big data, particularly in the scenes of data warehouse, data integration, data analysis and the like, can process structured and semi-structured data, can perform flexible and powerful query and analysis through HiveQL, and simultaneously, cooperates with other tools (such as HBase, spark and the like) of a Hadoop ecosystem to provide a complete big data processing solution.
The embodiment of the invention considers that the related technology needs to go through complicated processes when detecting the data quality, including processes of determining detection indexes, coding and developing data quality detection tasks, rechecking data, analyzing results and the like, and the method has the problems of time and labor waste, lower instantaneity and low efficiency.
Therefore, according to the embodiment of the invention, from the problems that in reality, the data quality detection has multiple data source uncomfortableness, single detection dimension, manual coding is needed and timely warning cannot be realized, the data quality detection method is designed, the effectiveness of the data quality detection method is verified when the data quality is detected, and finally the efficiency of the data quality detection by the method is obviously improved.
Specifically, referring to fig. 1, fig. 1 is a schematic diagram of functional blocks of a terminal device to which the data quality detection apparatus of the present invention belongs. The data quality detection means may be independent of the means of the terminal device capable of data quality detection, which may be carried on the terminal device in the form of hardware or software. The terminal equipment can be intelligent mobile equipment with a data processing function such as a mobile phone and a tablet personal computer, and can also be fixed terminal equipment or a server with a data processing function.
In this embodiment, the terminal device to which the data quality detection apparatus belongs at least includes an output module 110, a processor 120, a memory 130, and a communication module 140.
The memory 130 stores an operating system and a data quality detection program, and the data quality detection device may generate a quality detection task based on metadata acquired in advance; executing the quality detection task and obtaining a task execution result; and if the task execution result does not meet the preset constraint condition, sending out an alarm notification. Detecting data quality by the data quality detecting program, and storing information such as a detection result in the memory 130; the output module 110 may be a display screen or the like. The communication module 140 may include a WIFI module, a mobile communication module, a bluetooth module, and the like, and communicates with an external device or a server through the communication module 140.
Wherein the data quality detection program in the memory 130 when executed by the processor performs the steps of:
generating a quality detection task based on the metadata acquired in advance;
executing the quality detection task and obtaining a task execution result;
and if the task execution result does not meet the preset constraint condition, sending out an alarm notification.
Further, the data quality detection program in the memory 130, when executed by the processor, further performs the steps of:
extracting metadata acquired in advance to acquire structural information of the metadata;
converting and adapting through a preset query statement generator according to the structure information to obtain a meta model;
and generating a quality detection task according to the meta-model.
Further, the data quality detection program in the memory 130, when executed by the processor, further performs the steps of:
acquiring detection task configuration information and detection task query sentences according to the quality detection task;
and performing quality detection on the detection task configuration information and the detection task query statement to acquire a task execution result.
Further, the data quality detection program in the memory 130, when executed by the processor, further performs the steps of:
reading the task execution result through an abnormality alarming module to obtain a reading result;
judging according to the reading result and a preset constraint condition;
and if the constraint condition is not satisfied, sending out an alarm notification.
Further, the data quality detection program in the memory 130, when executed by the processor, further performs the steps of:
Generating a metadata service interface according to the meta model;
and converting the metadata into input configuration of a quality task according to the metadata service interface, and generating a quality detection task.
Further, the data quality detection program in the memory 130, when executed by the processor, further performs the steps of:
reading the quality detection task through the metadata service interface to obtain detection task configuration information;
and converting the quality detection task through the query statement generator to obtain a detection task query statement.
Further, the data quality detection program in the memory 130, when executed by the processor, further performs the steps of:
sending the detection task configuration information and the detection task query statement to a data cluster;
and calculating according to the data cluster through a preset calculation program to obtain a task execution result.
According to the scheme, the quality detection task is generated based on the metadata acquired in advance; executing the quality detection task and obtaining a task execution result; and if the task execution result does not meet the preset constraint condition, sending out an alarm notification. The quality detection task is generated based on metadata, then the quality detection task is executed, the detection result is obtained, the detection result is analyzed according to constraint conditions, and an alarm notification is timely sent out, so that the problems that various data sources are not suitable, the detection dimension is single, manual coding is needed, and timely alarm cannot be achieved can be solved. Based on the scheme of the invention, from the problems that in reality, the data quality detection has multiple data source uncomfortableness, single detection dimension, manual coding is needed and timely alarm cannot be carried out, the data quality detection method is designed, the effectiveness of the data quality detection method is verified when the data quality is detected, and finally the efficiency of the data quality detection by the method is obviously improved.
The method embodiments of the present invention are presented based on the above-described terminal device architecture but not limited to the above-described framework.
Referring to fig. 2, fig. 2 is a flowchart of an exemplary embodiment of a data quality detection method according to the present invention. The data quality detection method comprises the following steps:
step S01, generating a quality detection task based on metadata acquired in advance;
the main execution body of the method of the embodiment may be a data quality detection device, or may be a data quality detection terminal device or a server, and the embodiment uses the data quality detection device as an example, and the data quality detection device may be integrated on a terminal device with a data processing function.
In order to detect the data quality, a quality detection task is acquired first, and the quality detection task is realized through the following steps:
firstly, the data quality detection index is structured and disassembled based on a data physical model and a data quality standard, wherein the data physical model is an important concept of database design, and refers to an organization mode and structure of data in a database on a physical storage level, which describes how to map data objects (such as tables, columns, relationships and the like) in a logic model onto specific physical storage structures (such as disks, files, indexes and the like), and the data quality standard is a set of standards or indexes for measuring and evaluating the quality degree of the data, and the standards aim to ensure the accuracy, the integrity, the consistency, the reliability and the timeliness of the data so as to provide high-quality data support decision and service requirements, the standards can be used as reference indexes for evaluating the quality of the data, and can be adjusted and expanded according to specific service requirements and data characteristics, and the organization can ensure the reliability and the availability of the data by periodically checking and monitoring the quality of the data, thereby better supporting decision and service activities.
Finally, based on metadata and SQL generators, the data source configuration and quality detection SQL is generated according to selected metrics, wherein metadata refers to data describing the data, which provides various information and attributes about the data, metadata can help understand and manage the data, and conduct organization, discovery and use of the data, while SQL is a standardized language for managing relational databases, which provides a set of grammars and specifications for defining, operating and querying databases, and SQL almost all systems supporting relational databases follow SQL standards, different database management systems may have some grammatical differences, but basic SQL statements and concepts are generic, through SQL, users can conveniently manage and query the data in the databases, and conduct various complex data operations and analyses, in this embodiment, data quality can be detected by SQL as a query statement.
Step S02, executing the quality detection task and obtaining a task execution result;
after the quality detection task is acquired, acquiring a task execution result through the following steps:
firstly, in the process of generating a quality detection task, acquiring SQL sentences, namely query sentences, and reading configuration after a data source is converted into the quality task through the query sentences;
Finally, a Spark computing, timing scheduler and JDBC executor are used to perform quality detection tasks to obtain detection results, wherein Spark is a fast and general big data processing framework that provides a distributed computing capability that can handle large-scale data sets and support complex data analysis and computation tasks, while Spark provides rich APIs and tools that enable developers to conveniently perform data processing and analysis and support complex big data computation scenarios, and JDBC executor is a Java class library or tool for performing database operations that provides a set of APIs and methods that allow developers to conveniently connect to databases and perform SQL statements or storage procedures.
And S03, if the task execution result does not meet the preset constraint condition, sending out an alarm notification.
After the data quality detection is completed, sending an alarm notification according to the detection result:
firstly, calculating an output result and a predefined constraint condition;
then, if the constraint condition is met, the quality operation is normal, the current data information is recorded, but no alarm is sent;
and finally, if the constraint condition is not met, the quality operation is abnormal, and at the moment, according to the setting of the user, an alarm notification is sent to the set application program or a designated person.
Specifically, as shown in fig. 3, fig. 3 is an overall schematic diagram of the data quality detection method of the present invention.
Firstly, generating a data quality detection task based on metadata;
then, executing a data quality detection task to obtain a detection result;
and finally, carrying out abnormality judgment according to the detection result, and sending out an alarm notification when the alarm condition is met.
More specifically, as shown in fig. 4, fig. 4 is another overall schematic diagram of the data quality detection method according to the present invention.
First, query is performed using SELECT statements, which are keywords in Structured Query Language (SQL) for retrieving data from databases, which are typically used with other SQL clauses to define the required data sets and retrieval conditions, to obtain the test contents;
then, selecting a data source of the data to be retrieved using a FROM key, wherein the FROM key is a database table or view in the SELECT statement that is used to specify the data to be retrieved therefrom;
then, the scan range is determined using a WHERE key, which is a clause in the SELECT statement that is used to specify screening conditions that allow the user to filter the retrieved data according to the particular conditions;
then, executing by using a Spark task, and obtaining a detection result;
Then, splitting according to the alarm bar fittings, and comparing the split alarm condition with a detection result for analysis;
and finally, when the detection result is analyzed to meet the alarm condition, sending out an alarm notification.
According to the scheme, the quality detection task is generated based on the metadata acquired in advance; executing the quality detection task and obtaining a task execution result; and if the task execution result does not meet the preset constraint condition, sending out an alarm notification. Therefore, the detection of the data quality and the alarm notification are realized, the problems that various data sources are not adaptive, the detection dimension is single, manual coding is needed, and the alarm cannot be timely performed are solved, and the efficiency of the data quality detection is improved.
Referring to fig. 5, fig. 5 is a flowchart illustrating a data quality detection method according to another exemplary embodiment of the present invention.
Based on the embodiment shown in fig. 2, the step S01, the step of generating the quality detection task based on the metadata acquired in advance includes:
step S011, extracting the metadata acquired in advance to acquire the structural information of the metadata;
step S012, converting and adapting through a preset query statement generator according to the structure information to obtain a meta model;
And step S013, generating a quality detection task according to the meta-model.
Specifically, to generate a quality detection task, it is achieved by:
firstly, manually inputting metadata and automatically extracting, realizing accurate attribution of a physical table (including a theme tag, a responsible person, additional information and the like corresponding to the table) in a manual inputting mode, completing acquisition of a production metadata physical model in an automatic extracting mode, acquiring Database Description Language (DDL) by using SQL instructions for acquiring relational databases such as mySQL and hive metadata, and extracting json schema to acquire structural information for non-relational databases such as elasticsearch, redis;
then, converting the heterogeneous data source into a meta-model described by a hive SQL through SQL generator conversion adaptation, wherein specific steps include, but are not limited to, extracting table structures and field information of relational data, selecting fields of the table by using select sentences in the hive SQL to obtain corresponding views, analyzing json for non-relational data, acquiring json shcema, extracting json structures, traversing json data and collecting various field information including field names, field types and the like, tiling json fields by using select sentences in the hive SQL to obtain corresponding views, and thus completing meta-model construction;
Then, defining and managing the metadata service through the metadata model to acquire the corresponding metadata service;
and finally, using a lookup table provided by the metadata service to connect the metadata service such as configuration information, field information and index information, using the metadata service in a way of opening an API call to convert a data source into input configuration of a quality task, and generating a quality task execution SQL.
Fig. 6 is a schematic diagram of a data quality detection method according to the present invention, which involves generating quality detection tasks, as shown in fig. 6.
Firstly, metadata is collected, and metadata structure information is obtained through manual input and automatic extraction;
then, building a meta model through the structural information of the meta data;
and finally, defining and managing the metadata service through the metadata model, acquiring the corresponding metadata service, and using the metadata service through the corresponding interface.
According to the scheme, the structure information of the metadata is obtained by extracting the metadata which are obtained in advance; converting and adapting through a preset query statement generator according to the structure information to obtain a meta model; and generating a quality detection task according to the meta-model. Therefore, the generation of the quality detection task is completed, the acquired metadata is extracted, namely converted, the corresponding query statement is generated, the quality detection task is generated, the problems that various data sources are not suitable, the detection dimension is single and manual coding is needed are solved, and the universality of data quality detection is improved.
Referring to fig. 7, fig. 7 is a flowchart illustrating a data quality detection method according to another exemplary embodiment of the present invention.
Based on the embodiment shown in fig. 2, the step S02 of executing the quality detection task, and the step of obtaining the task execution result includes:
step S021, according to the quality detection task, acquiring detection task configuration information and detection task query sentences;
step S022, performing quality detection on the detection task configuration information and the detection task query statement, and obtaining a task execution result.
Specifically, in order to execute the quality detection task, a task execution result is obtained, and the following steps are adopted:
firstly, obtaining configuration information of a detection task and a detection task query statement according to a quality detection task, wherein the configuration information of the detection task is obtained from metadata service, the detection task query statement is an SQL statement and is obtained through an SQL generator, and therefore reading of the quality detection task is completed;
finally, according to the read task configuration information and the detection task query statement, calculating through a big data processing frame Spark, automatically triggering the execution of the task through a timing scheduler according to a time table or a rule set by a user, connecting to a database through a JDBC executor, executing an SQL statement or a storage process, completing the execution of the quality detection task, and obtaining a detection result.
Further, the process of using Spark for computation includes, but is not limited to: data loading, spark can load data from various data sources (such as Hadoop HDFS, relational databases, file systems, etc.), and the data is read and parsed by using APIs provided by Spark; conversion operations, in Spark, data may be processed and converted using conversion operations such as map, filter, reduce, which may be applied to the entire data set or to each partition of the data set; caching and persistence, in order to improve the performance of calculation, the caching and persistence mechanism is used for storing the repeatedly accessed data on a memory or a disk, so that repeated calculation can be avoided, and the subsequent calculation speed is increased; parallel computing, wherein Spark uses a distributed computing model, and the computing process is accelerated by dividing data into a plurality of partitions and performing parallel computing on a plurality of computing nodes in a cluster, and the computing performance is optimized by adjusting the number of the partitions and the parallelism; data aggregation and summarization, spark provides a series of aggregation operations and functions, such as sum, avg, max, min, etc., for calculating statistical indexes and summarization results of data; outputting the result, and finally, saving the calculation result to a different target, such as a file system, a database or other data storage system, and performing the operation using the JDBC processor may be: establishing a database connection, wherein the connection with the database can be established by using a JDBC executor, the connection information of the database such as URL, user name, password and the like is provided, and once the connection is successful, the database operation is started to be executed; executing SQL statements, executing various SQL statements such as query (SELECT), INSERT (INSERT), UPDATE (UPDATE), DELETE (DELETE) and the like by passing SQL statement strings or using precompiled statements (prepedStatement); processing the query result, if the query statement is executed, the JDBC executor can acquire the returned result set, iterate the result set by using the ResultSet object, and extract the required data; error processing, wherein the JDBC executor provides an error processing mechanism to capture and process possible anomalies in the execution process, such as connection failure, SQL statement errors and the like; after the database operation is performed, related resources are required to be released, such as closing connection, releasing a result set and the like, and the JDBC executor provides a related method to ensure the correct release of the resources so as to avoid the problems of resource leakage, memory overflow and the like.
More specifically, as shown in fig. 8, fig. 8 is a schematic diagram of a data quality detection method according to the present invention, which involves acquiring a task execution result.
Firstly, reading a quality detection task, converting the task, and obtaining quality task input configuration and quality task execution SQL (quality task execution statement), wherein the quality task input configuration is obtained through metadata service, and the quality task execution SQL is obtained through an SQL generator;
and finally, according to the acquired quality task input configuration and quality task execution statement, outputting results through Spark calculation, a timing scheduler and a JDBC executor to obtain a task execution result.
According to the scheme, the detection task configuration information and the detection task query statement are obtained according to the quality detection task; and performing quality detection on the detection task configuration information and the detection task query statement to acquire a task execution result. Therefore, the detection of the data quality is completed, the acquisition of the execution result of the data quality detection task is realized, the problems that the detection dimension is single and manual coding is needed are solved, and the efficiency of the data quality detection is improved.
Referring to fig. 9, fig. 9 is a flowchart illustrating another exemplary embodiment of a data quality detection method according to the present invention.
Based on the embodiment shown in fig. 2, in step S03, if the task execution result does not meet the preset constraint condition, the step of sending an alarm notification includes:
step S031, the coding information of the quality detection task is read through an abnormality alarming module, and task execution results and preset constraint conditions are obtained;
step S032, judging according to the task execution result and the constraint condition;
step S033, if the task execution result does not meet the constraint condition, sending an alarm notification.
Specifically, in order to alarm data with no data quality, the method is realized by the following steps:
firstly, acquiring coding information of a quality detection task through an abnormality alarm module;
then, acquiring a task execution result and corresponding constraint conditions through coding information, wherein the corresponding constraint conditions are generated by configuration in advance, and can be configured according to current service requirements, for example, aiming at data in some testing stages, data with quality problems are reserved for testing, and at the moment, the constraint conditions can be configured in advance in the constraint conditions, so that warning is avoided;
then, judging a task execution result by using constraint conditions, wherein the constraint conditions are acquired in advance and can be set according to specific service scenes, and in the embodiment, the constraint conditions comprise, but are not limited to, data type constraint, field constraint, null constraint, uniqueness constraint, format constraint and the like;
And finally, when the task execution meets the constraint condition, the detected data quality is considered to be qualified or meet the service requirement, no alarm notification is sent out at the moment, when the reading result does not meet the constraint condition, the detected data quality is considered to be unqualified or not meet the service requirement, and at the moment, the alarm notification is sent out, wherein the object and the mode of the alarm notification can be specifically set according to the service condition.
More specifically, as shown in fig. 10, fig. 10 is a schematic diagram of the data quality detection method according to the present invention, which involves sending out an alarm notification.
Firstly, reading coding information of a quality task, and acquiring a task execution result and constraint conditions;
and judging the task execution result by the constraint condition to obtain whether the current task reading result meets the constraint condition, if so, not sending out an alarm notification, and if not, sending out the alarm notification.
According to the scheme, the coded information of the quality detection task is read through the abnormality alarming module, and a task execution result and preset constraint conditions are obtained; judging according to the task execution result and the constraint condition; and if the task execution result does not meet the constraint condition, sending out an alarm notification. Therefore, analysis and judgment of task execution results and notification of the results are completed, alarm notification of data abnormality is realized, the problem that timely alarm cannot be performed is solved, and the efficiency of data quality detection is improved.
Referring to fig. 11, fig. 11 is a schematic flow chart of a data quality detection method according to the present invention, which relates to a quality detection task generation.
Based on the embodiment shown in fig. 5, the step S013, the step of generating the quality detection task according to the meta-model includes:
step S0131, generating a metadata service interface according to the meta model;
and step S0132, converting the metadata into input configuration of a quality task according to the metadata service interface, and generating a quality detection task.
Specifically, for generating a quality detection task, this is achieved by:
first, a metadata service interface is generated according to a meta model, wherein the meta model refers to a model for describing and defining metadata, defines the structure, attributes and relationships of metadata, specifies the organization of metadata, and the links and constraints between metadata, and the metadata service refers to a service for providing management, storage, query and manipulation of metadata, and allows a user to manage metadata through an interface or tool, including operations such as creation, modification, deletion, query and verification, and the relationships between the two are: the metadata service realizes functions of metadata access, query, operation and the like based on the metadata, ensures the accuracy, consistency and reliability of the metadata, interacts with the metadata service through interfaces, manages and operates the metadata through the structure and the constraint defined by the metadata model, for example, creates metadata instances by using the attribute and the relation defined by the metadata model, and performs the operations of adding, deleting and checking through the metadata service interface;
And finally, converting the metadata into input configuration of the quality task according to the metadata service interface, and generating a quality detection task.
According to the scheme, the metadata service interface is generated according to the meta model; and converting the metadata into input configuration of a quality task according to the metadata service interface, and generating a quality detection task. Therefore, the generation of the quality detection task is completed, the problems that the detection dimension is single and manual coding is needed are solved, and the efficiency of data quality detection is improved.
Referring to fig. 12, fig. 12 is a flow chart of a data quality detection method according to the present invention, which relates to acquiring task configuration information and query sentences.
Based on the embodiment shown in fig. 7, step S021, the step of obtaining the detection task configuration information and the detection task query statement according to the quality detection task includes:
step S0211, reading the quality detection task through the metadata service interface to obtain detection task configuration information;
and step S0212, converting the quality detection task through the query statement generator to obtain a detection task query statement.
Specifically, in order to obtain the detection task configuration information and the detection task query statement, the method is implemented by the following steps:
Firstly, acquiring metadata in advance and storing the metadata in a detection task, wherein a detection task query statement is specified to be a task query statement which is acquired after the metadata is converted when the metadata is structurally acquired, and the acquired task query statement is mainly an SQL statement, and is used in subsequent Spark calculation, a timing scheduler and a JDBC executor;
then, reading the quality detection task through a metadata service interface to obtain detection task configuration information;
and finally, converting the quality detection task through a query statement generator to obtain a detection task query statement.
According to the scheme, the quality detection task is read through the metadata service interface to obtain detection task configuration information; and converting the quality detection task through the query statement generator to obtain a detection task query statement. Therefore, the acquisition of the configuration information of the detection task and the inquiry statement of the detection task is completed, the problems that the detection dimension is single and manual coding is needed are solved, and the efficiency of data quality detection is improved.
Referring to fig. 13, fig. 13 is a schematic flow chart of a data quality detection method according to the present invention, which relates to obtaining a task execution result.
Based on the embodiment shown in fig. 7, in step S022, the step of performing quality detection on the detection task configuration information and the detection task query statement, and the step of obtaining the task execution result includes:
step S0221, sending the detection task configuration information and the detection task query statement to a data cluster;
step S0222, according to the data cluster, performing calculation by a preset calculation program, and obtaining a task execution result.
Specifically, in order to obtain the task execution result, the method is realized by the following steps:
firstly, submitting the generated detection task configuration information and detection task query statement to a data cluster with Spark through a timing scheduler, wherein the data cluster provides underlying infrastructure and resources, including computing nodes, a storage system, network connection and the like, and is used for storing and processing large-scale data; spark is used as a distributed computing framework, can be deployed and run on a big data cluster, and can process and analyze data in a distributed manner by utilizing the computing power and storage resources of the big data cluster; the big data cluster provides a distributed computing base environment for Spark, and supports the Spark to distribute tasks to a plurality of computing nodes for parallel execution, so that the data processing speed is increased; the Spark utilizes the storage system of the big data cluster, so that the data stored in the cluster can be directly read and processed, the data movement or copying is not needed, and the data access efficiency is improved;
And finally, calculating according to a calculation program with a Spark framework in the data cluster, and obtaining a task execution result.
According to the scheme, the detection task configuration information and the detection task query statement are sent to the data cluster; and calculating according to the data cluster through a preset calculation program to obtain a task execution result. Therefore, the acquisition of the task execution result is completed, the task execution by detecting the task configuration information and detecting the task query statement is realized, the problems that the detection dimension is single and manual coding is needed are solved, and the efficiency of data quality detection is improved.
In addition, an embodiment of the present invention further provides a data quality detection apparatus, where the data quality detection apparatus includes:
the generation module is used for generating a quality detection task based on the metadata acquired in advance;
the execution module is used for executing the quality detection task and obtaining a task execution result;
and the alarm module is used for sending an alarm notification if the task execution result does not meet the preset constraint condition.
In addition, the embodiment of the invention also provides a terminal device, which comprises a memory, a processor and a data quality detection program stored on the memory and capable of running on the processor, wherein the data quality detection program realizes the steps of the data quality detection method when being executed by the processor.
Because all the technical solutions of all the embodiments are adopted when the data quality detection program is executed by the processor, the data quality detection program at least has all the beneficial effects brought by all the technical solutions of all the embodiments, and is not described in detail herein.
In addition, the embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a data quality detection program, and the data quality detection program realizes the steps of the data quality detection method when being executed by a processor.
Because all the technical solutions of all the embodiments are adopted when the data quality detection program is executed by the processor, the data quality detection program at least has all the beneficial effects brought by all the technical solutions of all the embodiments, and is not described in detail herein.
Compared with the prior art, the data quality detection method, the device, the terminal equipment and the storage medium provided by the embodiment of the invention generate a quality detection task based on the metadata acquired in advance; executing the quality detection task and obtaining a task execution result; and if the task execution result does not meet the preset constraint condition, sending out an alarm notification. Therefore, the problems that various data sources are not adaptive, the detection dimension is single, manual coding is needed, and timely warning cannot be achieved are solved, the detection and warning of the data quality are achieved, and the efficiency of the data quality detection is improved. Based on the scheme of the invention, from the problems that in reality, the data quality detection has multiple data source uncomfortableness, single detection dimension, manual coding is needed and timely alarm cannot be carried out, the data quality detection method is designed, the effectiveness of the data quality detection method is verified when the data quality is detected, and finally the efficiency of the data quality detection by the method is obviously improved.
Compared with the prior art, the embodiment of the invention has the following advantages:
1. the method is suitable for various heterogeneous data sources, including but not limited to MySQL, hbase, hive, elasticSearch and the like, can select detection of consistency, repeatability, uniqueness, timeliness and the like in one quality task as required, is suitable for different scenes, and has certain universality;
2. based on metadata service and SQL generator, converting multiple data into hive SQL, and realizing full-automatic data quality detection by using zero SQL with simple configuration;
3. all software constructs of the present solution, including but not limited to source code, compiled class files, structured software packages, running scripts, configuration files, related documents, generated data, interface shots, and the like.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, a controlled terminal, or a network device, etc.) to perform the method of each embodiment of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (10)

1. A data quality detection method, characterized in that the data quality detection method comprises the steps of:
generating a quality detection task based on the metadata acquired in advance;
executing the quality detection task and obtaining a task execution result;
and if the task execution result does not meet the preset constraint condition, sending out an alarm notification.
2. The data quality detection method according to claim 1, wherein the step of generating a quality detection task based on the metadata acquired in advance includes:
extracting metadata acquired in advance to acquire structural information of the metadata;
converting and adapting through a preset query statement generator according to the structure information to obtain a meta model;
and generating a quality detection task according to the meta-model.
3. The method for detecting data quality according to claim 2, wherein the step of performing the quality detection task and obtaining the task execution result includes:
acquiring detection task configuration information and detection task query sentences according to the quality detection task;
and performing quality detection on the detection task configuration information and the detection task query statement to acquire a task execution result.
4. The data quality detection method according to claim 3, wherein the step of issuing an alarm notification if the task execution result does not satisfy a preset constraint condition comprises:
reading the coding information of the quality detection task through an abnormality alarming module to obtain a task execution result and preset constraint conditions;
judging according to the task execution result and the constraint condition;
and if the task execution result does not meet the constraint condition, sending out an alarm notification.
5. A data quality detection method according to claim 3, wherein the step of generating quality detection tasks from the metamodel comprises:
generating a metadata service interface according to the meta model;
and converting the metadata into input configuration of a quality task according to the metadata service interface, and generating a quality detection task.
6. The method according to claim 5, wherein the step of acquiring detection task configuration information and detection task query sentences according to the quality detection task comprises:
reading the quality detection task through the metadata service interface to obtain detection task configuration information;
And converting the quality detection task through the query statement generator to obtain a detection task query statement.
7. The method for detecting data quality according to claim 3, wherein the step of detecting the quality of the detection task configuration information and the detection task query statement, and obtaining the task execution result includes:
sending the detection task configuration information and the detection task query statement to a data cluster;
and calculating according to the data cluster through a preset calculation program to obtain a task execution result.
8. A data quality detection apparatus, characterized in that the data quality detection apparatus comprises:
the generation module is used for generating a quality detection task based on the metadata acquired in advance;
the execution module is used for executing the quality detection task and obtaining a task execution result;
and the alarm module is used for sending an alarm notification if the task execution result does not meet the preset constraint condition.
9. A terminal device, characterized in that the terminal device comprises a memory, a processor and a data quality detection program stored on the memory and executable on the processor, which data quality detection program, when executed by the processor, implements the steps of the data quality detection method according to any of claims 1-7.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a data quality detection program, which when executed by a processor, implements the steps of the data quality detection method according to any of claims 1-7.
CN202311063670.3A 2023-08-22 2023-08-22 Data quality detection method, device, terminal equipment and storage medium Pending CN117131027A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311063670.3A CN117131027A (en) 2023-08-22 2023-08-22 Data quality detection method, device, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311063670.3A CN117131027A (en) 2023-08-22 2023-08-22 Data quality detection method, device, terminal equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117131027A true CN117131027A (en) 2023-11-28

Family

ID=88857679

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311063670.3A Pending CN117131027A (en) 2023-08-22 2023-08-22 Data quality detection method, device, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117131027A (en)

Similar Documents

Publication Publication Date Title
US11409764B2 (en) System for data management in a large scale data repository
CN108959433B (en) Method and system for extracting knowledge graph from software project data and asking for questions and answers
US10831726B2 (en) System for importing data into a data repository
Khayyat et al. Bigdansing: A system for big data cleansing
US20100017395A1 (en) Apparatus and methods for transforming relational queries into multi-dimensional queries
US11941034B2 (en) Conversational database analysis
JP2017157229A (en) Scalable analysis platform for semi-structured data
EP3671526B1 (en) Dependency graph based natural language processing
US10915535B2 (en) Optimizations for a behavior analysis engine
US20100293161A1 (en) Automatically avoiding unconstrained cartesian product joins
CN114461603A (en) Multi-source heterogeneous data fusion method and device
US9063957B2 (en) Query systems
Swarna et al. Apache Pig-a data flow framework based on Hadoop Map Reduce
CN117093599A (en) Unified SQL query method for heterogeneous data sources
Al Mahruqi et al. A semi-automated framework for migrating web applications from SQL to document oriented NoSQL database.
CN117421302A (en) Data processing method and related equipment
CN116821098A (en) Data warehouse management method, service system and storage medium
US11681721B2 (en) Systems and methods for spark lineage data capture
KR101162468B1 (en) Automatic data store architecture detection
CN117131027A (en) Data quality detection method, device, terminal equipment and storage medium
CN106484706B (en) Method and apparatus for executing procedural SQL statements for distributed systems
KR102605930B1 (en) Method for processing structured data and unstructured data on database and data processing platform providing the method
Kvet et al. Enhancing Analytical Select Statements Using Reference Aliases
KR102605931B1 (en) Method for processing structured data and unstructured data on a plurality of databases and data processing platform providing the method
CN115952203B (en) Data query method, device, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination