CN113010488B - Data acquisition method, device, equipment and storage medium - Google Patents

Data acquisition method, device, equipment and storage medium Download PDF

Info

Publication number
CN113010488B
CN113010488B CN201911320945.0A CN201911320945A CN113010488B CN 113010488 B CN113010488 B CN 113010488B CN 201911320945 A CN201911320945 A CN 201911320945A CN 113010488 B CN113010488 B CN 113010488B
Authority
CN
China
Prior art keywords
data
configuration information
target
audited
data configuration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911320945.0A
Other languages
Chinese (zh)
Other versions
CN113010488A (en
Inventor
徐攀登
黄晓婧
韩翼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201911320945.0A priority Critical patent/CN113010488B/en
Publication of CN113010488A publication Critical patent/CN113010488A/en
Application granted granted Critical
Publication of CN113010488B publication Critical patent/CN113010488B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a data acquisition method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring data configuration information corresponding to a target database, wherein the data configuration information is used for determining data to be audited in the target database; analyzing the data configuration information according to the target database to obtain instantiation data configuration information corresponding to the target database; generating a data acquisition task according to the instantiation data configuration information, wherein the data acquisition task is used for acquiring the data to be audited from the target database. The universality is strong, and the acquisition of the data to be audited is more convenient.

Description

Data acquisition method, device, equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data acquisition method and apparatus, an electronic device, and a storage medium.
Background
The data quality audit is the data quality control of each link of production, processing, transmission, storage, use, exchange and the like in the data life cycle. Whether the data quality meets the requirements of a data user is generally detected according to the data quality rule configuration, the data quality rule configuration and auditing are carried out from the aspects of data integrity, data consistency, data timeliness, data accuracy, data logic and the like at present, and the rule that the quality rule does not pass is monitored and alarmed.
The existing data quality acquisition method comprises the steps of firstly acquiring data to be audited, and then comparing the data to be audited with a threshold value to judge the data quality of the data to be audited. The process of acquiring the data to be audited mainly comprises the steps of acquiring the data to be audited from a target database through a preset data acquisition program and executing the data acquisition program.
However, with this method, the data acquisition program can only be applied to the same database of the storage type, that is, if data in multiple databases of the same storage type needs to be acquired, multiple data acquisition programs need to be set to acquire the data, so that the acquisition of the data to be audited is very inconvenient.
Disclosure of Invention
The embodiment of the application provides a data acquisition method for conveniently acquiring data to be audited.
Correspondingly, the embodiment of the application also provides a data processing device, electronic equipment and a storage medium, which are used for ensuring the realization and application of the method.
In order to solve the above problems, an embodiment of the present application discloses a data acquisition method, including: acquiring data configuration information corresponding to a target database, wherein the data configuration information is used for determining data to be audited in the target database; analyzing the data configuration information according to the target database to obtain instantiation data configuration information corresponding to the target database; generating a data acquisition task according to the instantiation data configuration information, wherein the data acquisition task is used for acquiring the data to be audited from the target database.
The embodiment of the application also discloses a data acquisition method, which comprises the following steps: acquiring data configuration information corresponding to a target data source, wherein the data configuration information is used for determining data to be audited in the target data source; analyzing the data configuration information according to the target data source to obtain instantiation data configuration information corresponding to the target data source; generating a data acquisition task according to the instantiation data configuration information, wherein the data acquisition task is used for acquiring the data to be audited from the target data source.
The embodiment of the application also discloses a data acquisition device, which comprises: the configuration information acquisition module is used for acquiring data configuration information corresponding to a target database, wherein the data configuration information is used for determining data to be audited in the target database; the analysis module is used for analyzing the data configuration information according to the target database to obtain instantiation data configuration information corresponding to the target database; the task generating module is used for generating a data acquisition task according to the instantiation data configuration information, and the data acquisition task is used for acquiring the data to be audited from the target database.
The embodiment of the application also discloses a data acquisition device, which comprises: the configuration information acquisition module is used for acquiring data configuration information corresponding to a target data source, wherein the data configuration information is used for determining data to be audited in the target data source; the analysis processing module is used for analyzing the data configuration information according to the target data source to obtain instantiation data configuration information corresponding to the target data source; the task obtaining module is used for generating a data obtaining task according to the instantiation data configuration information, and the data obtaining task is used for obtaining the data to be audited from the target data source.
The embodiment of the application also discloses an electronic device, which comprises: a processor; and a memory having executable code stored thereon that, when executed, causes the processor to perform the data acquisition method as described in one or more embodiments above.
Embodiments of the present application also disclose one or more machine readable media having executable code stored thereon that, when executed, cause a processor to perform the data acquisition method as described in one or more of the embodiments above.
Compared with the prior art, the embodiment of the application has the following advantages:
in the embodiment of the application, the data configuration information is analyzed according to the target database to obtain the instantiated data configuration information corresponding to the target database, then a data acquisition task is generated according to the instantiated data configuration information, and the data to be audited is acquired from the target database according to the data acquisition task. According to the scheme, the general data configuration information is correspondingly analyzed according to the category of the target database to generate the instantiation data configuration information corresponding to the target database, and a data acquisition task is generated according to the instantiation data configuration information to acquire the data to be audited. The method and the device can generate corresponding tasks to acquire the data to be audited according to the analysis data configuration information corresponding to the categories of the database, have strong universality and are more convenient to acquire the data to be audited.
Drawings
FIG. 1 is a block diagram of a data quality auditing system according to an embodiment of the present application;
FIG. 2 is a block diagram of a data quality auditing system of an embodiment of the present application;
FIG. 3 is a flow chart of a data acquisition method according to an embodiment of the present application;
FIG. 4 is a flowchart of the processing steps of the data configuration parsing module of one embodiment of the present application;
FIG. 5 is a flowchart of the processing steps of a data parser in one embodiment of the present application;
FIG. 6 is a flow chart of a data acquisition method of one embodiment of the present application;
FIG. 7 is a flow chart of a data acquisition method according to another embodiment of the present application;
FIG. 8 is a schematic diagram of a data acquisition device according to an embodiment of the present application;
FIG. 9 is a schematic diagram of a data acquisition device according to another embodiment of the present application;
fig. 10 is a schematic structural view of an exemplary device according to an embodiment of the present application.
Detailed Description
In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description.
Fig. 1 shows an architecture diagram of a data quality auditing system according to an embodiment of the present application, where the data quality auditing system mainly includes a configuration server, an auditing server, a computing processing end and a database. The configuration server is used for defining relevant rules, data and the like for data quality audit; the auditing server side performs data auditing processing according to the data and rules configured by the configuration server side, for example, corresponding data configuration information and auditing rules are generated, the data configuration information can be understood as information related to auditing indexes Metrics, statistical indexes of different dimensions of the data can be determined, and accordingly a data acquisition task is generated based on the corresponding indexes to acquire corresponding index values as data to be audited. The auditing server side generates a data acquisition task aiming at the target database according to the data configuration information corresponding to the auditing rule and the corresponding target database, then sends the data acquisition task to the computing processing side, and the computing processing side executes the data acquisition task to acquire and analyze the stored data from the corresponding target database to obtain the data to be audited. The computing processing end sends the data to be audited to the auditing server end, and the auditing server end carries out data auditing on the data to be audited according to auditing rules. In some examples, the configuration server, the auditing server, the calculation processing end and the database may be different processing ends, in other examples, the configuration server and the auditing server may be the same processing end, for example, the configuration server and the auditing server may be the same service end, after the configuration is completed by the service end, the relevant processing of the data quality auditing may be performed based on the configuration, and if the calculation processing end and the database are the same processing end, the database may obtain the task and then extract the data and calculate to obtain the corresponding data to be audited, which may be specifically set according to the actual requirement.
Fig. 2 is a block diagram of a data quality audit system according to an embodiment of the present application, and in conjunction with fig. 2, the data quality audit system according to a specific embodiment is further described below, where, as shown in fig. 2, the data quality audit system is divided according to a hierarchy, and may include a capability layer, a calculation layer, and a storage layer.
The capability layer is used for providing various capabilities required by the auditing process and can comprise a model definition module, a rule engine and an execution engine. The model definition module is used for defining the storage rules of the database of the storage layer; the rule engine is used for defining the data to be audited and the auditing rules; the execution engine is used for acquiring the data to be audited and carrying out quality audit on the data to be audited according to the auditing rules. Specific:
The model definition module comprises a storage rule definition module and a data constraint module, wherein the storage rule definition module is used for defining a storage rule corresponding to a database, the data constraint module is used for determining a data constraint corresponding to stored data, and the data constraint comprises a main key, a partition, a data format and a data value range definition. For example, for data stored in the storage location a, the stored data is the size data of the clothes, the data constraint of the data is S, M, L, XL, and for other types of data, such as the brand of the clothes, the data constraint of the storage location a is not met, the storage location a is not stored.
The rule engine comprises a rule template module and a rule management module, the rule template module comprises a data definition module and a rule definition module, the data definition module is used for defining auditing indexes, and the auditing indexes comprise multiple types so as to correspond to different calculation engines.
The auditing index (metrics) may be understood as a data index for auditing, corresponding data to be audited may be determined based on the auditing index, the auditing index may be single return value class data and multiple return value class data, and the single return value class data may be mean, variance, standard deviation, maximum value, minimum value, null value data amount, and the like.
Among them, in order to be able to use unified metrics in different computing environments, the audit trail can be defined, which can be defined by an expression. The expression may include: meta name, meta expression, operator, and meta type.
Metrics expressions may be created in a variety of ways, for example, one example may be collectively represented as query SQL expressions, which are:
select${operator}from${table_name}where${where_expression}[group by column,column,…]。
Through unified SQL query grammar, various SQL-supported data sources can be supported. Operators are methods of computation for different metrics, which may be single or multiple functions acting on a field.
As another example, a Metrics expression may be uniformly represented as a user defined function (Userdefined function, UDF) expression, which may be, for example:
Metrics=[UDF]/SQL/FlinkAPIprogram/MR/…
"data amount of primary key non-unique =
select count(1)from(select${unique_key}from${table_name}${where_expression}group by${unique_key}having count(*)>1)t;"
"Data amount METRICSID =
select count(1)from${table_name}${where_expression};”。
The UDF is a function for custom computing Metrics, the function of SQL can be expanded, and the UDF function realizes the generation of Metrics of different data sources by integrating interfaces of multiple types of data sources. Wherein metrics definition may be as shown in table one.
List one
The meta definition in the above table one is only for illustrating various definition contents, and the actual process is not limited to the definition contents, and the same meta name is not limited to the definition contents in the table.
The plurality of return value class data can be discrete data such as product types (such as water cup and vacuum cup), product sizes and the like.
The rule is a standard for judging the data to be audited, the auditing index can be defined through the rule definition module, so that the quality judgment standard of the data to be audited is obtained, and the rule can be embodied in a rule expression mode, so that the corresponding rule definition module is obtained.
For example, one example rule expression may be:
[ UDF ] $ { METRICS ID } [ operator ] threshold/$ { METRICS ID }, wherein,
The operator is provided with an actual data value and an expected data value at two sides respectively, and the data to be checked corresponding to a certain Metrics index is compared with a threshold value, or the data to be checked corresponding to the Metrics index after function conversion is compared with the threshold value, so that whether the data to be checked is consistent with the expected value is judged, and whether the data to be checked meets the rule requirement is verified, and the quality of the data to be checked is checked.
For example, a rule that the data ring ratio fluctuation is less than 5% is defined as follows:
Function_same_period_ratio (function_cur (data volume METRICSID), function_last (data volume METRICSID)) <0.05.
UDF is a custom function acting on Metrics to implement secondary computation on Metrics or selective filtering of Metrics, including: a volatility function, a select specified time metrics value, an inclusion function, etc.
The operators comprise a comparison operator, a conditional operator, a set operator and the like, and the operators are used for performing operations between metrics and thresholds or operations between metrics, and the operators are shown as a table II.
Watch II
The second table is only for illustrating an operator, and the actual processing is not limited to the operator.
In other examples, whether the rule result is correct or not can also be judged through comparison between data to be checked corresponding to the Metrics, or comparison between data to be checked corresponding to the Metrics after the Metrics are converted by the function Metrics and data to be checked corresponding to the Metrics, so that data comparison of different tables or different tables stored in the same table can be realized. And judging whether the values of the Metrics (the data to be audited) are consistent or not through the comparison between the data to be audited corresponding to the Metrics indexes or the comparison between the data to be audited corresponding to the Metrics indexes after the function conversion, so that the data comparison of different tables or different storage tables is realized, and the method is an auditing mode of consistency rules between the data.
The rule management module comprises a data configuration module, a rule configuration module and a grammar checking module, wherein the rule configuration module is used for configuring rules, the rules of rule configuration comprise general rules and custom rules, the general rules are configured according to the rules defined by the rule definition module and the data of the data definition module, and the custom rules can be configured according to the uploaded custom rules. The rule configuration module configures the defined indexes to be audited and rules, and the data configuration module configures the auditing indexes according to categories to generate data configuration information which corresponds to different types of databases and is used for acquiring the data to be audited, wherein the data configuration information comprises a metrics expression to be calculated, an operator, a metrics type, a model storage type and the like. The grammar check is used for checking whether the custom rule meets the requirement.
The process of the execution engine for data quality auditing is mainly used for executing data configuration analysis, rule calculation and health degree calculation, and specifically comprises the following steps of:
Data configuration analysis: firstly, acquiring data configuration information configured by a data configuration module, and determining target data configuration information corresponding to a target database. And then determining a corresponding data analyzer, analyzing the target data configuration information by utilizing the data analyzer, and instantiating the target data configuration information. And then generating a data acquisition task according to the resource configuration corresponding to the target database by using a task generator. And sending the data acquisition task to a calculation processing end of the calculation layer, and acquiring the data to be audited from the target database in the storage layer through one or more calculation processing ends in the calculation layer. The computing processing end can be correspondingly selected and configured according to requirements, for example, a corresponding computing processing end is selected according to the type of the database.
Rule analysis: and acquiring the rule configured by the rule configuration module and analyzing to obtain the auditing rule.
Rule calculation: the data to be audited is subjected to rule calculation according to the audit rules obtained by rule analysis, specifically, the data to be audited can be compared with a threshold value, and the data to be audited in different types of databases can be compared to obtain a calculation result.
And (3) calculating the health degree: the method can be understood as an auditing process, and the health degree of the data to be audited can be obtained according to the calculation results of a plurality of rules, so as to complete the auditing of the quality of the data to be audited.
For example, the rule calculation and health calculation process flows are as follows:
1. A quality rule expression of the data model is obtained.
2. METRICSID in the regular expression is replaced with a meta value of the data that needs to be audited, which is generated according to the meta definition.
3. The rule expression of step 2 is an executable expression, and the calculation of the expression is performed to generate a boolean value, which is the rule result.
For example, model Table1 is day partition data, and a rule of day data amount fluctuation is configured, and a rule expression of day data ring ratio fluctuation of less than 5% is as follows:
functionRatio (functionSelect (data amount METRICSID, 0), functionSelect (data amount METRICSID, -1)) <0.05;
the rule expression is translated into based on the above rule expression and the result of data volume METRICSID:
Function_same_period_ratio (5000,4000) <0.05. Wherein,
Function_cur (data amount METRICSID) is a value of the acquired data amount METRICSID on the day: 5000.function_last (data amount METRICSID) is a value of the data amount METRICSID of the previous day of acquisition: 4000.
Function_same_period_ratio is calculated as a Function of the loop ratio, the formula is (5000-4000)/4000, resulting in 0.25. It can be seen that the result of this expression settlement is false, and the rule does not pass. The daily data ring ratio fluctuation does not accord with the rule, the data health degree does not accord with the requirement, and the audit does not pass.
Fig. 3 is a flow chart of a data acquisition method according to an embodiment of the application, as shown in fig. 3, the data acquisition method includes:
step 302, obtaining data configuration information corresponding to a target database, wherein the data configuration information is used for determining data to be audited in the target database.
In particular, data configuration information may be understood as information including Metrics expressions, operators, metrics types, and model types. The Metrics expression can be understood as an expression for acquiring an operator, the operator can be understood as an audit index, the audit index can be a maximum value, a minimum value, a variance, a mean value and other indexes, the Metrics type corresponds to the calculation processing end, and the model type corresponds to the storage type of the database. Before the data configuration information is acquired, determining the model type of the target database, and determining the corresponding data configuration information according to the model type of the target database. When acquiring the data configuration information, the data configuration information of a plurality of different Metrics types corresponding to the target database storage type can be acquired according to the Metrics types.
And step 304, analyzing the data configuration information according to the target database to obtain the instantiation data configuration information corresponding to the target database.
Optionally, in step 302, according to the target database, the data configuration information is parsed to obtain the instantiated data configuration information corresponding to the target database, which includes:
And selecting the data configuration information of the corresponding type as target data configuration information according to the target database and the target computing processing end, wherein the target computing processing end is used for executing a data acquisition task to obtain data to be audited.
And analyzing the target data configuration information according to the target database to obtain the instantiation data configuration information.
Specifically, the corresponding target data configuration information is determined according to the target database and the corresponding target computing processing end, and the target database may correspond to one or more target computing processing ends. And under the condition that the target database corresponds to one target computing processing end, determining target data configuration information corresponding to the target computing processing end according to the type of the data configuration information from more than one data configuration information according to the target computing processing end. Under the condition that the target database corresponds to a plurality of target computing processing ends, one of the target computing processing ends can be selected to determine corresponding target data configuration information, the target data configuration information is analyzed to generate instantiation data configuration information, a data acquisition task is generated according to the instantiation data configuration information, and data to be audited is acquired from the target database through the selected target computing processing end. Under the condition that the target database corresponds to a plurality of target computing processing ends, an optimal computing processing end corresponding to the data configuration information can be preset, the target computing processing ends are independently matched with different data configuration information according to the corresponding relation between the data configuration information and the computing processing ends, and a data acquisition task is executed through the target computing processing ends so as to acquire data to be audited from the target database. Under the condition that the target database corresponds to a plurality of target computing processing ends, the target computing processing ends can be determined according to the task quantity conditions of the target computing processing ends, namely, the target computing processing ends with small task quantity can be preferentially adopted for data acquisition.
Optionally, as an embodiment, analyzing the target data configuration information according to the target database to obtain the instantiated data configuration information, which specifically includes:
and converting the types of the universal operators in the configuration information of the target data according to the storage types of the target database to obtain instance operators, wherein the instance operators are used for determining the data to be audited.
And obtaining a target storage position of the instance operator according to the storage rule of the target database so as to generate the instantiation data configuration information.
Specifically, the instance operator corresponds to the Metrics index, and the type of the universal operator in the target data configuration information is converted according to the storage type of the target database to obtain the instance operator, wherein the instance operator is used for acquiring the data in the target database. After determining the instance operator, obtaining a target storage location of the instance operator according to a storage rule of the target database, and then obtaining the configuration information of the instantiation data, specifically, the process of obtaining the configuration information of the instantiation data according to the target storage location can be understood as replacing the table names and the field names of the metadata expression with the corresponding table names and field names in the target database.
The data acquisition method shown in fig. 3 further includes:
step 306, generating a data acquisition task according to the instantiated data configuration information, wherein the data acquisition task is used for acquiring data to be audited from the target database.
Specifically, according to the instantiated data configuration information, a data acquisition task is generated, and the data acquisition task is used for being executed by the target computing processing end to acquire data to be audited from the target database.
In this embodiment, the data configuration information is parsed according to the target database to obtain instantiated data configuration information corresponding to the target database, and then a data acquisition task is generated according to the instantiated data configuration information, and data to be audited is acquired from the target database according to the data acquisition task. According to the scheme of the embodiment, the general data configuration information is correspondingly analyzed according to the category of the target database to generate the instantiation data configuration information corresponding to the target database, and a data acquisition task is generated according to the instantiation data configuration information to acquire the data to be audited. The data to be audited can be obtained only according to the analysis data configuration information corresponding to the types of the databases, and different data obtaining tasks are not required to be set for the databases corresponding to different types, so that the data to be audited can be obtained more conveniently.
Optionally, in step 306, according to the instantiated data configuration information, a data acquisition task is generated, including:
And combining the more than one instantiation data configuration information according to the combination rule.
And generating a data acquisition task according to the merged instantiation data configuration information.
Specifically, after the instantiation data configuration information is generated, the instantiation data configuration information conforming to the merging rule is merged according to the merging rule, and then a data acquisition task is generated by utilizing the merged instantiation data configuration information. The unified type of instantiation data configuration information can be the same statement structure, the same statement such as the data range corresponding to the required target database, and the like, for example, the instantiation data configuration information with the same content except the operators can be obtained, the operators can be combined, and therefore the combined instantiation data configuration information can be used for calculating all the combined operators. As one example, a merge rule may be set to merge instantiated data configuration information for the same metadata expression of the same metadata type corresponding to the same database of the same type. As another example, the merge rule may also be set to merge instantiation data configuration information corresponding to the same type database, the same metadata type, the same metadata expression, and the same table name in the metadata expression.
In this embodiment, by merging the instantiation data configuration information and generating the data acquisition task by using the merged instantiation data configuration information, compared with a mode that the instantiation data configuration information is not merged, the number of the data acquisition tasks generated in the scheme of this embodiment is small, that is, the number of the data acquisition tasks executed by the target computing processing end is reduced, so that the data to be audited can be acquired more conveniently.
Optionally, in step 306, according to the instantiated data configuration information, a data acquisition task is generated, which specifically includes:
and generating a data acquisition task according to the instantiated data configuration information and the data range of the data to be audited.
Specifically, after the instantiation data configuration information is generated, a data acquisition task is generated according to a data range of the data to be audited, wherein the data range of the data to be audited can be a time range, a size range of the data occupying storage, a data source and the like.
For example, when the data range is a time range, a data acquisition task is generated according to the instantiated data configuration information, the time range and the current time, and the data acquisition task is used for acquiring the data to be audited from the storage position of the data to be audited according to the time range and the current time.
Optionally, as an embodiment, the method shown in fig. 3 further includes:
and sending the data acquisition task to a target computing processing end corresponding to the instantiation data configuration information, wherein the target computing processing end is used for executing the data acquisition task to obtain the data to be audited.
And obtaining data to be audited.
Specifically, after generating the data acquisition task, the data acquisition task is sent to a target computing processing end corresponding to the instantiated data configuration information, and the target computing processing end is utilized to execute the data acquisition task so as to acquire the data to be audited. And then obtaining the data to be audited sent by the target computing processing end. The data acquisition tasks are sent to the corresponding calculation processing ends to be executed to acquire the data to be audited, so that the tasks of data acquisition can be completed by utilizing different calculation processing ends, the calculation of various calculation processing ends can be flexibly supported, and the requirements on the storage capacity and the calculation capacity of the auditing server end are low.
Optionally, as an embodiment, the method shown in fig. 3 further includes:
And obtaining auditing rules corresponding to the data configuration information.
And auditing the data to be audited according to the auditing rules to obtain auditing results.
Specifically, the auditing rule corresponds to data configuration information, the auditing rule can be understood as a rule expression, the obtained data to be audited is substituted into the rule expression through the rule, and the data quality of the data to be audited is obtained according to whether the rule expression is established or not. Specifically, if the rule expression is satisfied, the data quality of the data to be audited is good, and if the rule expression is not satisfied, the data quality of the data to be audited is poor.
In this embodiment, the data quality audit is divided into data configuration information corresponding to the data to be audited and audit rules corresponding to the rules. And acquiring data to be checked in the target database through the data configuration information, and judging the quality of the data to be checked through the checking rules. The data configuration information is correspondingly analyzed and converted according to the types of the target databases, and can be suitable for various types of target databases, so that the data to be checked can be acquired more conveniently.
FIG. 4 is a flowchart illustrating the processing steps of the data configuration parsing module according to an embodiment of the present application, where the processing flow of the data configuration parsing module shown in FIG. 4 mainly includes:
After the target data configuration information corresponding to the target database is determined, the target data configuration information is analyzed by a data analyzer corresponding to the target data configuration information, and the target data configuration information is instantiated to obtain the storage position of the data to be audited in the target database. The data resolvers comprise a first resolver, a second resolver, a third resolver and the like, and different data resolvers can be set and selected according to requirements, such as determination according to different database types, types of data to be audited and the like.
And generating a data acquisition task according to the data range of the data to be audited by using a task generator corresponding to the analyzed target data configuration information. For example, the data range may be a time range, such as the acquired data range is the data in a period corresponding to the current time of yesterday to the current time of today. The task generator generates a data acquisition task according to the parameter information, which is the time range, the current time and the parsed target data configuration information. The task generator comprises a first task generator corresponding to the first resolver, a second task generator corresponding to the second resolver and a third task generator corresponding to the third resolver.
And executing the data acquisition task by using a computing processing end corresponding to the data acquisition task so as to acquire data from the target database and analyze the data to obtain the data to be audited. After the data to be audited is obtained, the data to be audited is calculated according to auditing rules, the health degree of the data is obtained, and the data quality audit is completed.
FIG. 5 is a flowchart illustrating the processing steps of a data parser according to one embodiment of the present application, wherein the data parser includes a first type parser, a second type parser, and a third type parser, which is specifically described below with reference to FIG. 5. Wherein,
For a first class resolver, the processing flow of the first class resolver includes:
And acquiring data configuration information, then determining whether a model storage rule exists locally, and if the model storage rule does not exist, subscribing the model storage rule to acquire the model storage rule. The data configuration information may be understood as information including Metrics expressions, operators, metrics types, and model types, among others.
And according to the storage type of the database, performing type conversion on the general operator in the data configuration information, and converting the general operator into an example operator which can be executed by the target computing processing end and is used for determining the data to be audited. And then instantiating the Metrics expression in the data configuration information according to the model storage rule to obtain the instantiated data configuration information. For example, the type conversion for an operator may be:
the operators corresponding to the data configuration information are: group _ concat ($ { COLUMN _ NAME }),
The example operators after conversion are: concat_ws (',', collect _set ($ { column_name ])).
In addition, the process of instantiating the metadata expressions in the data configuration information according to the storage rules can be understood as: the TABLE NAMEs and field NAMEs of the Metrics expression, i.e., $ { TABLE_NAME } and $ { COLUMN_NAME } are replaced with the TABLE NAMEs and field NAMEs to be calculated.
After the data configuration information is instantiated to obtain instantiated data configuration information, operators in the instantiated data configuration information belonging to the same type are merged according to a merging rule, so that merged instantiated data configuration information is obtained and is used as first instantiated data configuration information. The unified type of instantiation data configuration information can be the same statement structure, the same statement such as the data range corresponding to the required target database, and the like, for example, the instantiation data configuration information with the same content except the operators can be obtained, the operators can be combined, and therefore the combined instantiation data configuration information can be used for calculating all the combined operators.
For the second class of resolvers, the processing flow of the second class of resolvers comprises:
And acquiring the data configuration information, and then acquiring a model definition rule, and instantiating the data configuration information according to the model type and the model storage rule of the target database to generate second instantiated data configuration information.
For the third class of resolvers, the processing flow of the third class of resolvers includes:
And acquiring the data configuration information, and then acquiring a model definition rule, and instantiating the data configuration information according to the model type and the model storage rule of the target database to generate third instantiated data configuration information.
The following describes a data quality auditing method according to an embodiment with reference to fig. 6, and specifically, the data quality auditing method includes:
Step 601, determining data configuration information corresponding to a target database. The types of the data configuration information can be set into a plurality of types, and the data configuration information of the plurality of types corresponds to different types of target databases respectively. When the data to be audited in the target database is obtained, determining data configuration information corresponding to the type of the target database according to the type of the target database. In particular, data configuration information may be understood as information including Metrics expressions, operators, metrics types, and model types. The Metrics expression can be understood as an expression for acquiring an operator, the operator can be understood as an audit index, the audit index can be a maximum value, a minimum value, a variance, a mean value and other indexes, the Metrics type corresponds to the calculation processing end, and the model type corresponds to the storage type of the database.
Step 602, determining target data configuration information according to a calculation processing end corresponding to the target database. Each type of data configuration information comprises a plurality of data configuration information corresponding to different computing processing ends, and the target data configuration information is determined from more than one data configuration information according to the computing processing ends corresponding to the target database. Specifically, one or more computing processing ends corresponding to the target database may be provided. And under the condition that the computing processing ends corresponding to the target database are only one, determining corresponding target data configuration information according to the computing processing ends corresponding to the target database. Under the condition that a plurality of calculation processing ends corresponding to the target database are provided, the target data configuration information can be determined according to one calculation processing end of the plurality of calculation processing ends.
Step 603, analyzing the target data configuration information according to the target database, and generating instantiation data configuration information. Specifically, according to the storage type of the target database, performing type conversion on operators in the target data configuration information to obtain instance operators; and obtaining the target storage position of the instance operator according to the storage rule of the target database to obtain the configuration information of the instantiation data. Specifically, the process of obtaining the configuration information of the instantiated data according to the target storage location may be understood as replacing the table names and field names of the metadata expression with the corresponding table names and field names in the target database.
Step 604, generating a data acquisition task according to the instantiated data configuration information and the data range of the data to be audited. The data range of the data to be audited can be a time range, a size range of the data occupied by storage, a data source and the like. For example, when the data range is a time range, a data acquisition task is generated according to the instantiated data configuration information, the time range and the current time, and the data acquisition task is used for acquiring an instance operator from a target storage position of the data to be audited according to the time range and the current time.
Step 605, the data acquisition task is sent to the corresponding computing processing end. Only one computing processing end is shown in the figure, but in other examples, multiple data acquisition tasks may be generated, and multiple data acquisition tasks may correspond to multiple different types of computing processing ends. When the data acquisition tasks are sent, the data acquisition tasks can be respectively sent to the corresponding computing processing ends.
And 606, the calculation processing end executes a data acquisition task, acquires data corresponding to an instance operator from a target storage position in a target database according to the data range of the data to be audited, and analyzes the data to obtain the data to be audited.
Step 607, the auditing server receives the data to be audited sent by the computing processing end.
And 608, the auditing server side performs quality auditing on the data to be audited according to the auditing rules corresponding to the data configuration information. Specifically, the process of auditing the data to be audited by the auditing server may refer to the process of rule parsing, rule calculation and health calculation in the execution engine in fig. 2. The auditing server calculates the data to be audited according to the auditing rules to obtain a calculation result, and the quality auditing of the data to be audited is completed according to the calculation result. The method for comparing and calculating the data to be audited comprises the step of comparing and calculating the data to be audited with a threshold value to determine the data quality of the data to be audited.
In one embodiment, the present embodiment provides a data acquisition method, referring to fig. 7, the data acquisition method includes:
step 702, obtaining data configuration information corresponding to a target data source, where the data configuration information is used to determine data to be audited in the target data source.
Step 704, according to the target data source, analyzing the data configuration information to obtain the instantiated data configuration information corresponding to the target data source.
Step 706, generating a data acquisition task according to the instantiated data configuration information, where the data acquisition task is used to acquire the data to be audited from the target data source.
Specifically, the data source in this embodiment includes various objects storing data, such as databases, excel files, and the like. The steps of the embodiments of the present application and other steps associated therewith are similar to the corresponding steps of the embodiments described above, and specific reference may be made to the description of the embodiments described above. And will not be described in detail herein.
It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the application.
Referring to fig. 8, a block diagram of an embodiment of a data acquisition device of the present application is shown, and may specifically include the following modules:
A configuration information obtaining module 802, configured to obtain data configuration information corresponding to a target database, where the data configuration information is used to determine data to be audited in the target database;
The parsing module 804 is configured to parse the data configuration information according to the target database, so as to obtain instantiated data configuration information corresponding to the target database;
The task generating module 806 is configured to generate a data acquisition task according to the instantiated data configuration information, where the data acquisition task is configured to acquire the data to be audited from the target database.
In summary, according to the target database, the data configuration information is analyzed to obtain the instantiation data configuration information of the corresponding target database, then a data acquisition task is generated according to the instantiation data configuration information, and the data to be audited is acquired from the target database according to the data acquisition task. According to the scheme of the embodiment, the general data configuration information is correspondingly analyzed according to the category of the target database to generate the instantiation data configuration information corresponding to the target database, and a data acquisition task is generated according to the instantiation data configuration information to acquire the data to be audited. The data to be audited can be obtained only according to the analysis data configuration information corresponding to the types of the databases, and different data obtaining tasks are not required to be set for the databases corresponding to different types, so that the data to be audited can be obtained more conveniently.
Optionally, as an embodiment, the parsing module 804 includes:
And the screening sub-module is used for selecting the data configuration information of the corresponding type as target data configuration information according to the target database and the target calculation processing end, and the target calculation processing end is used for executing the data acquisition task to obtain the data to be audited.
And the analysis sub-module is used for analyzing the target data configuration information according to the target database to obtain the instantiation data configuration information.
Optionally, as an embodiment, the parsing sub-module includes:
The operator analysis sub-module is used for converting the types of the universal operators in the target data configuration information according to the storage types of the target database to obtain instance operators, and the instance operators are used for determining the data to be audited.
And the storage position analysis sub-module is used for obtaining the target storage position of the instance operator according to the storage rule of the target database so as to generate the instantiation data configuration information.
Optionally, as an embodiment, the task generating module 806 includes:
and the merging sub-module is used for merging more than one instantiation data configuration information according to the merging rule.
And the task generating sub-module is used for generating the data acquisition task according to the merged instantiation data configuration information.
Optionally, as an embodiment, the task generating module 806 specifically includes:
And generating the data acquisition task according to the instantiation data configuration information and the data range of the data to be audited.
Optionally, as an embodiment, the apparatus further includes:
The task sending module is used for sending the data acquisition task to a target computing processing end corresponding to the instantiation data configuration information, and the target computing processing end is used for executing the data acquisition task to obtain the data to be audited.
And the data acquisition module is used for acquiring the data to be audited.
Optionally, as an embodiment, the apparatus further includes:
and the rule acquisition module is used for acquiring the auditing rule corresponding to the data configuration information.
And the auditing processing module is used for auditing the data to be audited according to the auditing rules so as to obtain auditing results.
On the basis of the above embodiment, this embodiment further provides a data acquisition device, referring to fig. 9, the device includes:
The configuration information obtaining module 902 is configured to obtain data configuration information corresponding to a target data source, where the data configuration information is used to determine data to be audited in the target data source.
And the analysis processing module 904 is configured to analyze the data configuration information according to the target data source, so as to obtain instantiated data configuration information corresponding to the target data source.
The task obtaining module 906 is configured to generate a data obtaining task according to the instantiated data configuration information, where the data obtaining task is configured to obtain the data to be audited from the target data source.
The embodiment of the application also provides a non-volatile readable storage medium, where one or more modules (programs) are stored, where the one or more modules are applied to a device, and the instructions (instructions) of each method step in the embodiment of the application may cause the device to execute.
Embodiments of the application provide one or more machine-readable media having instructions stored thereon that, when executed by one or more processors, cause an electronic device to perform a method as described in one or more of the above embodiments. In the embodiment of the application, the electronic equipment comprises various types of equipment such as terminal equipment, servers (clusters) and the like.
Embodiments of the present disclosure may be implemented as an apparatus for performing a desired configuration using any suitable hardware, firmware, software, or any combination thereof, which may include electronic devices such as terminal devices, servers (clusters), etc. Fig. 10 schematically illustrates an exemplary apparatus 1000 that may be used to implement various embodiments described in the present disclosure.
For one embodiment, fig. 10 illustrates an example apparatus 1000 having one or more processors 1002, a control module (chipset) 1004 coupled to at least one of the processor(s) 1002, a memory 1006 coupled to the control module 1004, a non-volatile memory (NVM)/storage 1008 coupled to the control module 1004, one or more input/output devices 1010 coupled to the control module 1004, and a network interface 1012 coupled to the control module 1004.
The processor 1002 may include one or more single-core or multi-core processors, and the processor 1002 may include any combination of general-purpose or special-purpose processors (e.g., graphics processors, application processors, baseband processors, etc.). In some embodiments, the apparatus 1000 can be used as a terminal device, a server (cluster), or the like in the embodiments of the present application.
In some embodiments, the apparatus 1000 can include one or more computer-readable media (e.g., memory 1006 or NVM/storage 1008) having instructions 1014 and one or more processors 1002 in combination with the one or more computer-readable media configured to execute the instructions 1014 to implement the modules to perform the actions described in this disclosure.
For one embodiment, the control module 1004 may include any suitable interface controller to provide any suitable interface to at least one of the processor(s) 1002 and/or any suitable device or component in communication with the control module 1004.
The control module 1004 may include a memory controller module to provide an interface to the memory 1006. The memory controller modules may be hardware modules, software modules, and/or firmware modules.
Memory 1006 may be used to load and store data and/or instructions 1014 for device 1000, for example. For one embodiment, the memory 1006 may include any suitable volatile memory, such as a suitable DRAM. In some embodiments, the memory 1006 may comprise a double data rate type four synchronous dynamic random access memory (DDR 4 SDRAM).
For one embodiment, the control module 1004 may include one or more input/output controllers to provide an interface to the NVM/storage 1008 and the input/output device(s) 1010.
For example, NVM/storage 1008 may be used to store data and/or instructions 1014. NVM/storage 1008 may include any suitable nonvolatile memory (e.g., flash memory) and/or may include any suitable nonvolatile storage device(s) (e.g., one or more Hard Disk Drives (HDDs), one or more Compact Disc (CD) drives, and/or one or more Digital Versatile Disc (DVD) drives).
NVM/storage 1008 may include storage resources that are physically part of the device on which apparatus 1000 is installed, or may be accessible by the device without necessarily being part of the device. For example, NVM/storage 1008 may be accessed over a network via input/output device(s) 1010.
Input/output device(s) 1010 may provide an interface for apparatus 1000 to communicate with any other suitable device, input/output device 1010 may include communication components, audio components, sensor components, and the like. Network interface 1012 may provide an interface for device 1000 to communicate over one or more networks, and device 1000 may communicate wirelessly with one or more components of a wireless network according to any of one or more wireless network standards and/or protocols, such as accessing a wireless network based on a communication standard, such as WiFi, 2G, 3G, 4G, 5G, etc., or a combination thereof.
For one embodiment, at least one of the processor(s) 1002 may be packaged together with logic of one or more controllers (e.g., memory controller modules) of the control module 1004. For one embodiment, at least one of the processor(s) 1002 may be packaged together with logic of one or more controllers of the control module 1004 to form a System In Package (SiP). For one embodiment, at least one of the processor(s) 1002 may be integrated on the same mold as logic of one or more controllers of the control module 1004. For one embodiment, at least one of the processor(s) 1002 may be integrated on the same die with logic of one or more controllers of the control module 1004 to form a system on chip (SoC).
In various embodiments, the apparatus 1000 may be, but is not limited to being: a server, a desktop computing device, or a mobile computing device (e.g., a laptop computing device, a handheld computing device, a tablet, a netbook, etc.), among other terminal devices. In various embodiments, device 1000 may have more or fewer components and/or different architectures. For example, in some embodiments, the apparatus 1000 includes one or more cameras, a keyboard, a Liquid Crystal Display (LCD) screen (including a touch screen display), a non-volatile memory port, multiple antennas, a graphics chip, an Application Specific Integrated Circuit (ASIC), and a speaker.
The detection device can adopt a main control chip as a processor or a control module, sensor data, position information and the like are stored in a memory or an NVM/storage device, a sensor group can be used as an input/output device, and a communication interface can comprise a network interface.
For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the application.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or terminal device that comprises the element.
The foregoing has described in detail a data acquisition method and apparatus, an electronic device and a storage medium, and specific examples have been provided herein to illustrate the principles and embodiments of the present application, the above examples being provided only to assist in understanding the method and core idea of the present application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (10)

1. A method of data acquisition, comprising:
Determining a model type of a target database, and acquiring data configuration information corresponding to the target database according to the model type of the target database, wherein the data configuration information is used for determining data to be audited in the target database, the data configuration information comprises audit indexes of a meta expression, an operator, the meta type and information of the model type, the meta expression is used for acquiring the expression of the operator, the operator is an audit index, the meta type corresponds to a calculation processing end, and the model type corresponds to a storage type of the database;
Selecting corresponding types of data configuration information as target data configuration information according to the target database and a target computing processing end, wherein the target computing processing end is used for executing the data acquisition task to obtain the data to be audited;
According to the storage type of the target database, converting the type of a universal operator in the target data configuration information to obtain an instance operator, wherein the instance operator is used for determining the data to be audited;
obtaining a target storage position of the instance operator according to the storage rule of the target database to generate the instantiation data configuration information;
generating a data acquisition task according to the instantiation data configuration information, wherein the data acquisition task is used for acquiring the data to be audited from the target database.
2. The method of claim 1, wherein generating a data acquisition task based on the instantiated data configuration information comprises:
combining the more than one instantiation data configuration information according to a combining rule;
and generating the data acquisition task according to the merged instantiation data configuration information.
3. The method of claim 1, wherein generating a data acquisition task based on the instantiated data configuration information comprises:
And generating the data acquisition task according to the instantiation data configuration information and the data range of the data to be audited.
4. The method as recited in claim 1, further comprising:
the data acquisition task is sent to a target computing processing end corresponding to the instantiation data configuration information, and the target computing processing end is used for executing the data acquisition task to obtain the data to be audited;
and acquiring the data to be audited.
5. The method as recited in claim 1, further comprising:
Obtaining auditing rules corresponding to the data configuration information;
and auditing the data to be audited according to the auditing rules to obtain auditing results.
6. A method of data acquisition, comprising:
Determining a model type of a target data source, and acquiring data configuration information corresponding to the target data source according to the model type of the target data source, wherein the data configuration information is used for determining data to be audited in the target data source, the data configuration information comprises audit indexes of a meta expression, an operator, the meta type and information of the model type, the meta expression is used for acquiring the expression of the operator, the operator is an audit index, the meta type corresponds to a calculation processing end, and the model type corresponds to a storage type of the data source;
Selecting corresponding types of data configuration information as target data configuration information according to the target data source and the target computing processing end, wherein the target computing processing end is used for executing the data acquisition task to obtain the data to be audited;
according to the storage type of the target data source, converting the type of a universal operator in the target data configuration information to obtain an instance operator, wherein the instance operator is used for determining the data to be audited;
Obtaining a target storage position of the instance operator according to the storage rule of the target data source so as to generate the instantiation data configuration information;
Generating a data acquisition task according to the instantiation data configuration information, wherein the data acquisition task is used for acquiring the data to be audited from the target data source.
7. A data acquisition device, the device comprising:
the configuration information acquisition module is used for determining the model type of the target database, acquiring data configuration information corresponding to the target database according to the model type of the target database, wherein the data configuration information is used for determining data to be audited in the target database, the data configuration information comprises audit indexes of a meta expression, an operator, the information of the meta type and the information of the model type, the meta expression is used for acquiring the expression of the operator, the operator is an audit index, the meta type corresponds to a calculation processing end, and the model type corresponds to the storage type of the database;
The analysis module is used for selecting corresponding types of data configuration information as target data configuration information according to the target database and the target calculation processing end, and the target calculation processing end is used for executing the data acquisition task to obtain the data to be audited; according to the storage type of the target database, converting the type of a universal operator in the target data configuration information to obtain an instance operator, wherein the instance operator is used for determining the data to be audited; obtaining a target storage position of the instance operator according to the storage rule of the target database to generate the instantiation data configuration information;
The task generating module is used for generating a data acquisition task according to the instantiation data configuration information, and the data acquisition task is used for acquiring the data to be audited from the target database.
8. A data acquisition device, the device comprising:
the configuration information acquisition module is used for determining the model type of the target data source, acquiring data configuration information corresponding to the target data source according to the model type of the target data source, wherein the data configuration information is used for determining data to be audited in the target data source, the data configuration information comprises audit index Metrics expressions, operators, metrics types and information of the model types, the Metrics expressions are used for acquiring the expressions of the operators, the operators are audit indexes, the Metrics types correspond to the calculation processing end, and the model types correspond to the storage types of the data source;
The analysis processing module is used for selecting corresponding types of data configuration information as target data configuration information according to the target data source and the target calculation processing end, and the target calculation processing end is used for executing the data acquisition task to obtain the data to be audited; according to the storage type of the target data source, converting the type of a universal operator in the target data configuration information to obtain an instance operator, wherein the instance operator is used for determining the data to be audited; obtaining a target storage position of the instance operator according to the storage rule of the target data source so as to generate the instantiation data configuration information;
The task obtaining module is used for generating a data obtaining task according to the instantiation data configuration information, and the data obtaining task is used for obtaining the data to be audited from the target data source.
9. An electronic device, comprising: a processor; and
Memory having executable code stored thereon that, when executed, causes the processor to perform the data acquisition method of one or more of claims 1-6.
10. One or more machine readable media having executable code stored thereon that, when executed, causes a processor to perform the data acquisition method of one or more of claims 1-6.
CN201911320945.0A 2019-12-19 2019-12-19 Data acquisition method, device, equipment and storage medium Active CN113010488B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911320945.0A CN113010488B (en) 2019-12-19 2019-12-19 Data acquisition method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911320945.0A CN113010488B (en) 2019-12-19 2019-12-19 Data acquisition method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113010488A CN113010488A (en) 2021-06-22
CN113010488B true CN113010488B (en) 2024-05-28

Family

ID=76381390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911320945.0A Active CN113010488B (en) 2019-12-19 2019-12-19 Data acquisition method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113010488B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105931016A (en) * 2016-04-14 2016-09-07 北京思特奇信息技术股份有限公司 Automatic daily audit method and system
CN110543483A (en) * 2019-08-30 2019-12-06 北京百分点信息科技有限公司 Data auditing method and device and electronic equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9830366B2 (en) * 2008-03-22 2017-11-28 Thomson Reuters Global Resources Online analytic processing cube with time stamping
CN104573115B (en) * 2015-02-04 2019-03-22 北京慧辰资道资讯股份有限公司 Support the realization method and system of the integrated interface of multi-type database operation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105931016A (en) * 2016-04-14 2016-09-07 北京思特奇信息技术股份有限公司 Automatic daily audit method and system
CN110543483A (en) * 2019-08-30 2019-12-06 北京百分点信息科技有限公司 Data auditing method and device and electronic equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A Security Framework for Database Auditing System;Wang Huijie;IEEE;20180201;全文 *
基于JavaWeb和Android的基站发电稽核系统的设计;高磊;阳许军;陶方涛;陈亮;;电子设计工程;20160705(第13期);全文 *
高效分布式数据稽核系统实现方案;陈心咏;;信息通信;20160915(第09期);全文 *

Also Published As

Publication number Publication date
CN113010488A (en) 2021-06-22

Similar Documents

Publication Publication Date Title
CN113010502B (en) Data quality auditing method, device, equipment and storage medium
CN107908672B (en) Application report realization method, device and storage medium based on Hadoop platform
US8874600B2 (en) System and method for building a cloud aware massive data analytics solution background
WO2023060878A1 (en) Data query method and system, heterogeneous acceleration platform, and storage medium
US7676523B2 (en) Method and system for managing data quality
CN107168977B (en) Data query optimization method and device
Bockermann et al. The streams framework
CN109491989B (en) Data processing method and device, electronic equipment and storage medium
US20080263062A1 (en) Method and system for including data quality in data streams
US11314808B2 (en) Hybrid flows containing a continous flow
US20180218019A1 (en) Processing messages of a plurality of devices
CN106951231B (en) Computer software development method and device
US20110314060A1 (en) Markup language based query and file generation
US9171051B2 (en) Data definition language (DDL) expression annotation
CN106293891B (en) Multidimensional investment index monitoring method
US10157213B1 (en) Data processing with streaming data
US10007702B2 (en) Processing an input query
CN111125199B (en) Database access method and device and electronic equipment
CN113806429A (en) Canvas type log analysis method based on large data stream processing framework
CN108287876B (en) Power quality data service quality detection method and device supporting multiple formats
CN115952203B (en) Data query method, device, system and storage medium
CN113010488B (en) Data acquisition method, device, equipment and storage medium
CN107430633B (en) System and method for data storage and computer readable medium
CN104090895B (en) Obtain the method for radix, device, server and system
CN111475505A (en) Data acquisition method and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant