CN114328486A - Data quality checking method and device based on model - Google Patents

Data quality checking method and device based on model Download PDF

Info

Publication number
CN114328486A
CN114328486A CN202111600387.0A CN202111600387A CN114328486A CN 114328486 A CN114328486 A CN 114328486A CN 202111600387 A CN202111600387 A CN 202111600387A CN 114328486 A CN114328486 A CN 114328486A
Authority
CN
China
Prior art keywords
data
checking
checked
model
configuration information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111600387.0A
Other languages
Chinese (zh)
Inventor
秦晓宏
黄主斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Clinbrain Information Technology Co Ltd
Original Assignee
Shanghai Clinbrain Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Clinbrain Information Technology Co Ltd filed Critical Shanghai Clinbrain Information Technology Co Ltd
Priority to CN202111600387.0A priority Critical patent/CN114328486A/en
Publication of CN114328486A publication Critical patent/CN114328486A/en
Pending legal-status Critical Current

Links

Images

Abstract

The embodiment of the invention discloses a data quality checking method and device based on a model. The method comprises the following steps: acquiring data to be checked in a target database; inputting data to be checked into a data checking model for data checking, wherein the data checking model is created in advance based on checking configuration information, and the checking configuration information is obtained by configuring based on metadata corresponding to a target database; and determining a checking result corresponding to the data to be checked based on the output of the data checking model. By the technical scheme of the embodiment of the invention, the situation that the database resources are excessively occupied due to data checking can be avoided, the service pressure of the database is reduced, and the normal use of the database is ensured.

Description

Data quality checking method and device based on model
Technical Field
The embodiment of the invention relates to computer technology, in particular to a data quality checking method and device based on a model.
Background
With the rapid development of computer technology, quality checks on data stored in databases are often required. For example, the consistency, relevance, normalization, integrity and the like of the data are detected.
At present, a corresponding data check SQL statement is generally generated based on a specific structure of a database table, and data check is performed on the database by executing the SQL statement. It can be seen that this checking method requires frequent reading operations on the database, such as complex operations of data judgment, sorting, type conversion, logic judgment, etc., so that excessive database resources are occupied, the service pressure of the database is increased, and even the normal use of the database resources by the user is affected.
Disclosure of Invention
The embodiment of the invention provides a data quality checking method and device based on a model, which are used for avoiding the situation that the database resources are excessively occupied due to data checking, reducing the service pressure of a database and ensuring the normal use of the database.
In a first aspect, an embodiment of the present invention provides a method for checking data quality based on a model, including:
acquiring data to be checked in a target database;
inputting the data to be checked into a data checking model for data checking, wherein the data checking model is created in advance based on checking configuration information, and the checking configuration information is obtained by configuring based on metadata corresponding to the target database;
and determining a checking result corresponding to the data to be checked based on the output of the data checking model.
In a second aspect, an embodiment of the present invention further provides a data quality checking apparatus based on a model, including:
the data to be checked acquisition module is used for acquiring data to be checked in the target database;
the data to be checked input module is used for inputting the data to be checked into a data checking model for data checking, wherein the data checking model is created in advance based on checking configuration information, and the checking configuration information is obtained by configuring based on metadata corresponding to the target database;
and the checking result determining module is used for determining the checking result corresponding to the data to be checked based on the output of the data checking model.
In a third aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a model-based data quality verification method as provided by any embodiment of the invention.
In a fourth aspect, the embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the model-based data quality checking method provided in any embodiment of the present invention.
The embodiment of the invention configures the metadata based on the target database and creates the corresponding data checking model based on the obtained checking configuration information, thereby realizing the dynamic creation of the data checking model and meeting the individual requirements of data checking. By acquiring the data to be checked in the target database, inputting the data to be checked into the data checking model for data checking, and determining the checking result corresponding to the data to be checked based on the output of the data checking model, the data checking model is utilized, the checking result can be obtained only by performing reading operation on the target database once, repeated reading of the data is not required frequently, the condition that too much database resources are occupied due to data checking is avoided, the service pressure of the database is reduced, and the normal use of the database is ensured.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description will be given below of the drawings required for the embodiments or the technical solutions in the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flowchart of a method for checking data quality based on a model according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for checking data quality based on a model according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a model-based data quality checking apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a data quality checking method based on a model according to an embodiment of the present invention, which is applicable to quality checking of database table data stored in a database. The method may be performed by a model-based data quality checking apparatus, which may be implemented by software and/or hardware, integrated in an electronic device. As shown in fig. 1, the method specifically includes the following steps:
and S110, acquiring data to be checked in the target database.
The target database may refer to a relational database that needs to check data. The data to be checked can refer to data which is stored in the target database and needs to be checked for quality.
S120, inputting the data to be checked into a data checking model for data checking, wherein the data checking model is created in advance based on checking configuration information, and the checking configuration information is obtained by configuring based on metadata corresponding to the target database.
The metadata corresponding to the target database may refer to data describing data stored in the target database. For example, for each data table in the target database, the metadata may refer to fields in the header of each data table, such as name, gender, age, height, and the like. Each field in the metadata may be an object for data verification configuration. The checking configuration information may include a field of the data table to be checked in the target database and a data checking method corresponding to the field. The field of the data table to be checked may refer to a field to be checked in the data table, which is configured in advance by the user based on the service requirement. The number of the data table to be checked can be one or more. The data checking mode may be pre-configured and used for checking data of the corresponding data table field to be checked. Data verification means may include, but are not limited to: data checking rules and a checking triggering mode. The data checking rules can be characterized by using a data checking function mode, so that the data checking efficiency is improved. For example, the data-checking rules may include at least one data-checking function, which may include, but is not limited to: a non-null validation function, an enumerated value function, and a value range validation function. If the data checking rule corresponding to the field of the data table to be checked includes two or more data checking functions, the checking order of the data checking functions may be dynamically configured or defaulted, that is, the data checking method may further include: the function checks the order information. The function checking order information may include checking simultaneously or checking sequentially in the configuration order.
The data checking model can be created based on the checking configuration information and is used for uniformly checking the data. For example, the data checking model may perform data checking on corresponding field data in the data to be checked based on a data checking manner corresponding to the field of the data table to be checked.
Specifically, all data table field information of the target database can be obtained by reading the metadata corresponding to the target database, and all data table field information can be displayed in the configuration interface, so that a user can select each data table field to be checked from all displayed data table field information based on business requirements, and configure a corresponding data checking mode for each data table field to be checked, thereby realizing dynamic and flexible configuration of data checking modes, avoiding over-dependence on database and data table structures, and further improving checking efficiency. The corresponding data checking model can be dynamically created based on the checking configuration information configured by the user, so that the checking configuration information can be solidified into the data checking model, the data checking model can perform data checking on corresponding field data in the input data to be checked based on a data checking mode corresponding to each table field of the data to be checked in the checking configuration information, the dynamic creation of the data checking model is realized, and the individualized requirement of data checking is met.
Illustratively, S110 may include: and acquiring data to be checked corresponding to the data table field to be checked in the target database based on the data table field to be checked in the checking configuration information. Specifically, corresponding data to be checked is obtained based on each data table field to be checked in the checking configuration information, so that data which does not need to be checked can be avoided being obtained, and the checking efficiency is further improved.
And S130, determining a checking result corresponding to the data to be checked based on the output of the data checking model.
Specifically, the data verification model performs data verification on corresponding field data in the input data to be verified based on a data verification mode corresponding to each data table field to be verified in the verification configuration information, and then can automatically output a data verification result corresponding to each data table field to be verified, so that the verification result can be obtained only by performing one-time reading operation on the target database by using the data verification model. Repeated reading of data is not required to be frequently carried out for multiple times, the condition that excessive database resources are occupied due to data checking is avoided, the service pressure of the database is reduced, multiple times of logic association is not required, the consumption of a database CPU and an internal memory is reduced by using excessive functions or judgment expressions, and the normal use of the database is ensured.
According to the technical scheme of the embodiment, the corresponding data checking model is created based on the metadata corresponding to the target database and the obtained checking configuration information, so that the dynamic creation of the data checking model is realized, and the individual requirements of data checking are met. By acquiring the data to be checked in the target database and inputting the data to be checked into the data checking model for data checking, the checking result corresponding to the data to be checked is determined based on the output of the data checking model, so that the data checking model is utilized, the checking result can be obtained only by performing reading operation on the target database once, repeated reading of the data is not required to be performed frequently and repeatedly, the condition that too much database resources are occupied due to data checking is avoided, the service pressure of the database is reduced, and the normal use of the database is ensured.
On the basis of the above technical solution, S110 may include: acquiring newly-added data to be checked at the current moment in a target database; or acquiring data to be checked generated in a preset time period in the target database; or acquiring data to be checked which meets the preset data volume in the target database; or acquiring data to be checked generated in the target database within two checking time periods according to the last checking result and/or the server occupation condition.
Specifically, the starting time of data check can be set based on the service requirement. For example, by acquiring the to-be-checked data newly added at the current time in the target database, the corresponding data checking operation can be executed each time the to-be-checked data is newly added in the target database, so as to implement online checking of the data. Or, the data to be checked generated in the target database within the preset time period is obtained, so that the offline checking of the data can be realized. For example, each piece of data generated in a preset time period may be used as data to be checked one by one, so as to perform check one by one; all data generated in a preset time period can be used as data to be checked, so that batch checking can be performed conveniently, and checking efficiency is improved. Or, the data to be checked which meets the preset data volume in the target database is obtained, so that the data volume to be checked each time is controlled, and the normal use of the database is further ensured. Or, the data to be checked generated in the target database within the two checking time periods is acquired according to the last checking result and/or the server occupation condition, so that the intelligent adjustment of the data checking frequency can be realized. For example, if the last check result is that the data quality is poor, the data check frequency can be increased, so that a problem can be found in time to remind. If the server occupies a relatively serious area, the data checking frequency can be reduced, so that the occupation of normal service resources of the target database can be avoided, and the normal use of the database is further ensured.
On the basis of the above technical solutions, S120 may include: creating a data checking model in the memory based on the checking configuration information; and reading the data to be checked into the data checking model in the memory for data checking.
Specifically, a data check model can be dynamically created in the memory based on the check configuration information, and the data to be checked is read into the data check model in the memory for data check, so that the data check can be performed in the memory, the data check efficiency is further improved, and excessive functions or judgment expressions are used without performing multiple logical associations, so that the memory consumption can be reduced.
Example two
Fig. 2 is a flowchart of a data quality checking method based on a model according to a second embodiment of the present invention, where the embodiment describes in detail a creation process of a data checking model based on the above embodiments, and further optimizes the data quality checking process based on the detailed description. Wherein explanations of the same or corresponding terms as those of the above embodiments are omitted.
Referring to fig. 2, the method for checking data quality based on a model provided in this embodiment specifically includes the following steps:
s310, obtaining and checking the configuration information.
Specifically, a corresponding configuration file may be generated based on the checking information configured on the configuration interface by the user, so that the checking configuration information corresponding to the target database may be obtained by analyzing the configuration file, and the checking configuration information may include each to-be-checked data table field in the target database and a corresponding data checking manner.
S320, taking the to-be-checked data table field in the checking configuration information as column information, and creating a two-dimensional data structure object.
Wherein the data structure of the two-dimensional data structure object may be a table type data structure or a array type data structure.
Specifically, each to-be-checked data table field may be used as column information in an object relationship mapping manner, and a two-dimensional data structure object may be created in the memory. The two-dimensional data structure object may be a second-order matrix represented by using an X-axis and a Y-axis, where the X-axis and the Y-axis represent data, specifically, the X-axis may represent column information in data to be checked, and the Y-axis may represent row information in the data to be checked. The two-dimensional data structure object may also contain an ordered set of columns, each of which may be a different value type (e.g., numeric, string, boolean, etc.). The two-dimensional data structure object has both row index and column index, which can be regarded as a dictionary (commonly used as an index) composed of Series, and the row index can also be called as row label, and one row represents a group of data in the data table to be checked. The column index may also be referred to as a column tag, i.e., each to-be-checked data table field in the check configuration information.
Exemplarily, S320 may include: creating a two-dimensional data structure object with data association by a row index and a column index; the column index is a to-be-checked data table field in the checking configuration information.
Specifically, the two-dimensional data structure object may refer to a two-dimensional array structure constructed by a row index and a column index in a memory. The two-dimensional data structure object may be a complete storage space in the memory, or may not be a complete storage space, and may be a two-dimensional array structure formed by associating a row index and a column index, which are not connected to each other. By creating a two-dimensional data structure object which is subjected to data association by a row index and a column index, the memory space can be dynamically and fully utilized, the occupation and the operating pressure of the memory are reduced, and the data checking efficiency is further improved.
Illustratively, the number of rows of the two-dimensional data structure object is a preset number; the column number of the two-dimensional data structure object is the field number corresponding to the field of the data table to be checked configured in the checking configuration information.
The preset number may be a total number of rows included in a preset two-dimensional data structure object, or a preset number threshold for one-time data check, and is used to obtain a fixed number of data to be checked. Specifically, by creating a two-dimensional data structure object with fixed length and width, the subsequently created data verification model can have uniformity, and the data verification efficiency is further improved.
S330, binding the column information in the two-dimensional data structure object with a data checking mode corresponding to the corresponding to-be-checked data table field in the checking configuration information, and creating a data checking model.
Specifically, column information in the two-dimensional data structure object is bound with a corresponding data verification mode, so that data verification is performed on the corresponding column by using the bound data verification mode, and the verification configuration information is solidified into a corresponding data verification model.
Exemplarily, S330 may include: performing dimension raising on the two-dimensional data structure object to change the two-dimensional data structure object into a three-dimensional structure; the two-dimensional structure represents data to be checked, the one-dimensional structure represents a data checking mode, column information in a two-dimensional data structure object and the data checking mode corresponding to a corresponding data table field to be checked in the checking configuration information are bound through the three-dimensional structure, and a data checking model is created.
Specifically, the two-dimensional data structure object is subjected to dimension raising to be changed into a three-dimensional structure, for example, a second-order matrix is subjected to dimension raising to be changed into a third-order matrix. Three dimensions can be represented by an X axis, a Y axis and a Z axis, and a two-dimensional structure of the X axis and the Y axis represents data to be checked, specifically, the X axis can represent column information in the data to be checked, the Y axis can represent row information in the data to be checked, and the dimension of the Z axis can represent a data checking mode. The data checking mode can be bound with corresponding column information in the two-dimensional data structure object by setting the Z axis. If the data checking mode corresponding to the field of the data table to be checked comprises one or more data checking functions, each data checking function can be identified by utilizing integer data according to the preset checking sequence of each checking function. For example, if the "age" field corresponds to a non-empty verification function, a value range verification function, and a check function associated with the identification number field, the non-empty verification function may be identified as 1, the value range verification function as 2, and the check function associated with the identification number field as 3, so that the Z-axis may also be non-empty integer data.
And S340, acquiring data to be checked in the target database.
And S350, storing the data to be checked into the two-dimensional data structure object in the data checking model to obtain the object to be checked.
Specifically, the data to be checked corresponding to the field of the data table to be checked is used as column data, and is stored in a corresponding column in the two-dimensional data structure object in the data checking model, so that the object to be checked containing the data to be checked is obtained. For example, the field data corresponding to the name field and the field data corresponding to the age field in the user table in the target database may be stored in corresponding columns in the two-dimensional data structure object, so as to obtain the object to be checked.
If the number of lines of the two-dimensional data structure object is larger than the number of lines of the data to be checked, repeated storage, filling processing or blank processing of the data to be checked is carried out on the redundant line number space. For example, if the number of lines of the two-dimensional data structure object is 100 lines, and the number of lines of the data to be checked is 90 lines, then the redundant 10 lines may repeatedly appear, or an irrelevant value, such as 0, may be used to fill the two-dimensional data structure object, and a Null may also be left for processing, that is, a Null is defaulted, so that the format uniformity of the object to be checked obtained by the data checking model may be ensured, and further, many branch logics may be reduced, so as to achieve uniform format input and uniform format output, and further improve the checking efficiency.
And S360, performing data check on the data to be checked stored in the corresponding column based on the data check mode bound by the column information in the object to be checked.
Exemplarily, S360 may include: and performing data check on the data to be checked stored in the corresponding column by calling a check function corresponding to the data check mode bound by the column information in the object to be checked.
Specifically, the checking function corresponding to the data checking mode bound to each column information in the object to be checked can be sequentially called according to the preset checking sequence of each checking function, and data checking is performed on the data to be checked stored in the corresponding column, so that the checking result can be quickly obtained, and the checking efficiency is further improved.
And S370, determining a checking result corresponding to the data to be checked based on the output of the data checking model.
Specifically, if the to-be-checked object in the data checking model stores batch data to be checked, the obtained batch checking results can be serialized, and the checking result corresponding to each piece of data to be checked is obtained, for example, non-empty verification is satisfied, value range verification is satisfied, and the like.
It should be noted that, in this embodiment, only when the database table data in the target database is checked for the first time, the steps S310 to S330 are executed, the corresponding data checking model is dynamically created based on the checking configuration information, and during the subsequent checking, the data checking can be directly performed based on the created data checking model without re-creating the data checking model. If the checking configuration information corresponding to the target database is changed, a corresponding data checking model can be created again based on the changed checking configuration information, so that dynamic flexible configuration of data checking is realized, and the accuracy of data checking is ensured.
According to the technical scheme of the embodiment, the two-dimensional data structure object is created by taking the to-be-checked data table field in the acquired checking configuration information as column information, and the column information in the two-dimensional data structure object is bound with the data checking mode corresponding to the to-be-checked data table field in the checking configuration information, so that the data checking model is created, the dynamic creation of the data checking model can be further realized, the individualized requirement of data checking is met, and the accuracy of data checking is ensured.
The following is an embodiment of the model-based data quality checking apparatus provided in the embodiments of the present invention, and the apparatus and the model-based data quality checking method in the embodiments belong to the same inventive concept, and details that are not described in detail in the embodiments of the model-based data quality checking apparatus may refer to the embodiments of the model-based data quality checking method.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a data quality checking apparatus based on a model according to a third embodiment of the present invention, which is applicable to quality checking of database table data stored in a database. As shown in fig. 3, the apparatus includes: the system comprises a to-be-checked data acquisition module 710, a to-be-checked data input module 720 and a checking result determination module 730.
The to-be-checked data acquisition module 710 is configured to acquire to-be-checked data in a target database; the to-be-checked data input module 720 is configured to input the to-be-checked data into a data checking model for data checking, where the data checking model is created in advance based on checking configuration information, and the checking configuration information is obtained by configuring metadata corresponding to the target database; and the verification result determining module 730 is configured to determine a verification result corresponding to the data to be verified based on the output of the data verification model.
Optionally, the checking configuration information includes a data table field to be checked in the target database and a data checking mode corresponding to the data table field;
the to-be-checked data obtaining module 710 is specifically configured to: acquiring data to be checked corresponding to the data table field to be checked in the target database based on the data table field to be checked in the checking configuration information;
the data checking model is used for carrying out data checking on corresponding field data in the data to be checked based on a data checking mode corresponding to the field of the data table to be checked.
Optionally, the apparatus further comprises: a data verification model creation module comprising:
the checking configuration information acquisition unit is used for acquiring the checking configuration information;
the two-dimensional data structure object creating unit is used for creating a two-dimensional data structure object by taking a data table field to be checked in the checking configuration information as column information;
and the data checking mode binding unit is used for binding the column information in the two-dimensional data structure object with the data checking mode corresponding to the corresponding to-be-checked data table field in the checking configuration information to create a data checking model.
Optionally, the two-dimensional data structure object creating unit is specifically configured to: creating a two-dimensional data structure object with data association by a row index and a column index; the column index is a to-be-checked data table field in the checking configuration information.
Optionally, the data checking manner binding unit is specifically configured to: performing dimension raising on the two-dimensional data structure object to change the two-dimensional data structure object into a three-dimensional structure; the two-dimensional structure represents data to be checked, the one-dimensional structure represents a data checking mode, column information in a two-dimensional data structure object and the data checking mode corresponding to a corresponding data table field to be checked in the checking configuration information are bound through the three-dimensional structure, and a data checking model is created.
Optionally, the number of lines of the two-dimensional data structure object is a preset number; the column number of the two-dimensional data structure object is the field number corresponding to the field of the data table to be checked configured in the checking configuration information.
Optionally, the to-be-checked data input module 720 includes:
the device comprises a to-be-checked object obtaining unit, a to-be-checked object obtaining unit and a checking unit, wherein the to-be-checked object obtaining unit is used for storing data to be checked into a two-dimensional data structure object to obtain an object to be checked; if the number of lines of the two-dimensional data structure object is larger than the number of lines of the data to be checked, repeatedly storing, filling or leaving empty the data to be checked in the redundant line number space;
and the data checking unit is used for performing data checking on the data to be checked stored in the corresponding column based on the data checking mode bound by the column information in the object to be checked.
Optionally, the data checking unit is specifically configured to: and performing data check on the data to be checked stored in the corresponding column by calling a check function corresponding to the data check mode bound by the column information in the object to be checked.
Optionally, the to-be-checked data obtaining module 710 is specifically configured to:
acquiring newly-added data to be checked at the current moment in a target database; alternatively, the first and second electrodes may be,
acquiring data to be checked generated in a preset time period in a target database; alternatively, the first and second electrodes may be,
acquiring data to be checked which meets the preset data quantity in a target database; alternatively, the first and second electrodes may be,
and acquiring data to be checked generated in the target database within two checking time periods according to the last checking result and/or the server occupation condition.
Optionally, the to-be-checked data input module 720 is specifically configured to: creating a data checking model in the memory based on the checking configuration information; and reading the data to be checked into the data checking model in the memory for data checking.
The model-based data quality checking device provided by the embodiment of the invention can execute the model-based data quality checking method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the model-based data quality checking method.
It should be noted that, in the embodiment of the above model-based data quality verification apparatus, the included units and modules are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
Example four
Fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention. Referring to fig. 4, the electronic device includes:
one or more processors 810;
a memory 820 for storing one or more programs;
when the one or more programs are executed by the one or more processors 810, the one or more processors 810 are caused to implement a method for model-based data quality verification as provided in any of the embodiments above, the method comprising:
acquiring data to be checked in a target database;
inputting data to be checked into a data checking model for data checking, wherein the data checking model is created in advance based on checking configuration information, and the checking configuration information is obtained by configuring based on metadata corresponding to a target database;
and determining a checking result corresponding to the data to be checked based on the output of the data checking model.
In FIG. 4, a processor 810 is illustrated; the processor 810 and the memory 820 in the electronic device may be connected by a bus or other means, and the connection by the bus is exemplified in fig. 4.
The memory 820 is used as a computer readable storage medium and can be used for storing software programs, computer executable programs, and modules, such as program instructions/modules corresponding to the model-based data quality checking method in the embodiment of the present invention (for example, the to-be-checked data acquisition module 710, the to-be-checked data input module 720, and the checking result determination module 730 in the model-based data quality checking apparatus). The processor 810 executes various functional applications and data processing of the electronic device by executing software programs, instructions and modules stored in the memory 820, that is, implements the above-described model-based data quality checking method.
The memory 820 mainly includes a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 820 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 820 may further include memory located remotely from the processor 810, which may be connected to an electronic device through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device proposed by the present embodiment belongs to the same inventive concept as the model-based data quality checking method proposed by the above embodiment, and the technical details that are not described in detail in the present embodiment can be referred to the above embodiment, and the present embodiment has the same beneficial effects as the execution of the model-based data quality checking method.
EXAMPLE five
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements a model-based data quality checking method according to any of the embodiments of the present invention, the method including:
acquiring data to be checked in a target database;
inputting data to be checked into a data checking model for data checking, wherein the data checking model is created in advance based on checking configuration information, and the checking configuration information is obtained by configuring based on metadata corresponding to a target database;
and determining a checking result corresponding to the data to be checked based on the output of the data checking model.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer-readable storage medium may be, for example but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It will be understood by those skilled in the art that the modules or steps of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and optionally they may be implemented by program code executable by a computing device, such that it may be stored in a memory device and executed by a computing device, or it may be separately fabricated into various integrated circuit modules, or it may be fabricated by fabricating a plurality of modules or steps thereof into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (11)

1. A data quality checking method based on a model is characterized by comprising the following steps:
acquiring data to be checked in a target database;
inputting the data to be checked into a data checking model for data checking, wherein the data checking model is created in advance based on checking configuration information, and the checking configuration information is obtained by configuring based on metadata corresponding to the target database;
and determining a checking result corresponding to the data to be checked based on the output of the data checking model.
2. The method according to claim 1, wherein the checking configuration information includes a data table field to be checked in the target database and a data checking manner corresponding to the data table field;
the acquiring of the data to be checked in the target database includes: acquiring data to be checked corresponding to the data table field to be checked in the target database based on the data table field to be checked in the checking configuration information;
and the data checking model is used for carrying out data checking on corresponding field data in the data to be checked based on a data checking mode corresponding to the field of the data table to be checked.
3. The method of claim 2, wherein the creating the data verification model based on verification configuration information comprises:
acquiring the checking configuration information;
taking the data table field to be checked in the checking configuration information as column information, and creating a two-dimensional data structure object;
and binding the column information in the two-dimensional data structure object with a data checking mode corresponding to the corresponding to-be-checked data table field in the checking configuration information, and creating the data checking model.
4. The method according to claim 3, wherein the creating a two-dimensional data structure object by using the to-be-checked data table field in the checking configuration information as column information comprises:
creating a two-dimensional data structure object with data association by a row index and a column index; and the column index is a to-be-checked data table field in the checking configuration information.
5. The method according to claim 3, wherein the binding of the column information in the two-dimensional data structure object with the data verification manner corresponding to the corresponding to-be-verified data table field in the verification configuration information to create the data verification model comprises:
performing dimension raising on the two-dimensional data structure object to change the two-dimensional data structure object into a three-dimensional structure; and binding column information in the two-dimensional data structure object with a data checking mode corresponding to a corresponding data table field to be checked in the checking configuration information through the three-dimensional structure to create the data checking model.
6. The method of claim 3, wherein the number of rows of the two-dimensional data structure object is a preset number; the column number of the two-dimensional data structure object is the field number corresponding to the field of the data table to be checked configured in the checking configuration information.
7. The method of claim 3, wherein the inputting the data to be checked into a data checking model for data checking comprises:
storing the data to be checked into the two-dimensional data structure object to obtain the object to be checked; if the number of lines of the two-dimensional data structure object is larger than the number of lines of the data to be checked, repeatedly storing, filling or leaving empty the data to be checked in an excess line space;
and performing data check on the data to be checked stored in the corresponding column based on the data check mode bound by the column information in the object to be checked.
8. The method according to claim 7, wherein the data checking the to-be-checked data stored in the corresponding column based on the data checking manner bound to the column information in the to-be-checked object includes:
and performing data check on the data to be checked stored in the corresponding column by calling a check function corresponding to the data check mode bound by the column information in the object to be checked.
9. The method of claim 1, wherein the obtaining of the data to be checked in the target database comprises:
acquiring newly-added data to be checked at the current moment in a target database; alternatively, the first and second electrodes may be,
acquiring data to be checked generated in a preset time period in a target database; alternatively, the first and second electrodes may be,
acquiring data to be checked which meets the preset data quantity in a target database; alternatively, the first and second electrodes may be,
and acquiring data to be checked generated in the target database within two checking time periods according to the last checking result and/or the server occupation condition.
10. The method according to any one of claims 1 to 9, wherein the inputting the data to be checked into a data checking model for data checking comprises:
creating the data checking model in a memory based on the checking configuration information;
and reading the data to be checked into the data checking model in the memory for data checking.
11. A model-based data quality verification apparatus, comprising:
the data to be checked acquisition module is used for acquiring data to be checked in the target database;
the data to be checked input module is used for inputting the data to be checked into a data checking model for data checking, wherein the data checking model is created in advance based on checking configuration information, and the checking configuration information is obtained by configuring based on metadata corresponding to the target database;
and the checking result determining module is used for determining the checking result corresponding to the data to be checked based on the output of the data checking model.
CN202111600387.0A 2021-12-24 2021-12-24 Data quality checking method and device based on model Pending CN114328486A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111600387.0A CN114328486A (en) 2021-12-24 2021-12-24 Data quality checking method and device based on model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111600387.0A CN114328486A (en) 2021-12-24 2021-12-24 Data quality checking method and device based on model

Publications (1)

Publication Number Publication Date
CN114328486A true CN114328486A (en) 2022-04-12

Family

ID=81012968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111600387.0A Pending CN114328486A (en) 2021-12-24 2021-12-24 Data quality checking method and device based on model

Country Status (1)

Country Link
CN (1) CN114328486A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115841302A (en) * 2022-11-15 2023-03-24 四川智慧高速科技有限公司 Data checking method, electronic device and readable medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115841302A (en) * 2022-11-15 2023-03-24 四川智慧高速科技有限公司 Data checking method, electronic device and readable medium
CN115841302B (en) * 2022-11-15 2023-11-21 四川智慧高速科技有限公司 Data checking method, electronic device and readable medium

Similar Documents

Publication Publication Date Title
CN112800095B (en) Data processing method, device, equipment and storage medium
CN112307122B (en) Data lake-based data management system and method
CN107870949A (en) Data analysis job dependence relation generation method and system
CN107480268A (en) Data query method and device
CN115905630A (en) Graph database query method, device, equipment and storage medium
CN103064991A (en) Mass data clustering method
CN108363741A (en) Big data unified interface method, apparatus, equipment and storage medium
CN103678591A (en) Device and method for automatically executing multi-service receipt statistical treatment
CN114328486A (en) Data quality checking method and device based on model
CN107977504A (en) A kind of asymmetric in-core fuel management computational methods, device and terminal device
CN114443680A (en) Database management system, related apparatus, method and medium
CN112346775B (en) Index data general processing method, electronic device and storage medium
CN113190576A (en) Data processing method and device, computer equipment and readable storage medium
AU2019241002B2 (en) Transaction processing method and system, and server
CN107844490A (en) A kind of database divides storehouse method and device
CN111767406A (en) Knowledge representation method and device for PLC engineering
CN114968917A (en) Method and device for rapidly importing file data
CN109284268A (en) A kind of method, system and the electronic equipment of fast resolving log
CN114595215A (en) Data processing method and device, electronic equipment and storage medium
CN107995301B (en) Rapid data receiving and transmitting method based on Internet
CN112650777A (en) Data warehouse manufacturing method and device, terminal equipment and computer storage medium
CN117252180B (en) Report generation method and device, electronic equipment and storage medium
CN111324434B (en) Configuration method, device and execution system of computing task
CN109670601B (en) Machine learning feature generation method and device, electronic device and storage medium
CN116561077A (en) dbf file importing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination