CN111651439A - Data auditing method and device and computer readable storage medium - Google Patents

Data auditing method and device and computer readable storage medium Download PDF

Info

Publication number
CN111651439A
CN111651439A CN202010353392.5A CN202010353392A CN111651439A CN 111651439 A CN111651439 A CN 111651439A CN 202010353392 A CN202010353392 A CN 202010353392A CN 111651439 A CN111651439 A CN 111651439A
Authority
CN
China
Prior art keywords
data
rule
auditing
audited
audit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010353392.5A
Other languages
Chinese (zh)
Inventor
任世民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202010353392.5A priority Critical patent/CN111651439A/en
Publication of CN111651439A publication Critical patent/CN111651439A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Abstract

The invention relates to big data and discloses a data auditing method, which comprises the following steps: acquiring data to be audited, performing data cleaning operation on the data to be audited to obtain standard data, and storing the standard data in an SQL data table to obtain a data table to be audited; summarizing one or more preselected data auditing rules to obtain a rule set, converting each data auditing rule in the rule set into an SQL rule script, and obtaining a rule dimensional table according to the data auditing rules and the corresponding SQL rule script; auditing the data table to be audited by using a preset scheduling script according to the rule dimensional table to obtain an auditing score of the data table to be audited; and performing alarm operation on the data table to be audited according to the audit score. The invention also relates to a block chain technology, and the data to be audited is stored in the block chain. The invention can improve the efficiency of data auditing.

Description

Data auditing method and device and computer readable storage medium
Technical Field
The present invention relates to big data processing, and in particular, to a method and an apparatus for data auditing, an electronic device, and a computer-readable storage medium.
Background
With the advent of the big data era, the amount of data stored in a database system is larger and larger, and the amount of data accessed to each system is also larger and larger, and data auditing needs to be carried out before the data are used. Meanwhile, as the quality requirements of the industry boundary on data are gradually increased, data auditing is becoming more and more important. In the prior art, the rule script is required to be rewritten every time of data auditing, and the data auditing efficiency is low.
Disclosure of Invention
The invention provides a data auditing method, a data auditing device, electronic equipment and a computer readable storage medium, and mainly aims to improve the data auditing efficiency.
In order to achieve the above object, the present invention provides a data auditing method, which includes:
acquiring data to be audited, performing data cleaning operation on the data to be audited to obtain standard data, and storing the standard data in an SQL data table to obtain a data table to be audited;
summarizing one or more preselected data auditing rules to obtain a rule set, converting each data auditing rule in the rule set into an SQL rule script, and obtaining a rule dimensional table according to the data auditing rules and the corresponding SQL rule script;
auditing the data table to be audited by using a preset scheduling script according to the rule dimensional table to obtain an auditing score of the data table to be audited;
and performing alarm operation on the data table to be audited according to the audit score.
Optionally, the data to be audited is stored in a block chain, the obtaining of the data to be audited and the performing of data cleaning operation on the data to be audited to obtain standard data includes:
and carrying out abnormal data deletion processing, missing value filling processing and normalization processing on the data to be audited to obtain the standard data.
Optionally, the scheduling script includes a request object, a rule object and a response object, where the request object is used to receive a table name of a data table to be audited, which is input by a user, as a target table name, the rule object is used to audit the data table to be audited according to the target table name by using a rule dimensional table to obtain an audit score, and the response object user returns the audit score to the user.
Optionally, the audit processing includes:
through traversal operation, each data auditing rule in the rule dimensional table is obtained, and according to the obtained data auditing rule, the data table to be audited is audited by using a corresponding SQL rule script to obtain an auditing result of each data auditing rule;
multiplying the audit result of each data audit rule by the preset weight of the data audit rule to obtain the weight fraction of each data audit rule; and
and obtaining the auditing score of the data table to be audited according to the weight score.
Optionally, the auditing the to-be-audited data table to obtain the auditing result of each data auditing rule includes:
if the data table to be audited has data meeting the data auditing rule, the auditing result of the data auditing rule is set as a first preset value;
if the data to be audited does not have the data meeting the data auditing rule, the auditing result of the data auditing rule is set as a second preset value.
In order to solve the above problem, the present invention further provides a data auditing apparatus, including:
the data cleaning module is used for acquiring data to be audited, performing data cleaning operation on the data to be audited to obtain standard data, and storing the standard data in an SQL data table to obtain a data table to be audited;
the rule dimensional table creating module is used for summarizing one or more preselected data auditing rules to obtain a rule set, converting each data auditing rule in the rule set into an SQL rule script, and obtaining a rule dimensional table according to the data auditing rules and the corresponding SQL rule script;
the script auditing module is used for auditing the data table to be audited by using a preset scheduling script according to the rule dimensional table to obtain the auditing fraction of the data table to be audited;
and the auditing alarm module is used for carrying out alarm operation on the data table to be audited according to the auditing grades.
Optionally, the data to be audited is stored in a block chain, the obtaining of the data to be audited and the data cleaning operation on the data to be audited are performed to obtain standard data, and the obtaining of the standard data includes:
and carrying out abnormal data deletion processing, missing value filling processing and normalization processing on the data to be audited to obtain the standard data.
Optionally, the auditing the to-be-audited data table by using a preset scheduling script according to the rule dimension table to obtain the auditing score of the to-be-audited data table includes:
through traversal operation, each data auditing rule in the rule dimensional table is obtained, and according to the obtained data auditing rule, the data table to be audited is audited by using a corresponding SQL rule script to obtain an auditing result of each data auditing rule;
multiplying the audit result of each data audit rule by the preset weight of the data audit rule to obtain the weight fraction of each data audit rule; and
and obtaining the auditing score of the data table to be audited according to the weight score.
In order to solve the above problem, the present invention also provides an electronic device, including:
a memory storing at least one instruction; and
and the processor executes the instructions stored in the memory to realize the data auditing method.
In order to solve the above problem, the present invention further provides a computer-readable storage medium, which stores at least one instruction, where the at least one instruction is executed by a processor in an electronic device to implement the data auditing method.
The method comprises the steps of summarizing one or more preselected data auditing rules to obtain a rule set, converting each data auditing rule in the rule set into an SQL rule script, and obtaining a rule dimensional table according to the data auditing rules and the corresponding SQL rule script; according to the rule dimensional table, the preset scheduling scripts are used for auditing the data table to be audited, and through the characteristics, various rule scripts are solidified in the rule dimensional table, the rule scripts do not need to be rewritten during data auditing each time, the preset scheduling scripts are used for auditing the data table to be audited, the rule scripts do not need to be rewritten for different data auditing tables, and the data auditing efficiency is improved.
Drawings
FIG. 1 is a flowchart illustrating a data auditing method according to an embodiment of the present invention;
FIG. 2 is a diagram of a rule dimension table according to an embodiment of the present invention;
FIG. 3 is a block diagram of a data auditing apparatus according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating an internal structure of an electronic device implementing a data auditing method according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides a data auditing method. Fig. 1 is a schematic flow chart illustrating a data auditing method according to an embodiment of the present invention. The method may be performed by an apparatus, which may be implemented by software and/or hardware.
In this embodiment, the data auditing method includes:
s1, obtaining data to be audited, performing data cleaning operation on the data to be audited to obtain standard data, and storing the standard data in an SQL data table to obtain a data table to be audited.
In detail, the data cleansing operation includes: and carrying out abnormal data deletion processing, missing value filling processing and normalization processing on the data to be audited. It should be emphasized that, in order to further ensure the privacy and security of the data to be audited, the audit data may also be stored in a node of a block chain.
Wherein the abnormal value refers to an unreasonable value in the data to be audited. If the data interval of the age data is generally [0, 120], if the age data in the data to be audited is negative, the data is unreasonable value. Embodiments of the invention may be used including: and 3, judging abnormal values in the data to be audited by an abnormal value judging method such as a principle judging method, a cluster judging method, a statistical judging method and the like, and deleting the abnormal values in the data to be audited.
In detail, the invention can adopt the following method to delete the abnormal value in the data to be audited: calculating the standard deviation and the average value of the data in the data to be audited; and deleting data with the distance from the average value to the preset multiple of the standard deviation, such as 3 times, in the data to be audited.
Furthermore, the missing values in the data to be audited can be filled in by using a missing value filling mode of filling a fixed value, a median, a mean and a mode. For example: the missing value of the numerical data set A is filled with a fixed value, the default filling of the fixed value is 0, the missing value of the numerical data set A can be filled with the fixed value 0 by using a fillna function of a pandas module of Python, and the code is as follows:
A.fillna(0,inplace=True)。
furthermore, in order to accelerate the data auditing speed, the embodiment of the invention performs normalization processing on the data to be audited. The normalization process is to scale the data to fall within a small specific interval. The embodiment of the present invention may use a normalization method including (0,1) normalization, Z-score normalization, and Sigmoid function to normalize the filling data to obtain the standard data, for example: when the z-score standardization method is used, the embodiment of the invention utilizes the following calculation formula to carry out normalization processing on the data to be audited, and obtains the standard data:
y=(x11)/1
wherein mu1In order to be able to audit the data,1is the standard deviation, x, of the data to be audited1The average value of the data to be audited is shown, and y is standard data.
Further, in order to process the standard data by using the SQL rule script, the embodiment of the present invention stores the standard data in the SQL data table to obtain the data table to be audited.
S2, summarizing one or more preselected data auditing rules to obtain a rule set, converting each data auditing rule in the rule set into an SQL rule script, and obtaining a rule dimensional table according to the data auditing rules and the corresponding SQL rule script;
the embodiment of the invention selects one or more data auditing rules in advance for auditing the data in the data table to be audited.
Preferably, the data auditing rules include, for example:
rule 1: the absolute value of the data amount variation is less than a first preset value, such as 5%;
rule 2: the summary ring ratio of the field A is smaller than a second preset value, such as 5%;
rule 3: field B is greater than a third preset value, such as 0;
rule 4: field C is not null; and the like.
For example: when the data table to be audited contains the yield data of the product a and the product b in the last 100 days, the rule 1 indicates that the yield data of the product a or the product b is between 95 and 105; rule 2 indicates (sum of daily production of product a and product b-sum of daily production of product a and product b)/(sum of daily production of product a and product) 100% < 5%; rule 3 indicates that the production data for product a or product b on any day is greater than zero; rule 4 indicates that the production data for product a or product b on any day cannot be empty.
Further, in the embodiment of the present invention, the selected data audit rules are summarized as a rule set f:
f=[a1(p1),a2(p2),…aj(pj)]
wherein p isjFor data in the data sheet to be audited, ajRepresenting different data audit rules.
Furthermore, in the embodiment of the present invention, each data auditing rule in the rule set f is converted into an SQL rule script.
In detail, the embodiment of the present invention can calculate from xxx where data in ('today', 'yesterday') for the change of select data volume by converting the SQL rule script of rule 1; the SQL rule script of rule 2 is select sum (A) from xxx where date is 'today' and stringsum (A) < 5%; the SQL rule script of rule 3 is selectcount (B), from xxx where date ═ today' and B ═ 0; the SQL script of rule 4 is select count (C) from xxx where date ═ today' and C is null.
Preferably, another embodiment of the present invention uses inverse wave-blue Representation (RPN) to perform subsequent traversal on the data audit rules contained in the rule set, so as to conveniently represent the data audit rule sequence by using an RPN array, thereby being capable of more effectively performing the conversion of the SQL rule script, for example, the script defining rule 2 is: stringsum (X) < 5%, select sum (X) fromxxx where date ═ today'.
Furthermore, the embodiment of the invention presets the weight of the SQL rule script of each data auditing rule. For example: rule 1: the absolute value of the change of the data quantity is less than 5 percent, and the preset weight is 60 percent; rule 2: the summary ring ratio of the field A is less than 5%, and the preset weight is 60; rule 3: the field B is larger than 0, and the preset weight is 10; rule 4: the field C is not empty and is preset with a weight of 10. Further, the rule dimensional table is obtained according to the data auditing rule, the corresponding SQL rule script and the weight.
In detail, see the rule dimension table shown in FIG. 2. In the embodiment of the present invention, the function of each information column included in the rule dimension table is as follows:
and the rule coding column is used for naming and distinguishing different rules. For example: rule 1, rule 2, etc
And the target table name column is used for describing the name of the data table to be audited. For example: for example: and auditing the to-be-audited data table xxx, and filling the table name xxx of the to-be-audited data table xxx into the target table name column.
And the rule description column is used for describing the specific contents of different rules. For example: the rule corresponding to rule 1 is described as a data amount variation absolute value of less than 5%.
And the rule script column is used for describing rule scripts corresponding to different rules.
Weight column: for describing the weights corresponding to different rules. For example: rule 1 corresponds to a weight of 60.
In the embodiment of the invention, the rule dimensional table summarizes the data audit rules and the rule scripts, so that the data audit rules are prevented from being rewritten every time, and the audit efficiency is improved.
S3, auditing the data table to be audited by using a preset scheduling script according to the rule dimensional table to obtain the auditing score of the data table to be audited.
In detail, the scheduling script in the embodiment of the present invention includes a request object, a rule object, and a response object. The request object is used for receiving a table name of a data table to be audited input by a user as a target table name, the rule object is used for auditing the data table to be audited by utilizing the rule dimensional table according to the target table name to obtain an audit score, and the response object user returns the audit score to the user. For example: and auditing the to-be-audited data table A by using the scheduling script, wherein the request receives a parameter A input by a user as a target table name, the rule object audits the to-be-audited data table A by using the rule dimensional table according to the target table name A to obtain an audit score, and the response object returns the script audit score to the user.
Further, the audit process comprises: through traversal operation, each data auditing rule in the rule dimensional table is obtained, and the data auditing rule extracted from the rule dimensional table is utilized to audit the data table to be audited, so that an auditing result of each data auditing rule is obtained; calculating the weight fraction of each data auditing rule in the rule dimensional table according to the auditing result of each data auditing rule; and summing the weight scores of all the data auditing rules of the rule dimensional table to obtain the auditing score.
In one application example of the invention, if the data meeting a certain data auditing rule in the data table to be audited is audited, the auditing result of the data auditing rule is set as a first preset value, such as 1; if the data which is in accordance with a certain data auditing rule in the data table to be audited does not exist in the auditing, the auditing result of the data auditing rule is set as a second preset value, such as 0.
Furthermore, in the embodiment of the present invention, the audit result of each data audit rule is multiplied by the weight of the data audit rule to obtain the weight fraction of each data audit rule. Wherein the weight may be set according to an empirical value. For example: utilizing a scheduling script to audit the data table A to be audited, if the absolute value of the change of the data quantity in the data table A to be audited is less than 5% and accords with a rule 1, setting the audit result of the rule 1 as 1, and presetting the weight of the rule 1 as 60, so that the weight fraction of the rule 1 is calculated as: 1 × 60 ═ 60.
Furthermore, in the embodiment of the present invention, the weight scores of all rules in the rule dimension table are summed to obtain the audit score.
For example: auditing a to-be-audited data table xxx, acquiring a table name xxxx of the to-be-audited data table, auditing the to-be-audited data table by traversing the rule of the rule dimensional table, wherein the auditing result is as follows: the absolute value of the change of the data quantity is less than 5% and does not accord with the rule 1, the summary ring ratio of the field A is more than 5% and accords with the rule 2, the field B is not more than 0 and accords with the rule 3, and the field C is not empty and does not accord with the rule 4; therefore, the audit result of the rule 1 is set to be 0, the audit result of the rule 2 is set to be 1, the audit result of the rule 3 is set to be 1, and the audit result of the rule 4 is set to be 0; the weight of rule 1 is 60, the weight of rule 2 is 60, the weight of rule 3 is 10, and the weight of rule 4 is 10; therefore, the weight fraction of rule 1 is 0 × 60 ═ 0, the weight fraction of rule 2 is 1 × 60 ═ 60, the weight fraction of rule 3 is 1 × 10 ═ 10, the weight fraction of rule 4 is 0 × 10 ═ 60, and the weight fractions of all the rules in the rule dimension table are summed to obtain the audit fraction 0 × 60+1 ═ 10+0 ═ 10 ═ 70.
Further, in the above embodiment, a single data table to be audited is audited, where the data to be audited is generally data of the same department or industry, and when data to be audited of different industries is audited simultaneously, the data to be audited of different industries is stored in different data tables to be audited, and it is necessary to audit a plurality of data tables to be audited simultaneously, another embodiment of the present invention may audit a plurality of data tables to be audited simultaneously, and the method further includes:
acquiring data to be audited;
storing data to be audited in a plurality of SQL data tables;
respectively defining auditing rules for the plurality of SQL data tables;
auditing the plurality of SQL data tables by using a preset executable script according to the rule dimension table to obtain auditing scores of the plurality of SQL data tables;
and summing the audit scores of the plurality of SQL data tables to obtain a final audit score.
The embodiment can simultaneously audit data of a plurality of departments or industries, and the efficiency and the range of data audit are improved.
S4, performing alarm operation on the data table to be audited according to the audit score.
The embodiment of the invention can define different warning rules according to different audit scores. For example: if the audit score is more than 0 minutes but not more than 10 minutes, the mail alarm can be carried out; if the audit score is more than 10 minutes but not more than 60 minutes, short message alarm can be performed; if the audit score is more than 60 points, a telephone alarm is carried out, and the like.
The method comprises the steps of obtaining data to be audited, carrying out data cleaning operation on the data to be audited to obtain standard data, and storing the standard data in an SQL data table to obtain a data table to be audited; summarizing one or more preselected data auditing rules to obtain a rule set, defining each data auditing rule in the rule set as an SQL rule script, and obtaining a rule dimensional table according to the data auditing rules and the corresponding SQL rule script; auditing the data table to be audited by using a preset scheduling script according to the rule dimensional table to obtain an auditing score of the data table to be audited; and performing alarm operation on the data table to be audited according to the audit score. Various rule scripts are solidified in the rule dimensional table, the rule scripts do not need to be rewritten every time data is audited, the preset scheduling scripts are used for auditing the data table to be audited, the rule scripts do not need to be rewritten for different data auditing tables, and the data auditing efficiency is improved.
FIG. 3 is a functional block diagram of the data auditing apparatus according to the present invention.
The data auditing device 100 of the present invention can be installed in an electronic device. According to the realized functions, the data auditing device can comprise a data cleaning module 101, a rule dimensional table creating module 102, a script auditing module 103 and an auditing alarm module 104. A module according to the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the data cleaning module 101 is configured to acquire data to be audited, perform data cleaning operation on the data to be audited to obtain standard data, and store the standard data in an SQL data table to obtain a data table to be audited. It should be emphasized that, in order to further ensure the privacy and security of the data to be audited, the data to be audited may also be stored in a node of a block chain.
In detail, the data cleansing operation includes: and carrying out abnormal data deletion processing, missing value filling processing and normalization processing on the data to be audited.
The abnormal value is an unreasonable value in the data to be audited, for example, the data interval of the age data is generally [0, 120], and if the age data in the data to be audited is negative, the data is an unreasonable value. Embodiments of the invention may be used including: and 3, judging abnormal values in the data to be audited by an abnormal value judging method such as a principle judging method, a cluster judging method, a statistical judging method and the like, and deleting the abnormal values in the data to be audited.
In detail, the invention can adopt the following method to delete the abnormal value in the data to be audited: calculating the standard deviation and the average value of the data in the data to be audited; and deleting data with the distance from the average value to the preset multiple, such as 3 times, of the standard deviation in the data to be audited. Furthermore, the missing values in the data to be audited can be filled in by using a missing value filling mode of filling a fixed value, a median, a mean and a mode. For example: the missing value of the numerical data set A is filled with a fixed value, the default filling of the fixed value is 0, the missing value of the numerical data set A can be filled with the fixed value 0 by using a fillna function of a pandas module of Python, and the code is as follows:
B.fillna(0,inplace=True)。
furthermore, in order to accelerate the data auditing speed, the embodiment of the invention performs normalization processing on the data to be audited. The normalization process is to scale the data to fall within a small specific interval. The embodiment of the present invention may use a normalization method including (0,1) normalization, Z-score normalization, and Sigmoid function to normalize the filling data to obtain the standard data, for example: when the z-score standardization method is used, the embodiment of the invention utilizes the following calculation formula to carry out normalization processing on the data to be audited, and obtains the standard data:
y=(x11)/1
wherein mu1In order to be able to audit the data,1is the standard deviation, x, of the data to be audited1The average value of the data to be audited is shown, and y is standard data.
Further, in order to process the standard data by using the SQL rule script, the embodiment of the present invention stores the standard data in the SQL data table to obtain the data table to be audited.
The rule dimensional table creating module 102 is configured to summarize one or more pre-selected data audit rules to obtain a rule set, convert each data audit rule in the rule set into an SQL rule script, and obtain a rule dimensional table according to the data audit rules and the corresponding SQL rule script.
The embodiment of the invention selects one or more data auditing rules in advance for auditing the data in the data table to be audited.
Preferably, the data auditing rules include, for example:
rule 1: the absolute value of the data amount variation is less than a first preset value, such as 5%;
rule 2: the summary ring ratio of the field A is smaller than a second preset value, such as 5%;
rule 3: field B is greater than a third preset value, such as 0;
rule 4: field C is not null; and the like.
For example: when the data table to be audited contains the yield data of the product a and the product b in the last 100 days, the rule 1 indicates that the yield data of the product a or the product b is between 95 and 105; rule 2 indicates (sum of daily production of product a and product b-sum of daily production of product a and product b)/(sum of daily production of product a and product) 100% < 5%; rule 3 indicates that the production data for product a or product b on any day is greater than zero; rule 4 indicates that the production data for product a or product b on any day cannot be empty.
Further, in the embodiment of the present invention, the selected data audit rules are summarized as a rule set:
f=[a1(p1),a2(p2),…aj(pj)]
wherein p isjFor data in the data sheet to be audited, ajRepresenting different data audit rules.
Furthermore, in the embodiment of the present invention, each data auditing rule in the rule set f is converted into an SQL rule script.
In detail, the embodiment of the present invention can calculate from xxx where data in ('today', 'yesterday') for the change of select data volume by converting the SQL rule script of rule 1; the SQL rule script of rule 2 is select sum (A) from xxx where date is 'today' and stringsum (A) < 5%; the SQL rule script of rule 3 is selectcount (B), from xxx where date ═ today' and B ═ 0; the SQL script of rule 4 is select count (C) from xxx where date ═ today' and C is null.
Preferably, another embodiment of the present invention uses inverse wave-blue Representation (RPN) to perform subsequent traversal on the data audit rules contained in the rule set, so as to conveniently represent the data audit rule sequence by using an RPN array, thereby being capable of more effectively performing the definition of the SQL rule script, for example, the script defining rule 2 is: stringsum (X) < 5%, select sum (X) fromxxx where date ═ today'.
Furthermore, the embodiment of the invention presets the weight of the SQL rule script of each data auditing rule. For example: rule 1: the absolute value of the change of the data quantity is less than 5 percent, and the preset weight is 60 percent; rule 2: the summary ring ratio of the field A is less than 5%, and the preset weight is 60; rule 3: the field B is larger than 0, and the preset weight is 10; rule 4: the field C is not empty and is preset with a weight of 10.
Further, the rule dimensional table is obtained according to the data auditing rule, the corresponding SQL rule script and the weight.
In detail, see the rule dimension table shown in FIG. 2. In the embodiment of the present invention, the function of each information column included in the rule dimension table is as follows:
and the rule coding column is used for naming and distinguishing different rules. For example: rule 1, rule 2, etc
And the target table name column is used for describing the name of the data table to be audited. For example: for example: and auditing the to-be-audited data table xxx, and filling the table name xxx of the to-be-audited data table xxx into the target table name column.
And the rule description column is used for describing the specific contents of different rules. For example: the rule corresponding to rule 1 is described as a data amount variation absolute value of less than 5%.
And the rule script column is used for describing rule scripts corresponding to different rules.
Weight column: for describing the weights corresponding to different rules. For example: rule 1 corresponds to a weight of 60.
In the embodiment of the invention, the rule dimensional table summarizes the data audit rules and the rule scripts, so that the data audit rules are prevented from being rewritten every time, and the audit efficiency is improved.
The script auditing module 103 is used for auditing the data table to be audited by using a preset scheduling script according to the rule dimensional table to obtain the auditing score of the data table to be audited.
In detail, the scheduling script in the embodiment of the present invention includes a request object, a rule object, and a response object. The request object is used for receiving a table name of a data table to be audited input by a user as a target table name, the rule object is used for auditing the data table to be audited by utilizing the rule dimensional table according to the target table name to obtain an audit score, and the response object user returns the audit score to the user. For example: and auditing the to-be-audited data table A by using the scheduling script, wherein the request receives a parameter A input by a user as a target table name, the rule object audits the to-be-audited data table A by using the rule dimensional table according to the target table name A to obtain an audit score, and the response object returns the script audit score to the user.
Further, the audit process comprises: through traversal operation, each data auditing rule in the rule dimensional table is obtained, and the data auditing rule extracted from the rule dimensional table is utilized to audit the data table to be audited, so that an auditing result of each data auditing rule is obtained; calculating the weight fraction of each data auditing rule in the rule dimensional table according to the auditing result of each data auditing rule; and summing the weight scores of all the data auditing rules of the rule dimensional table to obtain the auditing score.
In one application example of the invention, if the data meeting a certain data auditing rule in the data table to be audited is audited, the auditing result of the data auditing rule is set as a first preset value, such as 1; if the data which is in accordance with a certain data auditing rule in the data table to be audited does not exist in the auditing, the auditing result of the data auditing rule is set as a second preset value, such as 0.
Furthermore, in the embodiment of the present invention, the audit result of each data audit rule is multiplied by the weight of the data audit rule to obtain the weight fraction of each data audit rule. Wherein the weight may be set according to an empirical value. For example: utilizing a scheduling script to audit the data table A to be audited, if the absolute value of the change of the data quantity in the data table A to be audited is less than 5% and accords with a rule 1, setting the audit result of the rule 1 as 1, and presetting the weight of the rule 1 as 60, so that the weight fraction of the rule 1 is calculated as: 1 × 60 ═ 60.
Furthermore, in the embodiment of the present invention, the weight scores of all rules in the rule dimension table are summed to obtain the audit score.
The auditing alarm module 104 is used for carrying out alarm operation on the data table to be audited according to the auditing scores.
The embodiment of the invention can define different warning rules according to different audit scores. For example: if the audit score is more than 0 minutes but not more than 10 minutes, the mail alarm can be carried out; if the audit score is more than 10 minutes but not more than 60 minutes, short message alarm can be performed; if the audit score is more than 60 points, a telephone alarm is carried out, and the like.
FIG. 4 is a schematic structural diagram of an electronic device implementing the data auditing method according to the present invention.
The electronic device 1 may include a processor 10, a memory 11 and a bus, and may further include a computer program, such as a data auditing program 12, stored in the memory 11 and operable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as codes of a data auditing program, but also for temporarily storing data that has been output or will be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the whole electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by operating or executing programs or modules (e.g., data auditing programs, etc.) stored in the memory 11 and calling data stored in the memory 11.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
Fig. 4 only shows an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The data auditing program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, implement:
acquiring data to be audited, performing data cleaning operation on the data to be audited to obtain standard data, and storing the standard data in an SQL data table to obtain a data table to be audited; summarizing one or more preselected data auditing rules to obtain a rule set, converting each data auditing rule in the rule set into an SQL rule script, and obtaining a rule dimensional table according to the data auditing rules and the corresponding SQL rule script;
auditing the data table to be audited by using a preset scheduling script according to the rule dimensional table to obtain an auditing score of the data table to be audited;
and performing alarm operation on the data table to be audited according to the audit score.
Specifically, the specific implementation method of the processor 10 for the instruction may refer to the description of the relevant steps in the embodiment corresponding to fig. 1, which is not described herein again. It should be emphasized that, in order to further ensure the privacy and security of the data to be audited, the audit data may also be stored in a node of a block chain.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A method for auditing data, the method comprising:
acquiring data to be audited, performing data cleaning operation on the data to be audited to obtain standard data, and storing the standard data in an SQL data table to obtain a data table to be audited;
summarizing one or more preselected data auditing rules to obtain a rule set, converting each data auditing rule in the rule set into an SQL rule script, and obtaining a rule dimensional table according to the data auditing rules and the corresponding SQL rule script;
auditing the data table to be audited by using a preset scheduling script according to the rule dimensional table to obtain an auditing score of the data table to be audited;
and performing alarm operation on the data table to be audited according to the audit score.
2. The data auditing method of claim 1, where said obtaining data to be audited, performing data cleaning operations on said data to be audited to obtain standard data, comprises:
and carrying out abnormal data deletion processing, missing value filling processing and normalization processing on the data to be audited to obtain the standard data.
3. The data auditing method of claim 1, characterized in that the scheduling script comprises a request object, a rule object and a response object, wherein the request object is used for receiving a table name of a data table to be audited input by a user as a target table name, the rule object is used for auditing the data table to be audited according to the target table name by using a rule dimension table to obtain an auditing score, and the response object user returns the auditing score to the user.
4. The data auditing method according to any one of claims 1 to 3, where the data to be audited is stored in a block chain, the auditing process comprising:
through traversal operation, each data auditing rule in the rule dimensional table is obtained, and according to the obtained data auditing rule, the data table to be audited is audited by using a corresponding SQL rule script to obtain an auditing result of each data auditing rule;
multiplying the audit result of each data audit rule by the preset weight of the data audit rule to obtain the weight fraction of each data audit rule; and
and obtaining the auditing score of the data table to be audited according to the weight score.
5. The data auditing method of claim 4, where auditing the data table to be audited to obtain an audit result for each data auditing rule includes:
if the data table to be audited has data meeting the data auditing rule, the auditing result of the data auditing rule is set as a first preset value;
if the data to be audited does not have the data meeting the data auditing rule, the auditing result of the data auditing rule is set as a second preset value.
6. A data auditing apparatus, the apparatus comprising:
the data cleaning module is used for acquiring data to be audited, performing data cleaning operation on the data to be audited to obtain standard data, and storing the standard data in an SQL data table to obtain a data table to be audited;
the rule dimensional table creating module is used for summarizing one or more preselected data auditing rules to obtain a rule set, converting each data auditing rule in the rule set into an SQL rule script, and obtaining a rule dimensional table according to the data auditing rules and the corresponding SQL rule script;
the script auditing module is used for auditing the data table to be audited by using a preset scheduling script according to the rule dimensional table to obtain the auditing fraction of the data table to be audited;
and the auditing alarm module is used for carrying out alarm operation on the data table to be audited according to the auditing scores.
7. The data auditing device of claim 6, where the data to be audited is stored in a block chain, where obtaining the data to be audited and performing a data cleaning operation on the data to be audited to obtain standard data comprises:
and carrying out abnormal data deletion processing, missing value filling processing and normalization processing on the data to be audited to obtain the standard data.
8. The data auditing device of claim 6, wherein the auditing the data table to be audited according to the rule dimensional table by using a preset scheduling script to obtain the auditing score of the data table to be audited comprises:
through traversal operation, each data auditing rule in the rule dimensional table is obtained, and according to the obtained data auditing rule, the data table to be audited is audited by using a corresponding SQL rule script to obtain an auditing result of each data auditing rule;
multiplying the audit result of each data audit rule by the preset weight of the data audit rule to obtain the weight fraction of each data audit rule; and
and obtaining the auditing score of the data table to be audited according to the weight score.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data auditing method of any of claims 1-5.
10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the data auditing method according to any one of claims 1-5.
CN202010353392.5A 2020-04-29 2020-04-29 Data auditing method and device and computer readable storage medium Pending CN111651439A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010353392.5A CN111651439A (en) 2020-04-29 2020-04-29 Data auditing method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010353392.5A CN111651439A (en) 2020-04-29 2020-04-29 Data auditing method and device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN111651439A true CN111651439A (en) 2020-09-11

Family

ID=72347974

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010353392.5A Pending CN111651439A (en) 2020-04-29 2020-04-29 Data auditing method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111651439A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112508526A (en) * 2020-12-15 2021-03-16 中国联合网络通信集团有限公司 Data auditing method and device
CN112785124A (en) * 2021-01-05 2021-05-11 科大国创云网科技有限公司 Method and system for auditing compliance of telecommunication service
CN113743749A (en) * 2021-08-20 2021-12-03 泰康保险集团股份有限公司 Medical institution inspection method and device and electronic equipment
CN117312314A (en) * 2023-09-26 2023-12-29 广州加之科技有限公司 Comprehensive auditing management method, device, terminal and medium for hospital business data

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112508526A (en) * 2020-12-15 2021-03-16 中国联合网络通信集团有限公司 Data auditing method and device
CN112508526B (en) * 2020-12-15 2024-04-19 中国联合网络通信集团有限公司 Data auditing method and device
CN112785124A (en) * 2021-01-05 2021-05-11 科大国创云网科技有限公司 Method and system for auditing compliance of telecommunication service
CN113743749A (en) * 2021-08-20 2021-12-03 泰康保险集团股份有限公司 Medical institution inspection method and device and electronic equipment
CN117312314A (en) * 2023-09-26 2023-12-29 广州加之科技有限公司 Comprehensive auditing management method, device, terminal and medium for hospital business data

Similar Documents

Publication Publication Date Title
CN111651439A (en) Data auditing method and device and computer readable storage medium
CN112115145A (en) Data acquisition method and device, electronic equipment and storage medium
CN111400363A (en) Index data processing method and device, electronic equipment and storage medium
CN112883042A (en) Data updating and displaying method and device, electronic equipment and storage medium
CN111694844A (en) Enterprise operation data analysis method and device based on configuration algorithm and electronic equipment
CN112579621A (en) Data display method and device, electronic equipment and computer storage medium
CN113868529A (en) Knowledge recommendation method and device, electronic equipment and readable storage medium
CN113890712A (en) Data transmission method and device, electronic equipment and readable storage medium
CN114844844A (en) Delay message processing method, device, equipment and storage medium
CN112464619B (en) Big data processing method, device and equipment and computer readable storage medium
CN114490586A (en) Medical information safe storage cooperation system based on block chain
CN113868528A (en) Information recommendation method and device, electronic equipment and readable storage medium
CN113837631A (en) Employee evaluation method and device, electronic device and readable storage medium
CN113157853A (en) Problem mining method and device, electronic equipment and storage medium
CN112948380A (en) Data storage method and device based on big data, electronic equipment and storage medium
CN112580079A (en) Authority configuration method and device, electronic equipment and readable storage medium
CN112256472A (en) Distributed data calling method and device, electronic equipment and storage medium
CN114417998A (en) Data feature mapping method, device, equipment and storage medium
CN113407322B (en) Multi-terminal task allocation method and device, electronic equipment and readable storage medium
CN114547011A (en) Data extraction method and device, electronic equipment and storage medium
CN114881324A (en) Road transportation optimization method, device and equipment based on fuzzy double boundary model
CN114490137A (en) Service data real-time statistical method and device, electronic equipment and readable storage medium
CN114186540A (en) Mail content intelligent filling method and device, electronic equipment and storage medium
CN113869455A (en) Unsupervised clustering method and device, electronic equipment and medium
CN113259446A (en) APP message sending method, device, equipment and medium based on user loyalty

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination