CN117743396A - Data quality detection method, device, equipment and storage medium - Google Patents
Data quality detection method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN117743396A CN117743396A CN202311669187.XA CN202311669187A CN117743396A CN 117743396 A CN117743396 A CN 117743396A CN 202311669187 A CN202311669187 A CN 202311669187A CN 117743396 A CN117743396 A CN 117743396A
- Authority
- CN
- China
- Prior art keywords
- rule
- quality detection
- data
- instruction
- detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 268
- 238000000034 method Methods 0.000 claims abstract description 41
- 238000012549 training Methods 0.000 claims description 47
- 238000012795 verification Methods 0.000 claims description 30
- 230000002159 abnormal effect Effects 0.000 claims description 22
- 238000004590 computer program Methods 0.000 claims description 16
- 238000012360 testing method Methods 0.000 claims description 13
- 238000012937 correction Methods 0.000 claims description 12
- 238000010276 construction Methods 0.000 claims description 10
- 238000007689 inspection Methods 0.000 claims 2
- 230000008569 process Effects 0.000 abstract description 12
- 238000004519 manufacturing process Methods 0.000 abstract description 5
- 238000004891 communication Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000005856 abnormality Effects 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 4
- 238000010606 normalization Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Stored Programmes (AREA)
Abstract
The invention discloses a data quality detection method, a device, equipment and a storage medium. Comprising the following steps: constructing a rule generation model and acquiring data to be detected and a detection instruction, wherein the rule generation model comprises a corresponding relation between rule description and a rule program; when the detection instruction is a rule newly-added instruction, generating a quality detection task according to a rule generation model and the detection instruction; and generating a quality detection report according to the quality detection task and the data to be detected. And constructing a rule generation model through a large model algorithm, and inputting rule description into a rule generation model generation rule program when a user wants to newly add the rule, so as to generate a quality detection task to finish corresponding data instruction detection and generate a quality detection report. The data quality detection based on the large model ensures the controllable program quality, accelerates the data quality detection process, improves the production efficiency, reduces the requirements on data quality detection personnel, and improves the data quality detection quality.
Description
Technical Field
The present invention relates to the field of quality detection technologies, and in particular, to a method, an apparatus, a device, and a storage medium for detecting data quality.
Background
Data as strategic assets are becoming the core competitiveness of enterprises, and the construction of data management systems with data assets as cores has become a clear development direction for enterprises. However, the data quality has different requirements and constraints under different service scenes, so that a large number of data quality detection rule programs need to be manually developed, the development amount is large, the requirements on data quality implementation personnel are high, and the program quality is not easy to guarantee.
The traditional quality detection platform in the prior art is basically based on rule detection, part of rules are built in, and most of rules are developed manually in later period. The mode has large program development amount and high requirement on personnel for detecting the data quality.
Disclosure of Invention
The invention provides a data quality detection method, a device, equipment and a storage medium, which can rapidly and accurately finish data quality detection work by training an automatic generation rule detection program based on an AI large model technology.
According to an aspect of the present invention, there is provided a data quality detection method, the method comprising:
constructing a rule generation model and acquiring data to be detected and a detection instruction, wherein the rule generation model comprises a corresponding relation between rule description and a rule program;
when the detection instruction is a rule newly-added instruction, generating a quality detection task according to a rule generation model and the detection instruction;
and generating a quality detection report according to the quality detection task and the data to be detected.
Optionally, constructing the rule generating model includes: acquiring a history rule description and a corresponding history rule program as training samples, and dividing the training samples into a training set and a testing set according to a specified proportion; building a model network structure, and performing iterative training on the model network structure through a training set to generate an initial model; inputting the test set into an initial model to obtain each test program rule, and obtaining verification conditions marked by a user based on each test program rule, wherein the verification conditions comprise accuracy and inaccuracy; and determining a construction rule generation model according to the verification condition.
Optionally, determining to build a rule generating model according to the verification condition includes: determining the accuracy of the initial model according to the verification condition; judging whether the accuracy is greater than a preset threshold value, if so, directly taking the initial model as a rule generation model; otherwise, acquiring correction parameters based on the accuracy, and generating a model according to the correction parameter generation rule.
Optionally, acquiring the data to be detected and the detection instruction includes: acquiring a user input form, and storing the user input form into a designated address to generate data to be detected; acquiring a rule base and displaying the rule base to a user, wherein the rule base comprises rule identifications and corresponding preset rule programs; acquiring a target rule identifier or a rule adding instruction input by a user based on a rule base; and generating a rule selection instruction according to the target rule identification, and taking the rule selection instruction or the rule newly-added instruction as a detection instruction.
Optionally, generating a quality detection task according to the rule generation model and the detection instruction includes: acquiring a target rule description input by a user based on a rule adding instruction; describing the target rule into an input rule generation model to obtain an output first target rule program; and acquiring a first adjustment instruction input by a user based on the first target rule program, and combining the first adjustment instruction and the first target rule program to generate a quality detection task.
Optionally, the method further comprises: when the detection instruction is a rule selection instruction, matching the target rule identification through a rule base to obtain a second target rule program corresponding to the target rule identification; and acquiring a second adjustment instruction input by a user based on a second target rule program, and combining the second adjustment instruction and the second target rule program to generate a quality detection task.
Optionally, generating a quality detection report according to the quality detection task and the data to be detected includes: detecting the data to be detected through a quality detection task to generate a quality detection result, wherein the quality detection result comprises normal data and abnormal data; combining the quality detection result with a first preset template to generate an overall detection report, and combining the abnormal data with a second preset template to generate an abnormal detection report; the overall detection report and the abnormal detection report are taken as quality detection reports.
Optionally, the method further comprises: acquiring a new detection rule input by a user, wherein the new detection rule comprises a new rule description and a corresponding new rule program; and performing incremental training on the rule generation model according to the newly added detection rule to generate an updated rule generation model.
According to another aspect of the present invention, there is provided a data quality detection apparatus comprising:
the model construction and data acquisition module is used for constructing a rule generation model and acquiring data to be detected and detection instructions, wherein the rule generation model comprises a corresponding relation between rule description and rule program;
the quality detection task generating module is used for generating a quality detection task according to the rule generating model and the detection instruction when the detection instruction is a rule newly-added instruction;
and the quality detection report generation module is used for generating a quality detection report according to the quality detection task and the data to be detected.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform a data quality detection method according to any one of the embodiments of the present invention.
According to the technical scheme provided by the embodiment of the invention, the rule generating model is constructed through the large model algorithm, and when a user wants to add the rule newly, the rule description can be input into the rule generating model to generate a rule program, so that a quality detection task is generated to finish corresponding data instruction detection and generate a quality detection report. The data quality detection based on the large model ensures the controllable program quality, accelerates the data quality detection process, improves the production efficiency, reduces the requirements on data quality detection personnel, and improves the data quality detection quality.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a data quality detection method according to a first embodiment of the present invention;
fig. 2 is a flowchart of another data quality detection method according to the second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a data quality detecting apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device implementing a data quality detection method according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a flowchart of a data quality detection method according to an embodiment of the present invention, where the method may be performed by a data quality detection device, and the data quality detection device may be implemented in hardware and/or software, and the data quality detection device may be configured in a computer controller. As shown in fig. 1, the method includes:
s110, constructing a rule generation model, and acquiring data to be detected and a detection instruction, wherein the rule generation model comprises a corresponding relation between rule description and a rule program.
The rule generation model is an AI large model established for quality detection rules, namely a model which is generated by collecting descriptions of a large number of existing data quality rules and taking program codes for realizing the rules as inputs for training. The rule generation model comprises a corresponding relation between rule description and rule program, namely, the rule description can be translated according to a designated language through the rule generation model, and then a quality detection rule program is output and used for carrying out data quality detection, wherein the data quality detection can comprise the dimensions of completeness, normalization, consistency, accuracy, relevance, uniqueness, logic and the like. The data to be detected is an object which is input by a user and needs to be subjected to data quality detection, the data quality detection mode selected by the user can be determined through a detection instruction, the detection instruction can be a rule selection instruction or a rule newly-added instruction, and the user is a person who performs data quality detection.
By way of example, integrity checking refers to the process of checking whether data information meets an intended goal, measuring which data is lost or which data is not available. Comprising the following steps: checking whether the necessary entry is filled in, checking whether the entity table and the field are missing, and checking whether the record numbers of the source table and the target table are consistent. The rule description corresponding to the integrity check may be a non-null check (find a field where the current entity table field is null). Normalization check refers to checking the length of a field according to the format of a formulated data standard document, and is used for measuring which data are not stored in a unified format. Comprising the following steps: checking whether the data meets the requirements of the data standard specification, and checking whether the field length exceeds the specified length. The rule description corresponding to the normalization check may be a length check (entity table field length gauge Fan Zhi); identity card verification (second generation identity card verification); telephone number verification (checking 11-digit cell phone number); date format check (whether "YYY-MM-DD HH: MM: SS" format is met).
Optionally, constructing the rule generating model includes: acquiring a history rule description and a corresponding history rule program as training samples, and dividing the training samples into a training set and a testing set according to a specified proportion; building a model network structure, and performing iterative training on the model network structure through a training set to generate an initial model; inputting the test set into an initial model to obtain each test program rule, and obtaining verification conditions marked by a user based on each test program rule, wherein the verification conditions comprise accuracy and inaccuracy; and determining a construction rule generation model according to the verification condition.
Specifically, when the mode is constructed, a large amount of history rule descriptions and corresponding history rule programs are adopted as training samples, the training samples are divided into training sets and test sets according to specified proportions, and a user can set the specified proportions according to training requirements, for example, 8:2. The training set is used for carrying out initial training on the model, and iterative training is carried out on the model network structure through the training set so as to generate an initial model; the test set is used for verifying and optimizing the model, the test set is input into the initial model to obtain each test program rule, then the manual verification and correction problem is carried out according to the test program rules, and the iteration is repeated until the generated program is correct and can carry out data quality detection. The user can manually verify and mark the test program rule output by the initial model, and the marked verification conditions comprise accuracy and inaccuracy.
Optionally, determining to build a rule generating model according to the verification condition includes: determining the accuracy of the initial model according to the verification condition; judging whether the accuracy is greater than a preset threshold value, if so, directly taking the initial model as a rule generation model; otherwise, acquiring correction parameters based on the accuracy, and generating a model according to the correction parameter generation rule.
Specifically, the accuracy of the initial model can be determined according to the verification condition of the user label, when the accuracy is greater than a preset threshold, the model training is completed, when the accuracy is smaller than or equal to the preset threshold, correction parameters are required to be obtained, iterative training is continued on the model, and when the accuracy reaches the preset threshold, the model can be indicated to detect the data quality.
Optionally, acquiring the data to be detected and the detection instruction includes: acquiring a user input form, and storing the user input form into a designated address to generate data to be detected; acquiring a rule base and displaying the rule base to a user, wherein the rule base comprises rule identifications and corresponding preset rule programs; acquiring a target rule identifier or a rule adding instruction input by a user based on a rule base; and generating a rule selection instruction according to the target rule identification, and taking the rule selection instruction or the rule newly-added instruction as a detection instruction.
Specifically, the user may select a private table or a shared table of the service system as the data object to be detected, that is, the user input table, and the controller stores the user input table in the designated address to generate the data to be detected, where the controller refers to a computer controller for detecting the quality of the data. In order to avoid repeated input of rule programs when a user detects data quality each time, a general rule program frequently used in history can be set in a rule base in advance, the user can set the rule base according to requirements, the set rule base comprises rule identifiers and corresponding preset rule programs, and the set rule base can be displayed to the user through a user terminal. When a user selects a data instruction detection rule, the user can refer to a rule base, if the data quality detection rule exists in the rule base, a target rule identifier can be input for direct use, a controller can generate a rule selection instruction according to the target rule identifier, if the data quality rule does not exist in the rule base, the user can input a rule newly-added instruction, and at the moment, a quality rule is generated through a rule generation model.
And S120, when the detection instruction is a rule newly-added instruction, generating a quality detection task according to the rule generation model and the detection instruction.
Optionally, generating a quality detection task according to the rule generation model and the detection instruction includes: acquiring a target rule description input by a user based on a rule adding instruction; describing the target rule into an input rule generation model to obtain an output first target rule program; and acquiring a first adjustment instruction input by a user based on the first target rule program, and combining the first adjustment instruction and the first target rule program to generate a quality detection task.
Specifically, when the user wants to add a rule, the user can continue to input a target rule description to the controller, at this time, the controller can output a corresponding first target rule program through the rule generation model, in addition, a data quality detection personnel can adjust the rule description or the adjustment program online until the program can complete corresponding data quality detection, package the program and load the program into a rule base for use, meanwhile, the user can set detection time according to detection requirements, namely, the controller can combine the first adjustment instruction with the first target rule program to generate a quality detection task, the quality detection task comprises the adjusted rule program and the detection time, the controller can execute the corresponding quality detection task according to the detection time and store the adjustment rule program in the quality detection task in the rule base, the user can call at any time when required, the repeated input process of the user is avoided, and the use of the user is facilitated.
And S130, generating a quality detection report according to the quality detection task and the data to be detected.
Optionally, generating a quality detection report according to the quality detection task and the data to be detected includes: detecting the data to be detected through a quality detection task to generate a quality detection result, wherein the quality detection result comprises normal data and abnormal data; combining the quality detection result with a first preset template to generate an overall detection report, and combining the abnormal data with a second preset template to generate an abnormal detection report; the overall detection report and the abnormal detection report are taken as quality detection reports.
Specifically, the quality detection task is a scheduling task, and the quality detection result can be generated by detecting the data to be detected according to the corresponding data quality detection rule executed by the scheduling task, wherein the quality detection result comprises normal data and abnormal data. And then generating a data quality detection report according to the data quality detection result, wherein the quality detection report comprises an abnormal detection report and an integral detection report, the integral detection report is used for a user to know the integral condition of quality detection, the integral detection report can be generated by combining the quality detection result with a first preset template, and each detection method corresponds to a problem data result in the integral detection report. The abnormality detection report comprises abnormality data in the quality detection process, and the abnormality detection report can be generated by combining the abnormality data with a second preset template. The quality detection report can be browsed and downloaded by the user.
Optionally, the method further comprises: acquiring a new detection rule input by a user, wherein the new detection rule comprises a new rule description and a corresponding new rule program; and performing incremental training on the rule generation model according to the newly added detection rule to generate an updated rule generation model.
Specifically, when the user needs to optimize the model, incremental training can be performed on the rule generation model, namely when a large number of manual online adjustment generation programs are needed for the generation program of the large model of the new type of quality detection rule AI, the model can be used after the incremental training, at the moment, the new detection rule which can be input by the user comprises the new rule description and the corresponding new rule program, at the moment, the controller performs incremental training on the rule generation model according to the new detection rule to generate an update rule generation model, so that the accuracy of the model is further improved, and the quality of data quality detection is improved.
According to the technical scheme provided by the embodiment of the invention, the rule generating model is constructed through the large model algorithm, and when a user wants to add the rule newly, the rule description can be input into the rule generating model to generate a rule program, so that a quality detection task is generated to finish corresponding data instruction detection and generate a quality detection report. The data quality detection based on the large model ensures the controllable program quality, accelerates the data quality detection process, improves the production efficiency, reduces the requirements on data quality detection personnel, and improves the data quality detection quality.
Example two
Fig. 2 is a flowchart of a data quality detection method according to a second embodiment of the present invention, where a process of generating a quality detection task when a detection instruction is a rule selection instruction is added on the basis of the first embodiment. As shown in fig. 2, the method includes:
s210, constructing a rule generation model, and acquiring data to be detected and a detection instruction, wherein the rule generation model comprises a corresponding relation between rule description and a rule program.
Optionally, constructing the rule generating model includes: acquiring a history rule description and a corresponding history rule program as training samples, and dividing the training samples into a training set and a testing set according to a specified proportion; building a model network structure, and performing iterative training on the model network structure through a training set to generate an initial model; inputting the test set into an initial model to obtain each test program rule, and obtaining verification conditions marked by a user based on each test program rule, wherein the verification conditions comprise accuracy and inaccuracy; and determining a construction rule generation model according to the verification condition.
Optionally, determining to build a rule generating model according to the verification condition includes: determining the accuracy of the initial model according to the verification condition; judging whether the accuracy is greater than a preset threshold value, if so, directly taking the initial model as a rule generation model; otherwise, acquiring correction parameters based on the accuracy, and generating a model according to the correction parameter generation rule.
Optionally, acquiring the data to be detected and the detection instruction includes: acquiring a user input form, and storing the user input form into a designated address to generate data to be detected; acquiring a rule base and displaying the rule base to a user, wherein the rule base comprises rule identifications and corresponding preset rule programs; acquiring a target rule identifier or a rule adding instruction input by a user based on a rule base; and generating a rule selection instruction according to the target rule identification, and taking the rule selection instruction or the rule newly-added instruction as a detection instruction.
And S220, when the detection instruction is a rule selection instruction, matching the target rule identification through a rule base to acquire a second target rule program corresponding to the target rule identification.
S230, acquiring a second adjustment instruction input by a user based on a second target rule program, and combining the second adjustment instruction and the second target rule program to generate a quality detection task.
Specifically, when the user selects the data quality detection rule, the user can directly use the data quality detection rule when the data quality detection rule exists in the system, and when the data quality detection rule is used, the user only needs to input a rule selection instruction, wherein the rule selection instruction comprises a target rule identifier, and the target rule identifier can be matched through a rule base preset by the user so as to obtain a second target rule program which is wanted to be selected by the user and corresponds to the target rule identifier. The user can edit and change the existing rule programs in the rule base, the content to be operated can be input into the controller in the mode of adjusting instructions by the user, at the moment, the controller can acquire a second adjusting instruction input by the user based on a second target rule program, and then the second adjusting instruction and the second target rule program are combined to generate the quality detection task. The quality detection task can meet the quality detection requirement of the user.
S240, generating a quality detection report according to the quality detection task and the data to be detected.
Optionally, generating a quality detection report according to the quality detection task and the data to be detected includes: detecting the data to be detected through a quality detection task to generate a quality detection result, wherein the quality detection result comprises normal data and abnormal data; combining the quality detection result with a first preset template to generate an overall detection report, and combining the abnormal data with a second preset template to generate an abnormal detection report; the overall detection report and the abnormal detection report are taken as quality detection reports.
Optionally, the method further comprises: acquiring a new detection rule input by a user, wherein the new detection rule comprises a new rule description and a corresponding new rule program; and performing incremental training on the rule generation model according to the newly added detection rule to generate an updated rule generation model.
According to the technical scheme provided by the embodiment of the invention, the rule generating model is constructed through the large model algorithm, and when a user wants to add the rule newly, the rule description can be input into the rule generating model to generate a rule program, so that a quality detection task is generated to finish corresponding data instruction detection and generate a quality detection report. The data quality detection based on the large model ensures the controllable program quality, accelerates the data quality detection process, improves the production efficiency, reduces the requirements on data quality detection personnel, and improves the data quality detection quality.
Example III
Fig. 3 is a schematic structural diagram of a data quality detecting device according to a third embodiment of the present invention.
As shown in fig. 3, the apparatus includes:
the model construction and data acquisition module 310 is configured to construct a rule generation model, and acquire data to be detected and a detection instruction, where the rule generation model includes a corresponding relationship between rule descriptions and rule programs;
the quality detection task generating module 320 is configured to generate a quality detection task according to the rule generating model and the detection instruction when the detection instruction is a rule adding instruction;
the quality detection report generating module 330 is configured to generate a quality detection report according to the quality detection task and the data to be detected.
Optionally, the model building and data obtaining module 310 specifically includes: the training sample generation and division unit is used for: acquiring a history rule description and a corresponding history rule program as training samples, and dividing the training samples into a training set and a testing set according to a specified proportion; an initial model generation unit for: building a model network structure, and performing iterative training on the model network structure through a training set to generate an initial model; an authentication condition acquisition unit configured to: inputting the test set into an initial model to obtain each test program rule, and obtaining verification conditions marked by a user based on each test program rule, wherein the verification conditions comprise accuracy and inaccuracy; a rule generation model construction unit for: and determining a construction rule generation model according to the verification condition.
Optionally, the rule generating model building unit is specifically configured to: determining the accuracy of the initial model according to the verification condition; judging whether the accuracy is greater than a preset threshold value, if so, directly taking the initial model as a rule generation model; otherwise, acquiring correction parameters based on the accuracy, and generating a model according to the correction parameter generation rule.
Optionally, the model building and data obtaining module 310 specifically includes: the detection data and instruction acquisition unit is used for: acquiring a user input form, and storing the user input form into a designated address to generate data to be detected; acquiring a rule base and displaying the rule base to a user, wherein the rule base comprises rule identifications and corresponding preset rule programs; acquiring a target rule identifier or a rule adding instruction input by a user based on a rule base; and generating a rule selection instruction according to the target rule identification, and taking the rule selection instruction or the rule newly-added instruction as a detection instruction.
Optionally, the quality detection task generating module 320 specifically includes: a rule adding task generating unit, configured to: acquiring a target rule description input by a user based on a rule adding instruction; describing the target rule into an input rule generation model to obtain an output first target rule program; and acquiring a first adjustment instruction input by a user based on the first target rule program, and combining the first adjustment instruction and the first target rule program to generate a quality detection task.
Optionally, the apparatus further comprises: the rule selection task generation module is used for: when the detection instruction is a rule selection instruction, matching the target rule identification through a rule base to obtain a second target rule program corresponding to the target rule identification; and acquiring a second adjustment instruction input by a user based on a second target rule program, and combining the second adjustment instruction and the second target rule program to generate a quality detection task.
Optionally, the quality detection report generating module 330 is specifically configured to: detecting the data to be detected through a quality detection task to generate a quality detection result, wherein the quality detection result comprises normal data and abnormal data; combining the quality detection result with a first preset template to generate an overall detection report, and combining the abnormal data with a second preset template to generate an abnormal detection report; the overall detection report and the abnormal detection report are taken as quality detection reports.
Optionally, the apparatus further comprises: model increment training module for: acquiring a new detection rule input by a user, wherein the new detection rule comprises a new rule description and a corresponding new rule program; and performing incremental training on the rule generation model according to the newly added detection rule to generate an updated rule generation model.
According to the technical scheme provided by the embodiment of the invention, the rule generating model is constructed through the large model algorithm, and when a user wants to add the rule newly, the rule description can be input into the rule generating model to generate a rule program, so that a quality detection task is generated to finish corresponding data instruction detection and generate a quality detection report. The data quality detection based on the large model ensures the controllable program quality, accelerates the data quality detection process, improves the production efficiency, reduces the requirements on data quality detection personnel, and improves the data quality detection quality.
The data quality detection device provided by the embodiment of the invention can execute the data quality detection method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example IV
Fig. 4 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 4, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as a data quality detection method. Namely: constructing a rule generation model and acquiring data to be detected and a detection instruction, wherein the rule generation model comprises a corresponding relation between rule description and a rule program; when the detection instruction is a rule newly-added instruction, generating a quality detection task according to a rule generation model and the detection instruction; and generating a quality detection report according to the quality detection task and the data to be detected.
In some embodiments, a data quality detection method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of a data quality detection method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform a data quality detection method in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.
Claims (10)
1. A method for detecting data quality, comprising:
constructing a rule generation model and acquiring data to be detected and a detection instruction, wherein the rule generation model comprises a corresponding relation between rule description and a rule program;
when the detection instruction is a rule newly-added instruction, generating a model and a quality detection task according to the rule;
and generating a quality detection report according to the quality detection task and the data to be detected.
2. The method of claim 1, wherein the building a rule generation model comprises:
acquiring a history rule description and a corresponding history rule program as training samples, and dividing the training samples into a training set and a testing set according to a specified proportion;
building a model network structure, and performing iterative training on the model network structure through the training set to generate an initial model;
inputting the test set into the initial model to obtain each test program rule, and obtaining verification conditions of user marks based on each test program rule, wherein the verification conditions comprise accuracy and inaccuracy;
and determining the construction rule generation model according to the verification condition.
3. The method of claim 2, wherein said determining the build rule generation model from the verification scenario comprises:
determining the accuracy of the initial model according to the verification condition;
judging whether the accuracy is greater than a preset threshold, if so, directly taking the initial model as the rule generation model;
otherwise, acquiring correction parameters based on the accuracy, and generating the rule generation model according to the correction parameters.
4. The method of claim 1, wherein the acquiring the data to be detected and the detection instruction comprises:
acquiring a user input form, and storing the user input form into a designated address to generate the data to be detected;
acquiring a rule base and displaying the rule base to a user, wherein the rule base comprises rule identifiers and corresponding preset rule programs;
acquiring a target rule identifier or a rule adding instruction input by a user based on the rule base;
and generating a rule selection instruction according to the target rule identification, and taking the rule selection instruction or the rule newly-added instruction as the detection instruction.
5. The method of claim 1, wherein the generating a quality inspection task from the rule generation model and the inspection instructions comprises:
acquiring a target rule description input by a user based on the rule adding instruction;
inputting the target rule description into the rule generation model to obtain an output first target rule program;
and acquiring a first adjustment instruction input by a user based on the first target rule program, and combining the first adjustment instruction and the first target rule program to generate the quality detection task.
6. The method according to claim 4, wherein the method further comprises:
when the detection instruction is a rule selection instruction, matching the target rule identification through the rule base to obtain a second target rule program corresponding to the target rule identification;
and acquiring a second adjustment instruction input by a user based on the second target rule program, and combining the second adjustment instruction and the second target rule program to generate the quality detection task.
7. The method of claim 6, wherein generating a quality detection report based on the quality detection task and the data to be detected comprises:
detecting the data to be detected through the quality detection task to generate a quality detection result, wherein the quality detection result comprises normal data and abnormal data;
combining the quality detection result with a first preset template to generate an overall detection report, and combining the abnormal data with a second preset template to generate an abnormal detection report;
and taking the whole detection report and the abnormal detection report as the quality detection report.
8. The method according to claim 1, wherein the method further comprises:
acquiring a new detection rule input by a user, wherein the new detection rule comprises a new rule description and a corresponding new rule program;
and performing incremental training on the rule generation model according to the newly added detection rule to generate an updated rule generation model.
9. A data quality detection apparatus, comprising:
the model construction and data acquisition module is used for constructing a rule generation model and acquiring data to be detected and detection instructions, wherein the rule generation model comprises a corresponding relation between rule description and rule program;
the quality detection task generating module is used for generating a quality detection task according to the rule generating model and the detection instruction when the detection instruction is a rule adding instruction;
and the quality detection report generation module is used for generating a quality detection report according to the quality detection task and the data to be detected.
10. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311669187.XA CN117743396A (en) | 2023-12-06 | 2023-12-06 | Data quality detection method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311669187.XA CN117743396A (en) | 2023-12-06 | 2023-12-06 | Data quality detection method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117743396A true CN117743396A (en) | 2024-03-22 |
Family
ID=90253604
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311669187.XA Pending CN117743396A (en) | 2023-12-06 | 2023-12-06 | Data quality detection method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117743396A (en) |
-
2023
- 2023-12-06 CN CN202311669187.XA patent/CN117743396A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115422617A (en) | Frame image size measuring method, device and medium based on CAD | |
CN117724980A (en) | Method and device for testing software framework performance, electronic equipment and storage medium | |
CN116303013A (en) | Source code analysis method, device, electronic equipment and storage medium | |
CN116486125A (en) | Equipment detection method, device, equipment and medium | |
CN116228301A (en) | Method, device, equipment and medium for determining target user | |
CN117743396A (en) | Data quality detection method, device, equipment and storage medium | |
CN114443493A (en) | Test case generation method and device, electronic equipment and storage medium | |
CN114741291A (en) | Method, device, equipment and medium for automatically submitting vulnerability information | |
CN117150215B (en) | Assessment result determining method and device, electronic equipment and storage medium | |
CN117492822B (en) | Change contrast method, device, electronic equipment and storage medium | |
CN115495380A (en) | Test case generation method and device, electronic equipment and storage medium | |
CN117331924A (en) | Data model matching degree checking method, device, equipment and storage medium | |
CN116644586A (en) | Construction target detection method, device, equipment and medium | |
CN116303071A (en) | Interface testing method and device, electronic equipment and storage medium | |
CN118467579A (en) | Database structure processing method, device, equipment and storage medium | |
CN117455684A (en) | Data processing method, device, electronic equipment, storage medium and product | |
CN115567624A (en) | Message processing method and device, electronic equipment and medium | |
CN117312168A (en) | Interface test case generation method, device, equipment and storage medium | |
CN116860652A (en) | Method and device for evaluating software quality, electronic equipment and storage medium | |
CN117370399A (en) | Form comparison method and device, target database and storage medium | |
CN117406562A (en) | Current compensation method and device for exposure machine, electronic equipment and storage medium | |
CN116089499A (en) | Data statistics method, device and medium based on kafka data volume | |
CN117667693A (en) | Automatic message testing method, device, equipment and storage medium | |
CN116894645A (en) | Electric power engineering acceptance method and device, electronic equipment and medium | |
CN117472751A (en) | Vehicle system function analysis method, device, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Country or region after: China Address after: No. 1599, Chuangxin Road, Songbei District, Harbin, Heilongjiang Province, 150028 Applicant after: Harbin Sihe Information Technology Co.,Ltd. Address before: No. 1599, Chuangxin Road, Songbei District, Harbin, Heilongjiang Province, 150028 Applicant before: HARBIN INSTITUTE OF TECHNOLOGY SOFTWARE ENGINEERING Co.,Ltd. Country or region before: China |