CN113434485A - Data quality health degree analysis method and system based on multidimensional analysis technology - Google Patents

Data quality health degree analysis method and system based on multidimensional analysis technology Download PDF

Info

Publication number
CN113434485A
CN113434485A CN202110725686.0A CN202110725686A CN113434485A CN 113434485 A CN113434485 A CN 113434485A CN 202110725686 A CN202110725686 A CN 202110725686A CN 113434485 A CN113434485 A CN 113434485A
Authority
CN
China
Prior art keywords
target
service data
preset
evaluation
data sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110725686.0A
Other languages
Chinese (zh)
Other versions
CN113434485B (en
Inventor
金震
王兆君
康进港
李明
曹朝辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing SunwayWorld Science and Technology Co Ltd
Original Assignee
Beijing SunwayWorld Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing SunwayWorld Science and Technology Co Ltd filed Critical Beijing SunwayWorld Science and Technology Co Ltd
Priority to CN202110725686.0A priority Critical patent/CN113434485B/en
Publication of CN113434485A publication Critical patent/CN113434485A/en
Application granted granted Critical
Publication of CN113434485B publication Critical patent/CN113434485B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models

Abstract

The invention discloses a data quality health degree analysis method and a data quality health degree analysis system based on a multidimensional analysis technology, wherein the data quality health degree analysis method comprises the following steps: the method comprises the steps of obtaining a first number of target business data samples, constructing a data analysis model by utilizing a preset similarity contrast rule, a preset integrity evaluation rule, a preset uniqueness evaluation rule and a preset relevance evaluation rule, receiving a target evaluation type selected by a target user, analyzing and evaluating the first number of target business data samples by utilizing the data analysis model according to the target evaluation type, generating a quality and health degree analysis report, and displaying the quality and health degree analysis report in a graphical format. The method and the device can avoid the situation that manpower is wasted due to manual investigation, can accurately analyze the data quality of the service data sample comprehensively and efficiently, and can timely eliminate useless data to avoid the situation that the useless data occupies the data, so that a user can avoid the interference of the useless data, and the use experience of the user is improved.

Description

Data quality health degree analysis method and system based on multidimensional analysis technology
Technical Field
The invention relates to the technical field of data processing, in particular to a data quality health degree analysis method and system based on a multidimensional analysis technology.
Background
In the normal operation process of enterprise data standardization, value feedback to business is expected to be managed through data standardization, and the importance of data quality is not excessive no matter how much emphasis is placed. In the normal operation process of enterprise standardized data, the generation of low-quality data is inevitable, and the quality of a data standard coding library is influenced by large-batch data initialization, problem diffusion caused by unprocessed historical data and low-quality data generated by emergency service. The method is a measure which can be organized and developed by enterprises, so that the enterprise data quality management is correctly understood, low-quality data is not generated, the low-quality data is actually a theoretical target, the low-quality data is timely found and effectively processed, and the high health degree of a standard coding library is controlled in the actual operation of the enterprise data quality management through scientific, effective and professional management and technical support, so that the generation rate and the existence rate of the low-quality data are reduced and controlled, the low-quality data is timely found and effectively processed, but the high health degree of the standard coding library is controlled, but the quality assurance is manually performed due to the factors such as huge data quantity of the data coding library, complexity of data information, high professional requirements and the like, the standard data coding library is detected through a professional quality management tool, and missing data and repeated data which need to be removed are found and processed, The noise data to be removed and the abnormal (but real) data to be processed are analyzed through the data health degree provided by a specialized data quality management platform, a basis is provided for data cleaning and treatment, and then the data cleaning platform is used for data cleaning and treatment, so that the data quality such as the integrity, the uniqueness, the consistency, the accuracy, the legality, the timeliness and the like of the data is ensured. The data quality management method in the prior art cannot analyze the data quality comprehensively and efficiently, and further causes incomplete cleaning of useless data, thereby occupying a data memory, influencing user call data and seriously influencing the use experience of a user.
Disclosure of Invention
Aiming at the problems shown above, the invention provides a data quality health degree analysis method and system based on a multidimensional analysis technology, which are used for solving the problems that the data quality management method in the prior art mentioned in the background technology cannot carry out comprehensive and efficient analysis on the data quality, and further the cleaning of useless data is incomplete, so that the data memory is occupied, the data calling of a user is influenced, and the use experience of the user is seriously influenced.
A data quality health degree analysis method based on a multidimensional analysis technology comprises the following steps:
acquiring a first number of target service data samples;
constructing a data analysis model by utilizing a preset similarity comparison rule, a preset integrity evaluation rule, a preset uniqueness evaluation rule and a preset relevance evaluation rule;
receiving a target evaluation type selected by a target user, and analyzing and evaluating the first number of target service data samples by using the data analysis model according to the target evaluation type to generate a quality health degree analysis report;
displaying the quality health degree analysis report in a graphical format;
wherein the target evaluation type is: one or more of a similarity assessment, an integrity assessment, a uniqueness assessment, and an association assessment.
Preferably, before obtaining the first number of target traffic data samples, the method further includes:
determining a first number of data samples according to a preset condition;
determining a state function based on the first number;
determining a screening condition according to the state function, and screening a first number of initial service data samples meeting the screening condition from a second number of initial service data samples, wherein the second number is greater than the first number;
determining the first number of initial traffic data samples as the first number of target traffic data samples.
Preferably, the constructing a data analysis model by using a preset similarity comparison rule, a preset integrity evaluation rule, a preset uniqueness evaluation rule and a preset association evaluation rule includes:
constructing an initial network model;
setting four network nodes in the initial network model;
respectively corresponding the preset similarity comparison rule, the preset integrity evaluation rule, the preset uniqueness evaluation rule and the preset relevance evaluation rule to the four network nodes;
after the correspondence is finished, detecting the stability of each network node;
and when the stability of each network node is qualified, confirming the convergence of the initial network model, and obtaining the data analysis model.
Preferably, before receiving a target evaluation type selected by a target user, performing analysis and evaluation on the first number of target business data samples by using the data analysis model according to the target evaluation type, and generating a quality health degree analysis report, the method further includes: and performing authenticity detection on the first number of target service data samples, wherein the steps comprise:
segmenting each target service data sample to obtain a plurality of data segments;
performing functional data processing on each data segment of each target service data sample to obtain a hash value of each data segment;
acquiring a source weighted value of each target business data sample according to the plurality of hash values of each target business data sample;
calculating the target truth of each target business data sample by utilizing a preset truth algorithm according to the plurality of hash values and the source weighted value of each target business data sample;
deleting the first target service data sample with the target truth smaller than the preset truth, and reserving a second target service data sample with the target truth larger than or equal to the preset truth;
and counting the number of the second target service data samples to obtain a third number of second target service data samples.
Preferably, before receiving a target evaluation type selected by a target user, performing analysis and evaluation on the first number of target business data samples by using the data analysis model according to the target evaluation type, and generating a quality health degree analysis report, the method further includes: inspecting the data analysis model, comprising the steps of:
acquiring a fourth number of preset service data samples;
predetermining the first integrity of each preset service data sample, the first similarity of each preset service data sample and other preset service data samples, the first uniqueness of each preset service data sample and the first relevance of each preset service data sample and other preset service data samples, and obtaining a first determination result;
inputting the fourth number of preset service samples into the data analysis model, receiving a second integrity of each preset service data sample, a second similarity of each preset service data sample and other preset service data samples, a second uniqueness of each preset service data sample and a second relevance of each preset service data sample and other preset service data samples output by the data analysis model, and obtaining a second determination result;
and confirming whether the first determination result is the same as the second determination result, if so, confirming that the data analysis model is accurate, otherwise, confirming that the data output by the data analysis model has deviation, and sending a prompt for repairing the data analysis model to a target user.
Preferably, the receiving a target evaluation type selected by a target user, and performing analysis and evaluation on the first number of target service data samples by using the data analysis model according to the target evaluation type to generate a quality health degree analysis report includes:
recommending four preset evaluation types to the target user;
receiving a target evaluation type selected by the user from four preset evaluation types;
when the target evaluation type is similarity evaluation, extracting the classified codes and metadata of each target business data sample in the first number of target business data samples, and performing similarity evaluation on the classified codes and metadata of each target business data sample and the classified codes and metadata of other target business data samples by using a similarity algorithm based on lexical analysis and syntactic analysis to generate a first evaluation result;
when the target evaluation type is integrity evaluation, performing integrity process detection on the classification code and the metadata of each target service data sample, wherein the integrity process detection comprises the following steps: whether the data is empty or not, detecting the data length, detecting the data enumeration value and detecting the data consistency to generate a second evaluation result;
when the target evaluation type is uniqueness evaluation, detecting whether the classification code and the metadata of each target business data sample are the only one, if so, confirming that a first number of target business data samples pass the uniqueness detection, otherwise, extracting the repeated target classification code and the target metadata and the defective target business data samples to which the target classification code and the target metadata belong, and generating a third evaluation result;
when the target evaluation type is relevance evaluation, performing relevance evaluation on the classified codes and metadata of each target service data sample and the classified codes and metadata of other target service data samples to obtain a fourth evaluation result;
and performing comprehensive analysis by using the first evaluation result, the second evaluation result, the third evaluation result and the fourth evaluation result to obtain the quality health degree analysis report.
Preferably, the quality health analysis report is displayed in a graphical format, and the method includes:
drawing and displaying the first evaluation result, the second evaluation result, the third evaluation result and the fourth evaluation result in a first radar chart format respectively;
and drawing and displaying the quality health degree analysis report subjected to comprehensive analysis by using the first evaluation result, the second evaluation result, the third evaluation result and the fourth evaluation result in a format of a second radar map.
Preferably, after the correspondence is completed, detecting the stability of each network node includes:
acquiring the number of times of heartbeat detection overtime of each node within a preset time length;
sequencing the four network nodes according to the overtime times of heartbeat detection in a sequence from the maximum to the minimum to obtain a sequencing result;
determining the network connection state of each network node in the sequencing result;
when the network connection state of each network node is smooth, judging that the working states of the four network nodes are normal, when the network connection state of any one network node is disconnected, determining a first target network node of the disconnected network, judging that the working state of the first target network node is abnormal, generating an abnormal report for displaying, and judging that the stability of the first target network node is poor;
when the working state of each network node is judged to be normal, each network node is used as an initiating node;
sending the first resource occupation state of each initiating node to the adjacent network nodes;
forcibly closing the first resource occupation state of each initiating node and confirming whether the first resource occupation state received by the adjacent network node is changed;
if the change occurs, detecting whether a second resource occupation state of the adjacent network node is the same as a first resource occupation state, if so, determining that the adjacent network node is abnormal, and judging that the stability of the adjacent network node is poor, otherwise, determining that the network node is normal;
when the network nodes are confirmed to be normal, the four network nodes are started simultaneously, whether interference conditions occur among the network nodes is confirmed, if yes, second target network nodes with the interference conditions occur are marked, the stability of the second target network nodes is judged to be poor, and otherwise, the network nodes are confirmed to be normal in working mode;
detecting the difference between the target data output by each network node and the preset data, if the target data output by each network node is the same as the preset data, confirming that the precision of the output data of the network node is normal, judging that the stability of each network node is excellent, if the target data output by any network node is different from the preset data, extracting a third target network node with the output target data different from the preset data, and judging that the stability of the third target network node is poor.
Preferably, after obtaining the first number of target service data samples, before constructing the data analysis model by using the preset similarity contrast rule, the preset integrity evaluation rule, the preset uniqueness evaluation rule, and the preset association evaluation rule, the method further includes: and performing qualification detection on the first number of target service data samples, wherein the method specifically comprises the following steps:
acquiring a security coefficient of each target service data sample;
calculating a target security index of each target service data sample according to the confidentiality coefficient of each target service data sample:
Figure BDA0003138540800000061
wherein, PiTarget safety index, S, expressed as ith target business data sampleiDenoted as the degree of freedom of the ith target business data sample, and Γ () denoted as gammaA ma function, where pi is a circumferential ratio, ln is a natural logarithm, and XiA privacy coefficient expressed as an ith target traffic data sample;
scanning the sample data content of each target service data sample, and determining the integrity and the truth of each target service data sample according to the sample data content of each target service data sample;
calculating a target qualified coefficient of each service data sample by using the target security index, the integrity and the truth of each target service data sample:
Figure BDA0003138540800000071
wherein, thetai1Expressed as a weight value, Q, of the target safety index of the ith target traffic data sample in the calculated eligibility coefficient of the ith target traffic data sampleiExpressed as the integrity, θ, of the ith target traffic data samplei2Expressed as the weighted value of the integrity of the ith target service data sample in the calculated qualification coefficient of the ith target service data sample, UiExpressed as the degree of truth, θ, of the ith target traffic data samplei3Expressing the weight value of the truth of the ith target service data sample in the qualified coefficient of the calculated ith target service data sample, wherein N is expressed as a first number, and M is expressed as a first quantityiThe value of the score value is [0.5,1 ] which is marked for the ith target service data sample by using a preset scoring rule]And a is an error factor in the calculation process and takes the value of [0.05, 0.1%],WiA target qualification coefficient expressed as an ith target traffic data sample;
determining whether the target qualified coefficient of each target service data sample is greater than or equal to a preset qualified coefficient, and carrying out quantity statistics on a third target service data sample of which the target qualified coefficient is smaller than the preset qualified coefficient;
confirming that the third target service data samples with the target number cannot pass qualified detection, and generating a detection report;
and displaying the detection report.
A data quality health analysis system based on multidimensional analysis techniques, the system comprising:
the acquisition module is used for acquiring a first number of target service data samples;
the construction module is used for constructing a data analysis model by utilizing a preset similarity contrast rule, a preset integrity evaluation rule, a preset uniqueness evaluation rule and a preset relevance evaluation rule;
the generation module is used for receiving a target evaluation type selected by a target user, analyzing and evaluating the first number of target service data samples by using the data analysis model according to the target evaluation type and generating a quality health degree analysis report;
the display module is used for displaying the quality health degree analysis report in a graphical format;
wherein the target evaluation type is: one or more of a similarity assessment, an integrity assessment, a uniqueness assessment, and an association assessment.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
FIG. 1 is a flowchart illustrating a method for analyzing data quality and health based on multidimensional analysis;
FIG. 2 is another flowchart of a method for analyzing data quality and health based on multidimensional analysis provided in the present invention;
FIG. 3 is a flowchart illustrating a method for analyzing data quality and health based on multidimensional analysis;
FIG. 4 is a screenshot of a workflow of a data quality and health analysis platform based on a multidimensional analysis technique according to the present invention;
FIG. 5 is a functional diagram of a data quality health analysis platform based on a multidimensional analysis technique according to the present invention;
FIG. 6 is a data quality health analysis dimension screenshot of a data quality health analysis platform based on a multidimensional analysis technique according to the present invention;
fig. 7 is a schematic structural diagram of a data quality health degree analysis system based on a multidimensional analysis technique according to the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
In the normal operation process of enterprise data standardization, value feedback to business is expected to be managed through data standardization, and the importance of data quality is not excessive no matter how much emphasis is placed. In the normal operation process of enterprise standardized data, the generation of low-quality data is inevitable, and the quality of a data standard coding library is influenced by large-batch data initialization, problem diffusion caused by unprocessed historical data and low-quality data generated by emergency service. The method is a measure which can be organized and developed by enterprises, so that the enterprise data quality management is correctly understood, low-quality data is not generated, the low-quality data is actually a theoretical target, the low-quality data is timely found and effectively processed, and the high health degree of a standard coding library is controlled in the actual operation of the enterprise data quality management through scientific, effective and professional management and technical support, so that the generation rate and the existence rate of the low-quality data are reduced and controlled, the low-quality data is timely found and effectively processed, but the high health degree of the standard coding library is controlled, but the quality assurance is manually performed due to the factors such as huge data quantity of the data coding library, complexity of data information, high professional requirements and the like, the standard data coding library is detected through a professional quality management tool, and missing data and repeated data which need to be removed are found and processed, The noise data to be removed and the abnormal (but real) data to be processed are analyzed through the data health degree provided by a specialized data quality management platform, a basis is provided for data cleaning and treatment, and then the data cleaning platform is used for data cleaning and treatment, so that the data quality such as the integrity, the uniqueness, the consistency, the accuracy, the legality, the timeliness and the like of the data is ensured. The data quality management method in the prior art cannot analyze the data quality comprehensively and efficiently, and further causes incomplete cleaning of useless data, thereby occupying a data memory, influencing user call data and seriously influencing the use experience of a user. In order to solve the above problem, the present embodiment discloses a data quality health degree analysis method based on a multidimensional analysis technology.
A data quality health degree analysis method based on a multidimensional analysis technology is shown in FIG. 1, and comprises the following steps:
step S101, obtaining a first number of target service data samples;
step S102, constructing a data analysis model by utilizing a preset similarity contrast rule, a preset integrity evaluation rule, a preset uniqueness evaluation rule and a preset relevance evaluation rule;
step S103, receiving a target evaluation type selected by a target user, and analyzing and evaluating the first number of target service data samples by using the data analysis model according to the target evaluation type to generate a quality health degree analysis report;
step S104, displaying the quality health degree analysis report in a graphical format;
wherein the target evaluation type is: one or more of a similarity assessment, an integrity assessment, a uniqueness assessment, and an association assessment.
The working principle of the technical scheme is as follows: the method comprises the steps of obtaining a first number of target business data samples, constructing a data analysis model by utilizing a preset similarity contrast rule, a preset integrity evaluation rule, a preset uniqueness evaluation rule and a preset relevance evaluation rule, receiving a target evaluation type selected by a target user, analyzing and evaluating the first number of target business data samples by utilizing the data analysis model according to the target evaluation type, generating a quality and health degree analysis report, and displaying the quality and health degree analysis report in a graphical format.
The beneficial effects of the above technical scheme are: the quality health degree analysis of the integrity of the service data sample by utilizing the data analysis model can avoid the occurrence of manpower waste caused by manual investigation and can also accurately carry out comprehensive and efficient analysis on the data quality of the service data sample, and can timely eliminate useless data to avoid the occurrence of data occupation of the useless data, so that a user can avoid the interference of the useless data, the use experience of the user is improved, further, the user can pertinently select the analysis angle of the service data sample, the experience of the user is further improved, the final data quality health degree analysis result is more accurate and correct due to the single angle analysis, and the stability is improved.
In one embodiment, as shown in fig. 2, before obtaining the first number of target traffic data samples, the method further comprises:
step S201, determining a first number of data samples according to preset conditions;
step S202, determining a state function based on the first number;
step S203, determining a screening condition according to the state function, and screening a first number of initial service data samples meeting the screening condition from a second number of initial service data samples, wherein the second number is greater than the first number;
step S204, determining the first number of initial service data samples as the first number of target service data samples.
The beneficial effects of the above technical scheme are: the first number of target business data samples meeting the condition can be screened out reasonably from the customer by determining the screening condition by using the state function, so that the selected samples are more practical and representative, the accuracy of the data is ensured, and good samples are provided for subsequent data quality and health degree analysis.
In one embodiment, as shown in fig. 3, the building the data analysis model by using the preset similarity comparison rule, the preset integrity evaluation rule, the preset uniqueness evaluation rule and the preset association evaluation rule includes:
s301, constructing an initial network model;
step S302, four network nodes are set in the initial network model;
step S303, respectively corresponding the preset similarity comparison rule, the preset integrity evaluation rule, the preset uniqueness evaluation rule and the preset association evaluation rule to the four network nodes;
step S304, after the correspondence is finished, the stability of each network node is detected;
and S305, when the stability of each network node is qualified, confirming the convergence of the initial network model, and obtaining the data analysis model.
The beneficial effects of the above technical scheme are: the mode of setting the network nodes is used for corresponding to each rule, so that each node can independently complete the analysis of one item of the service data sample, the situation that the final analysis result is disordered due to the fact that a plurality of analysis items are mixed together is avoided, and the stability is further improved.
In one embodiment, before receiving a target evaluation type selected by a target user, performing analysis evaluation on the first number of target business data samples by using the data analysis model according to the target evaluation type, and generating a quality health analysis report, the method further includes: and performing authenticity detection on the first number of target service data samples, wherein the steps comprise:
segmenting each target service data sample to obtain a plurality of data segments;
performing functional data processing on each data segment of each target service data sample to obtain a hash value of each data segment;
acquiring a source weighted value of each target business data sample according to the plurality of hash values of each target business data sample;
calculating the target truth of each target business data sample by utilizing a preset truth algorithm according to the plurality of hash values and the source weighted value of each target business data sample;
deleting the first target service data sample with the target truth smaller than the preset truth, and reserving a second target service data sample with the target truth larger than or equal to the preset truth;
and counting the number of the second target service data samples to obtain a third number of second target service data samples.
The beneficial effects of the above technical scheme are: the data precision can be further ensured by carrying out authenticity detection on the service data samples, and meanwhile, the authenticity evaluation is carried out by utilizing the unique hash value of each target service data sample, so that the authenticity of each target service data sample can be more truly and accurately calculated, and the safety is improved.
In one embodiment, before receiving a target evaluation type selected by a target user, performing analysis evaluation on the first number of target business data samples by using the data analysis model according to the target evaluation type, and generating a quality health analysis report, the method further includes: inspecting the data analysis model, comprising the steps of:
acquiring a fourth number of preset service data samples;
predetermining the first integrity of each preset service data sample, the first similarity of each preset service data sample and other preset service data samples, the first uniqueness of each preset service data sample and the first relevance of each preset service data sample and other preset service data samples, and obtaining a first determination result;
inputting the fourth number of preset service samples into the data analysis model, receiving a second integrity of each preset service data sample, a second similarity of each preset service data sample and other preset service data samples, a second uniqueness of each preset service data sample and a second relevance of each preset service data sample and other preset service data samples output by the data analysis model, and obtaining a second determination result;
and confirming whether the first determination result is the same as the second determination result, if so, confirming that the data analysis model is accurate, otherwise, confirming that the data output by the data analysis model has deviation, and sending a prompt for repairing the data analysis model to a target user.
The beneficial effects of the above technical scheme are: the final quality health degree analysis result of the analysis model can be perfectly matched with the actual result by checking the analysis model, the condition of missing identification of useless data is avoided, and the stability and the experience of a user are further provided.
In one embodiment, the receiving a target evaluation type selected by a target user, and performing analysis and evaluation on the first number of target business data samples by using the data analysis model according to the target evaluation type to generate a quality health degree analysis report includes:
recommending four preset evaluation types to the target user;
receiving a target evaluation type selected by the user from four preset evaluation types;
when the target evaluation type is similarity evaluation, extracting the classified codes and metadata of each target business data sample in the first number of target business data samples, and performing similarity evaluation on the classified codes and metadata of each target business data sample and the classified codes and metadata of other target business data samples by using a similarity algorithm based on lexical analysis and syntactic analysis to generate a first evaluation result;
when the target evaluation type is integrity evaluation, performing integrity process detection on the classification code and the metadata of each target service data sample, wherein the integrity process detection comprises the following steps: whether the data is empty or not, detecting the data length, detecting the data enumeration value and detecting the data consistency to generate a second evaluation result;
when the target evaluation type is uniqueness evaluation, detecting whether the classification code and the metadata of each target business data sample are the only one, if so, confirming that a first number of target business data samples pass the uniqueness detection, otherwise, extracting the repeated target classification code and the target metadata and the defective target business data samples to which the target classification code and the target metadata belong, and generating a third evaluation result;
when the target evaluation type is relevance evaluation, performing relevance evaluation on the classified codes and metadata of each target service data sample and the classified codes and metadata of other target service data samples to obtain a fourth evaluation result;
and performing comprehensive analysis by using the first evaluation result, the second evaluation result, the third evaluation result and the fourth evaluation result to obtain the quality health degree analysis report.
The beneficial effects of the above technical scheme are: by carrying out all-around analysis on the target service data sample, obtaining a plurality of evaluation results and then carrying out comprehensive analysis according to the plurality of evaluation results to generate the quality health degree analysis report, the evaluation of each project can be guaranteed to be independent, and the influence of other projects can not be said, the accuracy of each evaluation result can be guaranteed, and meanwhile, the accuracy of the final quality health degree analysis report is also guaranteed.
In one embodiment, the quality health analysis report is presented in a graphical format comprising:
drawing and displaying the first evaluation result, the second evaluation result, the third evaluation result and the fourth evaluation result in a first radar chart format respectively;
and drawing and displaying the quality health degree analysis report subjected to comprehensive analysis by using the first evaluation result, the second evaluation result, the third evaluation result and the fourth evaluation result in a format of a second radar map.
The beneficial effects of the above technical scheme are: the detection items in the evaluation results of the first number of target service samples can be accurately and comprehensively displayed through the radar map, so that a user can look up and understand the quality and health degree analysis report at a glance, and the experience of the user is further improved.
In one embodiment, after the mapping is completed, detecting the stability of each network node includes:
acquiring the number of times of heartbeat detection overtime of each node within a preset time length;
sequencing the four network nodes according to the overtime times of heartbeat detection in a sequence from the maximum to the minimum to obtain a sequencing result;
determining the network connection state of each network node in the sequencing result;
when the network connection state of each network node is smooth, judging that the working states of the four network nodes are normal, when the network connection state of any one network node is disconnected, determining a first target network node of the disconnected network, judging that the working state of the first target network node is abnormal, generating an abnormal report for displaying, and judging that the stability of the first target network node is poor;
when the working state of each network node is judged to be normal, each network node is used as an initiating node;
sending the first resource occupation state of each initiating node to the adjacent network nodes;
forcibly closing the first resource occupation state of each initiating node and confirming whether the first resource occupation state received by the adjacent network node is changed;
if the change occurs, detecting whether a second resource occupation state of the adjacent network node is the same as a first resource occupation state, if so, determining that the adjacent network node is abnormal, and judging that the stability of the adjacent network node is poor, otherwise, determining that the network node is normal;
when the network nodes are confirmed to be normal, the four network nodes are started simultaneously, whether interference conditions occur among the network nodes is confirmed, if yes, second target network nodes with the interference conditions occur are marked, the stability of the second target network nodes is judged to be poor, and otherwise, the network nodes are confirmed to be normal in working mode;
detecting the difference between the target data output by each network node and the preset data, if the target data output by each network node is the same as the preset data, confirming that the precision of the output data of the network node is normal, judging that the stability of each network node is excellent, if the target data output by any network node is different from the preset data, extracting a third target network node with the output target data different from the preset data, and judging that the stability of the third target network node is poor.
The beneficial effects of the above technical scheme are: whether the work of the target network node meets the actual requirement or not can be determined macroscopically by judging the stability of the target network node from multiple angles, the risk is reduced, the working performance of each target network node is guaranteed, the accuracy of the subsequent quality health degree evaluation result of the target business data sample can be further guaranteed, meanwhile, the stability of the model is also improved, the data analysis model can conduct quality health degree evaluation on a large number of business data samples, and the working efficiency is improved.
In one embodiment, after obtaining the first number of target business data samples, before constructing the data analysis model using the preset similarity contrast rule, the preset integrity evaluation rule, the preset uniqueness evaluation rule, and the preset association evaluation rule, the method further includes: and performing qualification detection on the first number of target service data samples, wherein the method specifically comprises the following steps:
acquiring a security coefficient of each target service data sample;
calculating a target security index of each target service data sample according to the confidentiality coefficient of each target service data sample:
Figure BDA0003138540800000161
wherein, PiTarget safety index, S, expressed as ith target business data sampleiRepresenting the degree of freedom of the ith target service data sample, representing gamma function by gamma () and representing circumference ratio by pi, representing natural logarithm by ln, and XiA privacy coefficient expressed as an ith target traffic data sample;
scanning the sample data content of each target service data sample, and determining the integrity and the truth of each target service data sample according to the sample data content of each target service data sample;
calculating a target qualified coefficient of each service data sample by using the target security index, the integrity and the truth of each target service data sample:
Figure BDA0003138540800000162
wherein, thetai1Expressed as a weight value, Q, of the target safety index of the ith target traffic data sample in the calculated eligibility coefficient of the ith target traffic data sampleiExpressed as the integrity, θ, of the ith target traffic data samplei2Expressed as the weighted value of the integrity of the ith target service data sample in the calculated qualification coefficient of the ith target service data sample, UiExpressed as the degree of truth, θ, of the ith target traffic data samplei3Expressing the weight value of the truth of the ith target service data sample in the qualified coefficient of the calculated ith target service data sample, wherein N is expressed as a first number, and M is expressed as a first quantityiThe value of the score value is [0.5,1 ] which is marked for the ith target service data sample by using a preset scoring rule]A is expressed as in the calculation processError factor with value of [0.05,0.1 ]],WiA target qualification coefficient expressed as an ith target traffic data sample;
determining whether the target qualified coefficient of each target service data sample is greater than or equal to a preset qualified coefficient, and carrying out quantity statistics on a third target service data sample of which the target qualified coefficient is smaller than the preset qualified coefficient;
confirming that the third target service data samples with the target number cannot pass qualified detection, and generating a detection report;
and displaying the detection report.
The beneficial effects of the above technical scheme are: the integrity and the truth of each target service data sample can be roughly calculated according to the safety index by calculating the target safety index of each target service data sample, and the integrity and the truth of the data with higher safety are higher, so the qualification coefficient of each target service data sample is calculated according to the integrity, the truth and the safety index of each target service data sample, the qualification of the target service data sample is determined together with the self parameters of the target service data sample from the external aspect, the accuracy of the final qualification detection is ensured, further, the target service data sample can be selectively replaced by a user by displaying the third target service data samples with unqualified target quantity, and the accuracy of the quality and health evaluation of the subsequent target service data samples is further ensured, and meanwhile, a qualified and perfect data sample is provided for the quality health degree evaluation of the subsequent target business data sample.
In one embodiment, as shown in fig. 4-6, includes:
a data quality health degree analysis platform based on a multidimensional analysis technology utilizes the method of the invention, and the working process comprises the steps of obtaining business data by utilizing an entity data model, determining dynamic data, namely main data, in the business data, carrying out health analysis on the business data according to a similarity rule, an integrity rule, a uniqueness rule and an association rule in the data analysis model, displaying an analysis result in a graphical format, and generating a data quality analysis report.
This platform still has following function:
configuration of a condition for supporting coincident code matching;
the system supports the regular main data repeated code check and provides a repeated code list of the main data;
the method supports the accurate duplicate checking function and can configure duplicate checking rules;
supporting the establishment of a uniform auditing process;
support the announcement and opinion collection of the duplicate code list: the main data duplication list is disclosed only to the subsidiary companies or business units using the main data to be deleted in the business system.
Performing multiple checking functions on the data through configurable data checking conditions;
the batch export of the main data coincident code list is supported;
the method supports the tracking of the processing condition of each service system on the issued duplicate code list: establishing a mapping relation of main data recoding codes, and tracking the service processing (including uncleared service and main data processing state) condition of the deleted main data;
establishing a data constraint rule;
realizing the field mandatory check function;
realizing a relation field checking function;
the system supports regular main data health degree analysis, checks the main data coincident codes and provides a coincident code list of the main data;
the method supports the examination, the public, the opinion collection, the release and the export of the coincident code list;
and processing and tracking of the issued duplicate code list are supported.
The data management platform supports various check rules and can customize the check rules, such as: and value range verification, related attached table verification, regular expression verification, homonymy library verification and custom rule verification are supported.
An input selection comprising: the method supports value list template selection, supports user-defined auxiliary table selection, supports uploading of any accessories and analysis of data health degree.
And configuring health degree analysis parameters is supported, normal state monitoring analysis of the standard coding library is realized, state analysis reports of various main data coding libraries are generated according to the health degree parameter model, a data list to be processed is provided, and a basis is provided for data cleaning.
The beneficial effects of the above technical scheme are: corresponding quality control and analysis parameters are configured for different types of data models through a data quality management platform, normal quality monitoring management is carried out on different types of standard data, accurate duplicate checking and fuzzy duplicate checking of the data can be realized, and various configurable data checking functions can be provided. And the data uniqueness, integrity and consistency check is supported.
The embodiment also discloses a data quality and health degree analysis system based on the multidimensional analysis technology, as shown in fig. 7, the system includes:
an obtaining module 701, configured to obtain a first number of target service data samples;
a building module 702, configured to build a data analysis model by using a preset similarity comparison rule, a preset integrity evaluation rule, a preset uniqueness evaluation rule, and a preset association evaluation rule;
a generating module 703, configured to receive a target evaluation type selected by a target user, perform analysis and evaluation on the first number of target service data samples by using the data analysis model according to the target evaluation type, and generate a quality and health degree analysis report;
a display module 704, configured to display the quality health analysis report in a graphical format;
wherein the target evaluation type is: one or more of a similarity assessment, an integrity assessment, a uniqueness assessment, and an association assessment.
The working principle and the advantageous effects of the above technical solution have been explained in the method claims, and are not described herein again.
It will be understood by those skilled in the art that the first and second terms of the present invention refer to different stages of application.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (6)

1. A data quality health degree analysis method based on a multidimensional analysis technology is characterized by comprising the following steps:
acquiring a first number of target service data samples;
constructing a data analysis model by utilizing a preset similarity comparison rule, a preset integrity evaluation rule, a preset uniqueness evaluation rule and a preset relevance evaluation rule;
receiving a target evaluation type selected by a target user, and analyzing and evaluating the first number of target service data samples by using the data analysis model according to the target evaluation type to generate a quality health degree analysis report;
displaying the quality health degree analysis report in a graphical format;
wherein the target evaluation type is: one or more of a similarity assessment, an integrity assessment, a uniqueness assessment, and an association assessment;
after obtaining the first number of target service data samples, before constructing a data analysis model using a preset similarity comparison rule, a preset integrity evaluation rule, a preset uniqueness evaluation rule, and a preset association evaluation rule, the method further includes: and performing qualification detection on the first number of target service data samples, wherein the method specifically comprises the following steps:
acquiring a security coefficient of each target service data sample;
calculating a target security index of each target service data sample according to the confidentiality coefficient of each target service data sample:
Figure FDA0003138540790000011
wherein, PiTarget safety index, S, expressed as ith target business data sampleiRepresenting the degree of freedom of the ith target service data sample, representing gamma function by gamma () and representing circumference ratio by pi, representing natural logarithm by ln, and XiA privacy coefficient expressed as an ith target traffic data sample;
scanning the sample data content of each target service data sample, and determining the integrity and the truth of each target service data sample according to the sample data content of each target service data sample;
calculating a target qualified coefficient of each service data sample by using the target security index, the integrity and the truth of each target service data sample:
Figure FDA0003138540790000021
wherein, thetai1Expressed as a weight value, Q, of the target safety index of the ith target traffic data sample in the calculated eligibility coefficient of the ith target traffic data sampleiExpressed as the integrity, θ, of the ith target traffic data samplei2Expressed as the weighted value of the integrity of the ith target service data sample in the calculated qualification coefficient of the ith target service data sample, UiExpressed as the degree of truth, θ, of the ith target traffic data samplei3An ith target service data sample expressed as the truth of the ith target service data sample in calculationThe weight value in the qualified coefficient of the text, N is expressed as a first number, MiThe value of the score value is [0.5,1 ] which is marked for the ith target service data sample by using a preset scoring rule]And a is an error factor in the calculation process and takes the value of [0.05, 0.1%],WiA target qualification coefficient expressed as an ith target traffic data sample;
determining whether the target qualified coefficient of each target service data sample is greater than or equal to a preset qualified coefficient, and carrying out quantity statistics on a third target service data sample of which the target qualified coefficient is smaller than the preset qualified coefficient;
confirming that the third target service data samples with the target number cannot pass qualified detection, and generating a detection report;
and displaying the detection report.
2. The method of claim 1, wherein prior to obtaining the first number of target business data samples, the method further comprises:
determining a first number of data samples according to a preset condition;
determining a state function based on the first number;
determining a screening condition according to the state function, and screening a first number of initial service data samples meeting the screening condition from a second number of initial service data samples, wherein the second number is greater than the first number;
determining the first number of initial traffic data samples as the first number of target traffic data samples.
3. The method of claim 1, wherein before receiving a target evaluation type selected by a target user, and performing an analysis evaluation on the first number of target business data samples by using the data analysis model according to the target evaluation type to generate a quality health analysis report, the method further comprises: inspecting the data analysis model, comprising the steps of:
acquiring a fourth number of preset service data samples;
predetermining the first integrity of each preset service data sample, the first similarity of each preset service data sample and other preset service data samples, the first uniqueness of each preset service data sample and the first relevance of each preset service data sample and other preset service data samples, and obtaining a first determination result;
inputting the fourth number of preset service samples into the data analysis model, receiving a second integrity of each preset service data sample, a second similarity of each preset service data sample and other preset service data samples, a second uniqueness of each preset service data sample and a second relevance of each preset service data sample and other preset service data samples output by the data analysis model, and obtaining a second determination result;
and confirming whether the first determination result is the same as the second determination result, if so, confirming that the data analysis model is accurate, otherwise, confirming that the data output by the data analysis model has deviation, and sending a prompt for repairing the data analysis model to a target user.
4. The method as claimed in claim 1, wherein the receiving a target evaluation type selected by a target user, and performing analysis and evaluation on the first number of target business data samples by using the data analysis model according to the target evaluation type to generate a quality health analysis report includes:
recommending four preset evaluation types to the target user;
receiving a target evaluation type selected by the user from four preset evaluation types;
when the target evaluation type is similarity evaluation, extracting the classified codes and metadata of each target business data sample in the first number of target business data samples, and performing similarity evaluation on the classified codes and metadata of each target business data sample and the classified codes and metadata of other target business data samples by using a similarity algorithm based on lexical analysis and syntactic analysis to generate a first evaluation result;
when the target evaluation type is integrity evaluation, performing integrity process detection on the classification code and the metadata of each target service data sample, wherein the integrity process detection comprises the following steps: whether the data is empty or not, detecting the data length, detecting the data enumeration value and detecting the data consistency to generate a second evaluation result;
when the target evaluation type is uniqueness evaluation, detecting whether the classification code and the metadata of each target business data sample are the only one, if so, confirming that a first number of target business data samples pass the uniqueness detection, otherwise, extracting the repeated target classification code and the target metadata and the defective target business data samples to which the target classification code and the target metadata belong, and generating a third evaluation result;
when the target evaluation type is relevance evaluation, performing relevance evaluation on the classified codes and metadata of each target service data sample and the classified codes and metadata of other target service data samples to obtain a fourth evaluation result;
and performing comprehensive analysis by using the first evaluation result, the second evaluation result, the third evaluation result and the fourth evaluation result to obtain the quality health degree analysis report.
5. The method of claim 4, wherein the displaying the quality health analysis report in a graphical format comprises:
drawing and displaying the first evaluation result, the second evaluation result, the third evaluation result and the fourth evaluation result in a first radar chart format respectively;
and drawing and displaying the quality health degree analysis report subjected to comprehensive analysis by using the first evaluation result, the second evaluation result, the third evaluation result and the fourth evaluation result in a format of a second radar map.
6. A data quality health analysis system based on multidimensional analysis techniques, the system comprising:
the acquisition module is used for acquiring a first number of target service data samples;
the construction module is used for constructing a data analysis model by utilizing a preset similarity contrast rule, a preset integrity evaluation rule, a preset uniqueness evaluation rule and a preset relevance evaluation rule;
the generation module is used for receiving a target evaluation type selected by a target user, analyzing and evaluating the first number of target service data samples by using the data analysis model according to the target evaluation type and generating a quality health degree analysis report;
the display module is used for displaying the quality health degree analysis report in a graphical format;
wherein the target evaluation type is: one or more of a similarity assessment, an integrity assessment, a uniqueness assessment, and an association assessment;
after obtaining the first number of target service data samples, before constructing a data analysis model using a preset similarity comparison rule, a preset integrity evaluation rule, a preset uniqueness evaluation rule, and a preset association evaluation rule, the method further includes: and performing qualification detection on the first number of target service data samples, wherein the method specifically comprises the following steps:
acquiring a security coefficient of each target service data sample;
calculating a target security index of each target service data sample according to the confidentiality coefficient of each target service data sample:
Figure FDA0003138540790000051
wherein, PiTarget safety index, S, expressed as ith target business data sampleiIs shown asThe degree of freedom of the ith target service data sample is shown as gamma function, pi is shown as circumferential rate, ln is shown as natural logarithm, X is shown as natural logarithmiA privacy coefficient expressed as an ith target traffic data sample;
scanning the sample data content of each target service data sample, and determining the integrity and the truth of each target service data sample according to the sample data content of each target service data sample;
calculating a target qualified coefficient of each service data sample by using the target security index, the integrity and the truth of each target service data sample:
Figure FDA0003138540790000061
wherein, thetai1Expressed as a weight value, Q, of the target safety index of the ith target traffic data sample in the calculated eligibility coefficient of the ith target traffic data sampleiExpressed as the integrity, θ, of the ith target traffic data samplei2Expressed as the weighted value of the integrity of the ith target service data sample in the calculated qualification coefficient of the ith target service data sample, UiExpressed as the degree of truth, θ, of the ith target traffic data samplei3Expressing the weight value of the truth of the ith target service data sample in the qualified coefficient of the calculated ith target service data sample, wherein N is expressed as a first number, and M is expressed as a first quantityiThe value of the score value is [0.5,1 ] which is marked for the ith target service data sample by using a preset scoring rule]And a is an error factor in the calculation process and takes the value of [0.05, 0.1%],WiA target qualification coefficient expressed as an ith target traffic data sample;
determining whether the target qualified coefficient of each target service data sample is greater than or equal to a preset qualified coefficient, and carrying out quantity statistics on a third target service data sample of which the target qualified coefficient is smaller than the preset qualified coefficient;
confirming that the third target service data samples with the target number cannot pass qualified detection, and generating a detection report;
and displaying the detection report.
CN202110725686.0A 2020-11-27 2020-11-27 Data quality health degree analysis method and system based on multidimensional analysis technology Active CN113434485B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110725686.0A CN113434485B (en) 2020-11-27 2020-11-27 Data quality health degree analysis method and system based on multidimensional analysis technology

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011362385.8A CN112380190B (en) 2020-11-27 2020-11-27 Data quality health degree analysis method and system based on multidimensional analysis technology
CN202110725686.0A CN113434485B (en) 2020-11-27 2020-11-27 Data quality health degree analysis method and system based on multidimensional analysis technology

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN202011362385.8A Division CN112380190B (en) 2020-11-27 2020-11-27 Data quality health degree analysis method and system based on multidimensional analysis technology

Publications (2)

Publication Number Publication Date
CN113434485A true CN113434485A (en) 2021-09-24
CN113434485B CN113434485B (en) 2021-12-07

Family

ID=74588495

Family Applications (3)

Application Number Title Priority Date Filing Date
CN202110724753.7A Active CN113407517B (en) 2020-11-27 2020-11-27 Data quality health degree analysis method and system based on multidimensional analysis technology
CN202110725686.0A Active CN113434485B (en) 2020-11-27 2020-11-27 Data quality health degree analysis method and system based on multidimensional analysis technology
CN202011362385.8A Active CN112380190B (en) 2020-11-27 2020-11-27 Data quality health degree analysis method and system based on multidimensional analysis technology

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202110724753.7A Active CN113407517B (en) 2020-11-27 2020-11-27 Data quality health degree analysis method and system based on multidimensional analysis technology

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202011362385.8A Active CN112380190B (en) 2020-11-27 2020-11-27 Data quality health degree analysis method and system based on multidimensional analysis technology

Country Status (1)

Country Link
CN (3) CN113407517B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117056576A (en) * 2023-10-13 2023-11-14 太极计算机股份有限公司 Data quality flexible verification method based on big data platform

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113110981B (en) * 2021-03-26 2024-04-09 北京中大科慧科技发展有限公司 Air conditioner room health energy efficiency detection method for data center
CN114722434B (en) * 2022-06-09 2022-08-16 江苏荣泽信息科技股份有限公司 Block chain-based ledger data control method and device
CN115543973B (en) * 2022-09-19 2023-06-13 北京三维天地科技股份有限公司 Data quality rule recommendation method based on knowledge spectrogram and machine learning
CN115409419B (en) * 2022-09-26 2023-12-05 河南星环众志信息科技有限公司 Method and device for evaluating value of business data, electronic equipment and storage medium
CN116842211B (en) * 2023-07-05 2024-03-15 北京能量时光教育科技有限公司 User analysis method and system based on live big data

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103247008A (en) * 2013-05-07 2013-08-14 国家电网公司 Quality evaluation method of electricity statistical index data
US20150310055A1 (en) * 2014-04-29 2015-10-29 Microsoft Corporation Using lineage to infer data quality issues
CN107491381A (en) * 2017-07-04 2017-12-19 广西电网有限责任公司电力科学研究院 A kind of equipment condition monitoring quality of data evaluating system
CN107545349A (en) * 2016-06-28 2018-01-05 国网天津市电力公司 A kind of Data Quality Analysis evaluation model towards electric power big data
CN108898311A (en) * 2018-06-28 2018-11-27 国网湖南省电力有限公司 A kind of data quality checking method towards intelligent distribution network repairing dispatching platform
CN110728437A (en) * 2019-09-26 2020-01-24 华南师范大学 Quality evaluation method and system for open data
CN111339215A (en) * 2019-05-31 2020-06-26 北京东方融信达软件技术有限公司 Structured data set quality evaluation model generation method, evaluation method and device

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10248672B2 (en) * 2011-09-19 2019-04-02 Citigroup Technology, Inc. Methods and systems for assessing data quality
US9268776B2 (en) * 2012-06-25 2016-02-23 International Business Machines Corporation Methods and apparatus for data collection
US20150220868A1 (en) * 2014-02-03 2015-08-06 Patient Profiles, LLC Evaluating Data Quality of Clinical Trials
US10049128B1 (en) * 2014-12-31 2018-08-14 Symantec Corporation Outlier detection in databases
CN107403257A (en) * 2017-07-04 2017-11-28 广西电网有限责任公司电力科学研究院 One kind production basic data index analysing system
CN107368957A (en) * 2017-07-04 2017-11-21 广西电网有限责任公司电力科学研究院 A kind of construction method of equipment condition monitoring quality of data evaluation and test system
CN108229784A (en) * 2017-11-09 2018-06-29 中国电力科学研究院有限公司 The multidimensional data quality evaluating method and system of a kind of intelligent distribution network
CN108536744A (en) * 2018-03-09 2018-09-14 中国地质大学(武汉) A kind of VGI Data Quality Assessment Methodologies and system based on historical data feature
CN108681814A (en) * 2018-05-10 2018-10-19 北京鼎泰智源科技有限公司 A kind of big data quality standard management control method
CN108966210A (en) * 2018-06-22 2018-12-07 西京学院 A kind of design method of wireless network Trust Valuation Model
CN111343003A (en) * 2020-02-11 2020-06-26 广州智乐物联网技术有限公司 Data analysis method and device based on block chain and SDN edge computing network system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103247008A (en) * 2013-05-07 2013-08-14 国家电网公司 Quality evaluation method of electricity statistical index data
US20150310055A1 (en) * 2014-04-29 2015-10-29 Microsoft Corporation Using lineage to infer data quality issues
CN107545349A (en) * 2016-06-28 2018-01-05 国网天津市电力公司 A kind of Data Quality Analysis evaluation model towards electric power big data
CN107491381A (en) * 2017-07-04 2017-12-19 广西电网有限责任公司电力科学研究院 A kind of equipment condition monitoring quality of data evaluating system
CN108898311A (en) * 2018-06-28 2018-11-27 国网湖南省电力有限公司 A kind of data quality checking method towards intelligent distribution network repairing dispatching platform
CN111339215A (en) * 2019-05-31 2020-06-26 北京东方融信达软件技术有限公司 Structured data set quality evaluation model generation method, evaluation method and device
CN110728437A (en) * 2019-09-26 2020-01-24 华南师范大学 Quality evaluation method and system for open data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
尹榕慧等: "面向多领域标准的数据质量评估框架研究", 《标准科学》 *
张立芬等: "NQI集成服务数据质量评价关键技术研究――以检验检测类基础数据为例", 《中国标准化》 *
荀华等: "基于规则的电力数据指标检查系统设计与实现", 《东北电力技术》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117056576A (en) * 2023-10-13 2023-11-14 太极计算机股份有限公司 Data quality flexible verification method based on big data platform
CN117056576B (en) * 2023-10-13 2024-04-05 太极计算机股份有限公司 Data quality flexible verification method based on big data platform

Also Published As

Publication number Publication date
CN113407517B (en) 2022-02-11
CN113434485B (en) 2021-12-07
CN112380190A (en) 2021-02-19
CN112380190B (en) 2021-08-17
CN113407517A (en) 2021-09-17

Similar Documents

Publication Publication Date Title
CN113434485B (en) Data quality health degree analysis method and system based on multidimensional analysis technology
CN109784758B (en) Engineering quality supervision early warning system and method based on BIM model
CN108073517B (en) Management method, device, medium and computer equipment for third-party software test
CN114519498A (en) Quality evaluation method and system based on BIM (building information modeling)
CN114090556B (en) Electric power marketing data acquisition method and system
CN111898359A (en) Intelligent quality detection report generation method and system
CN116414815A (en) Data quality detection method, device, computer equipment and storage medium
CN111858236B (en) Knowledge graph monitoring method and device, computer equipment and storage medium
CN112037007A (en) Credit approval method for small and micro enterprises and electronic equipment
CN112651433B (en) Abnormal behavior analysis method for privileged account
CN113791980A (en) Test case conversion analysis method, device, equipment and storage medium
CN117076454B (en) Engineering quality acceptance form data structured storage method and system
CN116991746B (en) Method and device for evaluating general quality characteristics of software
CN114665986B (en) Bluetooth key testing system and method
CN113434946B (en) Building model checking system and method based on BIM
CN116881824A (en) Anomaly detection method and system for official vehicle audit
CN117076634A (en) Corpus data management method and related equipment
CN113190805A (en) Code asset management system
CN117421248A (en) Software testing method and device, electronic equipment and computer readable medium
CN115599970A (en) Method for determining coverage area of welding work test piece and related assembly
CN113111073A (en) Abnormal data sorting method and device, computing equipment and computer storage medium
CN116167544A (en) Material error prevention method, system and storage medium based on MES system
CN115170315A (en) Monitoring report generation method and device, storage medium and electronic equipment
CN112000361A (en) Software security development capability maturity difference analysis method and system
CN117520204A (en) Bank product testing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant