CN114580982A - Method, device and equipment for evaluating data quality of industrial equipment - Google Patents
Method, device and equipment for evaluating data quality of industrial equipment Download PDFInfo
- Publication number
- CN114580982A CN114580982A CN202210491936.3A CN202210491936A CN114580982A CN 114580982 A CN114580982 A CN 114580982A CN 202210491936 A CN202210491936 A CN 202210491936A CN 114580982 A CN114580982 A CN 114580982A
- Authority
- CN
- China
- Prior art keywords
- time sequence
- evaluated
- similarity
- sequence data
- evaluation index
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000011156 evaluation Methods 0.000 claims abstract description 186
- 230000002159 abnormal effect Effects 0.000 claims abstract description 37
- 238000012545 processing Methods 0.000 claims description 17
- 238000001914 filtration Methods 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 3
- 238000013441 quality evaluation Methods 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 5
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 238000005215 recombination Methods 0.000 description 2
- 230000006798 recombination Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06395—Quality analysis or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2474—Sequence data queries, e.g. querying versioned data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Probability & Statistics with Applications (AREA)
- Marketing (AREA)
- Educational Administration (AREA)
- Tourism & Hospitality (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- Primary Health Care (AREA)
- General Health & Medical Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- General Factory Administration (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides an evaluation method, a device and equipment for data quality of industrial equipment, wherein the method comprises the following steps: acquiring at least two time sequence data curves to be evaluated of the time sequence data of the industrial equipment; obtaining relative fluctuation of at least two time sequence data curves to be evaluated and normal time sequence data curves in a preset case base under at least one evaluation index; determining at least one target evaluation index in at least one evaluation index according to the relative fluctuation of at least one evaluation index; according to at least one target evaluation index, obtaining the similarity between at least two time sequence data curves to be evaluated and abnormal time sequence data curves in the preset case library; and obtaining the evaluation result of the data quality of at least two time sequence data curves to be evaluated according to the similarity. The scheme of the invention can improve the accuracy and efficiency of the quality evaluation of the time sequence data of the industrial equipment.
Description
Technical Field
The present invention relates to the field of data processing technology for industrial equipment, and in particular, to a method, an apparatus, and a device for evaluating data quality of industrial equipment.
Background
In the process of processing the time series data of the industrial equipment, the quality of the time series data is important; due to the fact that the number of time sequence data generated by different industrial equipment cases is large, the types are complicated, and the difference among the data is large, general study and judgment logic is lacked for the quality problem of the time sequence data, and the quality of the time sequence data cannot be evaluated and analyzed timely and effectively; the data of an inexperienced data analyst is analyzed abnormally, so that the subjectivity is high, and whether the quality problem exists in the time series data or not can not be accurately analyzed.
Disclosure of Invention
The technical problem to be solved by the invention is how to provide a method, a device and equipment for evaluating the data quality of industrial equipment so as to overcome the problems in the prior art.
To solve the above technical problem, an embodiment of the present invention provides a method for evaluating data quality of an industrial device, where the method includes:
acquiring at least two time sequence data curves to be evaluated of the time sequence data of the industrial equipment;
obtaining relative fluctuation of at least two time sequence data curves to be evaluated and normal time sequence data curves in a preset case base under at least one evaluation index;
determining at least one target evaluation index in at least one evaluation index according to the relative fluctuation of at least one evaluation index;
according to at least one target evaluation index, obtaining the similarity between at least two time sequence data curves to be evaluated and abnormal time sequence data curves in the preset case library;
and obtaining the evaluation result of the data quality of at least two time sequence data curves to be evaluated according to the similarity.
Optionally, obtaining relative volatility of the at least two time series data curves to be evaluated and the normal time series data curve in the preset case base under at least one evaluation index includes:
acquiring a first similarity between at least two time series data curves to be evaluated under at least one evaluation index;
acquiring a second similarity between any one to-be-evaluated time sequence data curve of at least two to-be-evaluated time sequence data curves under at least one evaluation index and a normal time sequence data curve in a preset case library;
obtaining the relative volatility of the time sequence data curve to be evaluated and a normal time sequence data curve in a preset case base under at least one evaluation index according to the index value of the first similarity and the index value of the second similarity; the index value includes: mean and variance of similarity.
Optionally, obtaining, according to the index value of the first similarity and the index value of the second similarity, relative volatility of the time series data curve to be evaluated and a normal time series data curve in a preset case base under at least one evaluation index includes:
according to the formula, relative volatility = (mean of second similarity/variance of second similarity)/(mean of first similarity/variance of first similarity), which is obtained.
Optionally, determining at least one target evaluation index in the at least one evaluation index according to the relative fluctuation of the at least one evaluation index includes:
and determining at least one target evaluation index in at least one evaluation index according to the relative volatility of at least one evaluation index and a preset volatility threshold value.
Optionally, determining at least one target evaluation indicator in the at least one evaluation indicator according to the relative volatility of the at least one evaluation indicator and a preset volatility threshold, including: when the relative volatility under the at least one evaluation index is larger than a first preset volatility threshold value, filtering the corresponding time sequence data curve to be evaluated to obtain the remaining time sequence data curve to be evaluated;
and determining at least one corresponding evaluation index as a target evaluation index when the relative volatility in the remaining time sequence data curve to be evaluated is smaller than a second preset volatility threshold value.
Optionally, obtaining the similarity between at least two time series data curves to be evaluated and an abnormal time series data curve in the preset case library according to at least one target evaluation index includes:
and acquiring a third similarity between an abnormal time sequence data curve in the preset case base and at least two time sequence data curves to be evaluated under at least one target evaluation index.
Optionally, obtaining an evaluation result of the data quality of at least two time series data curves to be evaluated according to the similarity includes:
carrying out normalization processing on the third similarity to obtain a normalized similarity index of the third similarity;
when the normalized similarity index is greater than or equal to a preset similarity index threshold, obtaining any one time series data curve to be evaluated in at least two corresponding time series data curves to be evaluated, wherein the time series data curve to be evaluated is an evaluation result of the time series data curve with the same or similar quality problem as the abnormal time series data curve;
and when the normalized similarity index is smaller than a preset similarity index threshold, obtaining any one to-be-evaluated time sequence data curve of the at least two corresponding to-be-evaluated time sequence data curves, wherein the obtained to-be-evaluated time sequence data curve is an evaluation result of a normal time sequence data curve.
An embodiment of the present invention also provides an apparatus for evaluating data quality of an industrial device, the apparatus including:
the acquisition module is used for acquiring at least two time sequence data curves to be evaluated of the time sequence data of the industrial equipment;
the processing module is used for acquiring the relative volatility of at least two time sequence data curves to be evaluated and normal time sequence data curves in a preset case base under at least one evaluation index according to the at least one evaluation index; determining at least one target evaluation index in at least one evaluation index according to the relative fluctuation of at least one evaluation index; according to at least one target evaluation index, obtaining the similarity between at least two time sequence data curves to be evaluated and abnormal time sequence data curves in the preset case library; and obtaining the evaluation result of the data quality of at least two time sequence data curves to be evaluated according to the similarity.
Embodiments of the present invention also provide a computing device, comprising: a processor, a memory storing a computer program which, when executed by the processor, performs the method according to any of the preceding claims.
Embodiments of the present invention also provide a computer-readable storage medium storing instructions that, when executed on a computer, cause the computer to perform the method of any of the above.
The scheme of the invention at least comprises the following beneficial effects:
obtaining at least two time sequence data curves to be evaluated of the time sequence data of the industrial equipment; obtaining relative fluctuation of at least two time sequence data curves to be evaluated and normal time sequence data curves in a preset case base under at least one evaluation index; determining at least one target evaluation index in at least one evaluation index according to the relative fluctuation of at least one evaluation index; according to at least one target evaluation index, obtaining the similarity between at least two time sequence data curves to be evaluated and abnormal time sequence data curves in the preset case library; obtaining the evaluation results of the data quality of at least two time sequence data curves to be evaluated according to the similarity; the accuracy and timeliness of the quality problem analysis of the time sequence data of the industrial equipment are improved.
Drawings
FIG. 1 is a schematic flow chart of a method for evaluating data quality of an industrial device according to an embodiment of the present invention;
FIG. 2 is a graph of time series data under different evaluation indexes provided by the embodiment of the present invention;
FIG. 3 is a schematic diagram of a specific implementation flow of the method for evaluating data quality of industrial equipment according to the embodiment of the present invention;
FIG. 4 is a schematic diagram of a specific application of the method for evaluating data quality of an industrial device according to an embodiment of the present invention;
fig. 5 is a block diagram of an apparatus for evaluating data quality of an industrial device according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As shown in fig. 1, an embodiment of the present invention provides a method for evaluating data quality of an industrial device, the method including:
and step 15, obtaining the evaluation result of the data quality of at least two time sequence data curves to be evaluated according to the similarity.
In this embodiment, the time series data curve to be evaluated of the time series data of the industrial equipment may be obtained by processing the original time series data of the industrial equipment, for example, an experienced data analyst marks each data point as a corresponding time series data curve according to the original time series data of the industrial equipment in a time sequence; the time sequence data forming the time sequence data curve to be evaluated can be long time sequence data or short time sequence data; when the data is long-time sequence data, a sliding window can be set to divide the data into short-time sequence data; the time sequence data forming the time sequence data curve to be evaluated can be time sequence data with single variable or time sequence data with multiple variables;
the preset case base stores cases formed by determining the historical industrial equipment time sequence data as a normal time sequence data curve or an abnormal time sequence data curve, wherein the cases comprise a plurality of different cases; meanwhile, the abnormal time sequence data curve cases in the preset case base correspond to respective labels so as to distinguish different abnormal problems;
the evaluation index may be an index for evaluating the correlation characteristic between two time series data curves from different dimensions, and the evaluation index may be a plurality of evaluation indexes which can be listed by an enumeration method, such as frequency domain, time domain and other evaluation indexes; acquiring the relative volatility of each time sequence data curve of at least two time sequence data curves to be evaluated and at least one normal curve in the preset case base under at least one evaluation index; it should be understood that when the evaluation indexes are different, the obtained relative volatility is also different, so that the evaluation indexes and the relative volatility are in one-to-one correspondence; when the actual value of the relative volatility corresponding to at least one evaluation index meets a preset condition, determining the evaluation index corresponding to the relative volatility as a target evaluation index, and subsequently acquiring the similarity of each time series data curve of at least two time series data curves to be evaluated and at least one abnormal curve in the preset case library under the target evaluation index based on at least one target evaluation index; obtaining an evaluation result of the time sequence data quality in each of at least two time sequence data curves to be evaluated according to the similarity; the quality problem of the time sequence data of the industrial equipment is quickly and accurately positioned, and the working process of the industrial equipment can be further optimized;
the method comprises the steps of calculating a time sequence data curve to be evaluated of industrial equipment and a normal time sequence data curve in a preset case base, determining a target evaluation index according to similar volatility under different indexes, determining the accuracy of subsequent quality evaluation based on the target evaluation index according to the relative volatility, evaluating the time sequence data curve to be evaluated and an abnormal time sequence data curve in the preset case base according to the target evaluation index, and quickly positioning the quality problem of the time sequence data curve to be evaluated.
In an optional embodiment of the present invention, the step 12 may include:
step 121, obtaining a first similarity between at least two time series data curves to be evaluated under at least one evaluation index;
step 122, acquiring a second similarity between any one to-be-evaluated time series data curve of the at least two to-be-evaluated time series data curves under at least one evaluation index and a normal time series data curve in a preset case library;
step 123, obtaining the relative volatility of the time sequence data curve to be evaluated and a normal time sequence data curve in a preset case base under at least one evaluation index according to the index value of the first similarity and the index value of the second similarity; the index value includes: mean and variance of similarity.
In this embodiment, the first similarity and the second similarity may be calculated by euclidean distance or Dtw; as shown in fig. 2, since the different evaluation indexes have different emphasis properties, the similarity obtained by corresponding calculation under the different evaluation indexes is different; it should be noted that, under the same evaluation index, the similarity calculated between data curves in different orders should be different;
the first similarity can reflect the similarity between any two time sequence data curves to be evaluated in at least two time sequence data curves to be evaluated under the same evaluation index, so that the first similarity is actually a set of similarities between different time sequence data curves to be evaluated in at least two time sequence data curves to be evaluated under the same index;
the second similarity can reflect the similarity between at least two time sequence data curves to be evaluated and the normal time sequence data curves in the preset case library under the same evaluation index, so that the second similarity is a set of the similarities between the time sequence data curves to be evaluated under the same index and the normal time sequence data curves in the preset case library;
obtaining the relative volatility under any evaluation index according to index values by respectively calculating the index values of the first similarity and the second similarity; the index value may be a mean and a variance of the similarity, and it should be understood that the index value is not limited to the mean and the variance.
Further, the relative volatility may be obtained by a formula, where the relative volatility = (mean of the second similarity/variance of the second similarity)/(mean of the first similarity/variance of the first similarity);
in this embodiment, it should be noted that one evaluation index corresponds to one relative volatility; and reflecting the volatility of the similarity degree of at least two time sequence data curves to be evaluated and the normal time sequence data curves in the preset case library under the corresponding evaluation indexes through the relative volatility.
In an optional embodiment of the present invention, the step 13 may include:
and 131, determining at least one target evaluation index in at least one evaluation index according to the relative volatility of at least one evaluation index and a preset volatility threshold.
In this embodiment, the preset volatility threshold may be set according to a requirement of an actual industrial device, and at least one target evaluation index in at least one evaluation index is determined by comparing the relative volatility under the at least one evaluation index with the preset volatility threshold; and determining the target evaluation index according to the relative fluctuation, and ensuring the accuracy of subsequent evaluation on the time series data curve based on the target evaluation index.
In an optional embodiment of the present invention, the step 131 may include:
step 1311, when the relative volatility of at least one evaluation index is greater than a first preset volatility threshold, filtering out a corresponding time series data curve to be evaluated, and obtaining a remaining time series data curve to be evaluated;
step 1312, determining at least one corresponding evaluation index in the remaining time series data curve to be evaluated as a target evaluation index when the relative volatility is smaller than a second preset volatility threshold value.
In this embodiment, both the first preset volatility threshold and the second preset volatility threshold may be set according to actual needs, and it should be understood that the first volatility threshold is greater than the second volatility threshold;
when the relative volatility under at least one evaluation index is greater than the first preset volatility threshold, it indicates that the volatility of the similarity degree of at least two time series data curves to be evaluated and normal time series data curves in the preset case library is large under the corresponding evaluation index, and further indicates that the similarity degree of the curves is low, and then a sample under the evaluation index should be filtered;
in the remaining time series data curves to be evaluated, when the relative volatility under at least one evaluation index is smaller than the second preset volatility threshold, it is indicated that the volatility of the similarity degree of the remaining time series data curves to be evaluated and the normal time series data curves in the preset case library is small under the evaluation index, and further, the curve similarity is high, and then it should be determined that the evaluation value index corresponding to the relative volatility smaller than the second preset volatility threshold in the remaining time series data curves to be evaluated is a target evaluation index; and filtering out target evaluation indexes by screening, and further improving the accuracy of the quality evaluation of the subsequent time sequence data curve to be evaluated.
In an optional embodiment of the present invention, the step 14 may include:
step 141, obtaining a third similarity between the abnormal time series data curve in the preset case base and the at least two time series data curves to be evaluated under the at least one target evaluation index.
In this embodiment, the third similarity may also be calculated by methods such as euclidean distance or Dtw; the third similarity is the similarity between each of the at least two time series data curves to be evaluated and an abnormal time series data curve in the preset case library, which is obtained by calculation under a plurality of different target evaluation indexes; sequentially and circularly calculating the third similarity of the corresponding time sequence data curve to be evaluated and the abnormal time sequence data curve in the preset case library under each target evaluation index according to at least one target evaluation index, and obtaining a third similarity set of the same time sequence data curve to be evaluated under different target evaluation indexes; and similarly, respectively obtaining third similarity sets of corresponding different time sequence data curves to be evaluated.
In an optional embodiment of the present invention, the step 15 may include:
step 151, performing normalization processing on the third similarity to obtain a normalized similarity index of the third similarity;
step 152a, when the normalized similarity index is greater than or equal to a preset similarity index threshold, obtaining any one of the corresponding at least two time series data curves to be evaluated, which is an evaluation result of a time series data curve having the same or similar quality problem as the abnormal time series data curve;
and 152b, when the normalized similarity index is smaller than a preset similarity index threshold, obtaining any one time series data curve to be evaluated in the corresponding at least two time series data curves to be evaluated, wherein the time series data curve to be evaluated is an evaluation result of a normal time series data curve.
In this embodiment, since the values of the third similarities obtained by calculation under different target evaluation indexes are different, normalization similarity indexes of the third similarities are obtained by normalizing different third similarities obtained by calculation under different target evaluation indexes, so that the quality problem of the destination of the time series data to be evaluated can be truly, accurately and timely reflected by a unified index; preferably, the normalized similarity index may be represented by the formula: the normalized similarity index = the mean value of the third similarity/the absolute value of the third similarity, and is obtained by calculation;
when the normalized similarity index is greater than or equal to the preset similarity index threshold, the fact that the similarity between the time sequence data curve to be evaluated and the abnormal time sequence data curve in the preset case library is high is shown, and the fact that the time sequence data curve to be evaluated and the abnormal time sequence data curve have the same or similar quality problem is shown at the moment; when the cases in the preset case base are large enough, and the abnormal curve of each case has a label with a corresponding quality problem, further obtaining the actual quality problem of the time sequence data curve to be evaluated, and taking corresponding measures aiming at the problem;
when the normalized similarity index is smaller than the preset similarity index threshold, it indicates that the similarity between the time sequence data curve to be evaluated and the abnormal time sequence data curve in the preset case library is low, and at this time, it indicates that the time sequence data curve to be evaluated is not similar to the abnormal time sequence data curve, and the time sequence data curve is a normal time sequence data curve.
The method will be described below by using a specific example, as shown in fig. 3, the specific flow is as follows,
step 31, acquiring relative volatility of a time sequence data curve to be evaluated under multiple evaluation indexes and a normal time sequence data curve in a preset case library;
and step 34, obtaining an evaluation result that the time sequence data curve to be evaluated is a normal or abnormal time sequence data curve according to the similarity.
In the embodiment of the invention, target evaluation indexes in a plurality of evaluation indexes are screened out by the time sequence data curve to be evaluated and the normal time sequence data curve in the preset case base, and the time sequence data curve to be evaluated and the abnormal time sequence data curve in the preset case base are evaluated further according to the target evaluation indexes to obtain the evaluation result of the time sequence data curve to be evaluated; when the quality problem of a large amount of time sequence data of the industrial equipment is detected, the quality problem of the time sequence data of the industrial equipment can be quickly and accurately positioned by the method.
In practical application, as shown in fig. 4, at least two acquired time series data curves to be evaluated, a quality evaluation index set, and a normal time series data curve and an abnormal time series data curve in a preset case library of the time series data of the industrial equipment may be input into a quality problem identification function code generator, and then a quality problem function identification code is generated in the quality problem identification function code generator through different programming languages, code templates, and a similarity fluctuation function package and a similarity function package of the time series data curves, so that a quality problem existing in new time series data can be identified rapidly in the following step through the generated identification function code; the code template may be edited in a different programming language, such as the R language, although other programming languages may be implemented.
According to the embodiment of the invention, the evaluation results of the data quality of at least two time series data curves to be evaluated are obtained according to the similarity; if the similarity between the time sequence data curve to be evaluated and the abnormal time sequence data curve of the case base is greater than a preset value, the fact that the time sequence data curve to be evaluated has the quality problem the same as the abnormal time sequence data curve in the case base is indicated, when a large amount of industrial equipment time sequence data are detected and analyzed, automatic detection can be achieved through the method, and meanwhile the accuracy and timeliness of the quality problem analysis of the industrial equipment time sequence data are improved.
As shown in fig. 5, an embodiment of the present invention further provides an apparatus 50 for evaluating data quality of an industrial device, where the apparatus 50 includes:
the acquisition module 51 is used for acquiring at least two time sequence data curves to be evaluated of the time sequence data of the industrial equipment;
the processing module 52 is configured to obtain, according to at least one evaluation index, relative volatility of at least two time series data curves to be evaluated and a normal time series data curve in a preset case base under the at least one evaluation index; determining at least one target evaluation index in at least one evaluation index according to the relative fluctuation of at least one evaluation index; according to at least one target evaluation index, obtaining the similarity between at least two time sequence data curves to be evaluated and abnormal time sequence data curves in the preset case library; and obtaining the evaluation result of the data quality of at least two time sequence data curves to be evaluated according to the similarity.
Optionally, the processing module 52 is configured to obtain relative volatility of at least two time series data curves to be evaluated and a normal time series data curve in a preset case library under at least one evaluation index, and includes:
acquiring a first similarity between at least two time series data curves to be evaluated under at least one evaluation index;
acquiring a second similarity between any one to-be-evaluated time sequence data curve of at least two to-be-evaluated time sequence data curves under at least one evaluation index and a normal time sequence data curve in a preset case library;
obtaining the relative volatility of the time sequence data curve to be evaluated and a normal time sequence data curve in a preset case base under at least one evaluation index according to the index value of the first similarity and the index value of the second similarity; the index value includes: mean and variance of similarity.
Optionally, the processing module 52 is configured to obtain, according to the index value of the first similarity and the index value of the second similarity, relative volatility of the time series data curve to be evaluated and a normal time series data curve in a preset case base under at least one evaluation index, and includes:
relative volatility = (mean of second similarity/variance of second similarity)/(mean of first similarity/variance of first similarity), according to the formula, the relative volatility is obtained.
Optionally, the processing module 52 is configured to determine at least one target evaluation indicator in the at least one evaluation indicator according to the relative volatility of the at least one evaluation indicator, and includes:
and determining at least one target evaluation index in at least one evaluation index according to the relative volatility of at least one evaluation index and a preset volatility threshold value.
Optionally, the processing module 52 is configured to determine at least one target evaluation indicator in the at least one evaluation indicator according to the relative volatility of the at least one evaluation indicator and a preset volatility threshold, and includes: when the relative volatility under at least one evaluation index is greater than a first preset volatility threshold value, filtering the corresponding time sequence data curve to be evaluated to obtain the remaining time sequence data curve to be evaluated;
and determining at least one corresponding evaluation index as a target evaluation index when the relative volatility in the remaining time sequence data curve to be evaluated is smaller than a second preset volatility threshold value.
Optionally, the processing module 52 is configured to obtain, according to at least one of the target evaluation indexes, similarities between at least two time series data curves to be evaluated and an abnormal time series data curve in the preset case library, and includes:
and acquiring a third similarity between an abnormal time sequence data curve in the preset case base and at least two time sequence data curves to be evaluated under at least one target evaluation index.
Optionally, the processing module 52 is configured to obtain, according to the similarity, an evaluation result of data quality of at least two time series data curves to be evaluated, and the evaluation result includes:
carrying out normalization processing on the third similarity to obtain a normalized similarity index of the third similarity;
when the normalized similarity index is greater than or equal to a preset similarity index threshold, obtaining any one time series data curve to be evaluated in at least two corresponding time series data curves to be evaluated, wherein the time series data curve to be evaluated is an evaluation result of the time series data curve with the same or similar quality problem as the abnormal time series data curve;
and when the normalized similarity index is smaller than a preset similarity index threshold, obtaining any one to-be-evaluated time sequence data curve of the at least two corresponding to-be-evaluated time sequence data curves, wherein the obtained to-be-evaluated time sequence data curve is an evaluation result of a normal time sequence data curve.
It should be noted that the apparatus is an apparatus corresponding to the above method, and all the implementations in the above method embodiment are applicable to the embodiment of the apparatus, and the same technical effects can be achieved.
Embodiments of the present invention also provide a computing device, comprising: a processor, a memory storing a computer program which, when executed by the processor, performs the method as described above. All the implementation manners in the above method embodiment are applicable to this embodiment, and the same technical effect can be achieved.
Embodiments of the present invention also provide a computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the method as described above. All the implementation manners in the above method embodiment are applicable to this embodiment, and the same technical effect can be achieved.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
Furthermore, it is to be noted that in the device and method of the invention, it is obvious that the individual components or steps can be decomposed and/or recombined. These decompositions and/or recombinations are to be regarded as equivalents of the present invention. Also, the steps of performing the series of processes described above may naturally be performed chronologically in the order described, but need not necessarily be performed chronologically, and some steps may be performed in parallel or independently of each other. It will be understood by those skilled in the art that all or any of the steps or elements of the method and apparatus of the present invention may be implemented in any computing device (including processors, storage media, etc.) or network of computing devices, in hardware, firmware, software, or any combination thereof, which can be implemented by those skilled in the art using their basic programming skills after reading the description of the present invention.
Thus, the objects of the invention may also be achieved by running a program or a set of programs on any computing device. The computing device may be a general purpose device as is well known. The object of the invention is thus also achieved solely by providing a program product comprising program code for implementing the method or the apparatus. That is, such a program product also constitutes the present invention, and a storage medium storing such a program product also constitutes the present invention. It is to be understood that the storage medium may be any known storage medium or any storage medium developed in the future. It is further noted that in the apparatus and method of the present invention, it is apparent that each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be regarded as equivalents of the present invention. Also, the steps of executing the series of processes described above may naturally be executed chronologically in the order described, but need not necessarily be executed chronologically. Some steps may be performed in parallel or independently of each other.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (10)
1. A method of assessing data quality of an industrial device, the method comprising:
acquiring at least two time sequence data curves to be evaluated of the time sequence data of the industrial equipment;
obtaining relative fluctuation of at least two time sequence data curves to be evaluated and normal time sequence data curves in a preset case base under at least one evaluation index;
determining at least one target evaluation index in at least one evaluation index according to the relative fluctuation of at least one evaluation index;
according to at least one target evaluation index, obtaining the similarity between at least two time sequence data curves to be evaluated and abnormal time sequence data curves in the preset case library;
and obtaining the evaluation result of the data quality of at least two time sequence data curves to be evaluated according to the similarity.
2. The method for evaluating the data quality of the industrial equipment according to claim 1, wherein obtaining the relative volatility of at least two time series data curves to be evaluated and a normal time series data curve in a preset case base under at least one evaluation index comprises:
acquiring a first similarity between at least two time series data curves to be evaluated under at least one evaluation index;
acquiring a second similarity between any one to-be-evaluated time sequence data curve of at least two to-be-evaluated time sequence data curves under at least one evaluation index and a normal time sequence data curve in a preset case base;
obtaining the relative volatility of the time sequence data curve to be evaluated and a normal time sequence data curve in a preset case base under at least one evaluation index according to the index value of the first similarity and the index value of the second similarity; the index value includes: mean and variance of similarity.
3. The method according to claim 2, wherein obtaining the relative volatility of the time series data curve to be evaluated and a normal time series data curve in a preset case base under at least one evaluation index according to the index value of the first similarity and the index value of the second similarity comprises:
relative volatility = (mean of second similarity/variance of second similarity)/(mean of first similarity/variance of first similarity), according to the formula, the relative volatility is obtained.
4. The method for evaluating the data quality of the industrial equipment according to claim 2, wherein determining at least one target evaluation index of the at least one evaluation index according to the relative fluctuation of the at least one evaluation index comprises:
and determining at least one target evaluation index in at least one evaluation index according to the relative volatility of at least one evaluation index and a preset volatility threshold value.
5. The method for evaluating the data quality of the industrial equipment according to claim 4, wherein determining at least one target evaluation index in at least one evaluation index according to the relative volatility of the at least one evaluation index and a preset volatility threshold comprises:
when the relative volatility under at least one evaluation index is greater than a first preset volatility threshold value, filtering the corresponding time sequence data curve to be evaluated to obtain the remaining time sequence data curve to be evaluated;
and determining at least one corresponding evaluation index as a target evaluation index when the relative volatility in the remaining time sequence data curve to be evaluated is smaller than a second preset volatility threshold value.
6. The method for evaluating data quality of industrial equipment according to claim 1, wherein obtaining a similarity between at least two time series data curves to be evaluated and an abnormal time series data curve in the preset case base according to at least one target evaluation index comprises:
and acquiring a third similarity between an abnormal time sequence data curve in the preset case base and at least two time sequence data curves to be evaluated under at least one target evaluation index.
7. The method for evaluating the data quality of the industrial equipment according to claim 6, wherein obtaining the evaluation results of the data quality of at least two time series data curves to be evaluated according to the similarity comprises:
carrying out normalization processing on the third similarity to obtain a normalized similarity index of the third similarity;
when the normalized similarity index is greater than or equal to a preset similarity index threshold, obtaining any one time series data curve to be evaluated in at least two corresponding time series data curves to be evaluated, wherein the time series data curve to be evaluated is an evaluation result of the time series data curve with the same or similar quality problem as the abnormal time series data curve;
and when the normalized similarity index is smaller than a preset similarity index threshold, obtaining any one to-be-evaluated time sequence data curve of the at least two corresponding to-be-evaluated time sequence data curves, wherein the obtained to-be-evaluated time sequence data curve is an evaluation result of a normal time sequence data curve.
8. An apparatus for evaluating data quality of an industrial device, the apparatus comprising:
the acquisition module is used for acquiring at least two time sequence data curves to be evaluated of the time sequence data of the industrial equipment;
the processing module is used for acquiring the relative volatility of at least two time sequence data curves to be evaluated and a normal time sequence data curve in a preset case base under at least one evaluation index according to the at least one evaluation index; determining at least one target evaluation index in at least one evaluation index according to the relative fluctuation of at least one evaluation index; according to at least one target evaluation index, obtaining the similarity between at least two time sequence data curves to be evaluated and abnormal time sequence data curves in the preset case library; and obtaining the evaluation result of the data quality of at least two time sequence data curves to be evaluated according to the similarity.
9. A computing device, comprising: a processor, a memory storing a computer program which, when executed by the processor, performs the method of any of claims 1 to 7.
10. A computer-readable storage medium storing instructions that, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210491936.3A CN114580982B (en) | 2022-05-07 | 2022-05-07 | Method, device and equipment for evaluating data quality of industrial equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210491936.3A CN114580982B (en) | 2022-05-07 | 2022-05-07 | Method, device and equipment for evaluating data quality of industrial equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114580982A true CN114580982A (en) | 2022-06-03 |
CN114580982B CN114580982B (en) | 2022-08-05 |
Family
ID=81769267
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210491936.3A Active CN114580982B (en) | 2022-05-07 | 2022-05-07 | Method, device and equipment for evaluating data quality of industrial equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114580982B (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9390112B1 (en) * | 2013-11-22 | 2016-07-12 | Groupon, Inc. | Automated dynamic data quality assessment |
CN110188090A (en) * | 2019-06-17 | 2019-08-30 | 合肥优尔电子科技有限公司 | A kind of distribution topological data method for evaluating quality and device based on data mining |
CN111898903A (en) * | 2020-07-28 | 2020-11-06 | 北京科技大学 | Method and system for evaluating uniformity and comprehensive quality of steel product |
CN112800116A (en) * | 2021-04-08 | 2021-05-14 | 腾讯科技(深圳)有限公司 | Method and device for detecting abnormity of service data |
US20210216386A1 (en) * | 2018-07-23 | 2021-07-15 | Mitsubishi Electric Corporation | Time-sequential data diagnosis device, additional learning method, and recording medium |
CN113434970A (en) * | 2021-06-01 | 2021-09-24 | 北京交通大学 | Health index curve extraction and service life prediction method for mechanical equipment |
CN113468034A (en) * | 2021-07-07 | 2021-10-01 | 浙江大华技术股份有限公司 | Data quality evaluation method and device, storage medium and electronic equipment |
WO2021212752A1 (en) * | 2020-04-23 | 2021-10-28 | 平安科技(深圳)有限公司 | Device index data-based anomaly detection method and apparatus, device, and storage medium |
CN113986908A (en) * | 2021-12-24 | 2022-01-28 | 昆仑智汇数据科技(北京)有限公司 | Industrial equipment data processing method, device and equipment |
CN114153926A (en) * | 2021-11-26 | 2022-03-08 | 中国船级社 | Data quality evaluation method and device, computer equipment and storage medium |
CN114331195A (en) * | 2021-12-27 | 2022-04-12 | 北京科技大学 | Process curve risk evaluation method for influencing overall length quality of hot-rolled strip steel |
CN114444608A (en) * | 2022-02-08 | 2022-05-06 | 中国电信股份有限公司 | Data set quality evaluation method and device, electronic equipment and storage medium |
-
2022
- 2022-05-07 CN CN202210491936.3A patent/CN114580982B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9390112B1 (en) * | 2013-11-22 | 2016-07-12 | Groupon, Inc. | Automated dynamic data quality assessment |
US20210216386A1 (en) * | 2018-07-23 | 2021-07-15 | Mitsubishi Electric Corporation | Time-sequential data diagnosis device, additional learning method, and recording medium |
CN110188090A (en) * | 2019-06-17 | 2019-08-30 | 合肥优尔电子科技有限公司 | A kind of distribution topological data method for evaluating quality and device based on data mining |
WO2021212752A1 (en) * | 2020-04-23 | 2021-10-28 | 平安科技(深圳)有限公司 | Device index data-based anomaly detection method and apparatus, device, and storage medium |
CN111898903A (en) * | 2020-07-28 | 2020-11-06 | 北京科技大学 | Method and system for evaluating uniformity and comprehensive quality of steel product |
CN112800116A (en) * | 2021-04-08 | 2021-05-14 | 腾讯科技(深圳)有限公司 | Method and device for detecting abnormity of service data |
CN113434970A (en) * | 2021-06-01 | 2021-09-24 | 北京交通大学 | Health index curve extraction and service life prediction method for mechanical equipment |
CN113468034A (en) * | 2021-07-07 | 2021-10-01 | 浙江大华技术股份有限公司 | Data quality evaluation method and device, storage medium and electronic equipment |
CN114153926A (en) * | 2021-11-26 | 2022-03-08 | 中国船级社 | Data quality evaluation method and device, computer equipment and storage medium |
CN113986908A (en) * | 2021-12-24 | 2022-01-28 | 昆仑智汇数据科技(北京)有限公司 | Industrial equipment data processing method, device and equipment |
CN114331195A (en) * | 2021-12-27 | 2022-04-12 | 北京科技大学 | Process curve risk evaluation method for influencing overall length quality of hot-rolled strip steel |
CN114444608A (en) * | 2022-02-08 | 2022-05-06 | 中国电信股份有限公司 | Data set quality evaluation method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN114580982B (en) | 2022-08-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109934268B (en) | Abnormal transaction detection method and system | |
KR101799603B1 (en) | Automatic fault detection and classification in a plasma processing system and methods thereof | |
CN113792825A (en) | Fault classification model training method and device for electricity information acquisition equipment | |
CN108009740B (en) | Intelligent fine identification system and method for tobacco essence and flavor | |
CN105630656A (en) | Log model based system robustness analysis method and apparatus | |
CN111133396B (en) | Production facility monitoring device, production facility monitoring method, and recording medium | |
CN115168868A (en) | Business vulnerability analysis method and server applied to artificial intelligence | |
US9400868B2 (en) | Method computer program and system to analyze mass spectra | |
KR20210065751A (en) | System and method for estimating a missing value | |
CN114580982B (en) | Method, device and equipment for evaluating data quality of industrial equipment | |
CN107067034B (en) | Method and system for rapidly identifying infrared spectrum data classification | |
CN116559619A (en) | Method and related apparatus for testing semiconductor device | |
CN115659271A (en) | Sensor abnormality detection method, model training method, system, device, and medium | |
CN113127342B (en) | Defect prediction method and device based on power grid information system feature selection | |
CN111382052A (en) | Code quality evaluation method and device and electronic equipment | |
CN115904955A (en) | Performance index diagnosis method and device, terminal equipment and storage medium | |
US8027764B2 (en) | Method and system for automatic test data generation for lookup tables | |
CN108763092B (en) | Code defect detection method and device based on cross validation | |
CN111984515A (en) | Multi-source heterogeneous log analysis method | |
CN111199419A (en) | Method and system for identifying abnormal stock transaction | |
CN110516659A (en) | The recognition methods of ball-screw catagen phase, device, equipment and storage medium | |
US20180137270A1 (en) | Method and apparatus for non-intrusive program tracing for embedded computing systems | |
CN112559602B (en) | Method and system for determining target sample of industrial equipment symptom | |
CN117826771B (en) | Cold rolling mill control system abnormality detection method and system based on AI analysis | |
CN116067618B (en) | Automatic production and adjustment method for 800G high-speed optical module |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |