CN114580982A - Method, device and equipment for evaluating data quality of industrial equipment - Google Patents

Method, device and equipment for evaluating data quality of industrial equipment Download PDF

Info

Publication number
CN114580982A
CN114580982A CN202210491936.3A CN202210491936A CN114580982A CN 114580982 A CN114580982 A CN 114580982A CN 202210491936 A CN202210491936 A CN 202210491936A CN 114580982 A CN114580982 A CN 114580982A
Authority
CN
China
Prior art keywords
time sequence
evaluated
similarity
sequence data
evaluation index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210491936.3A
Other languages
Chinese (zh)
Other versions
CN114580982B (en
Inventor
田春华
张硕
徐地
袁文飞
孟越
胡坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunlun Intellectual Exchange Data Technology Beijing Co ltd
Original Assignee
Kunlun Intellectual Exchange Data Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunlun Intellectual Exchange Data Technology Beijing Co ltd filed Critical Kunlun Intellectual Exchange Data Technology Beijing Co ltd
Priority to CN202210491936.3A priority Critical patent/CN114580982B/en
Publication of CN114580982A publication Critical patent/CN114580982A/en
Application granted granted Critical
Publication of CN114580982B publication Critical patent/CN114580982B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Probability & Statistics with Applications (AREA)
  • Marketing (AREA)
  • Educational Administration (AREA)
  • Tourism & Hospitality (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • General Factory Administration (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an evaluation method, a device and equipment for data quality of industrial equipment, wherein the method comprises the following steps: acquiring at least two time sequence data curves to be evaluated of the time sequence data of the industrial equipment; obtaining relative fluctuation of at least two time sequence data curves to be evaluated and normal time sequence data curves in a preset case base under at least one evaluation index; determining at least one target evaluation index in at least one evaluation index according to the relative fluctuation of at least one evaluation index; according to at least one target evaluation index, obtaining the similarity between at least two time sequence data curves to be evaluated and abnormal time sequence data curves in the preset case library; and obtaining the evaluation result of the data quality of at least two time sequence data curves to be evaluated according to the similarity. The scheme of the invention can improve the accuracy and efficiency of the quality evaluation of the time sequence data of the industrial equipment.

Description

Method, device and equipment for evaluating data quality of industrial equipment
Technical Field
The present invention relates to the field of data processing technology for industrial equipment, and in particular, to a method, an apparatus, and a device for evaluating data quality of industrial equipment.
Background
In the process of processing the time series data of the industrial equipment, the quality of the time series data is important; due to the fact that the number of time sequence data generated by different industrial equipment cases is large, the types are complicated, and the difference among the data is large, general study and judgment logic is lacked for the quality problem of the time sequence data, and the quality of the time sequence data cannot be evaluated and analyzed timely and effectively; the data of an inexperienced data analyst is analyzed abnormally, so that the subjectivity is high, and whether the quality problem exists in the time series data or not can not be accurately analyzed.
Disclosure of Invention
The technical problem to be solved by the invention is how to provide a method, a device and equipment for evaluating the data quality of industrial equipment so as to overcome the problems in the prior art.
To solve the above technical problem, an embodiment of the present invention provides a method for evaluating data quality of an industrial device, where the method includes:
acquiring at least two time sequence data curves to be evaluated of the time sequence data of the industrial equipment;
obtaining relative fluctuation of at least two time sequence data curves to be evaluated and normal time sequence data curves in a preset case base under at least one evaluation index;
determining at least one target evaluation index in at least one evaluation index according to the relative fluctuation of at least one evaluation index;
according to at least one target evaluation index, obtaining the similarity between at least two time sequence data curves to be evaluated and abnormal time sequence data curves in the preset case library;
and obtaining the evaluation result of the data quality of at least two time sequence data curves to be evaluated according to the similarity.
Optionally, obtaining relative volatility of the at least two time series data curves to be evaluated and the normal time series data curve in the preset case base under at least one evaluation index includes:
acquiring a first similarity between at least two time series data curves to be evaluated under at least one evaluation index;
acquiring a second similarity between any one to-be-evaluated time sequence data curve of at least two to-be-evaluated time sequence data curves under at least one evaluation index and a normal time sequence data curve in a preset case library;
obtaining the relative volatility of the time sequence data curve to be evaluated and a normal time sequence data curve in a preset case base under at least one evaluation index according to the index value of the first similarity and the index value of the second similarity; the index value includes: mean and variance of similarity.
Optionally, obtaining, according to the index value of the first similarity and the index value of the second similarity, relative volatility of the time series data curve to be evaluated and a normal time series data curve in a preset case base under at least one evaluation index includes:
according to the formula, relative volatility = (mean of second similarity/variance of second similarity)/(mean of first similarity/variance of first similarity), which is obtained.
Optionally, determining at least one target evaluation index in the at least one evaluation index according to the relative fluctuation of the at least one evaluation index includes:
and determining at least one target evaluation index in at least one evaluation index according to the relative volatility of at least one evaluation index and a preset volatility threshold value.
Optionally, determining at least one target evaluation indicator in the at least one evaluation indicator according to the relative volatility of the at least one evaluation indicator and a preset volatility threshold, including: when the relative volatility under the at least one evaluation index is larger than a first preset volatility threshold value, filtering the corresponding time sequence data curve to be evaluated to obtain the remaining time sequence data curve to be evaluated;
and determining at least one corresponding evaluation index as a target evaluation index when the relative volatility in the remaining time sequence data curve to be evaluated is smaller than a second preset volatility threshold value.
Optionally, obtaining the similarity between at least two time series data curves to be evaluated and an abnormal time series data curve in the preset case library according to at least one target evaluation index includes:
and acquiring a third similarity between an abnormal time sequence data curve in the preset case base and at least two time sequence data curves to be evaluated under at least one target evaluation index.
Optionally, obtaining an evaluation result of the data quality of at least two time series data curves to be evaluated according to the similarity includes:
carrying out normalization processing on the third similarity to obtain a normalized similarity index of the third similarity;
when the normalized similarity index is greater than or equal to a preset similarity index threshold, obtaining any one time series data curve to be evaluated in at least two corresponding time series data curves to be evaluated, wherein the time series data curve to be evaluated is an evaluation result of the time series data curve with the same or similar quality problem as the abnormal time series data curve;
and when the normalized similarity index is smaller than a preset similarity index threshold, obtaining any one to-be-evaluated time sequence data curve of the at least two corresponding to-be-evaluated time sequence data curves, wherein the obtained to-be-evaluated time sequence data curve is an evaluation result of a normal time sequence data curve.
An embodiment of the present invention also provides an apparatus for evaluating data quality of an industrial device, the apparatus including:
the acquisition module is used for acquiring at least two time sequence data curves to be evaluated of the time sequence data of the industrial equipment;
the processing module is used for acquiring the relative volatility of at least two time sequence data curves to be evaluated and normal time sequence data curves in a preset case base under at least one evaluation index according to the at least one evaluation index; determining at least one target evaluation index in at least one evaluation index according to the relative fluctuation of at least one evaluation index; according to at least one target evaluation index, obtaining the similarity between at least two time sequence data curves to be evaluated and abnormal time sequence data curves in the preset case library; and obtaining the evaluation result of the data quality of at least two time sequence data curves to be evaluated according to the similarity.
Embodiments of the present invention also provide a computing device, comprising: a processor, a memory storing a computer program which, when executed by the processor, performs the method according to any of the preceding claims.
Embodiments of the present invention also provide a computer-readable storage medium storing instructions that, when executed on a computer, cause the computer to perform the method of any of the above.
The scheme of the invention at least comprises the following beneficial effects:
obtaining at least two time sequence data curves to be evaluated of the time sequence data of the industrial equipment; obtaining relative fluctuation of at least two time sequence data curves to be evaluated and normal time sequence data curves in a preset case base under at least one evaluation index; determining at least one target evaluation index in at least one evaluation index according to the relative fluctuation of at least one evaluation index; according to at least one target evaluation index, obtaining the similarity between at least two time sequence data curves to be evaluated and abnormal time sequence data curves in the preset case library; obtaining the evaluation results of the data quality of at least two time sequence data curves to be evaluated according to the similarity; the accuracy and timeliness of the quality problem analysis of the time sequence data of the industrial equipment are improved.
Drawings
FIG. 1 is a schematic flow chart of a method for evaluating data quality of an industrial device according to an embodiment of the present invention;
FIG. 2 is a graph of time series data under different evaluation indexes provided by the embodiment of the present invention;
FIG. 3 is a schematic diagram of a specific implementation flow of the method for evaluating data quality of industrial equipment according to the embodiment of the present invention;
FIG. 4 is a schematic diagram of a specific application of the method for evaluating data quality of an industrial device according to an embodiment of the present invention;
fig. 5 is a block diagram of an apparatus for evaluating data quality of an industrial device according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As shown in fig. 1, an embodiment of the present invention provides a method for evaluating data quality of an industrial device, the method including:
step 11, acquiring at least two time sequence data curves to be evaluated of the time sequence data of the industrial equipment;
step 12, obtaining the relative volatility of at least two time sequence data curves to be evaluated and normal time sequence data curves in a preset case library under at least one evaluation index;
step 13, determining at least one target evaluation index in at least one evaluation index according to the relative fluctuation of at least one evaluation index;
step 14, according to at least one target evaluation index, obtaining the similarity between at least two time sequence data curves to be evaluated and abnormal time sequence data curves in the preset case library;
and step 15, obtaining the evaluation result of the data quality of at least two time sequence data curves to be evaluated according to the similarity.
In this embodiment, the time series data curve to be evaluated of the time series data of the industrial equipment may be obtained by processing the original time series data of the industrial equipment, for example, an experienced data analyst marks each data point as a corresponding time series data curve according to the original time series data of the industrial equipment in a time sequence; the time sequence data forming the time sequence data curve to be evaluated can be long time sequence data or short time sequence data; when the data is long-time sequence data, a sliding window can be set to divide the data into short-time sequence data; the time sequence data forming the time sequence data curve to be evaluated can be time sequence data with single variable or time sequence data with multiple variables;
the preset case base stores cases formed by determining the historical industrial equipment time sequence data as a normal time sequence data curve or an abnormal time sequence data curve, wherein the cases comprise a plurality of different cases; meanwhile, the abnormal time sequence data curve cases in the preset case base correspond to respective labels so as to distinguish different abnormal problems;
the evaluation index may be an index for evaluating the correlation characteristic between two time series data curves from different dimensions, and the evaluation index may be a plurality of evaluation indexes which can be listed by an enumeration method, such as frequency domain, time domain and other evaluation indexes; acquiring the relative volatility of each time sequence data curve of at least two time sequence data curves to be evaluated and at least one normal curve in the preset case base under at least one evaluation index; it should be understood that when the evaluation indexes are different, the obtained relative volatility is also different, so that the evaluation indexes and the relative volatility are in one-to-one correspondence; when the actual value of the relative volatility corresponding to at least one evaluation index meets a preset condition, determining the evaluation index corresponding to the relative volatility as a target evaluation index, and subsequently acquiring the similarity of each time series data curve of at least two time series data curves to be evaluated and at least one abnormal curve in the preset case library under the target evaluation index based on at least one target evaluation index; obtaining an evaluation result of the time sequence data quality in each of at least two time sequence data curves to be evaluated according to the similarity; the quality problem of the time sequence data of the industrial equipment is quickly and accurately positioned, and the working process of the industrial equipment can be further optimized;
the method comprises the steps of calculating a time sequence data curve to be evaluated of industrial equipment and a normal time sequence data curve in a preset case base, determining a target evaluation index according to similar volatility under different indexes, determining the accuracy of subsequent quality evaluation based on the target evaluation index according to the relative volatility, evaluating the time sequence data curve to be evaluated and an abnormal time sequence data curve in the preset case base according to the target evaluation index, and quickly positioning the quality problem of the time sequence data curve to be evaluated.
In an optional embodiment of the present invention, the step 12 may include:
step 121, obtaining a first similarity between at least two time series data curves to be evaluated under at least one evaluation index;
step 122, acquiring a second similarity between any one to-be-evaluated time series data curve of the at least two to-be-evaluated time series data curves under at least one evaluation index and a normal time series data curve in a preset case library;
step 123, obtaining the relative volatility of the time sequence data curve to be evaluated and a normal time sequence data curve in a preset case base under at least one evaluation index according to the index value of the first similarity and the index value of the second similarity; the index value includes: mean and variance of similarity.
In this embodiment, the first similarity and the second similarity may be calculated by euclidean distance or Dtw; as shown in fig. 2, since the different evaluation indexes have different emphasis properties, the similarity obtained by corresponding calculation under the different evaluation indexes is different; it should be noted that, under the same evaluation index, the similarity calculated between data curves in different orders should be different;
the first similarity can reflect the similarity between any two time sequence data curves to be evaluated in at least two time sequence data curves to be evaluated under the same evaluation index, so that the first similarity is actually a set of similarities between different time sequence data curves to be evaluated in at least two time sequence data curves to be evaluated under the same index;
the second similarity can reflect the similarity between at least two time sequence data curves to be evaluated and the normal time sequence data curves in the preset case library under the same evaluation index, so that the second similarity is a set of the similarities between the time sequence data curves to be evaluated under the same index and the normal time sequence data curves in the preset case library;
obtaining the relative volatility under any evaluation index according to index values by respectively calculating the index values of the first similarity and the second similarity; the index value may be a mean and a variance of the similarity, and it should be understood that the index value is not limited to the mean and the variance.
Further, the relative volatility may be obtained by a formula, where the relative volatility = (mean of the second similarity/variance of the second similarity)/(mean of the first similarity/variance of the first similarity);
in this embodiment, it should be noted that one evaluation index corresponds to one relative volatility; and reflecting the volatility of the similarity degree of at least two time sequence data curves to be evaluated and the normal time sequence data curves in the preset case library under the corresponding evaluation indexes through the relative volatility.
In an optional embodiment of the present invention, the step 13 may include:
and 131, determining at least one target evaluation index in at least one evaluation index according to the relative volatility of at least one evaluation index and a preset volatility threshold.
In this embodiment, the preset volatility threshold may be set according to a requirement of an actual industrial device, and at least one target evaluation index in at least one evaluation index is determined by comparing the relative volatility under the at least one evaluation index with the preset volatility threshold; and determining the target evaluation index according to the relative fluctuation, and ensuring the accuracy of subsequent evaluation on the time series data curve based on the target evaluation index.
In an optional embodiment of the present invention, the step 131 may include:
step 1311, when the relative volatility of at least one evaluation index is greater than a first preset volatility threshold, filtering out a corresponding time series data curve to be evaluated, and obtaining a remaining time series data curve to be evaluated;
step 1312, determining at least one corresponding evaluation index in the remaining time series data curve to be evaluated as a target evaluation index when the relative volatility is smaller than a second preset volatility threshold value.
In this embodiment, both the first preset volatility threshold and the second preset volatility threshold may be set according to actual needs, and it should be understood that the first volatility threshold is greater than the second volatility threshold;
when the relative volatility under at least one evaluation index is greater than the first preset volatility threshold, it indicates that the volatility of the similarity degree of at least two time series data curves to be evaluated and normal time series data curves in the preset case library is large under the corresponding evaluation index, and further indicates that the similarity degree of the curves is low, and then a sample under the evaluation index should be filtered;
in the remaining time series data curves to be evaluated, when the relative volatility under at least one evaluation index is smaller than the second preset volatility threshold, it is indicated that the volatility of the similarity degree of the remaining time series data curves to be evaluated and the normal time series data curves in the preset case library is small under the evaluation index, and further, the curve similarity is high, and then it should be determined that the evaluation value index corresponding to the relative volatility smaller than the second preset volatility threshold in the remaining time series data curves to be evaluated is a target evaluation index; and filtering out target evaluation indexes by screening, and further improving the accuracy of the quality evaluation of the subsequent time sequence data curve to be evaluated.
In an optional embodiment of the present invention, the step 14 may include:
step 141, obtaining a third similarity between the abnormal time series data curve in the preset case base and the at least two time series data curves to be evaluated under the at least one target evaluation index.
In this embodiment, the third similarity may also be calculated by methods such as euclidean distance or Dtw; the third similarity is the similarity between each of the at least two time series data curves to be evaluated and an abnormal time series data curve in the preset case library, which is obtained by calculation under a plurality of different target evaluation indexes; sequentially and circularly calculating the third similarity of the corresponding time sequence data curve to be evaluated and the abnormal time sequence data curve in the preset case library under each target evaluation index according to at least one target evaluation index, and obtaining a third similarity set of the same time sequence data curve to be evaluated under different target evaluation indexes; and similarly, respectively obtaining third similarity sets of corresponding different time sequence data curves to be evaluated.
In an optional embodiment of the present invention, the step 15 may include:
step 151, performing normalization processing on the third similarity to obtain a normalized similarity index of the third similarity;
step 152a, when the normalized similarity index is greater than or equal to a preset similarity index threshold, obtaining any one of the corresponding at least two time series data curves to be evaluated, which is an evaluation result of a time series data curve having the same or similar quality problem as the abnormal time series data curve;
and 152b, when the normalized similarity index is smaller than a preset similarity index threshold, obtaining any one time series data curve to be evaluated in the corresponding at least two time series data curves to be evaluated, wherein the time series data curve to be evaluated is an evaluation result of a normal time series data curve.
In this embodiment, since the values of the third similarities obtained by calculation under different target evaluation indexes are different, normalization similarity indexes of the third similarities are obtained by normalizing different third similarities obtained by calculation under different target evaluation indexes, so that the quality problem of the destination of the time series data to be evaluated can be truly, accurately and timely reflected by a unified index; preferably, the normalized similarity index may be represented by the formula: the normalized similarity index = the mean value of the third similarity/the absolute value of the third similarity, and is obtained by calculation;
when the normalized similarity index is greater than or equal to the preset similarity index threshold, the fact that the similarity between the time sequence data curve to be evaluated and the abnormal time sequence data curve in the preset case library is high is shown, and the fact that the time sequence data curve to be evaluated and the abnormal time sequence data curve have the same or similar quality problem is shown at the moment; when the cases in the preset case base are large enough, and the abnormal curve of each case has a label with a corresponding quality problem, further obtaining the actual quality problem of the time sequence data curve to be evaluated, and taking corresponding measures aiming at the problem;
when the normalized similarity index is smaller than the preset similarity index threshold, it indicates that the similarity between the time sequence data curve to be evaluated and the abnormal time sequence data curve in the preset case library is low, and at this time, it indicates that the time sequence data curve to be evaluated is not similar to the abnormal time sequence data curve, and the time sequence data curve is a normal time sequence data curve.
The method will be described below by using a specific example, as shown in fig. 3, the specific flow is as follows,
step 31, acquiring relative volatility of a time sequence data curve to be evaluated under multiple evaluation indexes and a normal time sequence data curve in a preset case library;
step 32, determining a target evaluation index in the multiple evaluation indexes according to the relative volatility;
step 33, obtaining the relative degree between the time sequence data curve to be evaluated and the abnormal time sequence data curve in the preset case base according to the target evaluation index;
and step 34, obtaining an evaluation result that the time sequence data curve to be evaluated is a normal or abnormal time sequence data curve according to the similarity.
In the embodiment of the invention, target evaluation indexes in a plurality of evaluation indexes are screened out by the time sequence data curve to be evaluated and the normal time sequence data curve in the preset case base, and the time sequence data curve to be evaluated and the abnormal time sequence data curve in the preset case base are evaluated further according to the target evaluation indexes to obtain the evaluation result of the time sequence data curve to be evaluated; when the quality problem of a large amount of time sequence data of the industrial equipment is detected, the quality problem of the time sequence data of the industrial equipment can be quickly and accurately positioned by the method.
In practical application, as shown in fig. 4, at least two acquired time series data curves to be evaluated, a quality evaluation index set, and a normal time series data curve and an abnormal time series data curve in a preset case library of the time series data of the industrial equipment may be input into a quality problem identification function code generator, and then a quality problem function identification code is generated in the quality problem identification function code generator through different programming languages, code templates, and a similarity fluctuation function package and a similarity function package of the time series data curves, so that a quality problem existing in new time series data can be identified rapidly in the following step through the generated identification function code; the code template may be edited in a different programming language, such as the R language, although other programming languages may be implemented.
According to the embodiment of the invention, the evaluation results of the data quality of at least two time series data curves to be evaluated are obtained according to the similarity; if the similarity between the time sequence data curve to be evaluated and the abnormal time sequence data curve of the case base is greater than a preset value, the fact that the time sequence data curve to be evaluated has the quality problem the same as the abnormal time sequence data curve in the case base is indicated, when a large amount of industrial equipment time sequence data are detected and analyzed, automatic detection can be achieved through the method, and meanwhile the accuracy and timeliness of the quality problem analysis of the industrial equipment time sequence data are improved.
As shown in fig. 5, an embodiment of the present invention further provides an apparatus 50 for evaluating data quality of an industrial device, where the apparatus 50 includes:
the acquisition module 51 is used for acquiring at least two time sequence data curves to be evaluated of the time sequence data of the industrial equipment;
the processing module 52 is configured to obtain, according to at least one evaluation index, relative volatility of at least two time series data curves to be evaluated and a normal time series data curve in a preset case base under the at least one evaluation index; determining at least one target evaluation index in at least one evaluation index according to the relative fluctuation of at least one evaluation index; according to at least one target evaluation index, obtaining the similarity between at least two time sequence data curves to be evaluated and abnormal time sequence data curves in the preset case library; and obtaining the evaluation result of the data quality of at least two time sequence data curves to be evaluated according to the similarity.
Optionally, the processing module 52 is configured to obtain relative volatility of at least two time series data curves to be evaluated and a normal time series data curve in a preset case library under at least one evaluation index, and includes:
acquiring a first similarity between at least two time series data curves to be evaluated under at least one evaluation index;
acquiring a second similarity between any one to-be-evaluated time sequence data curve of at least two to-be-evaluated time sequence data curves under at least one evaluation index and a normal time sequence data curve in a preset case library;
obtaining the relative volatility of the time sequence data curve to be evaluated and a normal time sequence data curve in a preset case base under at least one evaluation index according to the index value of the first similarity and the index value of the second similarity; the index value includes: mean and variance of similarity.
Optionally, the processing module 52 is configured to obtain, according to the index value of the first similarity and the index value of the second similarity, relative volatility of the time series data curve to be evaluated and a normal time series data curve in a preset case base under at least one evaluation index, and includes:
relative volatility = (mean of second similarity/variance of second similarity)/(mean of first similarity/variance of first similarity), according to the formula, the relative volatility is obtained.
Optionally, the processing module 52 is configured to determine at least one target evaluation indicator in the at least one evaluation indicator according to the relative volatility of the at least one evaluation indicator, and includes:
and determining at least one target evaluation index in at least one evaluation index according to the relative volatility of at least one evaluation index and a preset volatility threshold value.
Optionally, the processing module 52 is configured to determine at least one target evaluation indicator in the at least one evaluation indicator according to the relative volatility of the at least one evaluation indicator and a preset volatility threshold, and includes: when the relative volatility under at least one evaluation index is greater than a first preset volatility threshold value, filtering the corresponding time sequence data curve to be evaluated to obtain the remaining time sequence data curve to be evaluated;
and determining at least one corresponding evaluation index as a target evaluation index when the relative volatility in the remaining time sequence data curve to be evaluated is smaller than a second preset volatility threshold value.
Optionally, the processing module 52 is configured to obtain, according to at least one of the target evaluation indexes, similarities between at least two time series data curves to be evaluated and an abnormal time series data curve in the preset case library, and includes:
and acquiring a third similarity between an abnormal time sequence data curve in the preset case base and at least two time sequence data curves to be evaluated under at least one target evaluation index.
Optionally, the processing module 52 is configured to obtain, according to the similarity, an evaluation result of data quality of at least two time series data curves to be evaluated, and the evaluation result includes:
carrying out normalization processing on the third similarity to obtain a normalized similarity index of the third similarity;
when the normalized similarity index is greater than or equal to a preset similarity index threshold, obtaining any one time series data curve to be evaluated in at least two corresponding time series data curves to be evaluated, wherein the time series data curve to be evaluated is an evaluation result of the time series data curve with the same or similar quality problem as the abnormal time series data curve;
and when the normalized similarity index is smaller than a preset similarity index threshold, obtaining any one to-be-evaluated time sequence data curve of the at least two corresponding to-be-evaluated time sequence data curves, wherein the obtained to-be-evaluated time sequence data curve is an evaluation result of a normal time sequence data curve.
It should be noted that the apparatus is an apparatus corresponding to the above method, and all the implementations in the above method embodiment are applicable to the embodiment of the apparatus, and the same technical effects can be achieved.
Embodiments of the present invention also provide a computing device, comprising: a processor, a memory storing a computer program which, when executed by the processor, performs the method as described above. All the implementation manners in the above method embodiment are applicable to this embodiment, and the same technical effect can be achieved.
Embodiments of the present invention also provide a computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the method as described above. All the implementation manners in the above method embodiment are applicable to this embodiment, and the same technical effect can be achieved.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
Furthermore, it is to be noted that in the device and method of the invention, it is obvious that the individual components or steps can be decomposed and/or recombined. These decompositions and/or recombinations are to be regarded as equivalents of the present invention. Also, the steps of performing the series of processes described above may naturally be performed chronologically in the order described, but need not necessarily be performed chronologically, and some steps may be performed in parallel or independently of each other. It will be understood by those skilled in the art that all or any of the steps or elements of the method and apparatus of the present invention may be implemented in any computing device (including processors, storage media, etc.) or network of computing devices, in hardware, firmware, software, or any combination thereof, which can be implemented by those skilled in the art using their basic programming skills after reading the description of the present invention.
Thus, the objects of the invention may also be achieved by running a program or a set of programs on any computing device. The computing device may be a general purpose device as is well known. The object of the invention is thus also achieved solely by providing a program product comprising program code for implementing the method or the apparatus. That is, such a program product also constitutes the present invention, and a storage medium storing such a program product also constitutes the present invention. It is to be understood that the storage medium may be any known storage medium or any storage medium developed in the future. It is further noted that in the apparatus and method of the present invention, it is apparent that each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be regarded as equivalents of the present invention. Also, the steps of executing the series of processes described above may naturally be executed chronologically in the order described, but need not necessarily be executed chronologically. Some steps may be performed in parallel or independently of each other.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (10)

1. A method of assessing data quality of an industrial device, the method comprising:
acquiring at least two time sequence data curves to be evaluated of the time sequence data of the industrial equipment;
obtaining relative fluctuation of at least two time sequence data curves to be evaluated and normal time sequence data curves in a preset case base under at least one evaluation index;
determining at least one target evaluation index in at least one evaluation index according to the relative fluctuation of at least one evaluation index;
according to at least one target evaluation index, obtaining the similarity between at least two time sequence data curves to be evaluated and abnormal time sequence data curves in the preset case library;
and obtaining the evaluation result of the data quality of at least two time sequence data curves to be evaluated according to the similarity.
2. The method for evaluating the data quality of the industrial equipment according to claim 1, wherein obtaining the relative volatility of at least two time series data curves to be evaluated and a normal time series data curve in a preset case base under at least one evaluation index comprises:
acquiring a first similarity between at least two time series data curves to be evaluated under at least one evaluation index;
acquiring a second similarity between any one to-be-evaluated time sequence data curve of at least two to-be-evaluated time sequence data curves under at least one evaluation index and a normal time sequence data curve in a preset case base;
obtaining the relative volatility of the time sequence data curve to be evaluated and a normal time sequence data curve in a preset case base under at least one evaluation index according to the index value of the first similarity and the index value of the second similarity; the index value includes: mean and variance of similarity.
3. The method according to claim 2, wherein obtaining the relative volatility of the time series data curve to be evaluated and a normal time series data curve in a preset case base under at least one evaluation index according to the index value of the first similarity and the index value of the second similarity comprises:
relative volatility = (mean of second similarity/variance of second similarity)/(mean of first similarity/variance of first similarity), according to the formula, the relative volatility is obtained.
4. The method for evaluating the data quality of the industrial equipment according to claim 2, wherein determining at least one target evaluation index of the at least one evaluation index according to the relative fluctuation of the at least one evaluation index comprises:
and determining at least one target evaluation index in at least one evaluation index according to the relative volatility of at least one evaluation index and a preset volatility threshold value.
5. The method for evaluating the data quality of the industrial equipment according to claim 4, wherein determining at least one target evaluation index in at least one evaluation index according to the relative volatility of the at least one evaluation index and a preset volatility threshold comprises:
when the relative volatility under at least one evaluation index is greater than a first preset volatility threshold value, filtering the corresponding time sequence data curve to be evaluated to obtain the remaining time sequence data curve to be evaluated;
and determining at least one corresponding evaluation index as a target evaluation index when the relative volatility in the remaining time sequence data curve to be evaluated is smaller than a second preset volatility threshold value.
6. The method for evaluating data quality of industrial equipment according to claim 1, wherein obtaining a similarity between at least two time series data curves to be evaluated and an abnormal time series data curve in the preset case base according to at least one target evaluation index comprises:
and acquiring a third similarity between an abnormal time sequence data curve in the preset case base and at least two time sequence data curves to be evaluated under at least one target evaluation index.
7. The method for evaluating the data quality of the industrial equipment according to claim 6, wherein obtaining the evaluation results of the data quality of at least two time series data curves to be evaluated according to the similarity comprises:
carrying out normalization processing on the third similarity to obtain a normalized similarity index of the third similarity;
when the normalized similarity index is greater than or equal to a preset similarity index threshold, obtaining any one time series data curve to be evaluated in at least two corresponding time series data curves to be evaluated, wherein the time series data curve to be evaluated is an evaluation result of the time series data curve with the same or similar quality problem as the abnormal time series data curve;
and when the normalized similarity index is smaller than a preset similarity index threshold, obtaining any one to-be-evaluated time sequence data curve of the at least two corresponding to-be-evaluated time sequence data curves, wherein the obtained to-be-evaluated time sequence data curve is an evaluation result of a normal time sequence data curve.
8. An apparatus for evaluating data quality of an industrial device, the apparatus comprising:
the acquisition module is used for acquiring at least two time sequence data curves to be evaluated of the time sequence data of the industrial equipment;
the processing module is used for acquiring the relative volatility of at least two time sequence data curves to be evaluated and a normal time sequence data curve in a preset case base under at least one evaluation index according to the at least one evaluation index; determining at least one target evaluation index in at least one evaluation index according to the relative fluctuation of at least one evaluation index; according to at least one target evaluation index, obtaining the similarity between at least two time sequence data curves to be evaluated and abnormal time sequence data curves in the preset case library; and obtaining the evaluation result of the data quality of at least two time sequence data curves to be evaluated according to the similarity.
9. A computing device, comprising: a processor, a memory storing a computer program which, when executed by the processor, performs the method of any of claims 1 to 7.
10. A computer-readable storage medium storing instructions that, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 7.
CN202210491936.3A 2022-05-07 2022-05-07 Method, device and equipment for evaluating data quality of industrial equipment Active CN114580982B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210491936.3A CN114580982B (en) 2022-05-07 2022-05-07 Method, device and equipment for evaluating data quality of industrial equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210491936.3A CN114580982B (en) 2022-05-07 2022-05-07 Method, device and equipment for evaluating data quality of industrial equipment

Publications (2)

Publication Number Publication Date
CN114580982A true CN114580982A (en) 2022-06-03
CN114580982B CN114580982B (en) 2022-08-05

Family

ID=81769267

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210491936.3A Active CN114580982B (en) 2022-05-07 2022-05-07 Method, device and equipment for evaluating data quality of industrial equipment

Country Status (1)

Country Link
CN (1) CN114580982B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9390112B1 (en) * 2013-11-22 2016-07-12 Groupon, Inc. Automated dynamic data quality assessment
CN110188090A (en) * 2019-06-17 2019-08-30 合肥优尔电子科技有限公司 A kind of distribution topological data method for evaluating quality and device based on data mining
CN111898903A (en) * 2020-07-28 2020-11-06 北京科技大学 Method and system for evaluating uniformity and comprehensive quality of steel product
CN112800116A (en) * 2021-04-08 2021-05-14 腾讯科技(深圳)有限公司 Method and device for detecting abnormity of service data
US20210216386A1 (en) * 2018-07-23 2021-07-15 Mitsubishi Electric Corporation Time-sequential data diagnosis device, additional learning method, and recording medium
CN113434970A (en) * 2021-06-01 2021-09-24 北京交通大学 Health index curve extraction and service life prediction method for mechanical equipment
CN113468034A (en) * 2021-07-07 2021-10-01 浙江大华技术股份有限公司 Data quality evaluation method and device, storage medium and electronic equipment
WO2021212752A1 (en) * 2020-04-23 2021-10-28 平安科技(深圳)有限公司 Device index data-based anomaly detection method and apparatus, device, and storage medium
CN113986908A (en) * 2021-12-24 2022-01-28 昆仑智汇数据科技(北京)有限公司 Industrial equipment data processing method, device and equipment
CN114153926A (en) * 2021-11-26 2022-03-08 中国船级社 Data quality evaluation method and device, computer equipment and storage medium
CN114331195A (en) * 2021-12-27 2022-04-12 北京科技大学 Process curve risk evaluation method for influencing overall length quality of hot-rolled strip steel
CN114444608A (en) * 2022-02-08 2022-05-06 中国电信股份有限公司 Data set quality evaluation method and device, electronic equipment and storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9390112B1 (en) * 2013-11-22 2016-07-12 Groupon, Inc. Automated dynamic data quality assessment
US20210216386A1 (en) * 2018-07-23 2021-07-15 Mitsubishi Electric Corporation Time-sequential data diagnosis device, additional learning method, and recording medium
CN110188090A (en) * 2019-06-17 2019-08-30 合肥优尔电子科技有限公司 A kind of distribution topological data method for evaluating quality and device based on data mining
WO2021212752A1 (en) * 2020-04-23 2021-10-28 平安科技(深圳)有限公司 Device index data-based anomaly detection method and apparatus, device, and storage medium
CN111898903A (en) * 2020-07-28 2020-11-06 北京科技大学 Method and system for evaluating uniformity and comprehensive quality of steel product
CN112800116A (en) * 2021-04-08 2021-05-14 腾讯科技(深圳)有限公司 Method and device for detecting abnormity of service data
CN113434970A (en) * 2021-06-01 2021-09-24 北京交通大学 Health index curve extraction and service life prediction method for mechanical equipment
CN113468034A (en) * 2021-07-07 2021-10-01 浙江大华技术股份有限公司 Data quality evaluation method and device, storage medium and electronic equipment
CN114153926A (en) * 2021-11-26 2022-03-08 中国船级社 Data quality evaluation method and device, computer equipment and storage medium
CN113986908A (en) * 2021-12-24 2022-01-28 昆仑智汇数据科技(北京)有限公司 Industrial equipment data processing method, device and equipment
CN114331195A (en) * 2021-12-27 2022-04-12 北京科技大学 Process curve risk evaluation method for influencing overall length quality of hot-rolled strip steel
CN114444608A (en) * 2022-02-08 2022-05-06 中国电信股份有限公司 Data set quality evaluation method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114580982B (en) 2022-08-05

Similar Documents

Publication Publication Date Title
CN109934268B (en) Abnormal transaction detection method and system
KR101799603B1 (en) Automatic fault detection and classification in a plasma processing system and methods thereof
CN113792825A (en) Fault classification model training method and device for electricity information acquisition equipment
CN108009740B (en) Intelligent fine identification system and method for tobacco essence and flavor
CN105630656A (en) Log model based system robustness analysis method and apparatus
CN111133396B (en) Production facility monitoring device, production facility monitoring method, and recording medium
CN115168868A (en) Business vulnerability analysis method and server applied to artificial intelligence
US9400868B2 (en) Method computer program and system to analyze mass spectra
KR20210065751A (en) System and method for estimating a missing value
CN114580982B (en) Method, device and equipment for evaluating data quality of industrial equipment
CN107067034B (en) Method and system for rapidly identifying infrared spectrum data classification
CN116559619A (en) Method and related apparatus for testing semiconductor device
CN115659271A (en) Sensor abnormality detection method, model training method, system, device, and medium
CN113127342B (en) Defect prediction method and device based on power grid information system feature selection
CN111382052A (en) Code quality evaluation method and device and electronic equipment
CN115904955A (en) Performance index diagnosis method and device, terminal equipment and storage medium
US8027764B2 (en) Method and system for automatic test data generation for lookup tables
CN108763092B (en) Code defect detection method and device based on cross validation
CN111984515A (en) Multi-source heterogeneous log analysis method
CN111199419A (en) Method and system for identifying abnormal stock transaction
CN110516659A (en) The recognition methods of ball-screw catagen phase, device, equipment and storage medium
US20180137270A1 (en) Method and apparatus for non-intrusive program tracing for embedded computing systems
CN112559602B (en) Method and system for determining target sample of industrial equipment symptom
CN117826771B (en) Cold rolling mill control system abnormality detection method and system based on AI analysis
CN116067618B (en) Automatic production and adjustment method for 800G high-speed optical module

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant