CN112559602A - Method and system for determining target sample of industrial equipment symptom - Google Patents

Method and system for determining target sample of industrial equipment symptom Download PDF

Info

Publication number
CN112559602A
CN112559602A CN202110194751.1A CN202110194751A CN112559602A CN 112559602 A CN112559602 A CN 112559602A CN 202110194751 A CN202110194751 A CN 202110194751A CN 112559602 A CN112559602 A CN 112559602A
Authority
CN
China
Prior art keywords
sample
evaluated
similarity
determining
symptom
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110194751.1A
Other languages
Chinese (zh)
Other versions
CN112559602B (en
Inventor
田春华
李闯
马国�
张�浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Innovation Center For Industrial Big Data Co ltd
Original Assignee
Beijing Innovation Center For Industrial Big Data Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Innovation Center For Industrial Big Data Co ltd filed Critical Beijing Innovation Center For Industrial Big Data Co ltd
Priority to CN202110194751.1A priority Critical patent/CN112559602B/en
Publication of CN112559602A publication Critical patent/CN112559602A/en
Application granted granted Critical
Publication of CN112559602B publication Critical patent/CN112559602B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a system for determining a target sample of an industrial equipment symptom, wherein the method comprises the following steps: acquiring the similarity of a sample to be evaluated of a symptom and a reference sample of the symptom; evaluating the sample to be evaluated of the symptom according to the similarity to obtain a target sample set, wherein the target sample set comprises: and the confusion degree with the sample to be evaluated of the symptom is larger than the target sample of the symptom with the preset value. According to the technical scheme, the 'plausible' cases are automatically searched in historical data through a small number of cases, so that experts can distinguish effective symptoms and false symptoms in a targeted mode, and a more exact case is provided for modeling.

Description

Method and system for determining target sample of industrial equipment symptom
Technical Field
The invention relates to the technical field of industry, in particular to a method and a system for determining a target sample of an industrial equipment symptom.
Background
When characterizing the logic of a device failure, a description of the symptoms is usually involved. The symptoms are visual, but quantitative depiction is not easy; the symptoms are a time series of key technical indexes (such as temperature, leakage amount, 1X amplitude of vibration and the like), and represent a certain trend (such as slow rise/fall) or form (such as existence of burrs, oscillation and the like).
The current case collecting means are as follows: the situation coverage of the case is very limited, which is not beneficial to the precise depiction during modeling, and comprises the following aspects:
1) past failure cases: the number is usually very limited and the coverage of the scene is limited
2) Domain experts' own memory or manual review: coverage is too low for IT modeling or implementation
3) The IT expert found in the test: the iteration period is long, and the requirement on intelligence and responsibility of IT experts is high.
Disclosure of Invention
The embodiment of the invention provides a method and a system for determining a target sample of industrial equipment symptoms, which are used for automatically searching 'plausible' cases in historical data through a small number of cases as samples to be evaluated so as to determine the target sample, and according to the target samples in a target sample set, which symptoms are effective symptoms and which are false symptoms can be more easily distinguished.
In order to solve the above technical problem, an embodiment of the present invention provides the following technical solutions:
a method of determining a target sample of industrial equipment symptoms, comprising:
acquiring the similarity of a sample to be evaluated of a symptom and a reference sample of the symptom;
evaluating the sample to be evaluated of the symptom according to the similarity to obtain a target sample set, wherein the target sample set comprises: and the confusion degree with the sample to be evaluated of the symptom is larger than the target sample of the symptom with the preset value.
Optionally, the obtaining of the similarity between the sample to be evaluated of the symptom and the reference sample of the symptom includes:
performing time sequence decomposition on the original time sequence of the reference sample to obtain a time sequence decomposition result;
determining a similarity evaluation index set according to the time sequence decomposition result;
and according to the similarity evaluation index set, obtaining the similarity between the sample to be evaluated and the reference sample.
Optionally, performing time sequence decomposition on the original time sequence of the reference sample to obtain a time sequence decomposition result, including:
performing time sequence decomposition on the original time sequence of the reference sample to obtain a trend item, a period item and a residual error item;
and determining a similarity evaluation index set according to the trend item, the period item and the residual error item of the reference sample.
Optionally, obtaining the similarity between the sample to be evaluated and the reference sample according to the similarity evaluation index set includes:
and acquiring a similarity matrix of the sample to be evaluated and the reference sample according to a time sequence decomposition result by adopting a preset time sequence similarity algorithm.
Optionally, determining a similarity evaluation index set according to the trend term, the period term, and the residual term of the reference sample, including:
if the sum of the energy ratios of the period term and the residual term is greater than a first threshold value, determining to use an oscillation type similarity evaluation index; otherwise, performing linear regression on the trend term to obtain a linear regression model, and determining a trend type similarity evaluation index if the assumed probability p-value of the linear regression model is smaller than a second threshold value; otherwise, determining to use the preset shape similarity evaluation index.
Optionally, the energy ratio of the period term and the residual term is obtained through the following processes:
acquiring the integral variance of the original time sequence of the sample to be evaluated, the variance of the period item and the variance of the residual error item;
periodic term energy ratio = variance of periodic term/overall variance;
residual term energy ratio = variance of residual term/overall variance.
Optionally, according to the similarity evaluation index, evaluating the sample to be evaluated of the symptom to obtain a target sample set, including:
and if the maximum value or the mean value of the similarity evaluation index is larger than a third threshold value, filtering out a sample to be evaluated corresponding to the similarity evaluation index, and determining the residual sample to be evaluated as the preliminarily screened target sample set.
Optionally, the method further includes:
and acquiring a plurality of evaluation indexes of the sample to be evaluated, carrying out inconsistency evaluation on the plurality of evaluation indexes by a similarity sorting variance method, and reserving the sample to be evaluated with the evaluation index variance larger than a fourth threshold value to obtain a target sample set.
Optionally, the method for determining a target sample of the industrial equipment sign further includes:
and clustering the samples in the target sample set to obtain a clustering result.
Embodiments of the present invention also provide a system for determining a target sample of industrial equipment symptoms, comprising:
the acquisition module is used for acquiring the similarity between a sample to be evaluated of a symptom and a reference sample of the symptom;
a processing module, configured to perform evaluation processing on the to-be-evaluated sample of the symptom according to the similarity, so as to obtain a target sample set, where the target sample set includes: and the confusion degree with the sample to be evaluated of the symptom is larger than the target sample of the symptom with the preset value.
The embodiment of the invention has the following technical effects:
according to the technical scheme of the invention, under different evaluation indexes, the characteristics of emphasis are different, the sequence similarity of the sample to be evaluated and the reference sample is different, and some samples are easy to be confused: similar to the case under some indexes, and not similar to the case under other indexes; according to the technical scheme, the target sample set is determined more accurately by obtaining the samples with the confusion degree greater than the preset value in the samples to be evaluated, so that the symptom studying and judging rule is optimized.
Drawings
FIG. 1 is a schematic flow chart of a method for determining a target sample of an industrial equipment symptom provided by an embodiment of the invention;
FIG. 2 is a schematic diagram of a reference sample provided in an embodiment of the present invention;
FIG. 3 is a schematic diagram of a first sample to be evaluated according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a second sample to be evaluated according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a third sample to be evaluated according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a fourth sample to be evaluated according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a fifth sample to be evaluated according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a sixth sample to be evaluated according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of a seventh sample to be evaluated according to an embodiment of the present invention;
fig. 10 is a schematic diagram of an eighth sample to be evaluated according to an embodiment of the present invention;
fig. 11 is a schematic diagram of a ninth sample to be evaluated according to an embodiment of the present invention;
fig. 12 is a schematic diagram of a tenth sample to be evaluated according to an embodiment of the present invention;
fig. 13 is a schematic view of an eleventh sample to be evaluated according to an embodiment of the present invention;
FIG. 14 is an enlarged schematic view of a reference sample provided in an embodiment of the present invention;
FIG. 15 is an enlarged schematic view of a second sample under evaluation provided by an embodiment of the present invention;
fig. 16 shows the performance of the reference sample and eleven sequences to be evaluated under 18 different distance metric functions according to the embodiment of the present invention.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
As shown in fig. 1, an embodiment of the present invention provides a method for determining a target sample of an industrial equipment symptom, including:
s1, acquiring the similarity of the sample to be evaluated of the symptom and the reference sample of the symptom;
s2, evaluating the sample to be evaluated of the symptom according to the similarity to obtain a target sample set, wherein the target sample set comprises: and the confusion degree with the sample to be evaluated of the symptom is larger than the target sample of the symptom with the preset value.
In the embodiment of the invention, according to different evaluation indexes, the emphasis properties are different, the sequence similarity of the sample to be evaluated and the reference sample is different, and some samples which are easy to be confused: similar to the case under some indexes, and not similar to the case under other indexes; according to the embodiment of the invention, the target sample set is determined more accurately by acquiring the samples with the confusion degree greater than the preset value in the samples to be evaluated, so that the symptom studying and judging rule is optimized.
In an optional embodiment of the present invention, in step S1, the obtaining of the similarity between the symptom to be evaluated sample and the symptom reference sample includes:
performing time sequence decomposition on the original time sequence of the reference sample to obtain a time sequence decomposition result;
determining a similarity evaluation index set according to the time sequence decomposition result;
and according to the similarity evaluation index set, obtaining the similarity between the sample to be evaluated and the reference sample.
In an alternative embodiment of the present invention, in step S1, performing time sequence decomposition on the original time sequence of the reference sample to obtain a time sequence decomposition result, including:
performing index decomposition on the original time sequence of the reference sample to obtain a trend item, a period item and a residual error item;
and determining a similarity evaluation index set according to the trend item, the period item and the residual error item of the reference sample.
Specifically, STL (temporal composition of Time Series by local), SSA (simple spectral analysis), EMD (empirical Mode composition) may be used to perform index Decomposition on the original Time sequence of the sample to be evaluated.
In an optional embodiment of the present invention, in step S1, obtaining the similarity between the sample to be evaluated and the reference sample according to the similarity evaluation index set includes:
and acquiring a similarity matrix of the sample to be evaluated and the reference sample according to a time sequence decomposition result by adopting a preset time sequence similarity algorithm.
In an optional embodiment of the present invention, in step S2, determining a similarity evaluation index set according to a trend term, a period term, and a residual term of a reference sample, includes:
if the sum of the energy ratios of the period term and the residual term is greater than a first threshold value, determining to use an oscillation type similarity evaluation index; otherwise, performing linear regression on the trend term to obtain a linear regression model, and determining a trend type similarity evaluation index if the assumed probability p-value of the linear regression model is smaller than a second threshold value; otherwise, determining to use the preset shape similarity evaluation index.
Specifically, the first threshold may be 75%, and the second threshold may be 0.05;
linear regression includes polynomials of degree 1, 3, logarithmic transformation, and exponential transformation, among others.
In an alternative embodiment of the present invention, in step S2, the energy ratio of the period term and the residual term is obtained through the following processes:
acquiring the integral variance of the original time sequence of the sample to be evaluated, the variance of the period item and the variance of the residual error item;
periodic term energy ratio = variance of periodic term/overall variance;
residual term energy ratio = variance of residual term/overall variance.
In an optional embodiment of the present invention, in step S2, according to the similarity evaluation index, the evaluating the sample to be evaluated of the symptom to obtain a target sample set, including:
and if the maximum value or the mean value of the similarity evaluation index is larger than a third threshold value, filtering out a sample to be evaluated corresponding to the similarity evaluation index, and determining the residual sample to be evaluated as the preliminarily screened target sample set.
An optional embodiment of the present invention, further comprising:
and acquiring a plurality of evaluation indexes of the sample to be evaluated, carrying out inconsistency evaluation on the plurality of evaluation indexes by a similarity sorting variance method, and reserving the sample to be evaluated with the evaluation index variance larger than a fourth threshold value to obtain a target sample set.
Specifically, the to-be-evaluated sample with low similarity to the reference sample is filtered, and there may be multiple criteria, for example, 1) a maximum value of multiple evaluation indexes > a threshold; 2) and (4) sorting (from small to large) of the samples to be evaluated under each evaluation index. And sorting a sample to be evaluated in a plurality of indexes by a maximum value > threshold value.
The consistency of a sample to be evaluated under various indexes can be evaluated by adopting methods such as similarity sorting variance, collaborative filtering and the like, and the sample to be evaluated with high inconsistency is selected as a target sample.
The size of the threshold and the degree of inconsistency can be set according to actual needs.
The following is a specific example of the time-series similarity evaluation index:
the similarity evaluation index types may include: a trend type similarity evaluation index, an oscillation type similarity evaluation index and a preset shape type similarity evaluation index;
trend type similarity evaluation indexes, wherein the corresponding main factors comprise a slope and a rise and a fall; corresponding alternative distance indicators include: "COR", "CORT"; the method comprises the following steps of (1) evaluating an oscillation type similarity index, wherein corresponding main factors comprise period, correlation and amplitude; corresponding alternative distance indicators include: "ACF", "AR.LPC.CEPS", "AR.MAH", "AR.PIC", "PACF", "SPEC.LLR", "SPEC.GLK", "PER"
Presetting shape similarity evaluation indexes, wherein the corresponding main factors comprise a mean value, an amplitude and a phase; corresponding alternative distance indicators include: DTWARP and EUCL, morphology identity degree; corresponding alternative distance indicators include: "MINDIST. SAX";
the above "COR", "CORT", "ACF", "ar.lpc.ceps", "ar.mah", "ar.pic", "PACF", "spec.llr", "spec.glk", "PER", "DTWARP", "EUCL", "mindist.sax" respectively represent the metric functions at different distances.
In an alternative embodiment of the present invention, step S2 further includes:
and clustering the samples in the target sample set to obtain a clustering result.
Specifically, algorithms such as kmeans, PAM, hierarchical clustering and the like can be adopted for clustering, so that the complexity is reduced.
The method for determining the target sample is described below by an embodiment:
for example, the depiction of "grass hat wind" in wind power:
1) fig. 2 to 13 are schematic diagrams of a reference sample and 11 samples to be evaluated; tables 2 and 3 show the distance values between the 11 samples to be evaluated and the reference sample under 18 different distance measurement functions, and fig. 16 is a schematic diagram showing the distances between the 11 samples to be evaluated and the reference sample under 18 different distance measurement functions;
among them, the measurement functions of 18 different distances (as shown in tables 2 and 3) include: "ACF", "ar.lpc.ceps", "ar.mah", "ar.pic", "CDM", "CID", "COR", "CORT", "DTWARP", "EUCL", "int.per", "NCD", "PACF", "PDC", "PER", "mindist.sax", "spec.llr", "spec.glk"; "Seq" is the code number of the sample to be evaluated;
TABLE 1
Figure 95496DEST_PATH_IMAGE002
TABLE 2
Figure 901909DEST_PATH_IMAGE004
As can be seen by comparing the data in table 2 and table 3, the similarity (distance value) between the 11 samples to be evaluated and the reference sample is different under the metric functions of different distances; for example, as shown in fig. 13 and 14, at the euclidean distance (EUCL), the second sample to be evaluated is slightly different from the reference sample by a distance of 60.6 compared to the other samples to be evaluated, but at the int.
2) Eliminating dissimilar samples to be evaluated;
as can be seen from comparing the data in table 2 and table 3, the eleventh sample to be evaluated is the sample with the worst similarity to the reference sample under the measurement functions of 18 different distances, so that the eleventh sample to be evaluated is eliminated, and the remaining 10 samples to be evaluated are retained.
3) As shown in tables 3 and 4 below, from the remaining 10 samples to be evaluated in step 2), samples with strong inconsistency were selected.
TABLE 3
Figure 984134DEST_PATH_IMAGE006
TABLE 4
Figure 32731DEST_PATH_IMAGE008
TABLE 5
Figure 618433DEST_PATH_IMAGE010
As shown in table 5, by using the similarity ranking variance method, the mean and variance of the distances of the 10 samples to be evaluated under the measurement functions of 18 different distances are calculated, and the variances of the 10 samples to be evaluated are compared, so that the variance values of the fifth, seventh, and tenth samples to be evaluated are 3.5, 3.9, and 3.5, respectively, and are larger than the rest samples to be evaluated, and therefore, the samples to be evaluated 5, 7, and 10 with high variance are retained as target samples according to the variance of the samples to be evaluated.
Therefore, the following steps are carried out: in the study and judgment of the straw hat wind, the width of the top of the schematic diagram of the sample to be evaluated is not important, and the left and the right are not symmetrical, so that the target sample can be accurately selected by the method.
According to the technical scheme of the embodiment of the invention, the similarity between the sample to be evaluated and the reference sample is evaluated from a plurality of similarity evaluation indexes; according to the time sequence characteristic analysis, automatically selecting a proper time sequence distance function; among similar samples, samples with high confusion are preferentially selected (inconsistency evaluated according to multiple indexes), and according to the concentration degree of target samples, in order to reduce the complexity of data, the samples with high confusion are clustered, so that a more exact case can be provided for modeling.
Embodiments of the present invention also provide a system for determining a target sample of industrial equipment symptoms, comprising:
the acquisition module is used for acquiring the similarity between a sample to be evaluated of a symptom and a reference sample of the symptom;
a processing module, configured to perform evaluation processing on the to-be-evaluated sample of the symptom according to the similarity, so as to obtain a target sample set, where the target sample set includes: and the confusion degree with the sample to be evaluated of the symptom is larger than the target sample of the symptom with the preset value.
Optionally, the obtaining of the similarity between the sample to be evaluated of the symptom and the reference sample of the symptom includes:
performing time sequence decomposition on the original time sequence of the reference sample to obtain a time sequence decomposition result;
determining a similarity evaluation index set according to the time sequence decomposition result;
and according to the similarity evaluation index set, obtaining the similarity between the sample to be evaluated and the reference sample.
Optionally, performing time sequence decomposition on the original time sequence of the reference sample to obtain a time sequence decomposition result, including:
performing time sequence decomposition on the original time sequence of the reference sample to obtain a trend item, a period item and a residual error item;
and determining a similarity evaluation index set according to the trend item, the period item and the residual error item of the reference sample.
Optionally, obtaining the similarity between the sample to be evaluated and the reference sample according to the similarity evaluation index set includes:
and acquiring a similarity matrix of the sample to be evaluated and the reference sample according to a time sequence decomposition result by adopting a preset time sequence similarity algorithm.
Optionally, obtaining a plurality of similarity evaluation indexes according to the time sequence decomposition result of the reference sample, including:
if the sum of the energy ratios of the period term and the residual term is greater than a first threshold value, determining to use an oscillation type similarity evaluation index; otherwise, performing linear regression on the trend term to obtain a linear regression model, and determining a trend type similarity evaluation index if the assumed probability p-value of the linear regression model is smaller than a second threshold value; otherwise, determining to use the preset shape similarity evaluation index.
Optionally, the energy ratio of the period term and the residual term is obtained through the following processes:
acquiring the integral variance of the original time sequence of the sample to be evaluated, the variance of the period item and the variance of the residual error item;
periodic term energy ratio = variance of periodic term/overall variance;
residual term energy ratio = variance of residual term/overall variance.
Optionally, according to the similarity evaluation index, evaluating the sample to be evaluated of the symptom to obtain a target sample set, including:
and if the maximum value or the mean value of the similarity evaluation index is larger than a third threshold value, filtering out a sample to be evaluated corresponding to the similarity evaluation index, and determining the residual sample to be evaluated as the preliminarily screened target sample set.
Optionally, the method further includes:
and acquiring a plurality of evaluation indexes of the sample to be evaluated, carrying out inconsistency evaluation on the plurality of evaluation indexes by a similarity sorting variance method, and reserving the sample to be evaluated with the evaluation index variance larger than a fourth threshold value to obtain a target sample set.
Optionally, the method for determining a target sample of the industrial equipment sign further includes:
and clustering the samples in the target sample set to obtain a clustering result.
The technical scheme of the invention can realize that the 'plausible' case is automatically searched in historical data through a small number of cases, so that an expert can distinguish effective symptoms and false symptoms in a more targeted way, and a more exact case is provided for modeling.
An embodiment of the present invention also provides a processor-readable storage medium, which stores a computer program for causing a processor to execute the method as described above. All the implementation manners in the above method embodiment are applicable to the embodiment of the system, and the same technical effect can be achieved.
Further, it is noted that in the system and method of the present invention, it is apparent that each component or each step may be decomposed and/or recombined. These decompositions and/or recombinations are to be regarded as equivalents of the present invention. Also, the steps of performing the series of processes described above may naturally be performed chronologically in the order described, but need not necessarily be performed chronologically, and some steps may be performed in parallel or independently of each other. It will be understood by those of ordinary skill in the art that all or any of the steps or elements of the method and system of the present invention may be implemented in any computing system (including processors, storage media, etc.) or network of computing systems, in hardware, firmware, software, or any combination thereof, which can be implemented by those of ordinary skill in the art using their basic programming skills after reading the description of the present invention.
Thus, the objects of the invention may also be achieved by running a program or a set of programs on any computing system. The computing system may be a well known general purpose system. Thus, the objects of the invention may also be realized by providing only a program product comprising program code for implementing the method or system. That is, such a program product also constitutes the present invention, and a storage medium storing such a program product also constitutes the present invention. It is to be understood that the storage medium may be any known storage medium or any storage medium developed in the future. It is also noted that, in the systems and methods of the present invention, it is apparent that individual components or steps may be disassembled and/or reassembled. These decompositions and/or recombinations are to be regarded as equivalents of the present invention. Also, the steps of executing the series of processes described above may naturally be executed chronologically in the order described, but need not necessarily be executed chronologically. Some steps may be performed in parallel or independently of each other.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (10)

1. A method of determining a target sample of industrial equipment symptoms, comprising:
acquiring the similarity of a sample to be evaluated of a symptom and a reference sample of the symptom;
evaluating the sample to be evaluated of the symptom according to the similarity to obtain a target sample set, wherein the target sample set comprises: and the confusion degree with the sample to be evaluated of the symptom is larger than the target sample of the symptom with the preset value.
2. The method for determining the target sample of the industrial equipment sign according to claim 1, wherein the obtaining of the similarity between the sample to be evaluated of the sign and the reference sample of the sign comprises:
performing time sequence decomposition on the original time sequence of the reference sample to obtain a time sequence decomposition result;
determining a similarity evaluation index set according to the time sequence decomposition result;
and according to the similarity evaluation index set, obtaining the similarity between the sample to be evaluated and the reference sample.
3. The method for determining the target sample of the industrial equipment sign according to claim 2, wherein the time sequence decomposition is performed on the original time sequence of the reference sample, and a time sequence decomposition result is obtained, and the method comprises the following steps:
performing time sequence decomposition on the original time sequence of the reference sample to obtain a trend item, a period item and a residual error item;
and determining a similarity evaluation index set according to the trend item, the period item and the residual error item of the reference sample.
4. The method for determining the target sample of the industrial equipment sign according to claim 3, wherein the obtaining the similarity between the sample to be evaluated and the reference sample according to the similarity evaluation index set comprises:
and acquiring a similarity matrix of the sample to be evaluated and the reference sample according to a time sequence decomposition result by adopting a preset time sequence similarity algorithm.
5. The method for determining the target sample of the industrial equipment symptom according to claim 3, wherein determining the similarity evaluation index set according to the trend term, the period term and the residual term of the reference sample comprises:
if the sum of the energy ratios of the period term and the residual term is greater than a first threshold value, determining to use an oscillation type similarity evaluation index; otherwise, performing linear regression on the trend term to obtain a linear regression model, and determining a trend type similarity evaluation index if the assumed probability p-value of the linear regression model is smaller than a second threshold value; otherwise, determining to use the preset shape similarity evaluation index.
6. The method for determining the target sample of the industrial equipment sign according to claim 5, wherein the energy ratio of the period term and the residual term is obtained by the following processes:
acquiring the integral variance of the original time sequence of the sample to be evaluated, the variance of the period item and the variance of the residual error item;
periodic term energy ratio = variance of periodic term/overall variance;
residual term energy ratio = variance of residual term/overall variance.
7. The method for determining the target sample of the industrial equipment sign according to claim 1, wherein the step of evaluating the sample to be evaluated of the sign according to the similarity evaluation index to obtain a target sample set comprises:
and if the maximum value or the mean value of the similarity evaluation index is larger than a third threshold value, filtering out a sample to be evaluated corresponding to the similarity evaluation index, and determining the residual sample to be evaluated as the preliminarily screened target sample set.
8. The method of determining a target sample of industrial equipment signs according to claim 7, further comprising:
and acquiring a plurality of evaluation indexes of the sample to be evaluated, carrying out inconsistency evaluation on the plurality of evaluation indexes by a similarity sorting variance method, and reserving the sample to be evaluated with the evaluation index variance larger than a fourth threshold value to obtain a target sample set.
9. The method of determining a target sample of industrial equipment signs according to claim 1, further comprising:
and clustering the samples in the target sample set to obtain a clustering result.
10. A system for determining a target sample of industrial equipment symptoms, comprising:
the acquisition module is used for acquiring the similarity between a sample to be evaluated of a symptom and a reference sample of the symptom;
a processing module, configured to perform evaluation processing on the to-be-evaluated sample of the symptom according to the similarity, so as to obtain a target sample set, where the target sample set includes: and the confusion degree with the sample to be evaluated of the symptom is larger than the target sample of the symptom with the preset value.
CN202110194751.1A 2021-02-21 2021-02-21 Method and system for determining target sample of industrial equipment symptom Active CN112559602B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110194751.1A CN112559602B (en) 2021-02-21 2021-02-21 Method and system for determining target sample of industrial equipment symptom

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110194751.1A CN112559602B (en) 2021-02-21 2021-02-21 Method and system for determining target sample of industrial equipment symptom

Publications (2)

Publication Number Publication Date
CN112559602A true CN112559602A (en) 2021-03-26
CN112559602B CN112559602B (en) 2021-07-13

Family

ID=75034395

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110194751.1A Active CN112559602B (en) 2021-02-21 2021-02-21 Method and system for determining target sample of industrial equipment symptom

Country Status (1)

Country Link
CN (1) CN112559602B (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101135601A (en) * 2007-10-18 2008-03-05 北京英华达电力电子工程科技有限公司 Rotating machinery vibrating failure diagnosis device and method
US20130338965A1 (en) * 2012-06-14 2013-12-19 International Business Machines Corporation Anomaly Detection Method, Program, and System
CN104572985A (en) * 2015-01-04 2015-04-29 大连理工大学 Industrial data sample screening method based on complex network community discovery
CN107194430A (en) * 2017-05-27 2017-09-22 北京三快在线科技有限公司 A kind of screening sample method and device, electronic equipment
CN108197638A (en) * 2017-12-12 2018-06-22 阿里巴巴集团控股有限公司 The method and device classified to sample to be assessed
CN109508558A (en) * 2018-10-31 2019-03-22 阿里巴巴集团控股有限公司 A kind of verification method and device of data validity
US20190213447A1 (en) * 2017-02-08 2019-07-11 Nanjing University Of Aeronautics And Astronautics Sample selection method and apparatus and server
CN110135492A (en) * 2019-05-13 2019-08-16 山东大学 Equipment fault diagnosis and method for detecting abnormality and system based on more Gauss models
CN111324637A (en) * 2020-02-05 2020-06-23 北京工业大数据创新中心有限公司 Fault symptom searching method and system for industrial time sequence data
CN111340144A (en) * 2020-05-15 2020-06-26 支付宝(杭州)信息技术有限公司 Risk sample detection method and device, electronic equipment and storage medium
CN111897695A (en) * 2020-07-31 2020-11-06 平安科技(深圳)有限公司 Method and device for acquiring KPI abnormal data sample and computer equipment
CN111931872A (en) * 2020-09-27 2020-11-13 北京工业大数据创新中心有限公司 Method and device for determining abnormity of trend symptom
CN112200273A (en) * 2020-12-07 2021-01-08 长沙海信智能系统研究院有限公司 Data annotation method, device, equipment and computer storage medium
CN112257423A (en) * 2020-10-21 2021-01-22 北京工业大数据创新中心有限公司 Equipment symptom information acquisition method and device and equipment operation and maintenance system
CN112270379A (en) * 2020-11-13 2021-01-26 北京百度网讯科技有限公司 Training method of classification model, sample classification method, device and equipment
CN112381185A (en) * 2021-01-15 2021-02-19 北京工业大数据创新中心有限公司 Industrial equipment characteristic curve similarity obtaining method and device

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101135601A (en) * 2007-10-18 2008-03-05 北京英华达电力电子工程科技有限公司 Rotating machinery vibrating failure diagnosis device and method
US20130338965A1 (en) * 2012-06-14 2013-12-19 International Business Machines Corporation Anomaly Detection Method, Program, and System
CN104572985A (en) * 2015-01-04 2015-04-29 大连理工大学 Industrial data sample screening method based on complex network community discovery
US20190213447A1 (en) * 2017-02-08 2019-07-11 Nanjing University Of Aeronautics And Astronautics Sample selection method and apparatus and server
CN107194430A (en) * 2017-05-27 2017-09-22 北京三快在线科技有限公司 A kind of screening sample method and device, electronic equipment
CN108197638A (en) * 2017-12-12 2018-06-22 阿里巴巴集团控股有限公司 The method and device classified to sample to be assessed
CN109508558A (en) * 2018-10-31 2019-03-22 阿里巴巴集团控股有限公司 A kind of verification method and device of data validity
CN110135492A (en) * 2019-05-13 2019-08-16 山东大学 Equipment fault diagnosis and method for detecting abnormality and system based on more Gauss models
CN111324637A (en) * 2020-02-05 2020-06-23 北京工业大数据创新中心有限公司 Fault symptom searching method and system for industrial time sequence data
CN111340144A (en) * 2020-05-15 2020-06-26 支付宝(杭州)信息技术有限公司 Risk sample detection method and device, electronic equipment and storage medium
CN111897695A (en) * 2020-07-31 2020-11-06 平安科技(深圳)有限公司 Method and device for acquiring KPI abnormal data sample and computer equipment
CN111931872A (en) * 2020-09-27 2020-11-13 北京工业大数据创新中心有限公司 Method and device for determining abnormity of trend symptom
CN112257423A (en) * 2020-10-21 2021-01-22 北京工业大数据创新中心有限公司 Equipment symptom information acquisition method and device and equipment operation and maintenance system
CN112270379A (en) * 2020-11-13 2021-01-26 北京百度网讯科技有限公司 Training method of classification model, sample classification method, device and equipment
CN112200273A (en) * 2020-12-07 2021-01-08 长沙海信智能系统研究院有限公司 Data annotation method, device, equipment and computer storage medium
CN112381185A (en) * 2021-01-15 2021-02-19 北京工业大数据创新中心有限公司 Industrial equipment characteristic curve similarity obtaining method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
常天庆 等: "测试算法的征兆误判和混淆问题及解决方法", 《计算机工程与应用》 *

Also Published As

Publication number Publication date
CN112559602B (en) 2021-07-13

Similar Documents

Publication Publication Date Title
CN109387712B (en) Non-invasive load detection and decomposition method based on state matrix decision tree
Schoknecht et al. Similarity of business process models—a state-of-the-art analysis
US10033694B2 (en) Method and device for recognizing an IP address of a specified category, a defense method and system
Brockhoff et al. Time-aware concept drift detection using the earth mover’s distance
Abd-El-Hafiz A metrics-based data mining approach for software clone detection
US10311067B2 (en) Device and method for classifying and searching data
Wang et al. Decision tree based control chart pattern recognition
Taufiq Classification method of multi-class on C4. 5 algorithm for fish diseases
Banda et al. An experimental evaluation of popular image parameters for monochromatic solar image categorization
CN115456107A (en) Time series abnormity detection system and method
Wu et al. Multiscale jump testing and estimation under complex temporal dynamics
CN117170915A (en) Data center equipment fault prediction method and device and computer equipment
CN111401420A (en) Abnormal data clustering method and device for wafer test, electronic equipment and medium
CN112559602B (en) Method and system for determining target sample of industrial equipment symptom
CN110688846A (en) Periodic word mining method, system, electronic equipment and readable storage medium
CN112632000A (en) Log file clustering method and device, electronic equipment and readable storage medium
CN117290404A (en) Method and system for rapidly searching and practical main distribution network fault processing method
CN112734072A (en) Power load prediction method, system, terminal device and medium
Kumar et al. Preprocessing and symbolic representation of stock data
König et al. Towards algorithm-agnostic uncertainty estimation: Predicting classification error in an automated machine learning setting
García et al. Benchmarking research performance at the university level with information theoretic measures
Lines Time Series classification through transformation and ensembles
Vadim et al. Temporal decision trees in diagnostics systems
CN111538669A (en) Test case extraction method and device based on historical problem backtracking analysis
CN114580982B (en) Method, device and equipment for evaluating data quality of industrial equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant