CN112559602A

CN112559602A - Method and system for determining target sample of industrial equipment symptom

Info

Publication number: CN112559602A
Application number: CN202110194751.1A
Authority: CN
Inventors: 田春华; 李闯; 马国�; 张�浩
Original assignee: Beijing Innovation Center For Industrial Big Data Co ltd
Current assignee: Beijing Innovation Center For Industrial Big Data Co ltd
Priority date: 2021-02-21
Filing date: 2021-02-21
Publication date: 2021-03-26
Anticipated expiration: 2041-02-21
Also published as: CN112559602B

Abstract

The invention provides a method and a system for determining a target sample of an industrial equipment symptom, wherein the method comprises the following steps: acquiring the similarity of a sample to be evaluated of a symptom and a reference sample of the symptom; evaluating the sample to be evaluated of the symptom according to the similarity to obtain a target sample set, wherein the target sample set comprises: and the confusion degree with the sample to be evaluated of the symptom is larger than the target sample of the symptom with the preset value. According to the technical scheme, the 'plausible' cases are automatically searched in historical data through a small number of cases, so that experts can distinguish effective symptoms and false symptoms in a targeted mode, and a more exact case is provided for modeling.

Description

Method and system for determining target sample of industrial equipment symptom

Technical Field

The invention relates to the technical field of industry, in particular to a method and a system for determining a target sample of an industrial equipment symptom.

Background

When characterizing the logic of a device failure, a description of the symptoms is usually involved. The symptoms are visual, but quantitative depiction is not easy; the symptoms are a time series of key technical indexes (such as temperature, leakage amount, 1X amplitude of vibration and the like), and represent a certain trend (such as slow rise/fall) or form (such as existence of burrs, oscillation and the like).

The current case collecting means are as follows: the situation coverage of the case is very limited, which is not beneficial to the precise depiction during modeling, and comprises the following aspects:

1) past failure cases: the number is usually very limited and the coverage of the scene is limited

2) Domain experts' own memory or manual review: coverage is too low for IT modeling or implementation

3) The IT expert found in the test: the iteration period is long, and the requirement on intelligence and responsibility of IT experts is high.

Disclosure of Invention

The embodiment of the invention provides a method and a system for determining a target sample of industrial equipment symptoms, which are used for automatically searching 'plausible' cases in historical data through a small number of cases as samples to be evaluated so as to determine the target sample, and according to the target samples in a target sample set, which symptoms are effective symptoms and which are false symptoms can be more easily distinguished.

In order to solve the above technical problem, an embodiment of the present invention provides the following technical solutions:

a method of determining a target sample of industrial equipment symptoms, comprising:

acquiring the similarity of a sample to be evaluated of a symptom and a reference sample of the symptom;

evaluating the sample to be evaluated of the symptom according to the similarity to obtain a target sample set, wherein the target sample set comprises: and the confusion degree with the sample to be evaluated of the symptom is larger than the target sample of the symptom with the preset value.

Optionally, the obtaining of the similarity between the sample to be evaluated of the symptom and the reference sample of the symptom includes:

performing time sequence decomposition on the original time sequence of the reference sample to obtain a time sequence decomposition result;

determining a similarity evaluation index set according to the time sequence decomposition result;

and according to the similarity evaluation index set, obtaining the similarity between the sample to be evaluated and the reference sample.

Optionally, performing time sequence decomposition on the original time sequence of the reference sample to obtain a time sequence decomposition result, including:

performing time sequence decomposition on the original time sequence of the reference sample to obtain a trend item, a period item and a residual error item;

and determining a similarity evaluation index set according to the trend item, the period item and the residual error item of the reference sample.

Optionally, obtaining the similarity between the sample to be evaluated and the reference sample according to the similarity evaluation index set includes:

and acquiring a similarity matrix of the sample to be evaluated and the reference sample according to a time sequence decomposition result by adopting a preset time sequence similarity algorithm.

Optionally, determining a similarity evaluation index set according to the trend term, the period term, and the residual term of the reference sample, including:

if the sum of the energy ratios of the period term and the residual term is greater than a first threshold value, determining to use an oscillation type similarity evaluation index; otherwise, performing linear regression on the trend term to obtain a linear regression model, and determining a trend type similarity evaluation index if the assumed probability p-value of the linear regression model is smaller than a second threshold value; otherwise, determining to use the preset shape similarity evaluation index.

Optionally, the energy ratio of the period term and the residual term is obtained through the following processes:

acquiring the integral variance of the original time sequence of the sample to be evaluated, the variance of the period item and the variance of the residual error item;

periodic term energy ratio = variance of periodic term/overall variance;

residual term energy ratio = variance of residual term/overall variance.

Optionally, according to the similarity evaluation index, evaluating the sample to be evaluated of the symptom to obtain a target sample set, including:

and if the maximum value or the mean value of the similarity evaluation index is larger than a third threshold value, filtering out a sample to be evaluated corresponding to the similarity evaluation index, and determining the residual sample to be evaluated as the preliminarily screened target sample set.

Optionally, the method further includes:

and acquiring a plurality of evaluation indexes of the sample to be evaluated, carrying out inconsistency evaluation on the plurality of evaluation indexes by a similarity sorting variance method, and reserving the sample to be evaluated with the evaluation index variance larger than a fourth threshold value to obtain a target sample set.

Optionally, the method for determining a target sample of the industrial equipment sign further includes:

and clustering the samples in the target sample set to obtain a clustering result.

Embodiments of the present invention also provide a system for determining a target sample of industrial equipment symptoms, comprising:

the acquisition module is used for acquiring the similarity between a sample to be evaluated of a symptom and a reference sample of the symptom;

a processing module, configured to perform evaluation processing on the to-be-evaluated sample of the symptom according to the similarity, so as to obtain a target sample set, where the target sample set includes: and the confusion degree with the sample to be evaluated of the symptom is larger than the target sample of the symptom with the preset value.

The embodiment of the invention has the following technical effects:

according to the technical scheme of the invention, under different evaluation indexes, the characteristics of emphasis are different, the sequence similarity of the sample to be evaluated and the reference sample is different, and some samples are easy to be confused: similar to the case under some indexes, and not similar to the case under other indexes; according to the technical scheme, the target sample set is determined more accurately by obtaining the samples with the confusion degree greater than the preset value in the samples to be evaluated, so that the symptom studying and judging rule is optimized.

Drawings

FIG. 1 is a schematic flow chart of a method for determining a target sample of an industrial equipment symptom provided by an embodiment of the invention;

FIG. 2 is a schematic diagram of a reference sample provided in an embodiment of the present invention;

FIG. 3 is a schematic diagram of a first sample to be evaluated according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a second sample to be evaluated according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a third sample to be evaluated according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a fourth sample to be evaluated according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a fifth sample to be evaluated according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a sixth sample to be evaluated according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of a seventh sample to be evaluated according to an embodiment of the present invention;

fig. 10 is a schematic diagram of an eighth sample to be evaluated according to an embodiment of the present invention;

fig. 11 is a schematic diagram of a ninth sample to be evaluated according to an embodiment of the present invention;

fig. 12 is a schematic diagram of a tenth sample to be evaluated according to an embodiment of the present invention;

fig. 13 is a schematic view of an eleventh sample to be evaluated according to an embodiment of the present invention;

FIG. 14 is an enlarged schematic view of a reference sample provided in an embodiment of the present invention;

FIG. 15 is an enlarged schematic view of a second sample under evaluation provided by an embodiment of the present invention;

fig. 16 shows the performance of the reference sample and eleven sequences to be evaluated under 18 different distance metric functions according to the embodiment of the present invention.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

As shown in fig. 1, an embodiment of the present invention provides a method for determining a target sample of an industrial equipment symptom, including:

s1, acquiring the similarity of the sample to be evaluated of the symptom and the reference sample of the symptom;

s2, evaluating the sample to be evaluated of the symptom according to the similarity to obtain a target sample set, wherein the target sample set comprises: and the confusion degree with the sample to be evaluated of the symptom is larger than the target sample of the symptom with the preset value.

In the embodiment of the invention, according to different evaluation indexes, the emphasis properties are different, the sequence similarity of the sample to be evaluated and the reference sample is different, and some samples which are easy to be confused: similar to the case under some indexes, and not similar to the case under other indexes; according to the embodiment of the invention, the target sample set is determined more accurately by acquiring the samples with the confusion degree greater than the preset value in the samples to be evaluated, so that the symptom studying and judging rule is optimized.

In an optional embodiment of the present invention, in step S1, the obtaining of the similarity between the symptom to be evaluated sample and the symptom reference sample includes:

In an alternative embodiment of the present invention, in step S1, performing time sequence decomposition on the original time sequence of the reference sample to obtain a time sequence decomposition result, including:

performing index decomposition on the original time sequence of the reference sample to obtain a trend item, a period item and a residual error item;

Specifically, STL (temporal composition of Time Series by local), SSA (simple spectral analysis), EMD (empirical Mode composition) may be used to perform index Decomposition on the original Time sequence of the sample to be evaluated.

In an optional embodiment of the present invention, in step S1, obtaining the similarity between the sample to be evaluated and the reference sample according to the similarity evaluation index set includes:

In an optional embodiment of the present invention, in step S2, determining a similarity evaluation index set according to a trend term, a period term, and a residual term of a reference sample, includes:

Specifically, the first threshold may be 75%, and the second threshold may be 0.05;

linear regression includes polynomials of

degree

1, 3, logarithmic transformation, and exponential transformation, among others.

In an alternative embodiment of the present invention, in step S2, the energy ratio of the period term and the residual term is obtained through the following processes:

periodic term energy ratio = variance of periodic term/overall variance;

residual term energy ratio = variance of residual term/overall variance.

In an optional embodiment of the present invention, in step S2, according to the similarity evaluation index, the evaluating the sample to be evaluated of the symptom to obtain a target sample set, including:

An optional embodiment of the present invention, further comprising:

Specifically, the to-be-evaluated sample with low similarity to the reference sample is filtered, and there may be multiple criteria, for example, 1) a maximum value of multiple evaluation indexes > a threshold; 2) and (4) sorting (from small to large) of the samples to be evaluated under each evaluation index. And sorting a sample to be evaluated in a plurality of indexes by a maximum value > threshold value.

The consistency of a sample to be evaluated under various indexes can be evaluated by adopting methods such as similarity sorting variance, collaborative filtering and the like, and the sample to be evaluated with high inconsistency is selected as a target sample.

The size of the threshold and the degree of inconsistency can be set according to actual needs.

The following is a specific example of the time-series similarity evaluation index:

the similarity evaluation index types may include: a trend type similarity evaluation index, an oscillation type similarity evaluation index and a preset shape type similarity evaluation index;

trend type similarity evaluation indexes, wherein the corresponding main factors comprise a slope and a rise and a fall; corresponding alternative distance indicators include: "COR", "CORT"; the method comprises the following steps of (1) evaluating an oscillation type similarity index, wherein corresponding main factors comprise period, correlation and amplitude; corresponding alternative distance indicators include: "ACF", "AR.LPC.CEPS", "AR.MAH", "AR.PIC", "PACF", "SPEC.LLR", "SPEC.GLK", "PER"

Presetting shape similarity evaluation indexes, wherein the corresponding main factors comprise a mean value, an amplitude and a phase; corresponding alternative distance indicators include: DTWARP and EUCL, morphology identity degree; corresponding alternative distance indicators include: "MINDIST. SAX";

the above "COR", "CORT", "ACF", "ar.lpc.ceps", "ar.mah", "ar.pic", "PACF", "spec.llr", "spec.glk", "PER", "DTWARP", "EUCL", "mindist.sax" respectively represent the metric functions at different distances.

In an alternative embodiment of the present invention, step S2 further includes:

Specifically, algorithms such as kmeans, PAM, hierarchical clustering and the like can be adopted for clustering, so that the complexity is reduced.

The method for determining the target sample is described below by an embodiment:

for example, the depiction of "grass hat wind" in wind power:

1) fig. 2 to 13 are schematic diagrams of a reference sample and 11 samples to be evaluated; tables 2 and 3 show the distance values between the 11 samples to be evaluated and the reference sample under 18 different distance measurement functions, and fig. 16 is a schematic diagram showing the distances between the 11 samples to be evaluated and the reference sample under 18 different distance measurement functions;

among them, the measurement functions of 18 different distances (as shown in tables 2 and 3) include: "ACF", "ar.lpc.ceps", "ar.mah", "ar.pic", "CDM", "CID", "COR", "CORT", "DTWARP", "EUCL", "int.per", "NCD", "PACF", "PDC", "PER", "mindist.sax", "spec.llr", "spec.glk"; "Seq" is the code number of the sample to be evaluated;

TABLE 1

TABLE 2

As can be seen by comparing the data in table 2 and table 3, the similarity (distance value) between the 11 samples to be evaluated and the reference sample is different under the metric functions of different distances; for example, as shown in fig. 13 and 14, at the euclidean distance (EUCL), the second sample to be evaluated is slightly different from the reference sample by a distance of 60.6 compared to the other samples to be evaluated, but at the int.

2) Eliminating dissimilar samples to be evaluated;

as can be seen from comparing the data in table 2 and table 3, the eleventh sample to be evaluated is the sample with the worst similarity to the reference sample under the measurement functions of 18 different distances, so that the eleventh sample to be evaluated is eliminated, and the remaining 10 samples to be evaluated are retained.

3) As shown in tables 3 and 4 below, from the remaining 10 samples to be evaluated in step 2), samples with strong inconsistency were selected.

TABLE 3

TABLE 4

TABLE 5

As shown in table 5, by using the similarity ranking variance method, the mean and variance of the distances of the 10 samples to be evaluated under the measurement functions of 18 different distances are calculated, and the variances of the 10 samples to be evaluated are compared, so that the variance values of the fifth, seventh, and tenth samples to be evaluated are 3.5, 3.9, and 3.5, respectively, and are larger than the rest samples to be evaluated, and therefore, the samples to be evaluated 5, 7, and 10 with high variance are retained as target samples according to the variance of the samples to be evaluated.

Therefore, the following steps are carried out: in the study and judgment of the straw hat wind, the width of the top of the schematic diagram of the sample to be evaluated is not important, and the left and the right are not symmetrical, so that the target sample can be accurately selected by the method.

According to the technical scheme of the embodiment of the invention, the similarity between the sample to be evaluated and the reference sample is evaluated from a plurality of similarity evaluation indexes; according to the time sequence characteristic analysis, automatically selecting a proper time sequence distance function; among similar samples, samples with high confusion are preferentially selected (inconsistency evaluated according to multiple indexes), and according to the concentration degree of target samples, in order to reduce the complexity of data, the samples with high confusion are clustered, so that a more exact case can be provided for modeling.

Optionally, obtaining a plurality of similarity evaluation indexes according to the time sequence decomposition result of the reference sample, including:

periodic term energy ratio = variance of periodic term/overall variance;

residual term energy ratio = variance of residual term/overall variance.

Optionally, the method further includes:

The technical scheme of the invention can realize that the 'plausible' case is automatically searched in historical data through a small number of cases, so that an expert can distinguish effective symptoms and false symptoms in a more targeted way, and a more exact case is provided for modeling.

An embodiment of the present invention also provides a processor-readable storage medium, which stores a computer program for causing a processor to execute the method as described above. All the implementation manners in the above method embodiment are applicable to the embodiment of the system, and the same technical effect can be achieved.

Further, it is noted that in the system and method of the present invention, it is apparent that each component or each step may be decomposed and/or recombined. These decompositions and/or recombinations are to be regarded as equivalents of the present invention. Also, the steps of performing the series of processes described above may naturally be performed chronologically in the order described, but need not necessarily be performed chronologically, and some steps may be performed in parallel or independently of each other. It will be understood by those of ordinary skill in the art that all or any of the steps or elements of the method and system of the present invention may be implemented in any computing system (including processors, storage media, etc.) or network of computing systems, in hardware, firmware, software, or any combination thereof, which can be implemented by those of ordinary skill in the art using their basic programming skills after reading the description of the present invention.

Thus, the objects of the invention may also be achieved by running a program or a set of programs on any computing system. The computing system may be a well known general purpose system. Thus, the objects of the invention may also be realized by providing only a program product comprising program code for implementing the method or system. That is, such a program product also constitutes the present invention, and a storage medium storing such a program product also constitutes the present invention. It is to be understood that the storage medium may be any known storage medium or any storage medium developed in the future. It is also noted that, in the systems and methods of the present invention, it is apparent that individual components or steps may be disassembled and/or reassembled. These decompositions and/or recombinations are to be regarded as equivalents of the present invention. Also, the steps of executing the series of processes described above may naturally be executed chronologically in the order described, but need not necessarily be executed chronologically. Some steps may be performed in parallel or independently of each other.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method of determining a target sample of industrial equipment symptoms, comprising:

2. The method for determining the target sample of the industrial equipment sign according to claim 1, wherein the obtaining of the similarity between the sample to be evaluated of the sign and the reference sample of the sign comprises:

3. The method for determining the target sample of the industrial equipment sign according to claim 2, wherein the time sequence decomposition is performed on the original time sequence of the reference sample, and a time sequence decomposition result is obtained, and the method comprises the following steps:

4. The method for determining the target sample of the industrial equipment sign according to claim 3, wherein the obtaining the similarity between the sample to be evaluated and the reference sample according to the similarity evaluation index set comprises:

5. The method for determining the target sample of the industrial equipment symptom according to claim 3, wherein determining the similarity evaluation index set according to the trend term, the period term and the residual term of the reference sample comprises:

6. The method for determining the target sample of the industrial equipment sign according to claim 5, wherein the energy ratio of the period term and the residual term is obtained by the following processes:

periodic term energy ratio = variance of periodic term/overall variance;

residual term energy ratio = variance of residual term/overall variance.

7. The method for determining the target sample of the industrial equipment sign according to claim 1, wherein the step of evaluating the sample to be evaluated of the sign according to the similarity evaluation index to obtain a target sample set comprises:

8. The method of determining a target sample of industrial equipment signs according to claim 7, further comprising:

9. The method of determining a target sample of industrial equipment signs according to claim 1, further comprising:

10. A system for determining a target sample of industrial equipment symptoms, comprising: