CN112559602B - Method and system for determining target sample of industrial equipment symptom - Google Patents

Method and system for determining target sample of industrial equipment symptom Download PDF

Info

Publication number
CN112559602B
CN112559602B CN202110194751.1A CN202110194751A CN112559602B CN 112559602 B CN112559602 B CN 112559602B CN 202110194751 A CN202110194751 A CN 202110194751A CN 112559602 B CN112559602 B CN 112559602B
Authority
CN
China
Prior art keywords
sample
evaluated
evaluation index
similarity
similarity evaluation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110194751.1A
Other languages
Chinese (zh)
Other versions
CN112559602A (en
Inventor
田春华
李闯
马国�
张�浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Innovation Center For Industrial Big Data Co ltd
Original Assignee
Beijing Innovation Center For Industrial Big Data Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Innovation Center For Industrial Big Data Co ltd filed Critical Beijing Innovation Center For Industrial Big Data Co ltd
Priority to CN202110194751.1A priority Critical patent/CN112559602B/en
Publication of CN112559602A publication Critical patent/CN112559602A/en
Application granted granted Critical
Publication of CN112559602B publication Critical patent/CN112559602B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Abstract

The invention provides a method and a system for determining a target sample of an industrial equipment symptom, wherein the method comprises the following steps: acquiring the similarity of a sample to be evaluated of a symptom and a reference sample of the symptom; evaluating the sample to be evaluated of the symptom according to the similarity to obtain a target sample set, wherein the target sample set comprises: and the confusion degree with the sample to be evaluated of the symptom is larger than the target sample of the symptom with the preset value. According to the technical scheme, the 'plausible' cases are automatically searched in historical data through a small number of cases, so that experts can distinguish effective symptoms and false symptoms in a targeted mode, and a more exact case is provided for modeling.

Description

Method and system for determining target sample of industrial equipment symptom
Technical Field
The invention relates to the technical field of industry, in particular to a method and a system for determining a target sample of an industrial equipment symptom.
Background
When characterizing the logic of a device failure, a description of the symptoms is usually involved. The symptoms are visual, but quantitative depiction is not easy; the symptoms are a time series of key technical indexes (such as temperature, leakage amount, 1X amplitude of vibration and the like), and represent a certain trend (such as slow rise/fall) or form (such as existence of burrs, oscillation and the like).
The current case collecting means are as follows: the situation coverage of the case is very limited, which is not beneficial to the precise depiction during modeling, and comprises the following aspects:
1) past failure cases: the number is usually very limited and the coverage of the scene is limited
2) Domain experts' own memory or manual review: coverage is too low for IT modeling or implementation
3) The IT expert found in the test: the iteration period is long, and the requirement on intelligence and responsibility of IT experts is high.
Disclosure of Invention
The embodiment of the invention provides a method and a system for determining a target sample of industrial equipment symptoms, which are used for automatically searching 'plausible' cases in historical data through a small number of cases as samples to be evaluated so as to determine the target sample, and according to the target samples in a target sample set, which symptoms are effective symptoms and which are false symptoms can be more easily distinguished.
In order to solve the above technical problem, an embodiment of the present invention provides the following technical solutions:
a method of determining a target sample of industrial equipment symptoms, comprising:
acquiring the similarity of a sample to be evaluated of a symptom and a reference sample of the symptom;
evaluating the sample to be evaluated of the symptom according to the similarity to obtain a target sample set, wherein the target sample set comprises: and the confusion degree with the sample to be evaluated of the symptom is larger than the target sample of the symptom with the preset value.
Optionally, the obtaining of the similarity between the sample to be evaluated of the symptom and the reference sample of the symptom includes:
performing time sequence decomposition on the original time sequence of the reference sample to obtain a time sequence decomposition result;
determining a similarity evaluation index set according to the time sequence decomposition result;
and according to the similarity evaluation index set, obtaining the similarity between the sample to be evaluated and the reference sample.
Optionally, performing time sequence decomposition on the original time sequence of the reference sample to obtain a time sequence decomposition result, including:
performing time sequence decomposition on the original time sequence of the reference sample to obtain a trend item, a period item and a residual error item;
and determining a similarity evaluation index set according to the trend item, the period item and the residual error item of the reference sample.
Optionally, obtaining the similarity between the sample to be evaluated and the reference sample according to the similarity evaluation index set includes:
and acquiring a similarity matrix of the sample to be evaluated and the reference sample according to a time sequence decomposition result by adopting a preset time sequence similarity algorithm.
Optionally, determining a similarity evaluation index set according to the trend term, the period term, and the residual term of the reference sample, including:
if the sum of the energy ratios of the period term and the residual term is greater than a first threshold value, determining to use an oscillation type similarity evaluation index; otherwise, performing linear regression on the trend term to obtain a linear regression model, and determining a trend type similarity evaluation index if the assumed probability p-value of the linear regression model is smaller than a second threshold value; otherwise, determining to use the preset shape similarity evaluation index.
Optionally, the energy ratio of the period term and the residual term is obtained through the following processes:
acquiring the integral variance of the original time sequence of the sample to be evaluated, the variance of the period item and the variance of the residual error item;
periodic term energy ratio = variance of periodic term/overall variance;
residual term energy ratio = variance of residual term/overall variance.
Optionally, according to the similarity evaluation index, evaluating the sample to be evaluated of the symptom to obtain a target sample set, including:
and if the maximum value or the mean value of the similarity evaluation index is larger than a third threshold value, filtering out a sample to be evaluated corresponding to the similarity evaluation index, and determining the residual sample to be evaluated as the preliminarily screened target sample set.
Optionally, the method further includes:
and acquiring a plurality of evaluation indexes of the sample to be evaluated, carrying out inconsistency evaluation on the plurality of evaluation indexes by a similarity sorting variance method, and reserving the sample to be evaluated with the evaluation index variance larger than a fourth threshold value to obtain a target sample set.
Optionally, the method for determining a target sample of the industrial equipment sign further includes:
and clustering the samples in the target sample set to obtain a clustering result.
Embodiments of the present invention also provide a system for determining a target sample of industrial equipment symptoms, comprising:
the acquisition module is used for acquiring the similarity between a sample to be evaluated of a symptom and a reference sample of the symptom;
a processing module, configured to perform evaluation processing on the to-be-evaluated sample of the symptom according to the similarity, so as to obtain a target sample set, where the target sample set includes: and the confusion degree with the sample to be evaluated of the symptom is larger than the target sample of the symptom with the preset value.
The embodiment of the invention has the following technical effects:
according to the technical scheme of the invention, under different evaluation indexes, the characteristics of emphasis are different, the sequence similarity of the sample to be evaluated and the reference sample is different, and some samples are easy to be confused: similar to the case under some indexes, and not similar to the case under other indexes; according to the technical scheme, the target sample set is determined more accurately by obtaining the samples with the confusion degree greater than the preset value in the samples to be evaluated, so that the symptom studying and judging rule is optimized.
Drawings
FIG. 1 is a schematic flow chart of a method for determining a target sample of an industrial equipment symptom provided by an embodiment of the invention;
FIG. 2 is a schematic diagram of a reference sample provided in an embodiment of the present invention;
FIG. 3 is a schematic diagram of a first sample to be evaluated according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a second sample to be evaluated according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a third sample to be evaluated according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a fourth sample to be evaluated according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a fifth sample to be evaluated according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a sixth sample to be evaluated according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of a seventh sample to be evaluated according to an embodiment of the present invention;
fig. 10 is a schematic diagram of an eighth sample to be evaluated according to an embodiment of the present invention;
fig. 11 is a schematic diagram of a ninth sample to be evaluated according to an embodiment of the present invention;
fig. 12 is a schematic diagram of a tenth sample to be evaluated according to an embodiment of the present invention;
fig. 13 is a schematic view of an eleventh sample to be evaluated according to an embodiment of the present invention;
FIG. 14 is an enlarged schematic view of a reference sample provided in an embodiment of the present invention;
FIG. 15 is an enlarged schematic view of a second sample under evaluation provided by an embodiment of the present invention;
fig. 16 shows the performance of the reference sample and eleven sequences to be evaluated under 18 different distance metric functions according to the embodiment of the present invention.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
As shown in fig. 1, an embodiment of the present invention provides a method for determining a target sample of an industrial equipment symptom, including:
s1, acquiring the similarity of the sample to be evaluated of the symptom and the reference sample of the symptom;
s2, evaluating the sample to be evaluated of the symptom according to the similarity to obtain a target sample set, wherein the target sample set comprises: and the confusion degree with the sample to be evaluated of the symptom is larger than the target sample of the symptom with the preset value.
In the embodiment of the invention, according to different evaluation indexes, the emphasis properties are different, the sequence similarity of the sample to be evaluated and the reference sample is different, and some samples which are easy to be confused: similar to the case under some indexes, and not similar to the case under other indexes; according to the embodiment of the invention, the target sample set is determined more accurately by acquiring the samples with the confusion degree greater than the preset value in the samples to be evaluated, so that the symptom studying and judging rule is optimized.
In an optional embodiment of the present invention, in step S1, the obtaining of the similarity between the symptom to be evaluated sample and the symptom reference sample includes:
performing time sequence decomposition on the original time sequence of the reference sample to obtain a time sequence decomposition result;
determining a similarity evaluation index set according to the time sequence decomposition result;
and according to the similarity evaluation index set, obtaining the similarity between the sample to be evaluated and the reference sample.
In an alternative embodiment of the present invention, in step S1, performing time sequence decomposition on the original time sequence of the reference sample to obtain a time sequence decomposition result, including:
performing index decomposition on the original time sequence of the reference sample to obtain a trend item, a period item and a residual error item;
and determining a similarity evaluation index set according to the trend item, the period item and the residual error item of the reference sample.
Specifically, STL (temporal composition of Time Series by local), SSA (simple spectral analysis), EMD (empirical Mode composition) may be used to perform index Decomposition on the original Time sequence of the sample to be evaluated.
In an optional embodiment of the present invention, in step S1, obtaining the similarity between the sample to be evaluated and the reference sample according to the similarity evaluation index set includes:
and acquiring a similarity matrix of the sample to be evaluated and the reference sample according to a time sequence decomposition result by adopting a preset time sequence similarity algorithm.
In an optional embodiment of the present invention, in step S2, determining a similarity evaluation index set according to a trend term, a period term, and a residual term of a reference sample, includes:
if the sum of the energy ratios of the period term and the residual term is greater than a first threshold value, determining to use an oscillation type similarity evaluation index; otherwise, performing linear regression on the trend term to obtain a linear regression model, and determining a trend type similarity evaluation index if the assumed probability p-value of the linear regression model is smaller than a second threshold value; otherwise, determining to use the preset shape similarity evaluation index.
Specifically, the first threshold may be 75%, and the second threshold may be 0.05;
linear regression includes polynomials of degree 1, 3, logarithmic transformation, and exponential transformation, among others.
In an alternative embodiment of the present invention, in step S2, the energy ratio of the period term and the residual term is obtained through the following processes:
acquiring the integral variance of the original time sequence of the sample to be evaluated, the variance of the period item and the variance of the residual error item;
periodic term energy ratio = variance of periodic term/overall variance;
residual term energy ratio = variance of residual term/overall variance.
In an optional embodiment of the present invention, in step S2, according to the similarity evaluation index, the evaluating the sample to be evaluated of the symptom to obtain a target sample set, including:
and if the maximum value or the mean value of the similarity evaluation index is larger than a third threshold value, filtering out a sample to be evaluated corresponding to the similarity evaluation index, and determining the residual sample to be evaluated as the preliminarily screened target sample set.
An optional embodiment of the present invention, further comprising:
and acquiring a plurality of evaluation indexes of the sample to be evaluated, carrying out inconsistency evaluation on the plurality of evaluation indexes by a similarity sorting variance method, and reserving the sample to be evaluated with the evaluation index variance larger than a fourth threshold value to obtain a target sample set.
Specifically, the to-be-evaluated sample with low similarity to the reference sample is filtered, and there may be multiple criteria, for example, 1) a maximum value of multiple evaluation indexes > a threshold; 2) and (4) sorting (from small to large) of the samples to be evaluated under each evaluation index. And sorting a sample to be evaluated in a plurality of indexes by a maximum value > threshold value.
The consistency of a sample to be evaluated under various indexes can be evaluated by adopting methods such as similarity sorting variance, collaborative filtering and the like, and the sample to be evaluated with high inconsistency is selected as a target sample.
The size of the threshold and the degree of inconsistency can be set according to actual needs.
The following is a specific example of the time-series similarity evaluation index:
the similarity evaluation index types may include: a trend type similarity evaluation index, an oscillation type similarity evaluation index and a preset shape type similarity evaluation index;
trend type similarity evaluation indexes, wherein the corresponding main factors comprise a slope and a rise and a fall; corresponding alternative distance indicators include: "COR", "CORT"; the method comprises the following steps of (1) evaluating an oscillation type similarity index, wherein corresponding main factors comprise period, correlation and amplitude; corresponding alternative distance indicators include: "ACF", "AR.LPC.CEPS", "AR.MAH", "AR.PIC", "PACF", "SPEC.LLR", "SPEC.GLK", "PER"
Presetting shape similarity evaluation indexes, wherein the corresponding main factors comprise a mean value, an amplitude and a phase; corresponding alternative distance indicators include: DTWARP and EUCL, morphology identity degree; corresponding alternative distance indicators include: "MINDIST. SAX";
the above "COR", "CORT", "ACF", "ar.lpc.ceps", "ar.mah", "ar.pic", "PACF", "spec.llr", "spec.glk", "PER", "DTWARP", "EUCL", "mindist.sax" respectively represent the metric functions at different distances.
In an alternative embodiment of the present invention, step S2 further includes:
and clustering the samples in the target sample set to obtain a clustering result.
Specifically, algorithms such as kmeans, PAM, hierarchical clustering and the like can be adopted for clustering, so that the complexity is reduced.
The method for determining the target sample is described below by an embodiment:
for example, the depiction of "grass hat wind" in wind power:
1) fig. 2 to 13 are schematic diagrams of a reference sample and 11 samples to be evaluated; tables 2 and 3 show the distance values between the 11 samples to be evaluated and the reference sample under 18 different distance measurement functions, and fig. 16 is a schematic diagram showing the distances between the 11 samples to be evaluated and the reference sample under 18 different distance measurement functions;
among them, the measurement functions of 18 different distances (as shown in tables 2 and 3) include: "ACF", "ar.lpc.ceps", "ar.mah", "ar.pic", "CDM", "CID", "COR", "CORT", "DTWARP", "EUCL", "int.per", "NCD", "PACF", "PDC", "PER", "mindist.sax", "spec.llr", "spec.glk"; "Seq" is the code number of the sample to be evaluated;
TABLE 1
Figure 95496DEST_PATH_IMAGE002
TABLE 2
Figure 901909DEST_PATH_IMAGE004
As can be seen by comparing the data in table 2 and table 3, the similarity (distance value) between the 11 samples to be evaluated and the reference sample is different under the metric functions of different distances; for example, as shown in fig. 13 and 14, at the euclidean distance (EUCL), the second sample to be evaluated is slightly different from the reference sample by a distance of 60.6 compared to the other samples to be evaluated, but at the int.
2) Eliminating dissimilar samples to be evaluated;
as can be seen from comparing the data in table 2 and table 3, the eleventh sample to be evaluated is the sample with the worst similarity to the reference sample under the measurement functions of 18 different distances, so that the eleventh sample to be evaluated is eliminated, and the remaining 10 samples to be evaluated are retained.
3) As shown in tables 3 and 4 below, from the remaining 10 samples to be evaluated in step 2), samples with strong inconsistency were selected.
TABLE 3
Figure 984134DEST_PATH_IMAGE006
TABLE 4
Figure 32731DEST_PATH_IMAGE008
TABLE 5
Figure 618433DEST_PATH_IMAGE010
As shown in table 5, by using the similarity ranking variance method, the mean and variance of the distances of the 10 samples to be evaluated under the measurement functions of 18 different distances are calculated, and the variances of the 10 samples to be evaluated are compared, so that the variance values of the fifth, seventh, and tenth samples to be evaluated are 3.5, 3.9, and 3.5, respectively, and are larger than the rest samples to be evaluated, and therefore, the samples to be evaluated 5, 7, and 10 with high variance are retained as target samples according to the variance of the samples to be evaluated.
Therefore, the following steps are carried out: in the study and judgment of the straw hat wind, the width of the top of the schematic diagram of the sample to be evaluated is not important, and the left and the right are not symmetrical, so that the target sample can be accurately selected by the method.
According to the technical scheme of the embodiment of the invention, the similarity between the sample to be evaluated and the reference sample is evaluated from a plurality of similarity evaluation indexes; according to the time sequence characteristic analysis, automatically selecting a proper time sequence distance function; among similar samples, samples with high confusion are preferentially selected (inconsistency evaluated according to multiple indexes), and according to the concentration degree of target samples, in order to reduce the complexity of data, the samples with high confusion are clustered, so that a more exact case can be provided for modeling.
Embodiments of the present invention also provide a system for determining a target sample of industrial equipment symptoms, comprising:
the acquisition module is used for acquiring the similarity between a sample to be evaluated of a symptom and a reference sample of the symptom;
a processing module, configured to perform evaluation processing on the to-be-evaluated sample of the symptom according to the similarity, so as to obtain a target sample set, where the target sample set includes: and the confusion degree with the sample to be evaluated of the symptom is larger than the target sample of the symptom with the preset value.
Optionally, the obtaining of the similarity between the sample to be evaluated of the symptom and the reference sample of the symptom includes:
performing time sequence decomposition on the original time sequence of the reference sample to obtain a time sequence decomposition result;
determining a similarity evaluation index set according to the time sequence decomposition result;
and according to the similarity evaluation index set, obtaining the similarity between the sample to be evaluated and the reference sample.
Optionally, performing time sequence decomposition on the original time sequence of the reference sample to obtain a time sequence decomposition result, including:
performing time sequence decomposition on the original time sequence of the reference sample to obtain a trend item, a period item and a residual error item;
and determining a similarity evaluation index set according to the trend item, the period item and the residual error item of the reference sample.
Optionally, obtaining the similarity between the sample to be evaluated and the reference sample according to the similarity evaluation index set includes:
and acquiring a similarity matrix of the sample to be evaluated and the reference sample according to a time sequence decomposition result by adopting a preset time sequence similarity algorithm.
Optionally, obtaining a plurality of similarity evaluation indexes according to the time sequence decomposition result of the reference sample, including:
if the sum of the energy ratios of the period term and the residual term is greater than a first threshold value, determining to use an oscillation type similarity evaluation index; otherwise, performing linear regression on the trend term to obtain a linear regression model, and determining a trend type similarity evaluation index if the assumed probability p-value of the linear regression model is smaller than a second threshold value; otherwise, determining to use the preset shape similarity evaluation index.
Optionally, the energy ratio of the period term and the residual term is obtained through the following processes:
acquiring the integral variance of the original time sequence of the sample to be evaluated, the variance of the period item and the variance of the residual error item;
periodic term energy ratio = variance of periodic term/overall variance;
residual term energy ratio = variance of residual term/overall variance.
Optionally, according to the similarity evaluation index, evaluating the sample to be evaluated of the symptom to obtain a target sample set, including:
and if the maximum value or the mean value of the similarity evaluation index is larger than a third threshold value, filtering out a sample to be evaluated corresponding to the similarity evaluation index, and determining the residual sample to be evaluated as the preliminarily screened target sample set.
Optionally, the method further includes:
and acquiring a plurality of evaluation indexes of the sample to be evaluated, carrying out inconsistency evaluation on the plurality of evaluation indexes by a similarity sorting variance method, and reserving the sample to be evaluated with the evaluation index variance larger than a fourth threshold value to obtain a target sample set.
Optionally, the method for determining a target sample of the industrial equipment sign further includes:
and clustering the samples in the target sample set to obtain a clustering result.
The technical scheme of the invention can realize that the 'plausible' case is automatically searched in historical data through a small number of cases, so that an expert can distinguish effective symptoms and false symptoms in a more targeted way, and a more exact case is provided for modeling.
An embodiment of the present invention also provides a processor-readable storage medium, which stores a computer program for causing a processor to execute the method as described above. All the implementation manners in the above method embodiment are applicable to the embodiment of the system, and the same technical effect can be achieved.
Further, it is noted that in the system and method of the present invention, it is apparent that each component or each step may be decomposed and/or recombined. These decompositions and/or recombinations are to be regarded as equivalents of the present invention. Also, the steps of performing the series of processes described above may naturally be performed chronologically in the order described, but need not necessarily be performed chronologically, and some steps may be performed in parallel or independently of each other. It will be understood by those of ordinary skill in the art that all or any of the steps or elements of the method and system of the present invention may be implemented in any computing system (including processors, storage media, etc.) or network of computing systems, in hardware, firmware, software, or any combination thereof, which can be implemented by those of ordinary skill in the art using their basic programming skills after reading the description of the present invention.
Thus, the objects of the invention may also be achieved by running a program or a set of programs on any computing system. The computing system may be a well known general purpose system. Thus, the objects of the invention may also be realized by providing only a program product comprising program code for implementing the method or system. That is, such a program product also constitutes the present invention, and a storage medium storing such a program product also constitutes the present invention. It is to be understood that the storage medium may be any known storage medium or any storage medium developed in the future. It is also noted that, in the systems and methods of the present invention, it is apparent that individual components or steps may be disassembled and/or reassembled. These decompositions and/or recombinations are to be regarded as equivalents of the present invention. Also, the steps of executing the series of processes described above may naturally be executed chronologically in the order described, but need not necessarily be executed chronologically. Some steps may be performed in parallel or independently of each other.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (8)

1. A method of determining a target sample of industrial equipment symptoms, comprising:
according to the similarity evaluation index set, obtaining the similarity of a sample to be evaluated of the symptom and a reference sample of the symptom; the similarity evaluation index set comprises: a trend type similarity evaluation index, an oscillation type similarity evaluation index and a preset shape type similarity evaluation index; the trend type similarity evaluation indexes comprise slope and elevation; the oscillation type similarity evaluation indexes comprise period, correlation and amplitude; the preset shape type similarity evaluation indexes comprise a mean value, an amplitude and a phase;
evaluating the sample to be evaluated of the symptom according to the similarity to obtain a target sample set, wherein the target sample set comprises: the confusion degree of the target sample and the sample to be evaluated of the symptom is larger than the target sample of the symptom of the preset value;
evaluating the sample to be evaluated of the symptom according to the similarity to obtain a target sample set, wherein the evaluating comprises the following steps:
if the maximum value or the mean value of the similarity evaluation index is larger than a third threshold value, filtering a sample to be evaluated corresponding to the similarity evaluation index, and determining the remaining sample to be evaluated as the preliminarily screened target sample set;
obtaining a plurality of evaluation indexes of the sample to be evaluated, carrying out inconsistency evaluation on the plurality of evaluation indexes by a similarity sorting variance method, and reserving the sample to be evaluated with the evaluation index variance larger than a fourth threshold value to obtain a target sample set;
the trend type similarity evaluation index, the oscillation type similarity evaluation index and the preset shape type similarity evaluation index respectively correspond to different distance indexes, the distance indexes represent measurement functions, and the distance values of different samples to be evaluated and reference samples are different under different measurement functions.
2. The method of claim 1, wherein the set of similarity evaluation indicators is determined by:
performing time sequence decomposition on the original time sequence of the reference sample to obtain a time sequence decomposition result;
and determining a similarity evaluation index set according to the time sequence decomposition result.
3. The method for determining the target sample of the industrial equipment sign according to claim 2, wherein the time sequence decomposition is performed on the original time sequence of the reference sample, and a time sequence decomposition result is obtained, and the method comprises the following steps:
performing time sequence decomposition on the original time sequence of the reference sample to obtain a trend item, a period item and a residual error item;
and determining a similarity evaluation index set according to the trend item, the period item and the residual error item of the reference sample.
4. The method for determining the target sample of the industrial equipment sign according to claim 3, wherein the obtaining the similarity between the sample to be evaluated and the reference sample according to the similarity evaluation index set comprises:
and acquiring a similarity matrix of the sample to be evaluated and the reference sample according to a time sequence decomposition result by adopting a preset time sequence similarity algorithm.
5. The method for determining the target sample of the industrial equipment symptom according to claim 3, wherein determining the similarity evaluation index set according to the trend term, the period term and the residual term of the reference sample comprises:
if the sum of the energy ratios of the period term and the residual term is greater than a first threshold value, determining to use an oscillation type similarity evaluation index; otherwise, performing linear regression on the trend term to obtain a linear regression model, and determining a trend type similarity evaluation index if the assumed probability p-value of the linear regression model is smaller than a second threshold value; otherwise, determining to use the preset shape type similarity evaluation index.
6. The method for determining the target sample of the industrial equipment sign according to claim 5, wherein the energy ratio of the period term and the residual term is obtained by the following processes:
acquiring the integral variance of the original time sequence of the sample to be evaluated, the variance of the period item and the variance of the residual error item;
periodic term energy ratio = variance of periodic term/overall variance;
residual term energy ratio = variance of residual term/overall variance.
7. The method of determining a target sample of industrial equipment signs according to claim 1, further comprising: and clustering the samples in the target sample set to obtain a clustering result.
8. A system for determining a target sample of industrial equipment symptoms, comprising:
the acquisition module is used for acquiring the similarity of a sample to be evaluated of a symptom and a reference sample of the symptom according to the similarity evaluation index set; the similarity evaluation index set comprises: a trend type similarity evaluation index, an oscillation type similarity evaluation index and a preset shape type similarity evaluation index; the trend type similarity evaluation indexes comprise slope and elevation; the oscillation type similarity evaluation indexes comprise period, correlation and amplitude; the preset shape type similarity evaluation indexes comprise a mean value, an amplitude and a phase;
a processing module, configured to perform evaluation processing on the to-be-evaluated sample of the symptom according to the similarity, so as to obtain a target sample set, where the target sample set includes: the confusion degree of the target sample and the sample to be evaluated of the symptom is larger than the target sample of the symptom of the preset value;
evaluating the sample to be evaluated of the symptom according to the similarity to obtain a target sample set, wherein the evaluating comprises the following steps:
if the maximum value or the mean value of the similarity evaluation index is larger than a third threshold value, filtering a sample to be evaluated corresponding to the similarity evaluation index, and determining the remaining sample to be evaluated as the preliminarily screened target sample set;
obtaining a plurality of evaluation indexes of the sample to be evaluated, carrying out inconsistency evaluation on the plurality of evaluation indexes by a similarity sorting variance method, and reserving the sample to be evaluated with the evaluation index variance larger than a fourth threshold value to obtain a target sample set;
the trend type similarity evaluation index, the oscillation type similarity evaluation index and the preset shape type similarity evaluation index respectively correspond to different distance indexes, the distance indexes represent measurement functions, and the distance values of different samples to be evaluated and reference samples are different under different measurement functions.
CN202110194751.1A 2021-02-21 2021-02-21 Method and system for determining target sample of industrial equipment symptom Active CN112559602B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110194751.1A CN112559602B (en) 2021-02-21 2021-02-21 Method and system for determining target sample of industrial equipment symptom

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110194751.1A CN112559602B (en) 2021-02-21 2021-02-21 Method and system for determining target sample of industrial equipment symptom

Publications (2)

Publication Number Publication Date
CN112559602A CN112559602A (en) 2021-03-26
CN112559602B true CN112559602B (en) 2021-07-13

Family

ID=75034395

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110194751.1A Active CN112559602B (en) 2021-02-21 2021-02-21 Method and system for determining target sample of industrial equipment symptom

Country Status (1)

Country Link
CN (1) CN112559602B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101135601A (en) * 2007-10-18 2008-03-05 北京英华达电力电子工程科技有限公司 Rotating machinery vibrating failure diagnosis device and method
CN104572985A (en) * 2015-01-04 2015-04-29 大连理工大学 Industrial data sample screening method based on complex network community discovery
CN107194430A (en) * 2017-05-27 2017-09-22 北京三快在线科技有限公司 A kind of screening sample method and device, electronic equipment
CN108197638A (en) * 2017-12-12 2018-06-22 阿里巴巴集团控股有限公司 The method and device classified to sample to be assessed
CN110135492A (en) * 2019-05-13 2019-08-16 山东大学 Equipment fault diagnosis and method for detecting abnormality and system based on more Gauss models
CN111324637A (en) * 2020-02-05 2020-06-23 北京工业大数据创新中心有限公司 Fault symptom searching method and system for industrial time sequence data
CN111340144A (en) * 2020-05-15 2020-06-26 支付宝(杭州)信息技术有限公司 Risk sample detection method and device, electronic equipment and storage medium
CN111897695A (en) * 2020-07-31 2020-11-06 平安科技(深圳)有限公司 Method and device for acquiring KPI abnormal data sample and computer equipment
CN111931872A (en) * 2020-09-27 2020-11-13 北京工业大数据创新中心有限公司 Method and device for determining abnormity of trend symptom
CN112257423A (en) * 2020-10-21 2021-01-22 北京工业大数据创新中心有限公司 Equipment symptom information acquisition method and device and equipment operation and maintenance system
CN112270379A (en) * 2020-11-13 2021-01-26 北京百度网讯科技有限公司 Training method of classification model, sample classification method, device and equipment
CN112381185A (en) * 2021-01-15 2021-02-19 北京工业大数据创新中心有限公司 Industrial equipment characteristic curve similarity obtaining method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013257251A (en) * 2012-06-14 2013-12-26 Internatl Business Mach Corp <Ibm> Anomaly detection method, program, and system
CN108399414B (en) * 2017-02-08 2021-06-01 南京航空航天大学 Sample selection method and device applied to cross-modal data retrieval field
CN109508558B (en) * 2018-10-31 2022-11-18 创新先进技术有限公司 Data validity verification method, device and equipment
CN112200273B (en) * 2020-12-07 2021-05-07 长沙海信智能系统研究院有限公司 Data annotation method, device, equipment and computer storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101135601A (en) * 2007-10-18 2008-03-05 北京英华达电力电子工程科技有限公司 Rotating machinery vibrating failure diagnosis device and method
CN104572985A (en) * 2015-01-04 2015-04-29 大连理工大学 Industrial data sample screening method based on complex network community discovery
CN107194430A (en) * 2017-05-27 2017-09-22 北京三快在线科技有限公司 A kind of screening sample method and device, electronic equipment
CN108197638A (en) * 2017-12-12 2018-06-22 阿里巴巴集团控股有限公司 The method and device classified to sample to be assessed
CN110135492A (en) * 2019-05-13 2019-08-16 山东大学 Equipment fault diagnosis and method for detecting abnormality and system based on more Gauss models
CN111324637A (en) * 2020-02-05 2020-06-23 北京工业大数据创新中心有限公司 Fault symptom searching method and system for industrial time sequence data
CN111340144A (en) * 2020-05-15 2020-06-26 支付宝(杭州)信息技术有限公司 Risk sample detection method and device, electronic equipment and storage medium
CN111897695A (en) * 2020-07-31 2020-11-06 平安科技(深圳)有限公司 Method and device for acquiring KPI abnormal data sample and computer equipment
CN111931872A (en) * 2020-09-27 2020-11-13 北京工业大数据创新中心有限公司 Method and device for determining abnormity of trend symptom
CN112257423A (en) * 2020-10-21 2021-01-22 北京工业大数据创新中心有限公司 Equipment symptom information acquisition method and device and equipment operation and maintenance system
CN112270379A (en) * 2020-11-13 2021-01-26 北京百度网讯科技有限公司 Training method of classification model, sample classification method, device and equipment
CN112381185A (en) * 2021-01-15 2021-02-19 北京工业大数据创新中心有限公司 Industrial equipment characteristic curve similarity obtaining method and device

Also Published As

Publication number Publication date
CN112559602A (en) 2021-03-26

Similar Documents

Publication Publication Date Title
CN109387712B (en) Non-invasive load detection and decomposition method based on state matrix decision tree
Chow et al. Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information
Schoknecht et al. Similarity of business process models—a state-of-the-art analysis
Sun et al. Identifying and correcting mislabeled training instances
CN112148955A (en) Method and system for detecting abnormal time sequence data of Internet of things
EP3311311A1 (en) Automatic entity resolution with rules detection and generation system
Abd-El-Hafiz A metrics-based data mining approach for software clone detection
Zhao et al. A novel multivariate time-series anomaly detection approach using an unsupervised deep neural network
Brockhoff et al. Time-aware concept drift detection using the earth mover’s distance
US10311067B2 (en) Device and method for classifying and searching data
Taufiq Classification method of multi-class on C4. 5 algorithm for fish diseases
Banda et al. An experimental evaluation of popular image parameters for monochromatic solar image categorization
CN111782491B (en) Disk failure prediction method, device, equipment and storage medium
Azzalini et al. A minimally supervised approach based on variational autoencoders for anomaly detection in autonomous robots
Pett et al. Stability of product-line samplingin continuous integration
CN112463848A (en) Method, system, device and storage medium for detecting abnormal user behavior
CN115456107A (en) Time series abnormity detection system and method
Wu et al. Multiscale jump testing and estimation under complex temporal dynamics
CN112559602B (en) Method and system for determining target sample of industrial equipment symptom
US20120109860A1 (en) Enhanced Training Data for Learning-To-Rank
CN117290404A (en) Method and system for rapidly searching and practical main distribution network fault processing method
CN117170915A (en) Data center equipment fault prediction method and device and computer equipment
Achar et al. Statistical significance of episodes with general partial orders
CN112734072A (en) Power load prediction method, system, terminal device and medium
CN113962216A (en) Text processing method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant