CN112559602A - Method and system for determining target sample of industrial equipment symptom - Google Patents
Method and system for determining target sample of industrial equipment symptom Download PDFInfo
- Publication number
- CN112559602A CN112559602A CN202110194751.1A CN202110194751A CN112559602A CN 112559602 A CN112559602 A CN 112559602A CN 202110194751 A CN202110194751 A CN 202110194751A CN 112559602 A CN112559602 A CN 112559602A
- Authority
- CN
- China
- Prior art keywords
- sample
- evaluated
- similarity
- determining
- symptom
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2474—Sequence data queries, e.g. querying versioned data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method and a system for determining a target sample of an industrial equipment symptom, wherein the method comprises the following steps: acquiring the similarity of a sample to be evaluated of a symptom and a reference sample of the symptom; evaluating the sample to be evaluated of the symptom according to the similarity to obtain a target sample set, wherein the target sample set comprises: and the confusion degree with the sample to be evaluated of the symptom is larger than the target sample of the symptom with the preset value. According to the technical scheme, the 'plausible' cases are automatically searched in historical data through a small number of cases, so that experts can distinguish effective symptoms and false symptoms in a targeted mode, and a more exact case is provided for modeling.
Description
Technical Field
The invention relates to the technical field of industry, in particular to a method and a system for determining a target sample of an industrial equipment symptom.
Background
When characterizing the logic of a device failure, a description of the symptoms is usually involved. The symptoms are visual, but quantitative depiction is not easy; the symptoms are a time series of key technical indexes (such as temperature, leakage amount, 1X amplitude of vibration and the like), and represent a certain trend (such as slow rise/fall) or form (such as existence of burrs, oscillation and the like).
The current case collecting means are as follows: the situation coverage of the case is very limited, which is not beneficial to the precise depiction during modeling, and comprises the following aspects:
1) past failure cases: the number is usually very limited and the coverage of the scene is limited
2) Domain experts' own memory or manual review: coverage is too low for IT modeling or implementation
3) The IT expert found in the test: the iteration period is long, and the requirement on intelligence and responsibility of IT experts is high.
Disclosure of Invention
The embodiment of the invention provides a method and a system for determining a target sample of industrial equipment symptoms, which are used for automatically searching 'plausible' cases in historical data through a small number of cases as samples to be evaluated so as to determine the target sample, and according to the target samples in a target sample set, which symptoms are effective symptoms and which are false symptoms can be more easily distinguished.
In order to solve the above technical problem, an embodiment of the present invention provides the following technical solutions:
a method of determining a target sample of industrial equipment symptoms, comprising:
acquiring the similarity of a sample to be evaluated of a symptom and a reference sample of the symptom;
evaluating the sample to be evaluated of the symptom according to the similarity to obtain a target sample set, wherein the target sample set comprises: and the confusion degree with the sample to be evaluated of the symptom is larger than the target sample of the symptom with the preset value.
Optionally, the obtaining of the similarity between the sample to be evaluated of the symptom and the reference sample of the symptom includes:
performing time sequence decomposition on the original time sequence of the reference sample to obtain a time sequence decomposition result;
determining a similarity evaluation index set according to the time sequence decomposition result;
and according to the similarity evaluation index set, obtaining the similarity between the sample to be evaluated and the reference sample.
Optionally, performing time sequence decomposition on the original time sequence of the reference sample to obtain a time sequence decomposition result, including:
performing time sequence decomposition on the original time sequence of the reference sample to obtain a trend item, a period item and a residual error item;
and determining a similarity evaluation index set according to the trend item, the period item and the residual error item of the reference sample.
Optionally, obtaining the similarity between the sample to be evaluated and the reference sample according to the similarity evaluation index set includes:
and acquiring a similarity matrix of the sample to be evaluated and the reference sample according to a time sequence decomposition result by adopting a preset time sequence similarity algorithm.
Optionally, determining a similarity evaluation index set according to the trend term, the period term, and the residual term of the reference sample, including:
if the sum of the energy ratios of the period term and the residual term is greater than a first threshold value, determining to use an oscillation type similarity evaluation index; otherwise, performing linear regression on the trend term to obtain a linear regression model, and determining a trend type similarity evaluation index if the assumed probability p-value of the linear regression model is smaller than a second threshold value; otherwise, determining to use the preset shape similarity evaluation index.
Optionally, the energy ratio of the period term and the residual term is obtained through the following processes:
acquiring the integral variance of the original time sequence of the sample to be evaluated, the variance of the period item and the variance of the residual error item;
periodic term energy ratio = variance of periodic term/overall variance;
residual term energy ratio = variance of residual term/overall variance.
Optionally, according to the similarity evaluation index, evaluating the sample to be evaluated of the symptom to obtain a target sample set, including:
and if the maximum value or the mean value of the similarity evaluation index is larger than a third threshold value, filtering out a sample to be evaluated corresponding to the similarity evaluation index, and determining the residual sample to be evaluated as the preliminarily screened target sample set.
Optionally, the method further includes:
and acquiring a plurality of evaluation indexes of the sample to be evaluated, carrying out inconsistency evaluation on the plurality of evaluation indexes by a similarity sorting variance method, and reserving the sample to be evaluated with the evaluation index variance larger than a fourth threshold value to obtain a target sample set.
Optionally, the method for determining a target sample of the industrial equipment sign further includes:
and clustering the samples in the target sample set to obtain a clustering result.
Embodiments of the present invention also provide a system for determining a target sample of industrial equipment symptoms, comprising:
the acquisition module is used for acquiring the similarity between a sample to be evaluated of a symptom and a reference sample of the symptom;
a processing module, configured to perform evaluation processing on the to-be-evaluated sample of the symptom according to the similarity, so as to obtain a target sample set, where the target sample set includes: and the confusion degree with the sample to be evaluated of the symptom is larger than the target sample of the symptom with the preset value.
The embodiment of the invention has the following technical effects:
according to the technical scheme of the invention, under different evaluation indexes, the characteristics of emphasis are different, the sequence similarity of the sample to be evaluated and the reference sample is different, and some samples are easy to be confused: similar to the case under some indexes, and not similar to the case under other indexes; according to the technical scheme, the target sample set is determined more accurately by obtaining the samples with the confusion degree greater than the preset value in the samples to be evaluated, so that the symptom studying and judging rule is optimized.
Drawings
FIG. 1 is a schematic flow chart of a method for determining a target sample of an industrial equipment symptom provided by an embodiment of the invention;
FIG. 2 is a schematic diagram of a reference sample provided in an embodiment of the present invention;
FIG. 3 is a schematic diagram of a first sample to be evaluated according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a second sample to be evaluated according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a third sample to be evaluated according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a fourth sample to be evaluated according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a fifth sample to be evaluated according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a sixth sample to be evaluated according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of a seventh sample to be evaluated according to an embodiment of the present invention;
fig. 10 is a schematic diagram of an eighth sample to be evaluated according to an embodiment of the present invention;
fig. 11 is a schematic diagram of a ninth sample to be evaluated according to an embodiment of the present invention;
fig. 12 is a schematic diagram of a tenth sample to be evaluated according to an embodiment of the present invention;
fig. 13 is a schematic view of an eleventh sample to be evaluated according to an embodiment of the present invention;
FIG. 14 is an enlarged schematic view of a reference sample provided in an embodiment of the present invention;
FIG. 15 is an enlarged schematic view of a second sample under evaluation provided by an embodiment of the present invention;
fig. 16 shows the performance of the reference sample and eleven sequences to be evaluated under 18 different distance metric functions according to the embodiment of the present invention.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
As shown in fig. 1, an embodiment of the present invention provides a method for determining a target sample of an industrial equipment symptom, including:
s1, acquiring the similarity of the sample to be evaluated of the symptom and the reference sample of the symptom;
s2, evaluating the sample to be evaluated of the symptom according to the similarity to obtain a target sample set, wherein the target sample set comprises: and the confusion degree with the sample to be evaluated of the symptom is larger than the target sample of the symptom with the preset value.
In the embodiment of the invention, according to different evaluation indexes, the emphasis properties are different, the sequence similarity of the sample to be evaluated and the reference sample is different, and some samples which are easy to be confused: similar to the case under some indexes, and not similar to the case under other indexes; according to the embodiment of the invention, the target sample set is determined more accurately by acquiring the samples with the confusion degree greater than the preset value in the samples to be evaluated, so that the symptom studying and judging rule is optimized.
In an optional embodiment of the present invention, in step S1, the obtaining of the similarity between the symptom to be evaluated sample and the symptom reference sample includes:
performing time sequence decomposition on the original time sequence of the reference sample to obtain a time sequence decomposition result;
determining a similarity evaluation index set according to the time sequence decomposition result;
and according to the similarity evaluation index set, obtaining the similarity between the sample to be evaluated and the reference sample.
In an alternative embodiment of the present invention, in step S1, performing time sequence decomposition on the original time sequence of the reference sample to obtain a time sequence decomposition result, including:
performing index decomposition on the original time sequence of the reference sample to obtain a trend item, a period item and a residual error item;
and determining a similarity evaluation index set according to the trend item, the period item and the residual error item of the reference sample.
Specifically, STL (temporal composition of Time Series by local), SSA (simple spectral analysis), EMD (empirical Mode composition) may be used to perform index Decomposition on the original Time sequence of the sample to be evaluated.
In an optional embodiment of the present invention, in step S1, obtaining the similarity between the sample to be evaluated and the reference sample according to the similarity evaluation index set includes:
and acquiring a similarity matrix of the sample to be evaluated and the reference sample according to a time sequence decomposition result by adopting a preset time sequence similarity algorithm.
In an optional embodiment of the present invention, in step S2, determining a similarity evaluation index set according to a trend term, a period term, and a residual term of a reference sample, includes:
if the sum of the energy ratios of the period term and the residual term is greater than a first threshold value, determining to use an oscillation type similarity evaluation index; otherwise, performing linear regression on the trend term to obtain a linear regression model, and determining a trend type similarity evaluation index if the assumed probability p-value of the linear regression model is smaller than a second threshold value; otherwise, determining to use the preset shape similarity evaluation index.
Specifically, the first threshold may be 75%, and the second threshold may be 0.05;
linear regression includes polynomials of degree 1, 3, logarithmic transformation, and exponential transformation, among others.
In an alternative embodiment of the present invention, in step S2, the energy ratio of the period term and the residual term is obtained through the following processes:
acquiring the integral variance of the original time sequence of the sample to be evaluated, the variance of the period item and the variance of the residual error item;
periodic term energy ratio = variance of periodic term/overall variance;
residual term energy ratio = variance of residual term/overall variance.
In an optional embodiment of the present invention, in step S2, according to the similarity evaluation index, the evaluating the sample to be evaluated of the symptom to obtain a target sample set, including:
and if the maximum value or the mean value of the similarity evaluation index is larger than a third threshold value, filtering out a sample to be evaluated corresponding to the similarity evaluation index, and determining the residual sample to be evaluated as the preliminarily screened target sample set.
An optional embodiment of the present invention, further comprising:
and acquiring a plurality of evaluation indexes of the sample to be evaluated, carrying out inconsistency evaluation on the plurality of evaluation indexes by a similarity sorting variance method, and reserving the sample to be evaluated with the evaluation index variance larger than a fourth threshold value to obtain a target sample set.
Specifically, the to-be-evaluated sample with low similarity to the reference sample is filtered, and there may be multiple criteria, for example, 1) a maximum value of multiple evaluation indexes > a threshold; 2) and (4) sorting (from small to large) of the samples to be evaluated under each evaluation index. And sorting a sample to be evaluated in a plurality of indexes by a maximum value > threshold value.
The consistency of a sample to be evaluated under various indexes can be evaluated by adopting methods such as similarity sorting variance, collaborative filtering and the like, and the sample to be evaluated with high inconsistency is selected as a target sample.
The size of the threshold and the degree of inconsistency can be set according to actual needs.
The following is a specific example of the time-series similarity evaluation index:
the similarity evaluation index types may include: a trend type similarity evaluation index, an oscillation type similarity evaluation index and a preset shape type similarity evaluation index;
trend type similarity evaluation indexes, wherein the corresponding main factors comprise a slope and a rise and a fall; corresponding alternative distance indicators include: "COR", "CORT"; the method comprises the following steps of (1) evaluating an oscillation type similarity index, wherein corresponding main factors comprise period, correlation and amplitude; corresponding alternative distance indicators include: "ACF", "AR.LPC.CEPS", "AR.MAH", "AR.PIC", "PACF", "SPEC.LLR", "SPEC.GLK", "PER"
Presetting shape similarity evaluation indexes, wherein the corresponding main factors comprise a mean value, an amplitude and a phase; corresponding alternative distance indicators include: DTWARP and EUCL, morphology identity degree; corresponding alternative distance indicators include: "MINDIST. SAX";
the above "COR", "CORT", "ACF", "ar.lpc.ceps", "ar.mah", "ar.pic", "PACF", "spec.llr", "spec.glk", "PER", "DTWARP", "EUCL", "mindist.sax" respectively represent the metric functions at different distances.
In an alternative embodiment of the present invention, step S2 further includes:
and clustering the samples in the target sample set to obtain a clustering result.
Specifically, algorithms such as kmeans, PAM, hierarchical clustering and the like can be adopted for clustering, so that the complexity is reduced.
The method for determining the target sample is described below by an embodiment:
for example, the depiction of "grass hat wind" in wind power:
1) fig. 2 to 13 are schematic diagrams of a reference sample and 11 samples to be evaluated; tables 2 and 3 show the distance values between the 11 samples to be evaluated and the reference sample under 18 different distance measurement functions, and fig. 16 is a schematic diagram showing the distances between the 11 samples to be evaluated and the reference sample under 18 different distance measurement functions;
among them, the measurement functions of 18 different distances (as shown in tables 2 and 3) include: "ACF", "ar.lpc.ceps", "ar.mah", "ar.pic", "CDM", "CID", "COR", "CORT", "DTWARP", "EUCL", "int.per", "NCD", "PACF", "PDC", "PER", "mindist.sax", "spec.llr", "spec.glk"; "Seq" is the code number of the sample to be evaluated;
TABLE 1
TABLE 2
As can be seen by comparing the data in table 2 and table 3, the similarity (distance value) between the 11 samples to be evaluated and the reference sample is different under the metric functions of different distances; for example, as shown in fig. 13 and 14, at the euclidean distance (EUCL), the second sample to be evaluated is slightly different from the reference sample by a distance of 60.6 compared to the other samples to be evaluated, but at the int.
2) Eliminating dissimilar samples to be evaluated;
as can be seen from comparing the data in table 2 and table 3, the eleventh sample to be evaluated is the sample with the worst similarity to the reference sample under the measurement functions of 18 different distances, so that the eleventh sample to be evaluated is eliminated, and the remaining 10 samples to be evaluated are retained.
3) As shown in tables 3 and 4 below, from the remaining 10 samples to be evaluated in step 2), samples with strong inconsistency were selected.
TABLE 3
TABLE 4
TABLE 5
As shown in table 5, by using the similarity ranking variance method, the mean and variance of the distances of the 10 samples to be evaluated under the measurement functions of 18 different distances are calculated, and the variances of the 10 samples to be evaluated are compared, so that the variance values of the fifth, seventh, and tenth samples to be evaluated are 3.5, 3.9, and 3.5, respectively, and are larger than the rest samples to be evaluated, and therefore, the samples to be evaluated 5, 7, and 10 with high variance are retained as target samples according to the variance of the samples to be evaluated.
Therefore, the following steps are carried out: in the study and judgment of the straw hat wind, the width of the top of the schematic diagram of the sample to be evaluated is not important, and the left and the right are not symmetrical, so that the target sample can be accurately selected by the method.
According to the technical scheme of the embodiment of the invention, the similarity between the sample to be evaluated and the reference sample is evaluated from a plurality of similarity evaluation indexes; according to the time sequence characteristic analysis, automatically selecting a proper time sequence distance function; among similar samples, samples with high confusion are preferentially selected (inconsistency evaluated according to multiple indexes), and according to the concentration degree of target samples, in order to reduce the complexity of data, the samples with high confusion are clustered, so that a more exact case can be provided for modeling.
Embodiments of the present invention also provide a system for determining a target sample of industrial equipment symptoms, comprising:
the acquisition module is used for acquiring the similarity between a sample to be evaluated of a symptom and a reference sample of the symptom;
a processing module, configured to perform evaluation processing on the to-be-evaluated sample of the symptom according to the similarity, so as to obtain a target sample set, where the target sample set includes: and the confusion degree with the sample to be evaluated of the symptom is larger than the target sample of the symptom with the preset value.
Optionally, the obtaining of the similarity between the sample to be evaluated of the symptom and the reference sample of the symptom includes:
performing time sequence decomposition on the original time sequence of the reference sample to obtain a time sequence decomposition result;
determining a similarity evaluation index set according to the time sequence decomposition result;
and according to the similarity evaluation index set, obtaining the similarity between the sample to be evaluated and the reference sample.
Optionally, performing time sequence decomposition on the original time sequence of the reference sample to obtain a time sequence decomposition result, including:
performing time sequence decomposition on the original time sequence of the reference sample to obtain a trend item, a period item and a residual error item;
and determining a similarity evaluation index set according to the trend item, the period item and the residual error item of the reference sample.
Optionally, obtaining the similarity between the sample to be evaluated and the reference sample according to the similarity evaluation index set includes:
and acquiring a similarity matrix of the sample to be evaluated and the reference sample according to a time sequence decomposition result by adopting a preset time sequence similarity algorithm.
Optionally, obtaining a plurality of similarity evaluation indexes according to the time sequence decomposition result of the reference sample, including:
if the sum of the energy ratios of the period term and the residual term is greater than a first threshold value, determining to use an oscillation type similarity evaluation index; otherwise, performing linear regression on the trend term to obtain a linear regression model, and determining a trend type similarity evaluation index if the assumed probability p-value of the linear regression model is smaller than a second threshold value; otherwise, determining to use the preset shape similarity evaluation index.
Optionally, the energy ratio of the period term and the residual term is obtained through the following processes:
acquiring the integral variance of the original time sequence of the sample to be evaluated, the variance of the period item and the variance of the residual error item;
periodic term energy ratio = variance of periodic term/overall variance;
residual term energy ratio = variance of residual term/overall variance.
Optionally, according to the similarity evaluation index, evaluating the sample to be evaluated of the symptom to obtain a target sample set, including:
and if the maximum value or the mean value of the similarity evaluation index is larger than a third threshold value, filtering out a sample to be evaluated corresponding to the similarity evaluation index, and determining the residual sample to be evaluated as the preliminarily screened target sample set.
Optionally, the method further includes:
and acquiring a plurality of evaluation indexes of the sample to be evaluated, carrying out inconsistency evaluation on the plurality of evaluation indexes by a similarity sorting variance method, and reserving the sample to be evaluated with the evaluation index variance larger than a fourth threshold value to obtain a target sample set.
Optionally, the method for determining a target sample of the industrial equipment sign further includes:
and clustering the samples in the target sample set to obtain a clustering result.
The technical scheme of the invention can realize that the 'plausible' case is automatically searched in historical data through a small number of cases, so that an expert can distinguish effective symptoms and false symptoms in a more targeted way, and a more exact case is provided for modeling.
An embodiment of the present invention also provides a processor-readable storage medium, which stores a computer program for causing a processor to execute the method as described above. All the implementation manners in the above method embodiment are applicable to the embodiment of the system, and the same technical effect can be achieved.
Further, it is noted that in the system and method of the present invention, it is apparent that each component or each step may be decomposed and/or recombined. These decompositions and/or recombinations are to be regarded as equivalents of the present invention. Also, the steps of performing the series of processes described above may naturally be performed chronologically in the order described, but need not necessarily be performed chronologically, and some steps may be performed in parallel or independently of each other. It will be understood by those of ordinary skill in the art that all or any of the steps or elements of the method and system of the present invention may be implemented in any computing system (including processors, storage media, etc.) or network of computing systems, in hardware, firmware, software, or any combination thereof, which can be implemented by those of ordinary skill in the art using their basic programming skills after reading the description of the present invention.
Thus, the objects of the invention may also be achieved by running a program or a set of programs on any computing system. The computing system may be a well known general purpose system. Thus, the objects of the invention may also be realized by providing only a program product comprising program code for implementing the method or system. That is, such a program product also constitutes the present invention, and a storage medium storing such a program product also constitutes the present invention. It is to be understood that the storage medium may be any known storage medium or any storage medium developed in the future. It is also noted that, in the systems and methods of the present invention, it is apparent that individual components or steps may be disassembled and/or reassembled. These decompositions and/or recombinations are to be regarded as equivalents of the present invention. Also, the steps of executing the series of processes described above may naturally be executed chronologically in the order described, but need not necessarily be executed chronologically. Some steps may be performed in parallel or independently of each other.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (10)
1. A method of determining a target sample of industrial equipment symptoms, comprising:
acquiring the similarity of a sample to be evaluated of a symptom and a reference sample of the symptom;
evaluating the sample to be evaluated of the symptom according to the similarity to obtain a target sample set, wherein the target sample set comprises: and the confusion degree with the sample to be evaluated of the symptom is larger than the target sample of the symptom with the preset value.
2. The method for determining the target sample of the industrial equipment sign according to claim 1, wherein the obtaining of the similarity between the sample to be evaluated of the sign and the reference sample of the sign comprises:
performing time sequence decomposition on the original time sequence of the reference sample to obtain a time sequence decomposition result;
determining a similarity evaluation index set according to the time sequence decomposition result;
and according to the similarity evaluation index set, obtaining the similarity between the sample to be evaluated and the reference sample.
3. The method for determining the target sample of the industrial equipment sign according to claim 2, wherein the time sequence decomposition is performed on the original time sequence of the reference sample, and a time sequence decomposition result is obtained, and the method comprises the following steps:
performing time sequence decomposition on the original time sequence of the reference sample to obtain a trend item, a period item and a residual error item;
and determining a similarity evaluation index set according to the trend item, the period item and the residual error item of the reference sample.
4. The method for determining the target sample of the industrial equipment sign according to claim 3, wherein the obtaining the similarity between the sample to be evaluated and the reference sample according to the similarity evaluation index set comprises:
and acquiring a similarity matrix of the sample to be evaluated and the reference sample according to a time sequence decomposition result by adopting a preset time sequence similarity algorithm.
5. The method for determining the target sample of the industrial equipment symptom according to claim 3, wherein determining the similarity evaluation index set according to the trend term, the period term and the residual term of the reference sample comprises:
if the sum of the energy ratios of the period term and the residual term is greater than a first threshold value, determining to use an oscillation type similarity evaluation index; otherwise, performing linear regression on the trend term to obtain a linear regression model, and determining a trend type similarity evaluation index if the assumed probability p-value of the linear regression model is smaller than a second threshold value; otherwise, determining to use the preset shape similarity evaluation index.
6. The method for determining the target sample of the industrial equipment sign according to claim 5, wherein the energy ratio of the period term and the residual term is obtained by the following processes:
acquiring the integral variance of the original time sequence of the sample to be evaluated, the variance of the period item and the variance of the residual error item;
periodic term energy ratio = variance of periodic term/overall variance;
residual term energy ratio = variance of residual term/overall variance.
7. The method for determining the target sample of the industrial equipment sign according to claim 1, wherein the step of evaluating the sample to be evaluated of the sign according to the similarity evaluation index to obtain a target sample set comprises:
and if the maximum value or the mean value of the similarity evaluation index is larger than a third threshold value, filtering out a sample to be evaluated corresponding to the similarity evaluation index, and determining the residual sample to be evaluated as the preliminarily screened target sample set.
8. The method of determining a target sample of industrial equipment signs according to claim 7, further comprising:
and acquiring a plurality of evaluation indexes of the sample to be evaluated, carrying out inconsistency evaluation on the plurality of evaluation indexes by a similarity sorting variance method, and reserving the sample to be evaluated with the evaluation index variance larger than a fourth threshold value to obtain a target sample set.
9. The method of determining a target sample of industrial equipment signs according to claim 1, further comprising:
and clustering the samples in the target sample set to obtain a clustering result.
10. A system for determining a target sample of industrial equipment symptoms, comprising:
the acquisition module is used for acquiring the similarity between a sample to be evaluated of a symptom and a reference sample of the symptom;
a processing module, configured to perform evaluation processing on the to-be-evaluated sample of the symptom according to the similarity, so as to obtain a target sample set, where the target sample set includes: and the confusion degree with the sample to be evaluated of the symptom is larger than the target sample of the symptom with the preset value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110194751.1A CN112559602B (en) | 2021-02-21 | 2021-02-21 | Method and system for determining target sample of industrial equipment symptom |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110194751.1A CN112559602B (en) | 2021-02-21 | 2021-02-21 | Method and system for determining target sample of industrial equipment symptom |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112559602A true CN112559602A (en) | 2021-03-26 |
CN112559602B CN112559602B (en) | 2021-07-13 |
Family
ID=75034395
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110194751.1A Active CN112559602B (en) | 2021-02-21 | 2021-02-21 | Method and system for determining target sample of industrial equipment symptom |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112559602B (en) |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101135601A (en) * | 2007-10-18 | 2008-03-05 | 北京英华达电力电子工程科技有限公司 | Rotating machinery vibrating failure diagnosis device and method |
US20130338965A1 (en) * | 2012-06-14 | 2013-12-19 | International Business Machines Corporation | Anomaly Detection Method, Program, and System |
CN104572985A (en) * | 2015-01-04 | 2015-04-29 | 大连理工大学 | Industrial data sample screening method based on complex network community discovery |
CN107194430A (en) * | 2017-05-27 | 2017-09-22 | 北京三快在线科技有限公司 | A kind of screening sample method and device, electronic equipment |
CN108197638A (en) * | 2017-12-12 | 2018-06-22 | 阿里巴巴集团控股有限公司 | The method and device classified to sample to be assessed |
CN109508558A (en) * | 2018-10-31 | 2019-03-22 | 阿里巴巴集团控股有限公司 | A kind of verification method and device of data validity |
US20190213447A1 (en) * | 2017-02-08 | 2019-07-11 | Nanjing University Of Aeronautics And Astronautics | Sample selection method and apparatus and server |
CN110135492A (en) * | 2019-05-13 | 2019-08-16 | 山东大学 | Equipment fault diagnosis and method for detecting abnormality and system based on more Gauss models |
CN111324637A (en) * | 2020-02-05 | 2020-06-23 | 北京工业大数据创新中心有限公司 | Fault symptom searching method and system for industrial time sequence data |
CN111340144A (en) * | 2020-05-15 | 2020-06-26 | 支付宝(杭州)信息技术有限公司 | Risk sample detection method and device, electronic equipment and storage medium |
CN111897695A (en) * | 2020-07-31 | 2020-11-06 | 平安科技(深圳)有限公司 | Method and device for acquiring KPI abnormal data sample and computer equipment |
CN111931872A (en) * | 2020-09-27 | 2020-11-13 | 北京工业大数据创新中心有限公司 | Method and device for determining abnormity of trend symptom |
CN112200273A (en) * | 2020-12-07 | 2021-01-08 | 长沙海信智能系统研究院有限公司 | Data annotation method, device, equipment and computer storage medium |
CN112257423A (en) * | 2020-10-21 | 2021-01-22 | 北京工业大数据创新中心有限公司 | Equipment symptom information acquisition method and device and equipment operation and maintenance system |
CN112270379A (en) * | 2020-11-13 | 2021-01-26 | 北京百度网讯科技有限公司 | Training method of classification model, sample classification method, device and equipment |
CN112381185A (en) * | 2021-01-15 | 2021-02-19 | 北京工业大数据创新中心有限公司 | Industrial equipment characteristic curve similarity obtaining method and device |
-
2021
- 2021-02-21 CN CN202110194751.1A patent/CN112559602B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101135601A (en) * | 2007-10-18 | 2008-03-05 | 北京英华达电力电子工程科技有限公司 | Rotating machinery vibrating failure diagnosis device and method |
US20130338965A1 (en) * | 2012-06-14 | 2013-12-19 | International Business Machines Corporation | Anomaly Detection Method, Program, and System |
CN104572985A (en) * | 2015-01-04 | 2015-04-29 | 大连理工大学 | Industrial data sample screening method based on complex network community discovery |
US20190213447A1 (en) * | 2017-02-08 | 2019-07-11 | Nanjing University Of Aeronautics And Astronautics | Sample selection method and apparatus and server |
CN107194430A (en) * | 2017-05-27 | 2017-09-22 | 北京三快在线科技有限公司 | A kind of screening sample method and device, electronic equipment |
CN108197638A (en) * | 2017-12-12 | 2018-06-22 | 阿里巴巴集团控股有限公司 | The method and device classified to sample to be assessed |
CN109508558A (en) * | 2018-10-31 | 2019-03-22 | 阿里巴巴集团控股有限公司 | A kind of verification method and device of data validity |
CN110135492A (en) * | 2019-05-13 | 2019-08-16 | 山东大学 | Equipment fault diagnosis and method for detecting abnormality and system based on more Gauss models |
CN111324637A (en) * | 2020-02-05 | 2020-06-23 | 北京工业大数据创新中心有限公司 | Fault symptom searching method and system for industrial time sequence data |
CN111340144A (en) * | 2020-05-15 | 2020-06-26 | 支付宝(杭州)信息技术有限公司 | Risk sample detection method and device, electronic equipment and storage medium |
CN111897695A (en) * | 2020-07-31 | 2020-11-06 | 平安科技(深圳)有限公司 | Method and device for acquiring KPI abnormal data sample and computer equipment |
CN111931872A (en) * | 2020-09-27 | 2020-11-13 | 北京工业大数据创新中心有限公司 | Method and device for determining abnormity of trend symptom |
CN112257423A (en) * | 2020-10-21 | 2021-01-22 | 北京工业大数据创新中心有限公司 | Equipment symptom information acquisition method and device and equipment operation and maintenance system |
CN112270379A (en) * | 2020-11-13 | 2021-01-26 | 北京百度网讯科技有限公司 | Training method of classification model, sample classification method, device and equipment |
CN112200273A (en) * | 2020-12-07 | 2021-01-08 | 长沙海信智能系统研究院有限公司 | Data annotation method, device, equipment and computer storage medium |
CN112381185A (en) * | 2021-01-15 | 2021-02-19 | 北京工业大数据创新中心有限公司 | Industrial equipment characteristic curve similarity obtaining method and device |
Non-Patent Citations (1)
Title |
---|
常天庆 等: "测试算法的征兆误判和混淆问题及解决方法", 《计算机工程与应用》 * |
Also Published As
Publication number | Publication date |
---|---|
CN112559602B (en) | 2021-07-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109387712B (en) | Non-invasive load detection and decomposition method based on state matrix decision tree | |
Schoknecht et al. | Similarity of business process models—a state-of-the-art analysis | |
US10033694B2 (en) | Method and device for recognizing an IP address of a specified category, a defense method and system | |
Brockhoff et al. | Time-aware concept drift detection using the earth mover’s distance | |
Abd-El-Hafiz | A metrics-based data mining approach for software clone detection | |
US10311067B2 (en) | Device and method for classifying and searching data | |
Wang et al. | Decision tree based control chart pattern recognition | |
Taufiq | Classification method of multi-class on C4. 5 algorithm for fish diseases | |
Banda et al. | An experimental evaluation of popular image parameters for monochromatic solar image categorization | |
CN115456107A (en) | Time series abnormity detection system and method | |
Wu et al. | Multiscale jump testing and estimation under complex temporal dynamics | |
CN117170915A (en) | Data center equipment fault prediction method and device and computer equipment | |
CN111401420A (en) | Abnormal data clustering method and device for wafer test, electronic equipment and medium | |
CN112559602B (en) | Method and system for determining target sample of industrial equipment symptom | |
CN110688846A (en) | Periodic word mining method, system, electronic equipment and readable storage medium | |
CN112632000A (en) | Log file clustering method and device, electronic equipment and readable storage medium | |
CN117290404A (en) | Method and system for rapidly searching and practical main distribution network fault processing method | |
CN112734072A (en) | Power load prediction method, system, terminal device and medium | |
Kumar et al. | Preprocessing and symbolic representation of stock data | |
König et al. | Towards algorithm-agnostic uncertainty estimation: Predicting classification error in an automated machine learning setting | |
García et al. | Benchmarking research performance at the university level with information theoretic measures | |
Lines | Time Series classification through transformation and ensembles | |
Vadim et al. | Temporal decision trees in diagnostics systems | |
CN111538669A (en) | Test case extraction method and device based on historical problem backtracking analysis | |
CN114580982B (en) | Method, device and equipment for evaluating data quality of industrial equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |