CN111324637B

CN111324637B - Fault symptom searching method and system for industrial time sequence data

Info

Publication number: CN111324637B
Application number: CN202010080597.0A
Authority: CN
Inventors: 李闯; 田春华; 刘家扬; 王吉东; 徐地; 曾庆勇; 胡志勇
Original assignee: Beijing Innovation Center For Industrial Big Data Co ltd
Current assignee: Beijing Innovation Center For Industrial Big Data Co ltd
Priority date: 2020-02-05
Filing date: 2020-02-05
Publication date: 2021-01-15
Anticipated expiration: 2040-02-05
Also published as: CN111324637A

Abstract

The invention provides a fault symptom searching method and system for industrial time sequence data. The method comprises the following steps: acquiring symptom description of industrial time sequence data; carrying out normalization processing on the symptom description to obtain a processing result; determining the type of the fault symptom according to the processing result; acquiring the similarity between a fault symptom of a certain type and a fault symptom in a symptom library; and outputting the fault symptoms of which the similarity is greater than a preset value. The scheme of the invention can more accurately and efficiently determine the fault symptoms of the industrial time sequence data.

Description

Fault symptom searching method and system for industrial time sequence data

Technical Field

The invention relates to the technical field of processing of industrial time sequence data, in particular to a fault symptom searching method and system of industrial time sequence data.

Background

In the field of large industry, common fault signs in health detection of equipment include: the blade failure modes of the fan comprise a rotation speed 2Hz main energy, an acceleration 1p/3p abnormity, an acceleration amplitude abnormity and a pitch angle following abnormity; for another example, the variable pitch rate abnormal pattern includes: spike, asynchrony, high frequency oscillation, low frequency oscillation, small tooth, single cycle jump, no variable pitch but acceleration overrun.

The fault symptoms are basic information for process monitoring and fault diagnosis, and whether equipment faults occur, the fault type positions and even the severity degree can be determined from the known symptoms.

As shown in fig. 1 to 6, some examples of the symptoms are shown, such as the curve shown in fig. 1, the peak shown in fig. 2, the peak shown in fig. 3, the graph shown in fig. 4, the curve shown in fig. 5, and the point cluster distribution shown in fig. 6.

As shown in FIG. 7, most familiar with the symptoms of the fault are the operation and maintenance personnel and field experts of the relevant equipment in the industrial field, and the symptom search system can help them quickly find the relevant symptom operator candidate set suitable for the concerned business problem in the existing symptom library, and establish a fault detection/diagnosis model after manually selecting symptom confirmation in the set.

In the prior art, the picture retrieval system mainly depends on the texture and the hierarchical structure of the picture, and in the rule, the symptoms are represented by curves, closed polygons and point clusters, which have no clear hierarchical result, and the picture search cannot process the retrieval of the symptoms.

The target shape retrieval is mainly carried out on the basis of the identification of the outer contour of a two-dimensional closed area, while many of the rules are single-dimensional curves, and in addition, the rules mainly look at the curve form instead of strict matching.

Disclosure of Invention

The invention provides a fault symptom searching method and system for industrial time sequence data, which can more accurately and efficiently determine the fault symptom of the industrial time sequence data.

In order to solve the above technical problem, an embodiment of the present invention provides a method for searching fault symptoms of industrial time series data, including:

acquiring symptom description of industrial time sequence data;

carrying out normalization processing on the symptom description to obtain a processing result;

determining the type of the fault symptom according to the processing result;

acquiring the similarity between a fault symptom of a certain type and a fault symptom in a symptom library;

and outputting the fault symptoms of which the similarity is greater than a preset value.

Optionally, obtaining a symptom description of the industrial time series data includes:

and acquiring a symptom description of the industrial time series data in a time series form, a distribution form or a shape form.

Optionally, the normalization processing is performed on the symptom description to obtain a processing result, and the processing result includes:

according to a preset strategy, carrying out normalization processing on the symptom descriptions in a time sequence form, a distribution form or a shape form to obtain a processing result; the preset strategy comprises the following steps: at least one of a score normalization method, a Min-Max strategy, and a non-linear variation.

Optionally, the normalization processing is performed on the symptom description in the time sequence form, and includes:

and performing amplitude normalization processing and time normalization processing on the input sequence corresponding to the symptom description in the time sequence form.

Optionally, the normalization processing is performed on the symptom description in the form of a shape, and includes:

and performing scale normalization processing and rotation normalization processing on the symptom description in the form of the shape.

Optionally, determining the type of the fault symptom according to the processing result includes:

arranging a time sequence curve or a graph corresponding to the symptom description in a coordinate system, and sequencing the time sequence curve or the graph boundary curve on an x axis in the coordinate system from large to small into 1 to N points;

acquiring all points in a local window of an x axis by taking a point i as a center according to the sequence of i to N, wherein the point i is 1 to N;

calculating the difference between the points and the point i on the y axis;

according to the difference, carrying out numerical clustering on the symptom description;

if the number of symptom descriptions is 1, then L is L + 1;

if the number of symptom descriptions is 2 to 3, then S ═ S + 1;

if the number of symptom descriptions is greater than 3, P is P + 1;

if P is the maximum, determining the fault symptom as a point cluster;

if S is the maximum, determining the fault symptom as a polygon;

and if the L is maximum, determining that the fault symptom is in a curve shape.

Optionally, obtaining a similarity between a certain type of fault symptom and a fault symptom in the symptom bank includes:

if the fault symptom is in a curve type, arranging a curve corresponding to the fault symptom and a curve corresponding to the fault symptom in a symptom library from small to large according to an X axis to obtain two arrays;

and aligning the lengths of the two arrays, and obtaining the similarity between the fault symptom and the fault symptom in the symptom library by adopting an Euclidean distance evaluation method or a Dynamic Time Warping (DTW) distance evaluation method or a method for calculating the optimal phase difference or rotation degree.

if the fault symptom is a polygon, the polygon corresponding to the fault symptom and the polygon corresponding to the fault symptom in the symptom library are expanded into an array according to the clockwise direction or the anticlockwise direction by taking the mass center as a reference point;

if the fault symptom is a point cluster, in a symptom library, a Gaussian Mixture Model (GMM) model is made for the condition meeting the point cluster;

for the input fault symptom, calculating the likelihood of each GMM model on the input data;

and according to the likelihood, obtaining the similarity between the fault symptom and the fault symptom in the symptom bank.

The embodiment of the invention also provides a fault symptom searching system of industrial time series data, which comprises:

the input module is used for acquiring symptom description of industrial time sequence data;

the processing module is used for carrying out normalization processing on the symptom description to obtain a processing result;

the determining module is used for determining the type of the fault symptom according to the processing result;

the acquisition module is used for acquiring the similarity between the fault symptom of a certain type and the fault symptom in the symptom library;

and the output module is used for outputting the fault symptoms of which the similarity is greater than a preset value.

Embodiments of the present invention also provide a processor-readable storage medium having stored thereon processor-executable instructions for causing a processor to perform the method as described above.

The technical scheme of the invention has the beneficial effects that:

according to the embodiment of the invention, the symptom description of the industrial time sequence data is acquired; carrying out normalization processing on the symptom description to obtain a processing result; determining the type of the fault symptom according to the processing result; acquiring the similarity between a fault symptom of a certain type and a fault symptom in a symptom library; and outputting the fault symptoms of which the similarity is greater than a preset value. The fault symptoms of the industrial time series data can be determined more accurately and efficiently.

Drawings

Fig. 1 to 6 show examples of symptoms;

FIG. 7 illustrates a prior art fault symptom search flow;

FIG. 8 is a flow chart illustrating a method for searching for signs of failure of industrial time series data according to an embodiment of the present invention;

FIG. 9 is a block diagram of a fault symptom search system according to an embodiment of the present invention;

FIG. 10 is a schematic diagram illustrating scale normalization and rotation normalization of an embodiment of the present invention;

FIG. 11 is a flow chart illustrating an implementation of an embodiment of the present invention for determining the type of symptom of the fault;

FIGS. 12-14 are diagrams illustrating numerical clustering of symptom descriptions according to embodiments of the present invention;

FIG. 15 is a graphical illustration of a time series form symptom description for a corresponding preprocessing in an embodiment of the present invention;

FIG. 16 is a schematic diagram of the graph of FIG. 15 processed by a distance algorithm to obtain an accurate graph.

Fig. 17 is a block diagram showing a fault symptom search system for industrial time series data according to an embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.

As shown in fig. 8, an embodiment of the present invention provides a method for searching fault symptoms of industrial time series data, including:

step 81, acquiring symptom description of industrial time sequence data;

step 82, carrying out normalization processing on the symptom description to obtain a processing result;

step 83, determining the type of the fault symptom according to the processing result;

step 84, obtaining the similarity between the fault symptom of a certain type and the fault symptom in the symptom bank;

and step 85, outputting the fault symptoms with the similarity larger than a preset value.

In the above embodiments of the present invention, the symptom description may be a timing description, a distribution description, and/or a shape description, and the symptom description is input into the symptom search engine in a curve or a shape. As shown in fig. 9, these inputted symptom descriptions are compared with the time sequence, distribution and/or shape in the symptom operator library, respectively, and the fault symptom with a larger similarity is outputted, and optionally, 5 fault symptoms with a similarity larger than a preset value are outputted.

In an alternative embodiment of the present invention, the step 81 may include: a symptom description in the form of a time series, distribution, or shape of the industrial time series data is obtained.

In an alternative embodiment of the present invention, the step 82 may include:

according to a preset strategy, carrying out normalization processing on the symptom descriptions in a time sequence form, a distribution form or a shape form to obtain a processing result; the preset strategy comprises the following steps: a score normalization method, a Min-Max strategy, and a non-linear variation (such as Log or arctan). The time-series symptom description herein may correspond to a curve, such as the curves shown in fig. 1, 2, 3, and 5, for example; the symptom description in the form of distribution may correspond to a cluster of points as shown in fig. 6, for example; the symptom description in the form of a shape may correspond to a polygon as shown in fig. 4, for example.

In an optional embodiment of the present invention, the normalizing the symptom description in the time series form includes:

and carrying out amplitude normalization and time normalization processing on the input sequence corresponding to the symptom description in the time sequence form. Specifically, the amplitude value normalization is performed on the input sequence of the curve corresponding to the symptom description in the time series form, and the time normalization processing is performed according to an interpolation algorithm (such as spline interpolation and loess interpolation).

Here, the amplitude normalization method may select a score normalization method, a Min-Max strategy and a nonlinear variation method, and let y be the Min-Max strategy as an example_iIs the amplitude of the ith point, then the normalized amplitude is y'_iComprises the following steps:

y′_i＝(y_i-y_min)/(y_max-y_min)

wherein, y_maxFor input sequence amplitude minimum, y_maxIs the input sequence amplitude maximum.

The time normalization method aims to make the length of an input sequence equal to that of a sequence in a symptom library, and specifically, the input sequence x is linearly transformed and aligned with the sequence x in the symptom library (namely, two sequences x are aligned)_minAnd x_maxEqual), and then calculates the input sequence x using an interpolation algorithm_minAnd x_maxThe amplitude values corresponding to the symptom bank sequence x in between.

In an optional embodiment of the present invention, the normalization process is performed on the symptom description in the form of a shape, and includes:

As shown in fig. 10, the scaling normalization here is to scale the longest edge d of the input polygon uniformly to a constant L, where L is determined by the length of the longest edge of the polygon in the symptom library.

The rotation normalization is to perform rotation transformation on the coordinate system of the input polygon, and select an angle alpha of the input coordinate system, wherein the angle alpha is determined by rotating the longest edge of the input polygon to be parallel to the x axis of the coordinate system.

In an optional embodiment of the present invention, the symptom description in the distribution form is normalized, and the distribution normalization method is the same as the time sequence amplitude normalization, including: score normalization method, Min-Max strategy, Z-score normalization and non-linear variance.

As shown in fig. 11, in an alternative embodiment of the present invention, the step 83 may include:

101, arranging a time sequence curve or a graph corresponding to symptom description in a coordinate system, and sequencing the time sequence curve or the graph boundary curve on an x axis in the coordinate system from large to small into 1 to N points;

step 102, acquiring all points in a local window of an x axis by taking a point i as a center according to the sequence from 1 to N;

step 103, calculating the difference between the points and the point i on the y axis; here, i is a point in the window which is located at the center, and the difference value between the amplitude value y of i and the amplitude value y of other points is the difference;

104, carrying out numerical clustering on the symptom description according to the difference; the clustering algorithm here can adopt a fuzzy C-means clustering algorithm, the clustering result is as shown in fig. 12 to 14, for the curve type symptom in fig. 12, only 1 class can be clustered in one local window of the point i, for the multi-edge type symptom in fig. 13, 2 to 3 classes are clustered in the local window, for the point cluster type symptom in fig. 14, more than 3 classes are clustered in the local window, and the types of the input symptoms can be distinguished based on the difference of the number of clusters;

step 105, if the number of the classes is 1, then L is L + 1;

if the number of classes is 2 to 3, S ═ S + 1;

if the number of classes is greater than 3, then P + 1;

step 106, if P is the maximum, determining the fault symptom as a point cluster; if S is the maximum, determining the fault symptom as a polygon; and if the L is maximum, determining that the fault symptom is in a curve shape.

In an alternative embodiment of the present invention, the step 84 may include:

step 841, if the fault symptom is a curve type, arranging the curve corresponding to the fault symptom and the curve corresponding to the fault symptom in the symptom library from small to large according to an X axis to obtain two arrays; as shown in fig. 15, which is a graph of the preprocessing, the connecting line between the

arrays

1 and 2 in the graph represents the distance between the corresponding points;

step 842, aligning the lengths of the two arrays, and obtaining the similarity between the fault symptom and the fault symptom in the symptom library by using an Euclidean distance evaluation method or a Dynamic Time Warping (DTW) distance evaluation method or a method for calculating the optimal phase difference or rotation degree. As shown in fig. 16, the accurate curve obtained by the distance evaluation method was used. The smaller the distance between the two arrays (i.e., the average of the pairs of points) the more similar.

In an alternative embodiment of the present invention, the step 84 may include:

step 843, if the fault symptom is a polygon, the polygon corresponding to the fault symptom and the polygon corresponding to the fault symptom in the symptom library are expanded into arrays according to the clockwise or counterclockwise direction by taking the mass center as a reference point; here, the polygon is a closed curve in a two-dimensional coordinate system, and is developed into an array from a point closest to the origin of coordinates in the polygon in a clockwise direction;

step 844, aligning the lengths of the two arrays, and obtaining the similarity between the fault symptom and the fault symptom in the symptom library by adopting an Euclidean distance evaluation method or a Dynamic Time Warping (DTW) distance evaluation method or a method for calculating the optimal phase difference or rotation degree.

In an alternative embodiment of the present invention, the step 84 may include:

step 845, if the fault symptom is a point cluster, in a symptom library, for the condition meeting the point cluster, a Gaussian mixture model GMM is made;

step 846, calculating the likelihood of each GMM model on input data according to the input fault symptom; here, each point cluster feature in the feature library needs to establish a corresponding GMM model, an input point cluster with n points is substituted into the GMM model to obtain n likelihood values, and then the average value is taken to obtain the similarity between the input point cluster and the GMM model, wherein the greater the likelihood value is, the more similar the input point cluster is.

Step 847, according to the likelihood, obtaining the similarity between the fault symptom and the fault symptom in the symptom bank.

In an alternative embodiment of the present invention, the step 85 may include:

distribution search for fault symptoms: outputting a distribution sign and a distribution parameter with high similarity to the input sequence distribution by a data distribution inspection method;

shape description search for fault symptoms: comparing the similarity of a polygon drawn by a user at will and imported with text data with a polygon fault sign in a fault sign library, and outputting a fault sign with high similarity;

for the time sequence description search of the fault symptom, 5 symptoms with the highest similarity with the input sequence are returned through a time sequence similarity calculation method.

As shown in fig. 17, an embodiment of the present invention further provides a system 170 for searching fault symptoms of industrial time series data, including:

an input module 171 for obtaining a symptom description of the industrial time series data;

the processing module 172 is configured to perform normalization processing on the symptom description to obtain a processing result;

a determining module 173, configured to determine the type of the fault symptom according to the processing result;

an obtaining module 174, configured to obtain a similarity between a certain type of fault symptom and a fault symptom in the symptom bank;

and an output module 175, configured to output a fault sign that the similarity is greater than a preset value.

Optionally, the input module 171 is specifically configured to obtain a symptom description in a time-series form, a distribution form, or a shape form of the industrial time-series data.

Optionally, the processing module 172 is specifically configured to perform normalization processing on the symptom descriptions in the time sequence form, the distribution form, or the shape form according to a preset policy to obtain a processing result; the preset strategy comprises the following steps: at least one of a score normalization method, a Min-Max strategy, and a non-linear variation.

Optionally, the determining module 173 is specifically configured to place the timing curve or graph corresponding to the symptom description in a coordinate system, and sort the boundary curve of the timing curve or graph on the x-axis in the coordinate system from large to small into 1 to N points;

calculating the difference between the points and the point i on the y axis;

if the number of symptom descriptions is 1, then L is L + 1;

if the number of symptom descriptions is 2 to 3, then S ═ S + 1;

if the number of symptom descriptions is greater than 3, P is P + 1;

if P is the maximum, determining the fault symptom as a point cluster;

if S is the maximum, determining the fault symptom as a polygon;

Optionally, the obtaining module 174 is specifically configured to, if the fault symptom is a curve type, arrange a curve corresponding to the fault symptom and a curve corresponding to the fault symptom in the symptom library from small to large according to an X axis to obtain two arrays;

Optionally, the obtaining module 174 is specifically configured to, if the fault symptom is a polygon, expand the polygon corresponding to the fault symptom and the polygon corresponding to the fault symptom in the symptom library into an array according to a clockwise or counterclockwise direction with the centroid as a reference point;

Optionally, the obtaining module 174 is specifically configured to, if the fault symptom is a point cluster, perform a gaussian mixture model GMM model on the condition that the fault symptom meets the point cluster in a symptom library;

In a specific implementation manner, the system may be as shown in fig. 2, and the fault symptom searching system includes: the symptom description module is used for inputting a query in a text, drawing or data form into the search system by a user; the system searches relevant signs in a sign calculation word bank by using the description input by the user and outputs a query result; and outputting n operators with highest similarity to the input by the system (n is 5 by default).

It should be noted that the system is a system corresponding to the above method, and all implementation schemes in the above method embodiments are applicable to the embodiment of the system, and the same technical effect can be achieved.

Furthermore, it is to be noted that in the device and method of the invention, it is obvious that the individual components or steps can be decomposed and/or recombined. These decompositions and/or recombinations are to be regarded as equivalents of the present invention. Also, the steps of performing the series of processes described above may naturally be performed chronologically in the order described, but need not necessarily be performed chronologically, and some steps may be performed in parallel or independently of each other. It will be understood by those skilled in the art that all or any of the steps or elements of the method and apparatus of the present invention may be implemented in any computing device (including processors, storage media, etc.) or network of computing devices, in hardware, firmware, software, or any combination thereof, which can be implemented by those skilled in the art using their basic programming skills after reading the description of the present invention.

Thus, the objects of the invention may also be achieved by running a program or a set of programs on any computing device. The computing device may be a general purpose device as is well known. The object of the invention is thus also achieved solely by providing a program product comprising program code for implementing the method or the apparatus. That is, such a program product also constitutes the present invention, and a storage medium storing such a program product also constitutes the present invention. It is to be understood that the storage medium may be any known storage medium or any storage medium developed in the future. It is further noted that in the apparatus and method of the present invention, it is apparent that each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be regarded as equivalents of the present invention. Also, the steps of executing the series of processes described above may naturally be executed chronologically in the order described, but need not necessarily be executed chronologically. Some steps may be performed in parallel or independently of each other.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A fault symptom searching method of industrial time series data is characterized by comprising the following steps:

acquiring symptom description of a time sequence form, a distribution form or a shape form of industrial time sequence data;

normalizing the symptom description in the time sequence form, the distribution form or the shape form to obtain a processing result;

calculating the difference between the points and the point i on the y axis;

if the number of symptom descriptions is 1, then L is L + 1;

if the number of symptom descriptions is 2 to 3, then S ═ S + 1;

if the number of symptom descriptions is greater than 3, P is P + 1;

if P is the maximum, determining the fault symptom as a point cluster;

if S is the maximum, determining the fault symptom as a polygon;

if L is the maximum, determining that the fault symptom is a curve;

acquiring similarity between a fault symptom of a determined type and a fault symptom in a symptom library, wherein the fault symptom of the determined type is one of a pointing cluster, a polygon and a curve;

2. The method for searching the fault symptom of the industrial time series data according to claim 1, wherein the normalization processing is performed on the symptom description to obtain a processing result, and the method comprises the following steps:

3. The method for searching the industrial time series data for the symptom of the fault according to claim 2, wherein the normalization processing of the symptom description in the time series form comprises:

4. The method for searching the industrial time series data for the symptom of the fault according to claim 2, wherein the normalization processing of the symptom description in the form of a shape includes:

5. The method of claim 1, wherein obtaining a similarity between a certain type of symptom and a symptom in the symptom bank comprises:

6. The method of claim 1, wherein obtaining a similarity between a certain type of symptom and a symptom in the symptom bank comprises:

7. The method of claim 1, wherein obtaining a similarity between a certain type of symptom and a symptom in the symptom bank comprises:

8. The method for searching the fault symptom of the industrial time series data according to claim 1, wherein outputting the fault symptom having the similarity degree greater than a preset value comprises:

9. A system for fault symptom search of industrial time series data, comprising:

the input module is used for acquiring symptom description of a time sequence form, a distribution form or a shape form of industrial time sequence data;

the processing module is used for carrying out normalization processing on the symptom description in the time sequence form, the distribution form or the shape form to obtain a processing result;

the determining module is used for placing a time sequence curve or a graph corresponding to the symptom description in a coordinate system, and sequencing the time sequence curve or the boundary curve of the graph on the x axis in the coordinate system from large to small into 1 to N points; acquiring all points in a local window of an x axis by taking a point i as a center according to the sequence of i to N, wherein the point i is 1 to N; calculating the difference between the points and the point i on the y axis; according to the difference, carrying out numerical clustering on the symptom description; if the number of symptom descriptions is 1, then L is L + 1; if the number of symptom descriptions is 2 to 3, then S ═ S + 1; if the number of symptom descriptions is greater than 3, P is P + 1; if P is the maximum, determining the fault symptom as a point cluster; if S is the maximum, determining the fault symptom as a polygon; if L is the maximum, determining that the fault symptom is a curve;

the acquisition module is used for acquiring the similarity between the fault symptom of a certain type and the fault symptom in the symptom library; the determined type of fault symptom is one of a pointing cluster, a polygon, and a curve;

10. A processor-readable storage medium having stored thereon processor-executable instructions for causing a processor to perform the method of any one of claims 1 to 8.