CN116361351A - Data mining method for health management of industrial equipment - Google Patents

Data mining method for health management of industrial equipment Download PDF

Info

Publication number
CN116361351A
CN116361351A CN202211531923.0A CN202211531923A CN116361351A CN 116361351 A CN116361351 A CN 116361351A CN 202211531923 A CN202211531923 A CN 202211531923A CN 116361351 A CN116361351 A CN 116361351A
Authority
CN
China
Prior art keywords
vector
monitoring
data
category
vectors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211531923.0A
Other languages
Chinese (zh)
Other versions
CN116361351B (en
Inventor
杨小强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Creation Vocational College
Original Assignee
Chongqing Creation Vocational College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Creation Vocational College filed Critical Chongqing Creation Vocational College
Priority to CN202211531923.0A priority Critical patent/CN116361351B/en
Publication of CN116361351A publication Critical patent/CN116361351A/en
Application granted granted Critical
Publication of CN116361351B publication Critical patent/CN116361351B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Manufacturing & Machinery (AREA)
  • Economics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The present disclosure relates to a data mining method for industrial equipment health management, comprising: acquiring a monitoring data set of the industrial equipment, wherein the monitoring data set comprises a plurality of state monitoring data, and the plurality of state monitoring data comprises first monitoring data and second monitoring data; carrying out vectorization processing on each state monitoring data in the plurality of state monitoring data to obtain a state monitoring vector corresponding to each state monitoring data, wherein the state monitoring vector comprises a first monitoring vector and a second monitoring vector; dividing the state monitoring vectors with vector similarity meeting the first preset requirement into the same category aiming at each state monitoring vector to obtain a plurality of category vector clusters, wherein the category is determined as a target category under the condition that the quantity and the proportion of the first monitoring vectors in the vector cluster of any category meet the second preset requirement; and determining new fault data based on second monitoring data corresponding to a second monitoring vector in the vector cluster of the target class.

Description

Data mining method for health management of industrial equipment
Technical Field
The present disclosure relates generally to the field of industrial equipment monitoring technology, and more particularly, to a data mining method for industrial equipment health management.
Background
At present, emerging technologies such as digital intelligence, equipment internet of things and the like in the manufacturing industry are continuously emerging, and a conversion type intelligent factory becomes a necessary development direction of a production line. Industrial equipment in factories is often complicated and high in automation degree, production equipment, detection equipment, cutter equipment and the like are high-end precision equipment, machining and detection precision is micrometers, and mass data information is generated in the production process. In order to timely master the states of products and equipment, and timely and efficient intervention control, large data analysis is needed to be carried out on various monitoring data of industrial equipment, especially data generated when the industrial equipment fails, so that the health management analysis of the industrial equipment is realized.
However, in the massive historical monitoring data of the industrial equipment, the data of the industrial equipment in normal operation often occupies a great majority proportion, namely, the data generated when the industrial equipment breaks down only occupies a very small proportion of the whole, and the monitoring equipment also has a great number of situations of false alarm, missing report and the like, so that the fault data of the industrial equipment does not have a good process of storage and original accumulation, and neither has a mature condition for the health management analysis of the industrial equipment in terms of basic data quantity nor data reliability.
In this case, when the health management analysis is performed on the industrial equipment in the factory, if the failure data amount reaches the basic requirement for the health management analysis by adopting the modes of open source data, culturing advanced data expert or accumulating data in a short period, etc., the problem of low coupling between the acquired data and the actual service is generally introduced, and the investment of cost is greatly increased. Therefore, when the amount of fault data for industrial equipment within a plant is small, it is necessary to focus on the carding and mining of the fault data in an economical and efficient manner.
Disclosure of Invention
The data mining method for the industrial equipment health management can well utilize the determined fault data in the industrial equipment historical fault database to mine new fault data, improves economy and efficiency of mining and carding the fault data, and provides reliable data support for subsequent equipment health management work.
In one general aspect, there is provided a data mining method for industrial equipment health management, comprising: acquiring a monitoring data set of industrial equipment, wherein the monitoring data set comprises a plurality of state monitoring data, the plurality of state monitoring data comprise first monitoring data and second monitoring data, the first monitoring data are fault data when the determined industrial equipment breaks down, and the second monitoring data are data to be determined; carrying out vectorization processing on each state monitoring data in the plurality of state monitoring data to obtain a state monitoring vector corresponding to each state monitoring data, wherein the state monitoring vector comprises a first monitoring vector and a second monitoring vector, the first monitoring data corresponds to the first monitoring vector, and the second monitoring data corresponds to the second monitoring vector; dividing the state monitoring vectors with vector similarity meeting the first preset requirement into the same category aiming at each state monitoring vector to obtain a plurality of category vector clusters, wherein the category is determined as a target category under the condition that the quantity and the proportion of the first monitoring vectors in the vector cluster of any category meet the second preset requirement; and determining new fault data based on second monitoring data corresponding to a second monitoring vector in the vector cluster of the target class.
Optionally, for each state monitoring vector, the classifying the state monitoring vectors with the vector similarity meeting the first preset requirement into the same category includes: for any one state monitoring vector, calculating the average vector similarity of the state monitoring vector relative to the current vector cluster of each category; when the maximum value in the average vector similarity of the state monitoring vector relative to the vector cluster of each current class is larger than a first threshold value, dividing the state monitoring vector into the class corresponding to the maximum value; when the maximum value is less than or equal to the first threshold value, a category is newly created and the state monitoring vector is divided into the newly created categories.
Optionally, for any one state monitoring vector, calculating the average vector similarity of the state monitoring vector relative to the current vector cluster of each category respectively includes: for a vector cluster of any current category, respectively calculating the vector similarity between the state monitoring vector and all seed vectors in the vector cluster of the category, wherein the seed vectors comprise a first monitoring vector and/or a first state monitoring vector divided into each category; taking the average value of the vector similarity of the state monitoring vector and all seed vectors in the vector cluster of the category as the average vector similarity of the state monitoring vector relative to the vector cluster of the category.
Optionally, in the case that the number and the proportion of the first monitoring vectors in the vector cluster of any category meet the second preset requirement, determining the category as the target category includes: when the number of first monitor vectors in the vector cluster of any category is greater than a second threshold and the ratio is greater than a third threshold, the category is determined to be the target category.
Optionally, the plurality of status monitoring data further includes third monitoring data, the third monitoring data being determined normal data when the industrial equipment is operating normally, wherein the status monitoring vector includes a third monitoring vector, and the third monitoring data corresponds to the third monitoring vector.
Optionally, the determining new fault data based on the second monitoring data corresponding to the second monitoring vector in the vector cluster of the target class includes: obtaining a first vector set based on each first monitoring vector and each third monitoring vector, and obtaining a second vector set based on the second monitoring vector in the vector cluster of each target class; training a classification model by using the first vector set, and predicting each second monitoring vector in the second vector set by using the trained classification model to obtain a prediction result of each second monitoring vector in the second vector set; and determining a fourth monitoring vector from the second vector set based on a prediction result of each second monitoring vector in the second vector set, and determining second monitoring data corresponding to the fourth monitoring vector as new fault data.
Optionally, the first vector set further includes label information corresponding to each first monitoring vector and each third monitoring vector, the label information is used for indicating that the corresponding first monitoring data belongs to fault data or the corresponding third monitoring data belongs to normal data, and the classification model is used for predicting a prediction probability that the second monitoring data corresponding to each second monitoring vector in the second vector set belongs to fault data.
Optionally, the determining a fourth monitoring vector from the second vector set includes: and selecting a second monitoring vector with the prediction probability larger than a fourth threshold value from the second vector set, and determining the second monitoring vector with the prediction probability larger than the fourth threshold value as the fourth monitoring vector.
Optionally, the training the classification model by using the first vector set, and predicting each second monitoring vector in the second vector set by using the trained classification model to obtain a prediction result of each second monitoring vector in the second vector set, including: cross training the two classification models by using the first vector set to respectively obtain a plurality of trained two classification models; respectively predicting each second monitoring vector in the second vector set by using the plurality of trained two-classification models to obtain a plurality of original prediction results of each second monitoring vector in the second vector set; and obtaining a prediction result of each second monitoring vector in the second vector set based on a plurality of original prediction results of each second monitoring vector in the second vector set, wherein an average value of the plurality of original prediction results of the second monitoring vector is taken as the prediction result of the second monitoring vector for any one second monitoring vector in the second vector set.
Optionally, the cross training the classification model using the first set of vectors includes: randomly dividing the first set of vectors into a first number of vector subsets; and training the classification model by utilizing a second number of vector subsets in the first number of vector subsets during each training, wherein the second number is smaller than the first number, and the second number of vector subsets used during any training are not identical to the second number of vector subsets used during other training.
According to the data mining method for the industrial equipment health management, the state monitoring data with certain data similarity are divided into the similar data, the part similar to the determined fault data in the to-be-processed data is screened out, the determined fault data in the industrial equipment historical fault database can be well utilized to mine new fault data, economy and efficiency of mining and combing the fault data are improved, reliable data support is provided for subsequent equipment health management work, the data for the industrial equipment health management analysis are more sufficient and perfect, accordingly more accurate analysis results can be obtained, the equipment management is changed from a preventive maintenance management means to more lean predictive maintenance, and post maintenance of emergency faults and excessive maintenance of preventive maintenance are reduced.
Additional aspects and/or advantages of the present general inventive concept will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the general inventive concept.
Drawings
The foregoing and other objects and features of embodiments of the present disclosure will become more apparent from the following description taken in conjunction with the accompanying drawings in which the embodiments are shown, in which:
FIG. 1 is a flow chart illustrating a data mining method for industrial equipment health management according to an embodiment of the present disclosure;
fig. 2 is a flowchart illustrating step S104 in fig. 1 according to an embodiment of the present disclosure.
Detailed Description
The following detailed description is provided to assist the reader in obtaining a thorough understanding of the methods, apparatus, and/or systems described herein. However, various changes, modifications, and equivalents of the methods, apparatuses, and/or systems described herein will be apparent after an understanding of the disclosure of the present application. For example, the order of operations described herein is merely an example and is not limited to those set forth herein, but may be altered as will be apparent after an understanding of the disclosure of the present application, except for operations that must occur in a particular order. Furthermore, descriptions of features known in the art may be omitted for clarity and conciseness.
The features described herein may be embodied in different forms and should not be construed as limited to the examples described herein. Rather, the examples described herein have been provided to illustrate only some of the many possible ways to implement the methods, devices, and/or systems described herein, which will be apparent after an understanding of the present disclosure.
Unless defined otherwise, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs after understanding this disclosure. Unless explicitly so defined herein, terms (such as those defined in a general dictionary) should be construed to have meanings consistent with their meanings in the context of the relevant art and the present disclosure, and should not be interpreted idealized or overly formal.
In addition, in the description of the examples, when it is considered that detailed descriptions of well-known related structures or functions will cause a ambiguous explanation of the present disclosure, such detailed descriptions will be omitted.
A data mining method for industrial equipment health management according to an embodiment of the present disclosure will be described in detail with reference to fig. 1 and 2.
Fig. 1 is a flowchart illustrating a data mining method for industrial device health management according to an embodiment of the present disclosure.
Referring to fig. 1, in step S101, a monitoring data set of an industrial device may be acquired, wherein the monitoring data set includes a plurality of status monitoring data, the plurality of status monitoring data includes first monitoring data and second monitoring data, the first monitoring data is failure data when the determined industrial device fails, and the second monitoring data is data to be determined. As an example, the industrial equipment may be a large industrial equipment such as a numerically controlled machine tool, an engine, a generator, or the like, or may be auxiliary equipment such as a heat sink, a servo motor, a self-priming pump, a deep well pump, or the like, but the disclosure is not limited thereto. Further, the monitoring data may be signal data collected by monitoring devices such as various meters, photoelectric switches, sensors, ampere meters, etc., such as, but not limited to, data including rotational speed signals, vibration signals, shaft load signals, shaft position signals, torque signals, pressure signals, temperature signals, current signals, etc., which may be determined by those skilled in the art according to industrial devices actually monitored in a factory, and the present disclosure is not limited thereto. Still further, any one of the monitoring status data may be a set of various signals of the industrial equipment collected by the monitoring equipment at the same time, where the set may include a rotational speed signal, a vibration signal, a shaft load signal, a shaft position signal, a torque signal, a pressure signal, a temperature signal, a current signal, and the like at the same time.
Next, in step S102, vectorization processing may be performed on each of the plurality of state monitoring data to obtain a state monitoring vector corresponding to each of the state monitoring data, where the state monitoring vector includes a first monitoring vector and a second monitoring vector, the first monitoring data corresponds to the first monitoring vector, and the second monitoring data corresponds to the second monitoring vector. Here, in one possible implementation, normalization processing may be performed on each signal in the state monitoring data, and each state monitoring data after normalization processing is processed into a vector form, so as to obtain a state monitoring vector corresponding to each state monitoring data; in another possible implementation, the pre-trained BERT model may be used to perform vectorization processing on each state monitoring data, so as to obtain a state monitoring vector corresponding to each state monitoring data, so that the obtained vectorized data has reliability, and further, the subsequent processing is more accurate and efficient.
Next, in step S103, for each state monitoring vector, the state monitoring vectors with vector similarity satisfying the first preset requirement may be divided into the same category to obtain a plurality of category vector clusters, where in the case that the number and proportion of the first monitoring vectors in the vector cluster of any category satisfy the second preset requirement, the category is determined as the target category. Here, the category may be determined as the target category when the number of first monitor vectors in the vector cluster of any category is greater than the second threshold and the ratio is greater than the third threshold. Further, the second threshold value and the third threshold value may be set by those skilled in the art according to actual circumstances, for example, the second threshold value may be 10 and the third threshold value may be 80%, but the present disclosure is not limited thereto.
According to the embodiment of the disclosure, for any one state monitoring vector, the average vector similarity of the state monitoring vector relative to the current vector cluster of each category can be calculated respectively; when the maximum value in the average vector similarity of the state monitoring vector relative to the vector cluster of each current class is larger than a first threshold value, dividing the state monitoring vector into the class corresponding to the maximum value; when the maximum value is less than or equal to the first threshold, a class is newly created and the state monitoring vector is divided into the newly created classes. Here, the first threshold value may be predetermined by a person skilled in the art or gradually determined by means of an iterative test, which is not limited by the present disclosure. Further, in calculating the vector similarity, the similarity between the two state monitoring vectors may be obtained by calculating a geometric distance, such as, but not limited to, a cosine distance, between the two state monitoring vectors, where the geometric distance is inversely proportional to the vector similarity; alternatively, the similarity between the state monitoring vector of the class to be classified and the state monitoring vector of the classified class may be obtained by calculating the distance of the state monitoring vector of the class to be classified from the hyperplane in which the state monitoring vector of each classified class is located, where the distance is inversely proportional to the vector similarity.
According to an embodiment of the disclosure, for a current vector cluster of any one category, vector similarity between the state monitoring vector and all seed vectors in the vector cluster of the category may be calculated, where the seed vectors include a first monitoring vector and/or a first state monitoring vector divided into each category; then, the average value of the vector similarity of the state monitoring vector and all the seed vectors in the vector cluster of the category can be used as the average vector similarity of the state monitoring vector relative to the vector cluster of the category. By taking the first monitoring vector corresponding to the determined fault data as a seed vector and then calculating the average vector similarity on the basis, the determined fault data in the historical fault database can be further utilized in the classification process, so that the classification result meets the requirements.
Next, in step S104, new fault data may be determined based on the second monitoring data corresponding to the second monitoring vector in the vector cluster of the target class. Step S104 in fig. 1 according to an embodiment of the present disclosure is described below in conjunction with fig. 2. Here, the plurality of status monitoring data as described above may further include third monitoring data, which is normal data when the determined industrial equipment is operating normally, and accordingly, the status monitoring vector may include a third monitoring vector, the third monitoring data corresponding to the third monitoring vector.
Fig. 2 is a flowchart illustrating step S104 in fig. 1 according to an embodiment of the present disclosure.
Referring to fig. 2, in step S201, a first vector set may be obtained based on each of the first monitor vector and the third monitor vector, and a second vector set may be obtained based on a second monitor vector in the vector cluster of each target class.
Next, in step S202, the first vector set may be used to train the classification model, and each second monitoring vector in the second vector set is predicted by using the trained classification model, so as to obtain a prediction result of each second monitoring vector in the second vector set. Here, the classification model may include, but is not limited to, at least one of the following models: a random forest model, a support vector machine model, a Wide and Deep model, etc., but the present disclosure is not limited thereto, and a person skilled in the art may train using an appropriate model according to actual situations.
According to an embodiment of the present disclosure, in addition to the first monitor vector and the third monitor vector, the first vector set may further include tag information corresponding to each of the first monitor vector and the third monitor vector, the tag information may be used to indicate that the corresponding first monitor data belongs to fault data or that the corresponding third monitor data belongs to normal data, and further, a classification model may be used to predict a prediction probability that the second monitor data corresponding to each of the second monitor vectors in the second vector set belongs to fault data. By training the classification model by using the first vector set containing the label information, the classification model can learn the distribution of each first monitoring vector or each third monitoring vector based on the label information corresponding to each first monitoring vector or each third monitoring vector, so that the prediction result of the classification model can accurately represent whether the second monitoring data corresponding to the second monitoring vector belongs to fault data or not.
According to the embodiment of the disclosure, when training the two-class model by using the first vector set, the two-class model can be cross-trained by using the first vector set to respectively obtain a plurality of trained two-class models so as to obtain a more stable result by the two-class models; then, predicting each second monitoring vector in the second vector set by using a plurality of trained classification models to obtain a plurality of original prediction results of each second monitoring vector in the second vector set; then, the prediction result of each second monitoring vector in the second vector set can be obtained based on a plurality of original prediction results of each second monitoring vector in the second vector set, wherein, for any one second monitoring vector in the second vector set, the average value of the plurality of original prediction results of the second monitoring vector can be used as the prediction result of the second monitoring vector, thereby avoiding excessive influence of the original prediction result of a certain two-classification model on the final prediction result, being beneficial to maintaining the stability of the prediction result and improving the reliability of the prediction result.
According to embodiments of the present disclosure, when cross-training a classification model with a first set of vectors, the first set of vectors may be randomly divided into a first number of vector subsets; at each training, the two-classification model is trained with a second number of vector subsets in the first number of vector subsets, where the second number is smaller than the first number, e.g., in the case of ten-fold cross training, the first number is 10 and the second number is 9, but the disclosure is not limited thereto, and specific values of the first number and the second number may be set by those skilled in the art according to the actual situation. Further, the second number of subsets of vectors used in any one training is not exactly the same as the second number of subsets of vectors used in other training. In other words, a part of vector subsets can be eliminated during each training, and the vector subsets eliminated each time are not identical, so that excessive influence of a part of vector subsets on training results is avoided to a certain extent, and a plurality of classification models are reliable as a whole.
Next, in step S203, a fourth monitoring vector may be determined from the second vector set based on the prediction result of each of the second monitoring vectors in the second vector set, and second monitoring data corresponding to the fourth monitoring vector may be determined as new fault data. Here, a second monitor vector having a prediction probability greater than the fourth threshold value may be selected from the second vector set, and the second monitor vector having a prediction probability greater than the fourth threshold value may be determined as the fourth monitor vector. Further, the fourth threshold may be determined by one skilled in the art according to actual circumstances, for example, 0.9 or 0.95, but the present disclosure is not limited thereto. Since the training-derived classification model has already learned the distribution of each first monitor vector, the prediction result of each second monitor vector by the classification model can indicate whether the distribution of each second monitor vector is the same as or similar to the first monitor vector, so that the second monitor vector which is the same as or similar to the distribution of the first monitor vector can be determined as the fourth monitor vector.
According to the data mining method for the industrial equipment health management, the state monitoring data with certain data similarity are divided into the similar data, the part similar to the determined fault data in the to-be-processed data is screened out, the determined fault data in the industrial equipment historical fault database can be well utilized to mine new fault data, economy and efficiency of mining and combing the fault data are improved, reliable data support is provided for subsequent equipment health management work, the data for the industrial equipment health management analysis are more sufficient and perfect, accordingly more accurate analysis results can be obtained, the equipment management is changed from a preventive maintenance management means to more lean predictive maintenance, and post maintenance of emergency faults and excessive maintenance of preventive maintenance are reduced.
A data mining method for industrial device health management according to embodiments of the present disclosure may be written as a computer program and stored on a computer readable storage medium. The data mining method for industrial equipment health management as described above may be implemented when the computer program is executed by a processor. Examples of the computer readable storage medium include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, nonvolatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD+RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD+RW, DVD-RAM, BD-ROM, BD-R, BD-RLTH, BD-RE, blu-ray or optical disk storage, hard Disk Drives (HDD), solid State Disks (SSD), card-type memories (such as multimedia cards, secure Digital (SD) cards or extreme digital (XD) cards), magnetic tapes, floppy disks, magneto-optical data storage devices, hard disks, solid state disks, and any other devices configured to store computer programs and any associated data, data files and data structures in a non-transitory manner and to provide the computer programs and any associated data, data files and data structures to a processor or computer to enable the processor or computer to execute the programs. In one example, the computer program and any associated data, data files, and data structures are distributed across networked computer systems such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed manner by one or more processors or computers.
Although a few embodiments of the present disclosure have been shown and described, it would be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the disclosure, the scope of which is defined in the claims and their equivalents.

Claims (10)

1. A data mining method for health management of industrial equipment, comprising:
acquiring a monitoring data set of industrial equipment, wherein the monitoring data set comprises a plurality of state monitoring data, the plurality of state monitoring data comprise first monitoring data and second monitoring data, the first monitoring data are fault data when the determined industrial equipment breaks down, and the second monitoring data are data to be determined;
carrying out vectorization processing on each state monitoring data in the plurality of state monitoring data to obtain a state monitoring vector corresponding to each state monitoring data, wherein the state monitoring vector comprises a first monitoring vector and a second monitoring vector, the first monitoring data corresponds to the first monitoring vector, and the second monitoring data corresponds to the second monitoring vector;
dividing the state monitoring vectors with vector similarity meeting the first preset requirement into the same category aiming at each state monitoring vector to obtain a plurality of category vector clusters, wherein the category is determined as a target category under the condition that the quantity and the proportion of the first monitoring vectors in the vector cluster of any category meet the second preset requirement;
and determining new fault data based on second monitoring data corresponding to a second monitoring vector in the vector cluster of the target class.
2. The data mining method according to claim 1, wherein the classifying the state monitoring vectors, for which the vector similarity satisfies the first preset requirement, into the same category for each state monitoring vector includes:
for any one state monitoring vector, calculating the average vector similarity of the state monitoring vector relative to the current vector cluster of each category;
when the maximum value in the average vector similarity of the state monitoring vector relative to the vector cluster of each current class is larger than a first threshold value, dividing the state monitoring vector into the class corresponding to the maximum value;
when the maximum value is less than or equal to the first threshold value, a category is newly created and the state monitoring vector is divided into the newly created categories.
3. The data mining method according to claim 2, wherein the calculating, for any one of the state monitoring vectors, the average vector similarity of the state monitoring vector with respect to the current vector cluster of each category includes:
for a vector cluster of any current category, respectively calculating the vector similarity between the state monitoring vector and all seed vectors in the vector cluster of the category, wherein the seed vectors comprise a first monitoring vector and/or a first state monitoring vector divided into each category;
taking the average value of the vector similarity of the state monitoring vector and all seed vectors in the vector cluster of the category as the average vector similarity of the state monitoring vector relative to the vector cluster of the category.
4. The data mining method according to claim 1, wherein the determining the class as the target class in the case that the number and the proportion of the first monitor vectors in the vector cluster of any class satisfy the second preset requirement includes:
when the number of first monitor vectors in the vector cluster of any category is greater than a second threshold and the ratio is greater than a third threshold, the category is determined to be the target category.
5. The data mining method of claim 1, wherein the plurality of status monitor data further comprises third monitor data, the third monitor data being determined normal data for normal operation of the industrial equipment, wherein the status monitor vector comprises a third monitor vector, the third monitor data corresponding to the third monitor vector.
6. The data mining method of claim 5, wherein the determining new fault data based on second monitoring data corresponding to a second monitoring vector in the vector cluster of the target class comprises:
obtaining a first vector set based on each first monitoring vector and each third monitoring vector, and obtaining a second vector set based on the second monitoring vector in the vector cluster of each target class;
training a classification model by using the first vector set, and predicting each second monitoring vector in the second vector set by using the trained classification model to obtain a prediction result of each second monitoring vector in the second vector set;
and determining a fourth monitoring vector from the second vector set based on a prediction result of each second monitoring vector in the second vector set, and determining second monitoring data corresponding to the fourth monitoring vector as new fault data.
7. The data mining method of claim 6, wherein the first vector set further includes tag information corresponding to each of the first monitor vector and the third monitor vector, the tag information indicating whether the corresponding first monitor data belongs to fault data or the corresponding third monitor data belongs to normal data, and the classification model is used to predict a prediction probability that the corresponding second monitor data of each of the second monitor vectors in the second vector set belongs to fault data.
8. The data mining method of claim 7, wherein the determining a fourth monitoring vector from the second set of vectors comprises:
and selecting a second monitoring vector with the prediction probability larger than a fourth threshold value from the second vector set, and determining the second monitoring vector with the prediction probability larger than the fourth threshold value as the fourth monitoring vector.
9. The data mining method of claim 6, wherein training the classification model using the first set of vectors and predicting each second monitor vector in the second set of vectors using the trained classification model to obtain a prediction result for each second monitor vector in the second set of vectors comprises:
cross training the two classification models by using the first vector set to respectively obtain a plurality of trained two classification models;
respectively predicting each second monitoring vector in the second vector set by using the plurality of trained two-classification models to obtain a plurality of original prediction results of each second monitoring vector in the second vector set;
and obtaining a prediction result of each second monitoring vector in the second vector set based on a plurality of original prediction results of each second monitoring vector in the second vector set, wherein an average value of the plurality of original prediction results of the second monitoring vector is taken as the prediction result of the second monitoring vector for any one second monitoring vector in the second vector set.
10. The data mining method of claim 9, wherein cross-training the classification model with the first set of vectors comprises:
randomly dividing the first set of vectors into a first number of vector subsets;
training the classification model with a second number of subsets of vectors of the first number of subsets of vectors at each training time, wherein the second number is smaller than the first number,
wherein the second number of subsets of vectors used in any one training is not exactly the same as the second number of subsets of vectors used in other training.
CN202211531923.0A 2022-12-01 2022-12-01 Data mining method for health management of industrial equipment Active CN116361351B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211531923.0A CN116361351B (en) 2022-12-01 2022-12-01 Data mining method for health management of industrial equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211531923.0A CN116361351B (en) 2022-12-01 2022-12-01 Data mining method for health management of industrial equipment

Publications (2)

Publication Number Publication Date
CN116361351A true CN116361351A (en) 2023-06-30
CN116361351B CN116361351B (en) 2024-05-17

Family

ID=86926955

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211531923.0A Active CN116361351B (en) 2022-12-01 2022-12-01 Data mining method for health management of industrial equipment

Country Status (1)

Country Link
CN (1) CN116361351B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117270479A (en) * 2023-11-21 2023-12-22 清远欧派集成家居有限公司 Method and system for monitoring multi-working-procedure production line of molding plate

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104091035A (en) * 2014-07-30 2014-10-08 中国科学院空间应用工程与技术中心 Health monitoring method for effective loads of space station based on data-driven algorithm
CN104503874A (en) * 2014-12-29 2015-04-08 南京大学 Hard disk failure prediction method for cloud computing platform
CN105391579A (en) * 2015-11-25 2016-03-09 国家电网公司 Electric power communication network fault positioning method based on key alarm sets and supervised classification
US20160188876A1 (en) * 2014-12-30 2016-06-30 Battelle Memorial Institute Anomaly detection for vehicular networks for intrusion and malfunction detection
CN107403072A (en) * 2017-08-07 2017-11-28 北京工业大学 A kind of diabetes B prediction and warning method based on machine learning
CN107832896A (en) * 2017-11-29 2018-03-23 广东电网有限责任公司电力科学研究院 A kind of electric power factory equipment soft fault method for early warning and device
CN114969349A (en) * 2022-07-29 2022-08-30 北京达佳互联信息技术有限公司 Text processing method and device, electronic equipment and medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104091035A (en) * 2014-07-30 2014-10-08 中国科学院空间应用工程与技术中心 Health monitoring method for effective loads of space station based on data-driven algorithm
CN104503874A (en) * 2014-12-29 2015-04-08 南京大学 Hard disk failure prediction method for cloud computing platform
US20160188876A1 (en) * 2014-12-30 2016-06-30 Battelle Memorial Institute Anomaly detection for vehicular networks for intrusion and malfunction detection
CN105391579A (en) * 2015-11-25 2016-03-09 国家电网公司 Electric power communication network fault positioning method based on key alarm sets and supervised classification
CN107403072A (en) * 2017-08-07 2017-11-28 北京工业大学 A kind of diabetes B prediction and warning method based on machine learning
CN107832896A (en) * 2017-11-29 2018-03-23 广东电网有限责任公司电力科学研究院 A kind of electric power factory equipment soft fault method for early warning and device
CN114969349A (en) * 2022-07-29 2022-08-30 北京达佳互联信息技术有限公司 Text processing method and device, electronic equipment and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JUEUN KWAK等: ""An Incremental Clustering-Based Fault Detection Algorithm for Class-Imbalanced Process Data"", 《IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING ( VOLUME: 28, ISSUE: 3, AUGUST 2015)》, 15 June 2015 (2015-06-15), pages 1 - 4 *
束洪春等: ""考虑互感器传变特性的输电线路暂态保护雷击干扰与线路故障识别方法"", 《电工技术学报》, 28 February 2015 (2015-02-28), pages 1 - 12 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117270479A (en) * 2023-11-21 2023-12-22 清远欧派集成家居有限公司 Method and system for monitoring multi-working-procedure production line of molding plate
CN117270479B (en) * 2023-11-21 2024-02-06 清远欧派集成家居有限公司 Method and system for monitoring multi-working-procedure production line of molding plate

Also Published As

Publication number Publication date
CN116361351B (en) 2024-05-17

Similar Documents

Publication Publication Date Title
KR101713985B1 (en) Method and apparatus for prediction maintenance
CN108647136B (en) Hard disk damage prediction method and device based on SMART information and deep learning
Ferreiro et al. Industry 4.0: predictive intelligent maintenance for production equipment
EP3001265A1 (en) Computer-implemented method and system for machine tool damage assessment, prediction, and planning in manufacturing shop floor
CN116361351B (en) Data mining method for health management of industrial equipment
Patange et al. Augmentation of decision tree model through hyper-parameters tuning for monitoring of cutting tool faults based on vibration signatures
CN110426634B (en) Method and equipment for predicting abnormity of driving system
Zschech et al. Prognostic model development with missing labels: a condition-based maintenance approach using machine learning
CN116432361B (en) Service life assessment method and device of wind generating set
WO2016034945A2 (en) Stuck pipe prediction
CN111309502A (en) Solid state disk service life prediction method
Becherer et al. Intelligent choice of machine learning methods for predictive maintenance of intelligent machines
Gittler et al. International Conference on Advanced and Competitive Manufacturing Technologies milling tool wear prediction using unsupervised machine learning
Qasim et al. A comparative analysis of anomaly detection methods for predictive maintenance in SME
de Carvalho Michalski et al. Applying moving window principal component analysis (MWPCA) for fault detection in hydrogenerator
Hwang et al. Shifting artificial data to detect system failures
Al-Dahidi et al. A novel ensemble clustering for operational transients classification with application to a nuclear power plant turbine
Wagner et al. Machine condition monitoring and fault diagnostics with imbalanced data sets based on the KDD process
Burmeister et al. Exploration of production data for predictive maintenance of industrial equipment: A case study
CN116482460A (en) Fault diagnosis method for power grid equipment and related equipment
Ardali et al. A data-driven fault detection and diagnosis by NSGAII-t-SNE and clustering methods in the chemical process industry
Kusumaningrum et al. Machine learning for predictive maintenance
KR20220097252A (en) Method and system for managing equipment of smart plant using machine-learning
Last et al. Condition-based maintenance with multi-target classification models
Leohold et al. Prognostic Methods for Predictive Maintenance: A generalized Topology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant