CN117131859A - Equipment fault mode extraction and identification method based on text mining technology - Google Patents

Equipment fault mode extraction and identification method based on text mining technology Download PDF

Info

Publication number
CN117131859A
CN117131859A CN202310991214.9A CN202310991214A CN117131859A CN 117131859 A CN117131859 A CN 117131859A CN 202310991214 A CN202310991214 A CN 202310991214A CN 117131859 A CN117131859 A CN 117131859A
Authority
CN
China
Prior art keywords
text
fault
word
fault mode
mining technology
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310991214.9A
Other languages
Chinese (zh)
Inventor
杨军
王宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202310991214.9A priority Critical patent/CN117131859A/en
Publication of CN117131859A publication Critical patent/CN117131859A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention provides a device fault mode extraction and identification method based on text mining technology, which comprises the steps of firstly, carrying out data preprocessing on maintenance text data through operations such as unifying data coding formats, eliminating useless characters, word segmentation and the like; secondly, taking the preprocessed fault mode data as input, adopting a TF-IDF algorithm to carry out text vectorization and feature extraction, and adopting a K-Means clustering algorithm to obtain a fault mode type label; and then, taking the preprocessed fault phenomenon data as input, adopting a TF-IDF algorithm to carry out text vectorization and feature extraction, and constructing an equipment fault mode identification framework based on six machine learning classification algorithms, so as to establish the corresponding relation between the fault phenomenon and the fault mode. According to the method, equipment maintenance text information is fully utilized, a text mining technology is used for extracting valuable information contained in a text file, and the defect that available quantized data is deficient in equipment development stage and failure mode analysis is difficult to effectively develop can be effectively overcome.

Description

Equipment fault mode extraction and identification method based on text mining technology
Technical Field
The invention provides an equipment fault mode extraction and identification method based on a text mining technology, which can make up for the defect that available quantitative data is deficient in a large-scale equipment development stage, so that fault mode analysis is difficult to effectively develop, fully utilizes equipment maintenance text information, and develops fault mode extraction and machine learning-based fault mode identification research by using the text mining technology, thereby effectively carrying out rapid equipment fault positioning and maintenance arrangement decision. The method is suitable for the relevant fields of equipment fault mode analysis and the like.
Background
Reliability is an important technical attribute for measuring the use efficiency of weapon equipment and complex systems, and along with the rapid development of science and technology, high reliability and long service life have become the general requirements for development, production and service of model equipment. Therefore, advanced and scientific reliability verification and comprehensive evaluation have become the basic basis for equipment sizing, and thus, life cycle management decisions. The fault mode analysis and the fault mode identification of the equipment system are important links of reliability assessment, provide required fault type information for the reliability assessment, and directly determine the validity of a reliability assessment result in terms of accuracy and high efficiency. However, due to the outstanding characteristics of high test cost, long development period and less field test data of the large equipment system, quantitative data information which can be used for directly carrying out equipment system fault mode analysis is deficient. Therefore, how to extract other data information rapidly and accurately, and alleviate the outstanding difficulty of insufficient field test data in the equipment test identification stage, and has extremely important significance for effectively carrying out equipment fault mode analysis and identification work.
During the development of equipment systems, a large amount of data information has not been effectively utilized. In fact, the equipment system has a large amount of text information in the development stage, and the content of maintenance reports and the like contained in the text information provides effective data for the equipment system to perform fault mode analysis. However, how to extract effective failure modes from recorded text information and establish a relationship between failure phenomena and failure modes, thereby guiding maintenance personnel to quickly locate failure sites has become a new challenge. In recent years, thanks to the development of big data analysis technology, text information mining methods have been widely studied and developed, which can extract valuable information contained in text files based on intelligent algorithms, organize and sort the valuable information, and realize mining of text knowledge. Therefore, the text mining technology provides a powerful technical tool for solving the problems of insufficient test data, unknown relevance between a fault phenomenon and a fault mode and the like in the fault mode analysis of the equipment system.
Therefore, the invention aims at the problems that available quantized data is deficient and failure mode analysis is difficult to effectively develop in the equipment development stage, fully utilizes maintenance text information, and develops failure mode extraction and failure mode identification research based on machine learning by using a text mining technology. By effectively extracting the fault information in the text, the corresponding relation between the fault phenomenon and the fault mode is established, so that related staff can be effectively guided to carry out rapid fault positioning and maintenance activity arrangement.
Disclosure of Invention
The purpose of the invention is that: aiming at the problems that available quantized data is deficient and failure mode analysis is difficult to effectively develop in the equipment development stage, the invention fully utilizes maintenance text information, uses an intelligent extraction algorithm and a machine learning technology, and provides an equipment failure mode extraction and identification method based on a text mining technology, thereby establishing a corresponding relation between failure phenomena and failure modes and effectively carrying out quick positioning and maintenance arrangement decision of equipment failures.
The technical scheme is as follows:
based on the method and thought, the invention provides an equipment fault mode extraction and identification method based on a text mining technology, which comprises the following specific implementation steps:
step one: preprocessing text data;
chinese text is different from english text, has no space between words, and contains a large amount of useless information. Therefore, in order to effectively develop text information mining, data cleaning and preprocessing are firstly carried out on a maintenance text, so that structured text data is obtained, and the specific flow is as follows:
(1) unifying data coding formats;
chinese text information typically contains multiple encoding formats, severely impacting data processing efficiency. Considering the wide applicability of UTF-8 to various programming languages, the invention uses UTF-8 as a standard coding format to perform unified processing on all Chinese text information.
(2) Removing useless characters;
considering that punctuation marks are usually used for Chinese text information sentence breaking, arabic numerals relate to model information and partial quantization information, english letters relate to model specific component information, and all characters except the punctuation marks, the Arabic numerals and the English letters in a maintenance text are removed.
(3) Word segmentation and part-of-speech tagging;
the invention combines the stop word vocabulary designed by the Harbin industrial university and the built-in vocabulary of the Jieba word segmentation as stop word vocabulary and custom dictionary for improving the word segmentation effect and the accuracy of word part tagging based on the operation of word segmentation and word part tagging of the Jieba maintenance record text information.
The steps are applied to maintenance record text information of the equipment system, a preprocessing result of fault phenomenon and fault mode data is obtained, and the preprocessing result is used as input to develop subsequent fault mode extraction and recognition research.
Step two: clustering fault modes;
(1) fault mode word vector conversion;
text data is unstructured data, cannot be directly calculated by a computer, and needs to be converted into a series of vectors capable of expressing text semantics. Therefore, firstly, word vector conversion is carried out on fault mode text based on TF-IDF algorithm, and the specific steps are as follows:
(a) Word frequencies are first extracted based on the word segmentation result, which represents the number of times a word appears in a document, and in order to eliminate the influence of the size of the document itself, it is generally defined as:
where TF (t, d) represents the word frequency of the word t, df (t) represents the number of times the word t appears in the document, and N represents the total number of words in the document.
(b) An inverse document frequency is calculated, the parameter being used to represent the importance of the term. If a word appears more frequently, the denominator is larger, the inverse document frequency is closer to 0, and the calculation process is as follows:
where IDF (t) represents the inverse document frequency of word t and n represents the total number of documents in the corpus.
(c) The TF-IDF is calculated as follows:
TF-IDF(t)=TF(t,d)×IDF(t). (3)
based on the steps, the word vector conversion is carried out on the preprocessing result of the fault mode text information to obtain a fault mode feature matrix, and corresponding input is provided for subsequent fault mode clustering.
(2) K-Means based failure mode clustering;
clustering the fault mode feature matrix obtained in the step (1) by adopting a K-Means clustering algorithm to obtain potential fault mode categories, wherein the specific process is as follows:
(a) Selecting the number k of categories to be clustered, and selecting k center points;
(b) For each sample point, finding the nearest center point, and gathering the nearest points from the same center point into a class, thus completing one-time clustering;
(c) Judging whether the category conditions of the sample points before and after clustering are the same, and if so, stopping the algorithm; otherwise, entering step (d);
(d) Calculating the center points of the sample points for each class of sample points, and taking the center points as new center points of the class; then, continuing step (b);
through the steps, the types of the potential failure modes of the equipment are obtained and used as class labels of the subsequent corresponding failure phenomenon characteristic matrixes.
Step three: machine learning-based fault pattern recognition;
after the fault mode clustering is completed, the fault mode type can be rapidly positioned after the related staff observe the fault phenomenon, so that maintenance management work can be efficiently carried out. Based on the fault phenomenon corresponding to each fault mode, the invention excavates and establishes the interrelation between the two, and provides effective maintenance guidance for relevant staff, and the specific operation is as follows:
(1) word vector conversion of fault phenomena;
in order to be able to efficiently handle the failure phenomenon, it is also necessary to transform it into a series of vectors capable of expressing text semantics using text mining techniques. In the operation, the TF-IDF is also adopted to preprocess the system fault phenomenon data, and the result data, the word vector conversion and the feature extraction are carried out. The operation flow is the same as the operation (1) of the first step, and will not be described here again.
Based on the operation, word vector conversion is carried out to obtain a fault phenomenon characteristic matrix, and corresponding input is provided for building a subsequent fault mode identification framework.
(2) Constructing a fault mode identification framework;
after the extraction of the fault phenomenon characteristic matrix is completed, combining the corresponding fault mode types, and building a fault mode identification framework based on a machine learning classification algorithm. Classifiers commonly used in machine learning include KNN, SVM, decision tree, naive bayes, random forests, adaboost classifier, etc. Therefore, the invention is mainly built by developing the optimal fault mode identification framework by the six classifiers.
In order to verify and compare the effect of the proposed algorithm, three indexes most commonly used in the machine learning classification field are adopted for verification, namely accuracy, recall and F1-score, and the method is concretely as follows:
in the formula, accuracy refers to Accuracy, and represents the proportion of the number of correctly classified test cases to the total number of the test cases; recall refers to Recall rate, which indicates the proportion of the number of correctly classified positive examples to the actual number of positive examples; precision refers to Precision, also called Precision, representing the proportion of the number of correctly classified positive examples to the number of examples classified as positive examples; f1-score is based on the harmonic average of recall and precision, i.e., the recall and precision are integrated for evaluation.
And taking the fault phenomenon characteristic matrix as characteristic input, taking the corresponding fault mode type as class label output, constructing a corresponding fault mode identification frame based on the six machine learning classification algorithms in sequence, and finally, verifying the prediction effect of the provided identification frame by adopting three indexes of accuracy, recall and F1-score.
The invention has the advantages that:
(1) aiming at the problems that available quantized data is deficient and failure mode analysis is difficult to effectively develop in the equipment development stage, maintenance text information is fully utilized, an intelligent extraction algorithm and a machine learning technology are used, and an equipment failure mode extraction and identification method based on a text mining technology is provided. The method can establish the corresponding relation between the fault phenomenon and the fault mode, thereby providing a reference basis for rapid positioning and maintenance arrangement decision of equipment faults.
(2) The prediction method provided by the invention combines engineering practice, has simple model construction, is easy to optimize and train, does not need to intervene expert experience, is convenient for engineering technicians to apply, and is scientific in method specification.
Drawings
Fig. 1 is a flow chart of the method of the present invention.
Fig. 2 is a text word cloud of failure modes of steering engine equipment.
Fig. 3 is a clustering analysis result of the fault modes of the steering engine equipment.
Detailed Description
The invention relates to an equipment fault mode extraction and identification method based on a text mining technology, wherein a technical flow chart is shown in fig. 1, and a concrete implementation step of the invention is described in detail below by taking steering engine equipment maintenance text data of a certain type of commercial ship as an example.
Step one: preprocessing text data;
and obtaining text data preprocessing results of the fault phenomenon and the fault mode of steering engine equipment through the steps of unifying data coding formats, eliminating useless characters, segmenting words, labeling parts of speech and the like, wherein the text data preprocessing results are shown in a table 1 and a table 2 respectively.
TABLE 1 data preprocessing results for failure phenomena
TABLE 2 failure mode data Pre-processing results
In order to more intuitively visualize the word segmentation effect of the fault text, the invention takes the fault mode text as an example, and generates a corresponding word cloud picture, which is particularly shown in fig. 2.
And then, taking the data preprocessing result as input, and developing a subsequent fault mode extraction and identification method research.
Step two: clustering fault modes;
firstly, based on a TF-IDF algorithm, the result data corresponding to the fault mode in the table 2 is preprocessed, word vector conversion is carried out, and a fault mode feature matrix is obtained, as shown in the table 3.
TABLE 3 failure mode feature matrix
In the table, the numbers 1 to 58 represent 58 pieces of fault pattern description information recorded in the maintenance text information, and the subsequent data are the extracted features. Therefore, based on the above information, failure mode cluster analysis can be performed to further extract the main failure mode type.
And then, clustering the obtained fault mode feature matrix based on a K-Means clustering algorithm to obtain potential fault mode categories. Experiments find that when k is taken to be 4, at least two pieces of maintenance information are contained in each category. Accordingly, a total of 4 types of failure modes are set, and the failure mode information represented by each type and the maintenance information number contained therein are shown in table 4. In order to perform visual display on the clustering effect, a TSNE tool is adopted to reduce dimensions of all features, so that visualization is facilitated, and a result is shown in FIG. 3. It can be seen that the clustering model achieves a very good clustering effect.
TABLE 4 failure mode Cluster analysis results
As can be seen from a combination of fig. 3 and table 4, the main failure modes of the steering engine equipment are two major types, namely, a system failure caused by a circuit problem and a failure caused by a system hardware problem, and in addition, a failure caused by external environment interference and a failure caused by a system software problem also occupy a considerable proportion.
Step three: machine learning-based fault pattern recognition;
first, similar to the fault mode text data processing, the result data is preprocessed according to the TF-IDF algorithm, which corresponds to table 1, and word vector conversion is performed, so as to obtain a fault phenomenon feature matrix, as shown in table 5.
TABLE 5 characterization matrix of failure phenomena
In table 5, the numbers 1 to 58 represent 58 pieces of failure phenomenon text information recorded in the maintenance text information, and the data in the table is the extracted failure phenomenon feature matrix.
Then, a fault phenomenon characteristic matrix is taken as input, a fault mode clustering result is taken as class output, a corresponding fault mode identification framework is constructed based on six common machine learning algorithms, and the classification result is shown in table 6.
Table 6 classification effect of six types of steering engine fault mode recognition frameworks
It can be seen that the KNN-based steering engine fault pattern recognition framework, the SVM-based steering engine fault pattern recognition framework, the decision tree-based steering engine fault pattern recognition framework, the naive bayes-based steering engine fault pattern recognition framework and the random forest-based steering engine fault pattern recognition framework all achieve excellent effects, wherein the decision tree-based steering engine fault pattern recognition framework and the random forest-based steering engine fault pattern recognition framework achieve the best classification recognition effect, and reach 100%. Therefore, it is recommended to construct a steering engine failure mode recognition framework using decision tree-based or random forest classification algorithms.

Claims (9)

1. A method for extracting and identifying equipment fault modes based on a text mining technology is characterized by comprising the following steps of: the method comprises the following steps:
step one: preprocessing text data; performing data cleaning and preprocessing on the maintenance text to obtain structured text data, including: unifying data coding formats, removing useless characters, segmentation words and part-of-speech labels;
step two: clustering fault modes; comprising the following steps: fault mode word vector conversion and K-Means based fault mode clustering;
step three: machine learning-based fault pattern recognition; comprising the following steps: and constructing a fault phenomenon word vector conversion and fault mode identification framework.
2. The text mining technology-based equipment failure mode extraction and recognition method according to claim 1, wherein: and taking UTF-8 as a standard coding format to perform unified processing on all Chinese text information.
3. The text mining technology-based equipment failure mode extraction and recognition method according to claim 1, wherein: and eliminating all characters except punctuation marks, arabic numerals and English letters in the maintenance text.
4. The text mining technology-based equipment failure mode extraction and recognition method according to claim 1, wherein: the stop word vocabulary designed by the Harbin industrial university and the vocabulary built in the Jieba word segmentation are used as stop word vocabulary and custom dictionary.
5. A method for extracting and identifying equipment failure modes based on text mining technology according to claim 1 or 2 or 3 or 4, wherein: word vector conversion is carried out on the fault mode text based on the TF-IDF algorithm, and the specific steps are as follows:
(a) Word frequency is extracted based on the word segmentation result, which represents the number of times a word appears in a document, and in order to eliminate the influence of the size of the document itself, it is defined as:
where TF (t, d) represents the word frequency of the word t, df (t) represents the number of times the word t appears in the document, and N represents the total number of words in the document;
(b) Calculating an inverse document frequency for representing the importance of the word; if a word appears more frequently, the denominator is larger, the inverse document frequency is closer to 0, and the calculation process is as follows:
wherein IDF (t) represents the inverse document frequency of word t, and n represents the total number of documents in the corpus;
(c) The TF-IDF is calculated as follows:
TF-IDF(t)=TF(t,d)×IDF(t) (3)。
6. the text mining technology-based equipment failure mode extraction and recognition method according to claim 5, wherein: clustering the fault mode feature matrix by adopting a K-Means clustering algorithm to obtain potential fault mode categories, wherein the specific process is as follows:
(a) Selecting the number k of categories to be clustered, and selecting k center points;
(b) For each sample point, finding the nearest center point, and gathering the nearest points from the same center point into a class to finish one-time clustering;
(c) Judging whether the category conditions of the sample points before and after clustering are the same, and if so, stopping the algorithm; otherwise, entering step (d);
(d) Calculating the center points of the sample points for each class of sample points, and taking the center points as new center points of the class; then, step (b) is continued.
7. The text mining technology-based equipment failure mode extraction and recognition method according to claim 1, wherein: after the extraction of the fault phenomenon characteristic matrix is completed, combining the corresponding fault mode types, and building a fault mode identification framework based on a machine learning classification algorithm.
8. The text mining technology-based equipment failure mode extraction and recognition method according to claim 7, wherein: the classifier in machine learning comprises KNN, SVM, decision tree, naive Bayes, random forest and AdaboostClassifier.
9. The text mining technology-based equipment failure mode extraction and recognition method according to claim 8, wherein: verification is performed with accuracy, recall and F1-score, as follows:
in the formula, accuracy refers to Accuracy, and represents the proportion of the number of correctly classified test cases to the total number of the test cases; recall refers to Recall rate, which indicates the proportion of the number of correctly classified positive examples to the actual number of positive examples; precision refers to Precision, also called Precision, representing the proportion of the number of correctly classified positive examples to the number of examples classified as positive examples; f1-score is based on the harmonic average of recall and precision, i.e., the recall and precision are integrated for evaluation.
CN202310991214.9A 2023-08-08 2023-08-08 Equipment fault mode extraction and identification method based on text mining technology Pending CN117131859A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310991214.9A CN117131859A (en) 2023-08-08 2023-08-08 Equipment fault mode extraction and identification method based on text mining technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310991214.9A CN117131859A (en) 2023-08-08 2023-08-08 Equipment fault mode extraction and identification method based on text mining technology

Publications (1)

Publication Number Publication Date
CN117131859A true CN117131859A (en) 2023-11-28

Family

ID=88861940

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310991214.9A Pending CN117131859A (en) 2023-08-08 2023-08-08 Equipment fault mode extraction and identification method based on text mining technology

Country Status (1)

Country Link
CN (1) CN117131859A (en)

Similar Documents

Publication Publication Date Title
WO2022110637A1 (en) Question and answer dialog evaluation method and apparatus, device, and storage medium
CN113449099B (en) Text classification method and text classification device
CN110612524B (en) Information processing apparatus, information processing method, and recording medium
CN108304382B (en) Quality analysis method and system based on text data mining in manufacturing process
CN111143840B (en) Method and system for identifying abnormity of host operation instruction
CN109389418A (en) Electric service client's demand recognition methods based on LDA model
CN112347271A (en) Auxiliary defect entry method for power distribution Internet of things equipment based on character semantic recognition
CN114528848B (en) Safety analysis and automatic evaluation method based on index threshold and semantic analysis
CN114997169A (en) Entity word recognition method and device, electronic equipment and readable storage medium
CN110704638A (en) Clustering algorithm-based electric power text dictionary construction method
CN107577738A (en) A kind of FMECA method by SVM text mining processing datas
CN110347805A (en) Petroleum industry security risk key element extracting method, device, server and storage medium
CN112347252B (en) Interpretability analysis method based on CNN text classification model
CN111291562B (en) Intelligent semantic recognition method based on HSE
CN112685374B (en) Log classification method and device and electronic equipment
CN116629228A (en) Standard element duplicate checking method based on text mining
CN117131859A (en) Equipment fault mode extraction and identification method based on text mining technology
CN115470034A (en) Log analysis method, device and storage medium
KR102265947B1 (en) Method and apparatus for providing information based on machine learning
CN115758183A (en) Training method and device for log anomaly detection model
CN115169490A (en) Log classification method, device and equipment and computer readable storage medium
CN115373982A (en) Test report analysis method, device, equipment and medium based on artificial intelligence
CN114610882A (en) Abnormal equipment code detection method and system based on electric power short text classification
CN111859896B (en) Formula document detection method and device, computer readable medium and electronic equipment
CN114416991A (en) Method and system for analyzing text emotion reason based on prompt

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination