CN111858270A - Interlocking system fault positioning method based on data mining algorithm - Google Patents

Interlocking system fault positioning method based on data mining algorithm Download PDF

Info

Publication number
CN111858270A
CN111858270A CN202010475231.3A CN202010475231A CN111858270A CN 111858270 A CN111858270 A CN 111858270A CN 202010475231 A CN202010475231 A CN 202010475231A CN 111858270 A CN111858270 A CN 111858270A
Authority
CN
China
Prior art keywords
method based
interlocking system
variables
data mining
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010475231.3A
Other languages
Chinese (zh)
Inventor
黄鲁江
成燚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Casco Signal Ltd
Original Assignee
Casco Signal Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Casco Signal Ltd filed Critical Casco Signal Ltd
Priority to CN202010475231.3A priority Critical patent/CN111858270A/en
Publication of CN111858270A publication Critical patent/CN111858270A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3447Performance evaluation by modeling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to an interlocking system fault positioning method based on a data mining algorithm, which comprises the following steps: step 1) obtaining a fault log; step 2), extracting characteristic variables and target variables; step 3), data processing; step 4), algorithm selection; step 5), training and evaluating a model; step 6), acquiring the importance of the characteristic variable; and 7) determining a fault reason. Compared with the prior art, the method has the advantages of greatly reducing the workload of engineers, improving the working efficiency and the like.

Description

Interlocking system fault positioning method based on data mining algorithm
Technical Field
The invention relates to a fault positioning method for an interlocking system, in particular to a fault positioning method for an interlocking system based on a data mining algorithm.
Background
Data mining is the process of finding potentially valuable information or knowledge hidden in data from large, complex, noisy, or even incomplete data. Data mining algorithms have been widely used in commercial fields such as retail, insurance, finance, medical treatment, transportation and the like, and industrial fields such as aerospace, electric power, machine manufacturing and the like. Meanwhile, data mining technology is also gradually beginning to be explored in railway signal systems, but is not applied to fault diagnosis or positioning of the interlocking system. An ensemble learning algorithm is an important algorithm in data mining technology.
The computer interlocking system is the core control equipment for ensuring the driving safety in a railway signal system, and the reliable and stable operation of the system is the guarantee of the train operation. Complex faults in computer interlocking systems are generally characterized by ambiguity, coupling, and the like. For complex faults, a mode of manually checking a large number of log records is still adopted to locate the faults at the present stage, the mode not only depends on the experience and knowledge level of an analyst, but also takes a large amount of time and has low efficiency.
The failure analysis of the computer interlocking system is a complex system engineering, comprises various troubleshooting means and processing methods, and is unrealistic to completely cover and contain all the failure analysis through an algorithm.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide an interlocking system fault positioning method based on a data mining algorithm.
The purpose of the invention can be realized by the following technical scheme:
an interlocking system fault positioning method based on a data mining algorithm comprises the following steps:
step 1) obtaining a fault log;
step 2), extracting characteristic variables and target variables;
Step 3), data processing;
step 4), algorithm selection;
step 5), training and evaluating a model;
step 6), acquiring the importance of the characteristic variable;
and 7) determining a fault reason.
Preferably, the step 1) of obtaining the fault log specifically includes:
and acquiring database records with double-computer asynchronous faults for multiple times within a period of time, and dividing all fault logs according to minutes to acquire multiple groups of fault records.
Preferably, the extracting of the characteristic variables and the target variables in step 2) is specifically:
according to the principle analysis of double-machine asynchronous faults, which variables are characteristic variables and which variables are target variables need to be determined;
18 feature variables and one target variable are obtained.
Preferably, the target variable is whether an out-of-sync alarm occurs.
Preferably, the 18 characteristic variables include collection class information, driving class information, station field representation information and network state information.
Preferably, the data processing in the step 3) is specifically;
(1) analyzing the data type;
(2) processing missing values;
(3) deletion of features with variance of 0;
(4) processing outliers of the exceptions;
(5) deletion of all-0 value data;
(6) sample equalization;
(7) Normalization;
(8) and (4) dividing the data set.
Preferably, the algorithm of step 4) is specifically selected as follows: a decision tree DT, random forest RF or XGBT algorithm is selected.
Preferably, the step 5) model training and evaluation:
and training the three algorithms by respectively adopting the training data sets, and evaluating the three algorithms by adopting the test data sets.
Preferably, said assessment is in particular:
three algorithms were evaluated using three evaluation indices, recall, precision and F1 values.
Preferably, the step 6) of obtaining the importance of the characteristic variable specifically includes:
and obtaining the characteristic variable which has the largest influence on the target variable through the score of the characteristic importance.
Compared with the prior art, the invention has the following advantages:
1. the data mining algorithm is applied to the fault analysis of the computer interlocking system for the first time, and a new thought and a feasible method are provided for the intelligent fault analysis of the computer interlocking system.
2. The characteristic selection algorithm in data mining is a series of operations generally carried out for improving the accuracy of the algorithm, and the method creatively takes the characteristic selection algorithm as a fault positioning and screening strategy and is a process for reversely deducing fault reasons according to fault results.
3. Aiming at the positioning and troubleshooting of a kind of complex faults, the invention provides an intelligent method which is time-saving and labor-saving and has a certain degree of automation, thereby greatly reducing the workload of engineers and improving the working efficiency.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic diagram of feature importance scores for a decision tree algorithm;
FIG. 3 is a schematic diagram of feature importance scores for a random forest algorithm;
FIG. 4 is a graph illustrating feature importance scores for the XGBT algorithm.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.
The present invention will be described in detail below with reference to fig. 1 and an actual single fault type (dual asynchronous fault).
1 obtaining fault logs
Firstly, database records with double-computer asynchronous faults occurring for multiple times within a period of time (within one month) are obtained, and all fault logs are divided according to minutes, so that 30-24-60-43200 groups of fault records are obtained.
2 extracting characteristic variables and target variables
A large amount of variable information is stored in a log file of a computer interlocking system, data needs to be classified in a large amount of unclassified data according to business understanding, and then characteristic variables needing to be used are determined. The information which may affect the asynchronous fault includes acquisition information, driving information, station indicating information and network state information, wherein the first three types of information can be classified into turnout information, signal information, track information, system information and the like.
A total of 18 characteristic variables and one target variable are obtained, the target variable being: and if asynchronous alarm occurs, information such as representation, control and drive of turnouts, signal machines and track circuits is used as characteristic variables. 43200 sets of statistics are then extracted from 43200 sets of fault log data.
3 data processing
And converting a large amount of log files into statistical data which can be used by an algorithm. And screens, filters, and groups data. And performing data processing on 43200 groups of data by the following steps:
(1) data type analysis
(2) Missing value handling
(3) Deletion of variance 0 features
(4) Outlier handling for exceptions
(5) Deletion of all-0 value data
(6) Sample equalization
(7) Normalization
(8) Data set partitioning
4 Algorithm selection
Since the method needs to obtain special importance in the model training process, certain requirements are required for the algorithm, and the algorithm capable of generating the feature importance must be selected. Decision Trees (DT), Random Forests (RF) and XGBT algorithms are selected in the method.
5 model training and evaluation
The fault case belongs to the two-classification problem of sample imbalance, three evaluation indexes, namely a call evaluation index, a precision evaluation index and an F1 evaluation index are adopted to evaluate three algorithms, and meanwhile, the feature variable which has the largest influence on the target variable is obtained through the score of feature importance.
The three algorithms were trained using the training data sets, respectively, and evaluated using the test data sets, as shown in table 1.
TABLE 1
Figure BDA0002515664450000051
The importance scores of the feature variables are obtained while the model is trained, five feature variables with the highest feature importance scores of the three algorithms are distributed as shown In FIGS. 2-4, and In the three algorithms, the feature variables with the highest scores are all the variables S _ In (representing the representation type variables of the signal), so that the variable S _ In is determined to be the most main reason for double-computer asynchronization.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. An interlocking system fault positioning method based on a data mining algorithm is characterized by comprising the following steps:
step 1) obtaining a fault log;
step 2), extracting characteristic variables and target variables;
step 3), data processing;
step 4), algorithm selection;
step 5), training and evaluating a model;
step 6), acquiring the importance of the characteristic variable;
and 7) determining a fault reason.
2. The interlocking system fault location method based on the data mining algorithm according to claim 1, wherein the step 1) of obtaining the fault log specifically comprises:
and acquiring database records with double-computer asynchronous faults for multiple times within a period of time, and dividing all fault logs according to minutes to acquire multiple groups of fault records.
3. The interlocking system fault location method based on the data mining algorithm according to claim 1, wherein the step 2) of extracting the characteristic variables and the target variables specifically comprises:
according to the principle analysis of double-machine asynchronous faults, which variables are characteristic variables and which variables are target variables need to be determined;
18 feature variables and one target variable are obtained.
4. The interlocking system fault location method based on the data mining algorithm as claimed in claim 3, characterized in that the target variable is whether an out-of-sync alarm occurs.
5. The interlocking system fault location method based on the data mining algorithm as claimed in claim 3, wherein the 18 characteristic variables comprise collection class information, driving class information, station yard representation information and network state information.
6. The interlocking system fault location method based on the data mining algorithm as claimed in claim 1, wherein the data processing in the step 3) is specifically;
(1) analyzing the data type;
(2) processing missing values;
(3) deletion of features with variance of 0;
(4) processing outliers of the exceptions;
(5) deletion of all-0 value data;
(6) Sample equalization;
(7) normalization;
(8) and (4) dividing the data set.
7. The interlocking system fault location method based on the data mining algorithm as claimed in claim 1, wherein the algorithm selection of step 4) is specifically: a decision tree DT, random forest RF or XGBT algorithm is selected.
8. The interlocking system fault location method based on the data mining algorithm as claimed in claim 1, wherein the step 5) model training and evaluation:
and training the three algorithms by respectively adopting the training data sets, and evaluating the three algorithms by adopting the test data sets.
9. The interlocking system fault location method based on the data mining algorithm as claimed in claim 8, wherein the evaluation is specifically:
three algorithms were evaluated using three evaluation indices, recall, precision and F1 values.
10. The interlocking system fault location method based on the data mining algorithm according to claim 8, wherein the step 6) of obtaining the importance of the characteristic variable specifically comprises:
and obtaining the characteristic variable which has the largest influence on the target variable through the score of the characteristic importance.
CN202010475231.3A 2020-05-29 2020-05-29 Interlocking system fault positioning method based on data mining algorithm Pending CN111858270A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010475231.3A CN111858270A (en) 2020-05-29 2020-05-29 Interlocking system fault positioning method based on data mining algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010475231.3A CN111858270A (en) 2020-05-29 2020-05-29 Interlocking system fault positioning method based on data mining algorithm

Publications (1)

Publication Number Publication Date
CN111858270A true CN111858270A (en) 2020-10-30

Family

ID=72985997

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010475231.3A Pending CN111858270A (en) 2020-05-29 2020-05-29 Interlocking system fault positioning method based on data mining algorithm

Country Status (1)

Country Link
CN (1) CN111858270A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115185780A (en) * 2022-07-21 2022-10-14 宁夏正安诚意科贸有限公司 Data acquisition method and system based on industrial Internet

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101452287A (en) * 2007-12-08 2009-06-10 鞍山钢铁集团公司铁路运输公司 Railway signal microcomputer interlock system
CN103345207A (en) * 2013-05-31 2013-10-09 北京泰乐德信息技术有限公司 Mining analyzing and fault diagnosis system of rail transit monitoring data
CN104392756A (en) * 2014-10-08 2015-03-04 中国科学院合肥物质科学研究院 Reactor dynamic interlock system and method based on digital instrumentation and control system
CN107256219A (en) * 2017-04-24 2017-10-17 卡斯柯信号有限公司 Big data convergence analysis method applied to automatic train control system massive logs

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101452287A (en) * 2007-12-08 2009-06-10 鞍山钢铁集团公司铁路运输公司 Railway signal microcomputer interlock system
CN103345207A (en) * 2013-05-31 2013-10-09 北京泰乐德信息技术有限公司 Mining analyzing and fault diagnosis system of rail transit monitoring data
CN104392756A (en) * 2014-10-08 2015-03-04 中国科学院合肥物质科学研究院 Reactor dynamic interlock system and method based on digital instrumentation and control system
CN107256219A (en) * 2017-04-24 2017-10-17 卡斯柯信号有限公司 Big data convergence analysis method applied to automatic train control system massive logs

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
刘伯鸿 等: "基于神经网络联锁系统故障诊断专家系统的研究", 《铁路计算机应用》 *
叶明全 等: "《深入理解XGBoost 高效机器学习算法与进阶》", 31 January 2020, 合肥:安徽大学出版社 *
李涛 等著: "《数据挖掘的应用与实践:大数据时代的案例分析》", 31 October 2013, 厦门:厦门大学出版社 *
王海艳: "基于数据挖掘技术的铁路车站微机联锁故障诊断系统研究与分析", 《哈尔滨铁道科技》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115185780A (en) * 2022-07-21 2022-10-14 宁夏正安诚意科贸有限公司 Data acquisition method and system based on industrial Internet
CN115185780B (en) * 2022-07-21 2023-10-24 北京国联视讯信息技术股份有限公司 Data acquisition method and system based on industrial Internet

Similar Documents

Publication Publication Date Title
CN103761173A (en) Log based computer system fault diagnosis method and device
CN111027615B (en) Middleware fault early warning method and system based on machine learning
CN107147639A (en) A kind of actual time safety method for early warning based on Complex event processing
CN110490720A (en) Financial data analysis and early warning method, apparatus, computer equipment and storage medium
CN112559376A (en) Automatic positioning method and device for database fault and electronic equipment
CN110688389A (en) Transformer substation secondary equipment defect cloud management system
CN115883163A (en) Network safety alarm monitoring method
CN113806343B (en) Evaluation method and system for Internet of vehicles data quality
CN115719283A (en) Intelligent accounting management system
CN111858270A (en) Interlocking system fault positioning method based on data mining algorithm
CN111078457A (en) Storage fault analysis method and device based on big data
CN107688619A (en) A kind of daily record data processing method and processing device
CN111522705A (en) Intelligent operation and maintenance solution method for industrial big data
CN116302984A (en) Root cause analysis method and device for test task and related equipment
CN114465875B (en) Fault processing method and device
CN110298585B (en) Hierarchical automatic auditing method for monitoring information of substation equipment
CN110008245B (en) Method suitable for searching equipment fault early warning model time period
CN113515560A (en) Vehicle fault analysis method and device, electronic equipment and storage medium
CN112732773A (en) Uniqueness checking method and system for relay protection defect data
CN113553358B (en) Data mining-based power grid equipment invalid data identification method and device
CN112149969B (en) Extra-high voltage direct current control protection merging unit operation state evaluation method and system
CN112488468A (en) Method for constructing multi-dimensional stereo evaluation index system
Niyazmand et al. Identification and clustering of alarm floods in a natural gas processing plant
Joshi et al. Topological Data Analysis Based Feature Selection for Predicting Fatigue Strength of Steel Using Machine Learning
CN114022108A (en) Data-driven device management strategy automatic generation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40032538

Country of ref document: HK

RJ01 Rejection of invention patent application after publication

Application publication date: 20201030

RJ01 Rejection of invention patent application after publication