CN110825801A - Train signal system vehicle-mounted log analysis system and method based on distributed architecture - Google Patents

Train signal system vehicle-mounted log analysis system and method based on distributed architecture Download PDF

Info

Publication number
CN110825801A
CN110825801A CN201911076714.XA CN201911076714A CN110825801A CN 110825801 A CN110825801 A CN 110825801A CN 201911076714 A CN201911076714 A CN 201911076714A CN 110825801 A CN110825801 A CN 110825801A
Authority
CN
China
Prior art keywords
data
log
analysis
vehicle
train
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911076714.XA
Other languages
Chinese (zh)
Other versions
CN110825801B (en
Inventor
谢飞
魏盛昕
程浩
李立
张奕男
朱存仁
付朗
杨辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Casco Signal Cherngdu Ltd
Original Assignee
Casco Signal Cherngdu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Casco Signal Cherngdu Ltd filed Critical Casco Signal Cherngdu Ltd
Priority to CN201911076714.XA priority Critical patent/CN110825801B/en
Publication of CN110825801A publication Critical patent/CN110825801A/en
Application granted granted Critical
Publication of CN110825801B publication Critical patent/CN110825801B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/289Object oriented databases

Abstract

The invention discloses a train signal system vehicle-mounted log big data analysis system and method based on a distributed database, and relates to the technical field of train data analysis.

Description

Train signal system vehicle-mounted log analysis system and method based on distributed architecture
Technical Field
The invention relates to the technical field of train data analysis, in particular to a train signal system vehicle-mounted log big data analysis system and method based on a distributed database.
Background
In urban rail transit, a signal system is a key system for guaranteeing driving safety and improving transportation efficiency, and vehicle-mounted equipment of the signal system mainly comprises two subsystems of Automatic Train Operation (ATO) and train overspeed protection (ATP). The log data of the vehicle-mounted equipment records all running states of the vehicle-mounted equipment in the whole process of train running, fault alarm and other key information, and plays an important role in the application and maintenance of the vehicle-mounted equipment.
In the data analysis and application process of the vehicle-mounted log of the existing signal system, the following problems mainly exist:
1. analyzing the mass vehicle-mounted logs: due to the high real-time requirement of the signaling system, each control end (set at the vehicle head position) of the ATO and ATP subsystems will generate 10 log data packets per second. It is estimated that 1 subway line with 40 trains will generate nearly 1 hundred million log data each day, and will accumulate to 300 billions of data sizes in the next 1 year. The traditional relational database has the defects that the requirements of users on analysis and processing of mass log data cannot be completely met due to the limited single-machine storage space and computing capability.
2. The problem of locating faults of the vehicle-mounted equipment is as follows: when a certain vehicle-mounted device or system module breaks down, the reason for analyzing the problem needs to be located, subway companies mainly check and analyze the vehicle-mounted log files with the problem one by one through log checking tools provided by signal system suppliers at present, and the problems of low efficiency, complex process and the like exist in the whole process.
3. Limitations of the on-line monitoring system of the signal equipment: the existing signal equipment on-line monitoring system can only reflect the alarm and fault information of the vehicle-mounted equipment and does not store the complete vehicle-mounted log, so that the alarm and fault information of the vehicle-mounted equipment cannot be intelligently diagnosed and analyzed, and the requirement of a subway company on development of maintenance and repair of the vehicle-mounted equipment towards an intelligent direction cannot be met.
4. Maintenance problems of the wire-mesh level signal system: in the urban rail transit project which is opened and built at present, due to different signal system manufacturers, each line is often provided with a Maintenance Support System (MSS) independently, and a serious information isolated island exists, so that the problem that resources such as maintenance information, maintenance tools and maintenance personnel of each manufacturer cannot be shared is caused, and the quick positioning and quick repairing of the fault of the wire network level signal system cannot be realized.
In the existing signal system vehicle-mounted log analysis solution, an analysis algorithm of a vehicle-mounted log and a log analysis system based on a big data technology are mainly involved.
The analysis algorithm of the vehicle-mounted log mainly comprises two algorithms of pattern recognition and fusion analysis. The pattern recognition algorithm mainly analyzes the vehicle-mounted log (only ATP log), extracts effective state data in the vehicle-mounted log, inputs the effective state into a set behavior pattern for recognition and matching calculation, and finally realizes recognition and prediction of subway faults. The fusion analysis algorithm mainly defines basic data and performs service modeling on analyzable items in mass logs of the train control system, collects log data based on the open and standard principles, preprocesses and stores the log data based on the fusion analysis algorithm rule, finally realizes cross-system log correlation analysis based on a service model, and visually displays the analysis result. Although the algorithm can effectively meet the requirement of vehicle log analysis, a specific analysis platform still adopts a traditional relational database, the operation and storage performance of the vehicle log cannot be guaranteed in the face of massive vehicle logs, and an enterprise needs to build a distributed database according to the actual condition of the enterprise and perform massive data migration work. As in the prior art, a big data fusion analysis method applied to a massive log of an automatic train control system is disclosed, wherein the publication number is CN107256219A, and the publication time is 2017, 10 and 17, and the method comprises the following steps: (1) defining basic data types of service analyzable items in a system log; (2) modeling a system fusion analysis service; (3) realizing a unified log collection process based on an open and standard principle; (4) preprocessing and storing the log data based on the fusion analysis data processing rule; (5) cross-system log association analysis is realized based on a business analysis model; (6) and realizing the visual display of the log analysis result through a uniform interface. Compared with the prior art, the method has the advantages of timely diagnosing the abnormality among the systems, effectively reducing the workload of maintenance personnel and the like.
The log analysis system based on the big data technology is mainly an analysis system of website logs based on a Hadoop platform, the used architecture is as Flume, Hive, HBase, Sqoop and the like, modules including file uploading, data cleaning, data statistical analysis, data exporting, data displaying and the like are included, and millisecond-level query of mass data can be achieved. In the prior art, if the publication number is CN108123834A, and the publication time is 2018, 6 and 5, a log analysis system based on a distributed database is disclosed, which comprises a network data collector, a distributed real-time data transmission channel, a distributed log processing platform, a network data protocol feature library and a distributed database; the main functions and processing flows are as follows: (1) data is transmitted through a distributed real-time data transmission channel; the network data acquisition unit is responsible for acquiring network data packets on the network equipment and transmitting the data packets to the distributed log processing platform in a real-time queue mode through the distributed real-time data transmission channel; (2) the distributed log processing platform processes the data packet in real time; the distributed log processing platform analyzes real-time data of the network data packet, performs data feature matching through a network data protocol feature library, and sends the network log data which is confirmed to be abnormal in matching to the distributed database for storage; (3) the distributed database performs cluster analysis and classification training on the weblog data and dynamically updates a weblog protocol feature library.
Although the Hadoop-based big data analysis platform can also store and analyze massive log data, the performance of processing the structured data of the vehicle-mounted log of the signal system is slightly inferior to that of a distributed database such as Greenplus, and a set of complete Hadoop distributed database is built and finally put into use, so that higher cost is required for enterprises.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a train signal system vehicle-mounted log analysis system based on a distributed architecture, and provides an efficient and good-expansibility solution for offline analysis of vehicle-mounted logs in a massive train signal system.
The purpose of the invention is realized by the following technical scheme:
train signal system vehicle-mounted log analysis system based on distributed architecture, its characterized in that: the system comprises a data acquisition module, a data analysis module, a data storage module, a data cleaning module, a data statistical analysis module and a distributed database.
The data acquisition module and the data analysis module are message middleware based on a Kafka + Zookeeper architecture and are used for acquiring and analyzing vehicle-mounted log data of a train signal system;
in order to solve the problem that the storage efficiency is reduced due to too many data fields, the data analysis module classifies fields which are continuous and belong to the same system module in the data collected by the data collection module during the analysis of the vehicle-mounted log data, all the fields of the same category are combined into 1 large field in sequence, and corresponding data are also combined into one data block.
The data storage module is based on a GPSS (Greenplus Stream Server) + Kafka + gpfdist architecture and is used for recording the vehicle-mounted log data analyzed by the data analysis module into a distributed database to form an original log table.
The data cleaning module is used for processing the original log table after being put in storage, sequentially splitting the combined large fields into original fields in the log data according to a field combination rule when the data analysis module analyzes the vehicle-mounted log data, and converting binary values corresponding to the fields into decimal or Boolean type data corresponding to the binary values to complete numerical conversion.
The data statistical analysis module performs aggregate statistics based on specific train operation events, calculates to obtain corresponding train key operation and maintenance indexes, and finally stores the corresponding train key operation and maintenance indexes in a distributed database; and the data statistical analysis module is used for performing statistical analysis on the cleaned log data to obtain required operation and maintenance indexes, and storing statistical results into the distributed database.
The train specific operation events comprise station entering and stopping, Emergency Braking (EB), beacon loss and train-ground wireless communication faults.
The key operation and maintenance indexes of the train comprise train stop time, train exceeding stop times and average duration, train stop precision and exceeding stop times, Emergency Braking (EB) times and reasons, beacon loss times and train-ground wireless communication fault times.
The distributed database is a distributed cluster based on greenplus, and provides a distributed data storage and calculation platform for the data acquisition module, the data analysis module, the data storage module, the data cleaning module and the data statistical analysis module, namely the distributed database is responsible for storage and calculation of vehicle-mounted log data, comprises a management (Master) node, a calculation (Segment) node and a Standby (Standby) node, and is a basic platform for realizing the five modules.
Folders for storing vehicle-mounted log data are created on the three nodes; the management node is provided with a MySQL database for storing acquisition and analysis records, the MySQL database is started to deploy data acquisition and analysis services in the system, and when the fact that vehicle-mounted log data are uploaded in the web server is detected, the FTP is started to download.
The management node does not store vehicle-mounted log data, is responsible for SQL analysis, forms distributed tasks, collects calculation results and manages other nodes; the computing node and the standby node are responsible for storing vehicle-mounted log data and executing distributed tasks; the storage strategy of the vehicle-mounted log data adopts a random distribution mode in a distributed database, so that data inclination is avoided.
Each node is configured to be 2 CPUs with 8 cores, 32GB memories and 20 SAS hard disks, gigabit network connection is adopted between each node, 2 Primary instances (Primary) and 2 Mirror instances (Mirror) are deployed on each node, cross Mirror configuration is performed between each node, and availability of the distributed cluster is improved.
The data acquisition module and the data analysis module start service after a corresponding log Topic (Topic) is established for each train in a cluster corresponding to the Kafka architecture, and a corresponding GPSS instance is established.
The train signal system vehicle-mounted log analysis method based on the distributed architecture is characterized by comprising the following steps of:
the method comprises the steps of data acquisition, namely, firstly, starting a multi-thread log scanning task for each node in a distributed cluster of a distributed database, and regularly scanning a folder of vehicle-mounted log data of each train on a wire network log server; and when detecting that data update exists in the folder of the vehicle-mounted log data, locking the updated folder, and simultaneously creating an FTP downloading task on the node of the log scanning task to finish the acquisition of the vehicle-mounted log data from the net log server to the local node.
And a data analysis step, namely decompressing the vehicle-mounted log data acquired by the local node in the data acquisition step, analyzing the log data according to a message analysis rule to obtain vehicle-mounted log data classified according to system modules, meanwhile, encapsulating analysis results into a Topic message in a JSON format according to different trains, issuing the Topic message to a Kafka cluster server (broker) of the data analysis module for storage, and effectively judging whether data packet loss exists or not to cause abnormal data analysis by detecting the position and change of a timestamp in the log data in the log analysis process. After the log file data is downloaded successfully, the log file is analyzed through the data analysis module, the analysis result is sent to a server broker of a Kafka cluster in the data analysis module to carry out persistence operation, the broker is a place used for storing message data issued by a producer in the Kafka cluster, each type of message is a topic, and the messages can be deleted automatically once being read by a consumer.
The analysis of the log data according to the message analysis rule is to solve the problem that the storage efficiency is reduced due to too many data fields, classify the fields which are continuous and belong to the same system module in the vehicle-mounted log data collected to the local node, sequentially combine all the fields of the same class into 1 large field, and combine the corresponding data into one data block.
A data warehousing step, in which a vehicle-mounted log data processing task is started, a GPSS serving as a Kafka cluster consumer in the data warehousing module detects the Topic message stored in the Kafka cluster, and if the Topic message is updated, a gpfdist program is started to write log data into an original log table corresponding to a train in a highly-concurrent manner in a readable external table manner; the gpfdist is a Greenplus self-contained concurrent file distribution program, can realize that a plurality of instances are simultaneously and quickly written into a database, and therefore has high concurrency.
And a data cleaning step, namely storing the vehicle-mounted log data subjected to the data storage step in an original log table, sequentially splitting the combined large fields into original fields in the log data by a data cleaning module according to field combination rules when the vehicle-mounted log data are analyzed by a data analysis module, carrying out field splitting, converting binary values corresponding to the fields into decimal or Boolean type data corresponding to the binary values, completing numerical value conversion, and writing the result into a middle log table.
And a data statistical analysis step, namely performing aggregation operation on the intermediate log table after the data cleaning step according to a specific train operation event to obtain key indexes required by the operation and maintenance of the signal system, and storing statistical results into a distributed database.
Compared with the prior art, the technical scheme provided by the invention deploys the data acquisition and analysis tasks to each node in the cluster, and synchronizes the acquisition and analysis tasks to the zookeeper, so that the coordinated management of multiple acquisition and analysis tasks is realized, and the performance bottleneck which may occur when single-machine downloading is adopted is effectively avoided. The method has the advantages that the gpfdist mode provided by Greenplus is adopted to carry out storage operation on the log data, the log data are directly and concurrently loaded through the computing nodes of the distributed database, load balancing is achieved for each node, high concurrent storage of the log data can be achieved, and storage time of a large amount of data is effectively shortened.
The method is a field merging and splitting mechanism aiming at the problem that log data fields exceed the database limit due to too many log data fields, and can effectively improve the data storage efficiency by merging the fields during log analysis and splitting the fields during data cleaning, so that the data storage time is greatly shortened. The method is also an aggregation algorithm for determining the change time of various time sequence variables in the process of train stop, and can accurately obtain the time of occurrence of key events such as train stop, train door/shield door opening command, train door/shield door opening, train door/shield door closing command, train door/shield door closing, departure indicator light lightening, departure button and the like by aggregating fields such as train number, train position, train speed, stop sign and the like and calculating the maximum value or the minimum value when the time sequence variables change, thereby improving the accuracy of statistical data to a certain extent.
Drawings
The foregoing and following detailed description of the invention will be apparent when read in conjunction with the following drawings, in which:
FIG. 1 is a schematic diagram of a train signal system vehicle-mounted log big data analysis system of the present invention;
FIG. 2 is a logic diagram of a train signal system vehicle log big data analysis method;
FIG. 3 is a schematic diagram of a vehicle-mounted log big data analysis distributed database according to the present invention;
FIG. 4 is a logic diagram of a processing method for analyzing a vehicle log according to the present invention.
Detailed Description
The technical solutions for achieving the objects of the present invention are further illustrated by the following specific examples, and it should be noted that the technical solutions claimed in the present invention include, but are not limited to, the following examples.
Example 1
As a most basic implementation scheme of the present invention, as shown in fig. 1, this embodiment discloses a train signal system vehicle-mounted log analysis system based on a distributed architecture, which includes a data acquisition module, a data analysis module, a data storage module, a data cleaning module, a data statistical analysis module, and a distributed database.
The data acquisition module and the data analysis module are message middleware based on a Kafka + Zookeeper architecture and are used for acquiring and analyzing vehicle-mounted log data of a train signal system; in order to solve the problem that the storage efficiency is reduced due to too many data fields, the data analysis module classifies fields which are continuous and belong to the same system module in the data collected by the data collection module during the analysis of the vehicle-mounted log data, all the fields of the same category are combined into 1 large field in sequence, and corresponding data are also combined into one data block.
The data storage module is based on a GPSS (Greenplus Stream Server) + Kafka + gpfdist architecture and is used for recording the vehicle-mounted log data analyzed by the data analysis module into a distributed database to form an original log table.
The data cleaning module is used for processing the original log table after being put in storage, sequentially splitting the combined large fields into original fields in the log data according to a field combination rule when the data analysis module analyzes the vehicle-mounted log data, and converting binary values corresponding to the fields into decimal or Boolean type data corresponding to the binary values to complete numerical conversion.
The data statistical analysis module performs aggregate statistics based on specific train operation events, calculates to obtain corresponding train key operation and maintenance indexes, and finally stores the corresponding train key operation and maintenance indexes in a distributed database; and the data statistical analysis module is used for performing statistical analysis on the cleaned log data to obtain required operation and maintenance indexes, and storing statistical results into the distributed database.
The train specific operation events comprise station entering and stopping, Emergency Braking (EB), beacon loss and train-ground wireless communication faults. The key operation and maintenance indexes of the train comprise train stop time, train exceeding stop times and average duration, train stop precision and exceeding stop times, Emergency Braking (EB) times and reasons, beacon loss times and train-ground wireless communication fault times.
The distributed database is a distributed cluster based on greenplus, and provides a distributed data storage and calculation platform for the data acquisition module, the data analysis module, the data storage module, the data cleaning module and the data statistical analysis module, namely the distributed database is responsible for storing and calculating vehicle-mounted log data, and as shown in fig. 3, the distributed database comprises 1 management (Master) node, 2 calculation (Segment) nodes and 1 Standby (Standby) node, and is a basic platform for realizing the five modules.
Folders for storing vehicle-mounted log data are created on the three nodes; the management node is provided with a MySQL database for storing acquisition and analysis records, the MySQL database is started to deploy data acquisition and analysis services in the system, and when the fact that vehicle-mounted log data are uploaded in the web server is detected, the FTP is started to download.
The management node does not store vehicle-mounted log data, is responsible for SQL analysis, forms distributed tasks, collects calculation results and manages other nodes; the computing node and the standby node are responsible for storing vehicle-mounted log data and executing distributed tasks; the storage strategy of the vehicle-mounted log data adopts a random distribution mode in a distributed database, so that data inclination is avoided.
As shown in fig. 3, each node is configured as 2 CPUs with 8 cores, a 32GB memory, and 20 SAS hard disks, gigabit network connection is adopted between each node, and 2 Primary instances (Primary) and 2 Mirror instances (Mirror) are deployed on each node, and cross Mirror configuration is performed between each node, so that availability of the distributed cluster is improved.
The data acquisition module and the data analysis module start service after a corresponding log Topic (Topic) is established for each train in a cluster corresponding to the Kafka architecture, and a corresponding GPSS instance is established.
In addition, the embodiment also discloses a train signal system vehicle log analysis method based on the system, as shown in fig. 2, including the following steps:
the method comprises the steps of data acquisition, namely, firstly, starting a multi-thread log scanning task for each node in a distributed cluster of a distributed database, and regularly scanning a folder of vehicle-mounted log data of each train on a wire network log server; and when detecting that data update exists in the folder of the vehicle-mounted log data, locking the updated folder, and simultaneously creating an FTP downloading task on the node of the log scanning task to finish the acquisition of the vehicle-mounted log data from the net log server to the local node.
And a data analysis step, namely decompressing the vehicle-mounted log data acquired by the local node in the data acquisition step, analyzing the log data according to a message analysis rule to obtain vehicle-mounted log data classified according to system modules, meanwhile, encapsulating analysis results into a Topic message in a JSON format according to different trains, issuing the Topic message to a Kafka cluster server (broker) of the data analysis module for storage, and effectively judging whether data packet loss exists or not to cause abnormal data analysis by detecting the position and change of a timestamp in the log data in the log analysis process. After the log file data is downloaded successfully, the log file is analyzed through the data analysis module, the analysis result is sent to a server broker of a Kafka cluster in the data analysis module to carry out persistence operation, the broker is a place used for storing message data issued by a producer in the Kafka cluster, each type of message is a topic, and the messages can be deleted automatically once being read by a consumer.
The analysis of the log data according to the message analysis rule is to solve the problem that the storage efficiency is reduced due to too many data fields, classify the fields which are continuous and belong to the same system module in the vehicle-mounted log data collected to the local node, sequentially combine all the fields of the same class into 1 large field, and combine the corresponding data into one data block.
A data warehousing step, in which a vehicle-mounted log data processing task is started, a GPSS serving as a Kafka cluster consumer in the data warehousing module detects the Topic message stored in the Kafka cluster, and if the Topic message is updated, a gpfdist program is started to write log data into an original log table corresponding to a train in a highly-concurrent manner in a readable external table manner; the gpfdist is a Greenplus self-contained concurrent file distribution program, can realize that a plurality of instances are simultaneously and quickly written into a database, and therefore has high concurrency.
And a data cleaning step, namely storing the vehicle-mounted log data subjected to the data storage step in an original log table, sequentially splitting the combined large fields into original fields in the log data by a data cleaning module according to field combination rules when the vehicle-mounted log data are analyzed by a data analysis module, carrying out field splitting, converting binary values corresponding to the fields into decimal or Boolean type data corresponding to the binary values, completing numerical value conversion, and writing the result into a middle log table.
And a data statistical analysis step, namely performing aggregation operation on the intermediate log table after the data cleaning step according to a specific train operation event to obtain key indexes required by the operation and maintenance of the signal system, and storing statistical results into a distributed database.
Example 2
As a preferred and specific implementation scheme of the technical solution of the present invention, the present embodiment discloses a train signal system vehicle-mounted log analysis system based on a distributed architecture, as shown in fig. 1, which includes a data acquisition module, a data analysis module, a data storage module, a data cleaning module, a data statistical analysis module, and a distributed database.
The data acquisition module is used for downloading the vehicle-mounted log file, firstly, the data acquisition module downloads the log file in an FTP mode, and then, the log file is stored in the distributed database; the log files are vehicle-mounted logs of a subway signal system and comprise ATP log files and ATO log files, and the log files are in a binary format.
The data analysis module is used for analyzing the log files downloaded into the distributed database, packaging the analyzed data into JSON data according to the requirement of data storage, persisting the JSON data into a Broker of the Kafka cluster, and processing the log files which do not meet the requirement, wherein the log files which do not meet the requirement mainly comprise three types of files such as damaged files, missing data packets and abnormal data contents; the JSON data is organized according to mapping fields of the Topic configuration files in the Kafka cluster.
And the data storage module is used for reading JSON data in the Broker of the Kafka cluster and writing the JSON data into an original log table corresponding to the distributed database.
And the data cleaning module is used for cleaning and converting the data in the original log table, and storing the cleaned data into the corresponding intermediate data table. The data cleaning comprises field splitting, numerical value conversion, effective data screening and the like, and the effective data screening conditions comprise whether the data is a Master control terminal (Master CC-Core) or not and whether the data is operated on the line or not.
And the data statistical analysis module is used for performing cluster analysis on the intermediate data table based on the specific train operation event to obtain the required operation and maintenance index and storing the statistical data into the distributed database. Train specific operational events include parking, Emergency Braking (EB), beacon loss, train-ground wireless communication failure, and the like.
The distributed database is a distributed cluster based on greenplus, is responsible for storing and calculating all data, and is a basic platform for realizing the five modules. As shown in fig. 3, the platform is mainly composed of 4 nodes, including 1 management node, 2 computing nodes, and 1 standby node. Each node is configured with 2 CPUs with 8 cores, 32GB memory and 20 SAS hard disks, and gigabit network connection is adopted among the nodes. Each node is provided with 2 main instances (Primary) and 2 Mirror instances (Mirror), and cross Mirror configuration is performed among the nodes, so that the availability of the distributed cluster is improved.
The construction process of the vehicle-mounted log analysis system comprises the following steps.
The first step is as follows: a distributed cluster platform based on Greenplus is built, and the distributed cluster platform comprises the following three nodes: the system comprises a management node, a computing node and a standby node, wherein the computing node and the standby node are data nodes, and a greenplus service is started on the management node.
The second step is that: firstly building a Zookeeper cluster in a platform, then building a Kafka cluster, thereby forming a set of distributed task collaborative management cluster, and respectively building corresponding Topic for ATO and ATP logs of each train in the cluster. Firstly starting the Zookeeper cluster service, and then starting the Kafka cluster service.
The third step: creating a folder for storing vehicle-mounted logs for each train on all computing nodes in the platform, wherein the directory structure of the folder is consistent with the file directory structure (/ FTPOMAP/date/train number /) in a network server, deploying a MySQL database on a management node, and then deploying data acquisition and analysis service in the platform and starting. And once the data scanning task detects that the vehicle-mounted log file is uploaded in the web server, starting an FTP downloading task for downloading, and locking the file to prevent the file from being downloaded by other threads.
The fourth step: after the log file is downloaded successfully, a data analysis task is started to analyze the log file, the analyzed data is packaged into a JSON format and sent to a server (broker) of the Kafka cluster for persistence.
The fifth step: after detecting that a new message is issued to the Kafka cluster, the GPSS instance reads the log data stored in the corresponding Topic, and persists the log data to the original log table in the platform in a gpfdist mode. Original log tables including ATO _ log _ line name _ train number and ATP _ log _ line name _ train number need to be created for ATO and ATP logs of each train, respectively, and partitions (partitions) need to be created for the original log tables on a daily basis due to large data volume.
And a sixth step: after the log data are put in storage, a data cleaning module is started to carry out field splitting and numerical conversion on the original log table, and the cleaned and converted data are written into an intermediate data table. Intermediate data tables including ATO _ log _ mid _ line name _ train number and ATP _ log _ mid _ line name _ train number need to be created separately for the ATO and ATP logs of each train.
The seventh step: and after the data is cleaned, performing statistical analysis on the data by using a data statistical analysis module, and writing the statistical result into a corresponding statistical table for front-end display and calling.
Further, based on the system, as shown in fig. 2, an analysis method of a train signal system vehicle-mounted log analysis system based on a distributed architecture includes the following steps:
step 1, data acquisition, namely firstly starting a multi-thread data acquisition task of each node in a distributed cluster, scanning a log folder of each train on a wire mesh log server at regular time, and triggering an FTP (file transfer protocol) download task on the node to finish acquisition of the log file if the log file update of a certain train is detected; and the downloaded log file name is written into a log record table in the MySQL database to prevent repeated downloading.
And 2, analyzing the data, starting a data analysis task after the log file is successfully downloaded, decompressing the log file, analyzing according to a message protocol of the log file, packaging an analysis result into a JSON file as required, and issuing the JSON file to a broker of the Kafka cluster for storage.
Corresponding Topic messages need to be created in advance for the vehicle-mounted logs of each train in the Kafka cluster, the content of the Topic messages is consistent with the fields in the JSON files, field merging processing is carried out in the data packaging process, and the analyzed log file names are written into an analysis record table in the MySQL database to prevent repeated analysis.
And 3, warehousing data, starting a log data processing task, consuming log data corresponding to Topic in the Kafka cluster in real time by the GPSS instance of each train, completing high concurrency writing of the log data into an original log table of the distributed database on each computing node in a gpfdist mode, and establishing corresponding GPSS instances in advance aiming at different trains.
And 4, cleaning data, after the log data are put into a warehouse, performing field splitting and numerical value conversion on the original log table, screening useful data fields and effective log data in the original log according to the actual operation scene of the train, and storing the cleaned and converted data into an intermediate data table in the distributed database.
And 5, performing data statistics and analysis, performing aggregation operation on the cleaned log data according to the specific operation events of the train to obtain key indexes required by maintenance decision of the subway company, such as train stop time, train exceeding stop times and average duration, train stop precision and exceeding stop times, Emergency Braking (EB) times and reasons, beacon loss times, train-ground wireless communication fault times and the like, and storing the result data into a distributed database.

Claims (9)

1. Train signal system vehicle-mounted log analysis system based on distributed architecture, its characterized in that: the system comprises a data acquisition module, a data analysis module, a data storage module, a data cleaning module, a data statistical analysis module and a distributed database;
the data acquisition module and the data analysis module are message middleware based on a Kafka + Zookeeper architecture and are used for acquiring and analyzing vehicle-mounted log data of a train signal system;
the data storage module is based on a GPSS (Greenplus Stream Server) + Kafka + gpfdist architecture and is used for recording the vehicle-mounted log data analyzed by the data analysis module into a distributed database to form an original log table;
the data cleaning module is used for processing the original log table after being put in storage, sequentially splitting the combined large fields into original fields in the log data according to a field combination rule when the data analysis module analyzes the vehicle-mounted log data, and converting binary values corresponding to the fields into decimal or Boolean type data corresponding to the binary values to complete numerical value conversion;
the data statistical analysis module performs aggregate statistics based on specific train operation events, calculates to obtain corresponding train key operation and maintenance indexes, and finally stores the corresponding train key operation and maintenance indexes in a distributed database; the data statistical analysis module is used for performing statistical analysis on the cleaned log data to obtain required operation and maintenance indexes, and storing statistical results into a distributed database;
the distributed database is a distributed cluster based on greenplus, provides a distributed data storage and calculation platform for the data acquisition module, the data analysis module, the data storage module, the data cleaning module and the data statistical analysis module, and comprises a management node, a calculation node and a standby node.
2. The train signal system on-board log analysis system based on the distributed architecture of claim 1, wherein: and the data analysis module classifies fields which are continuous and belong to the same system module in the data acquired by the data acquisition module during vehicle-mounted log data analysis, all the fields of the same class are combined into 1 large field in sequence, and corresponding data are combined into one data block.
3. The train signal system on-board log analysis system based on the distributed architecture of claim 1, wherein: the train specific operation events comprise station entering and stopping, Emergency Braking (EB), beacon loss and train-ground wireless communication faults.
4. The train signal system onboard log analysis system based on the distributed architecture of claim 1 or 3, wherein: the key operation and maintenance indexes of the train comprise train stop time, train exceeding stop times and average duration, train stop precision and exceeding stop times, Emergency Braking (EB) times and reasons, beacon loss times and train-ground wireless communication fault times.
5. The train signal system on-board log analysis system based on the distributed architecture of claim 1, wherein: folders for storing vehicle-mounted log data are created on three nodes of the distributed database; the management node is provided with a MySQL database for storing acquisition and analysis records, the MySQL database is started to deploy data acquisition and analysis services in the system, and when the fact that vehicle-mounted log data are uploaded in the web server is detected, the FTP is started to download.
6. The train signal system on-board log analysis system based on the distributed architecture of claim 5, wherein: the management node is responsible for SQL analysis, distributed tasks are formed, calculation results are collected, and other nodes are managed; the computing node and the standby node are responsible for storing vehicle-mounted log data and executing distributed tasks; the storage strategy of the vehicle-mounted log data adopts a random distribution mode in a distributed database.
7. The train signal system onboard log analysis system based on a distributed architecture of claim 1, 5 or 6, wherein: each node is configured with 2 CPUs with 8 cores, 32GB memories and 20 SAS hard disks, gigabit network connection is adopted between each node, 2 main instances and 2 mirror instances are deployed on each node, and cross mirror configuration is carried out between each node.
8. The analysis method of the train signal system vehicle-mounted log analysis system based on the distributed architecture as claimed in claim 1, characterized by comprising the following steps:
the method comprises the steps of data acquisition, namely, firstly, starting a multi-thread log scanning task for each node in a distributed cluster of a distributed database, and regularly scanning a folder of vehicle-mounted log data of each train on a wire network log server; when detecting that the data in the folder of the vehicle-mounted log data is updated, locking the updated folder, and simultaneously creating an FTP downloading task on the node of the log scanning task to finish the acquisition of the vehicle-mounted log data from the net log server to the local node;
a data analysis step, namely decompressing the vehicle-mounted log data acquired by the local node in the data acquisition step, analyzing the log data according to a message analysis rule to obtain vehicle-mounted log data classified according to system modules, meanwhile, encapsulating analysis results into a Topic message in a JSON format according to different trains, issuing the Topic message to a Kafka cluster server (broker) of the data analysis module for storage, and judging whether data packet loss exists or not by detecting the position of a timestamp in the log data and whether the timestamp changes or not in the log analysis process so as to cause abnormal data analysis;
a data warehousing step, in which a vehicle-mounted log data processing task is started, a GPSS serving as a Kafka cluster consumer in the data warehousing module detects the Topic message stored in the Kafka cluster, and if the Topic message is updated, a gpfdist program is started to write log data into an original log table corresponding to a train in a highly-concurrent manner in a readable external table manner;
a data cleaning step, namely storing the vehicle-mounted log data subjected to the data storage step in an original log table, sequentially splitting the combined large fields into original fields in the log data by a data cleaning module according to field combination rules when the vehicle-mounted log data are analyzed by a data analysis module, carrying out field splitting, converting binary values corresponding to the fields into decimal or Boolean type data corresponding to the binary values, completing numerical value conversion, and writing the result into a middle log table;
and a data statistical analysis step, namely performing aggregation operation on the intermediate log table after the data cleaning step according to a specific train operation event to obtain key indexes required by the operation and maintenance of the signal system, and storing statistical results into a distributed database.
9. The analysis method of the train signal system vehicle-mounted log analysis system based on the distributed architecture as claimed in claim 8, wherein: the analysis of the log data according to the message analysis rule is to classify fields which are continuous and belong to the same system module in the vehicle-mounted log data collected to the local node, combine all the fields of the same category into 1 large field in sequence, and combine the corresponding data into one data block.
CN201911076714.XA 2019-11-06 2019-11-06 Train signal system vehicle-mounted log analysis system and method based on distributed architecture Active CN110825801B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911076714.XA CN110825801B (en) 2019-11-06 2019-11-06 Train signal system vehicle-mounted log analysis system and method based on distributed architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911076714.XA CN110825801B (en) 2019-11-06 2019-11-06 Train signal system vehicle-mounted log analysis system and method based on distributed architecture

Publications (2)

Publication Number Publication Date
CN110825801A true CN110825801A (en) 2020-02-21
CN110825801B CN110825801B (en) 2023-03-10

Family

ID=69553214

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911076714.XA Active CN110825801B (en) 2019-11-06 2019-11-06 Train signal system vehicle-mounted log analysis system and method based on distributed architecture

Country Status (1)

Country Link
CN (1) CN110825801B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111479140A (en) * 2020-03-27 2020-07-31 深圳市酷开网络科技有限公司 Data acquisition method, data acquisition device, computer device and storage medium
CN111611219A (en) * 2020-06-02 2020-09-01 卡斯柯信号(成都)有限公司 Intelligent analysis system and method for tramcar logs
CN111930835A (en) * 2020-07-16 2020-11-13 广州运达智能科技有限公司 Intelligent operation and maintenance big data management system and method for urban rail transit
CN112214974A (en) * 2020-10-10 2021-01-12 卡斯柯信号有限公司 Intelligent rail crossing oriented alarm information merging analysis processing method
CN112258690A (en) * 2020-10-23 2021-01-22 中车青岛四方机车车辆股份有限公司 Data access method and device and data storage method and device
CN112288907A (en) * 2020-10-28 2021-01-29 山东超越数控电子股份有限公司 Vehicle real-time monitoring method
CN112699172A (en) * 2021-01-06 2021-04-23 中车青岛四方机车车辆股份有限公司 Data processing method and device for railway vehicle
CN113836212A (en) * 2021-09-27 2021-12-24 易保网络技术(上海)有限公司 Method for automatically generating Json data by database data, readable medium and electronic equipment
CN113844505A (en) * 2021-08-31 2021-12-28 通号城市轨道交通技术有限公司 Train log processing method and device
CN114007244A (en) * 2021-11-03 2022-02-01 广州地铁集团有限公司 Method for analyzing train-ground communication quality
CN114189834A (en) * 2022-02-17 2022-03-15 成都市以太节点科技有限公司 Rail transit vehicle-ground communication supervision system and method
CN114394128A (en) * 2022-01-27 2022-04-26 中铁第四勘察设计院集团有限公司 Train control method and system, vehicle-mounted subsystem and trackside resource management subsystem
WO2023093257A1 (en) * 2021-11-26 2023-06-01 华为技术有限公司 Modeling method for self-driving service and related device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7484008B1 (en) * 1999-10-06 2009-01-27 Borgia/Cummins, Llc Apparatus for vehicle internetworks
US20120166390A1 (en) * 2010-12-23 2012-06-28 Dwight Merriman Method and apparatus for maintaining replica sets
US20160253340A1 (en) * 2015-02-27 2016-09-01 Podium Data, Inc. Data management platform using metadata repository
CN106484709A (en) * 2015-08-26 2017-03-08 北京神州泰岳软件股份有限公司 A kind of auditing method of daily record data and audit device
CN106709003A (en) * 2016-12-23 2017-05-24 长沙理工大学 Hadoop-based mass log data processing method
CN107256219A (en) * 2017-04-24 2017-10-17 卡斯柯信号有限公司 Big data convergence analysis method applied to automatic train control system massive logs
CN109614395A (en) * 2018-12-17 2019-04-12 广州数园网络有限公司 Data processing platform (DPP) and method
CN109712681A (en) * 2018-12-21 2019-05-03 河海大学常州校区 A kind of vehicle-mounted analysis system based on sign big data

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7484008B1 (en) * 1999-10-06 2009-01-27 Borgia/Cummins, Llc Apparatus for vehicle internetworks
US20120166390A1 (en) * 2010-12-23 2012-06-28 Dwight Merriman Method and apparatus for maintaining replica sets
US20160253340A1 (en) * 2015-02-27 2016-09-01 Podium Data, Inc. Data management platform using metadata repository
CN106484709A (en) * 2015-08-26 2017-03-08 北京神州泰岳软件股份有限公司 A kind of auditing method of daily record data and audit device
CN106709003A (en) * 2016-12-23 2017-05-24 长沙理工大学 Hadoop-based mass log data processing method
CN107256219A (en) * 2017-04-24 2017-10-17 卡斯柯信号有限公司 Big data convergence analysis method applied to automatic train control system massive logs
CN109614395A (en) * 2018-12-17 2019-04-12 广州数园网络有限公司 Data processing platform (DPP) and method
CN109712681A (en) * 2018-12-21 2019-05-03 河海大学常州校区 A kind of vehicle-mounted analysis system based on sign big data

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
LEE Y ET AL: "An Internet traffic analysis method with MapReduce", 《PROCEEDINGSOFTHE2010IEEE/IFIPNETWORKOPERATIONSANDMANAGEMENT SYMPOSIUMWORKSHOPS》 *
刘真等: "云计算模型在铁路大规模数据处理中的应用", 《北京交通大学学报》 *
朱斌: "基于Hadoop的日志统计分析系统的设计与实现", 《中国优秀硕士学位论文全文数据库》 *
李媛: "基于Hadoop平台的地铁运营日志采集与预处理系统的设计与实现", 《中国优秀硕士学位论文全文数据》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111479140B (en) * 2020-03-27 2022-07-05 深圳市酷开网络科技股份有限公司 Data acquisition method, data acquisition device, computer device and storage medium
CN111479140A (en) * 2020-03-27 2020-07-31 深圳市酷开网络科技有限公司 Data acquisition method, data acquisition device, computer device and storage medium
CN111611219A (en) * 2020-06-02 2020-09-01 卡斯柯信号(成都)有限公司 Intelligent analysis system and method for tramcar logs
CN111930835A (en) * 2020-07-16 2020-11-13 广州运达智能科技有限公司 Intelligent operation and maintenance big data management system and method for urban rail transit
CN111930835B (en) * 2020-07-16 2022-11-22 广州运达智能科技有限公司 Intelligent operation and maintenance big data management system and method for urban rail transit
CN112214974A (en) * 2020-10-10 2021-01-12 卡斯柯信号有限公司 Intelligent rail crossing oriented alarm information merging analysis processing method
CN112214974B (en) * 2020-10-10 2022-08-30 卡斯柯信号有限公司 Intelligent rail crossing oriented alarm information merging analysis processing method
CN112258690A (en) * 2020-10-23 2021-01-22 中车青岛四方机车车辆股份有限公司 Data access method and device and data storage method and device
CN112288907A (en) * 2020-10-28 2021-01-29 山东超越数控电子股份有限公司 Vehicle real-time monitoring method
CN112699172A (en) * 2021-01-06 2021-04-23 中车青岛四方机车车辆股份有限公司 Data processing method and device for railway vehicle
CN113844505A (en) * 2021-08-31 2021-12-28 通号城市轨道交通技术有限公司 Train log processing method and device
CN113836212A (en) * 2021-09-27 2021-12-24 易保网络技术(上海)有限公司 Method for automatically generating Json data by database data, readable medium and electronic equipment
CN113836212B (en) * 2021-09-27 2023-09-08 易保网络技术(上海)有限公司 Method for automatically generating Json data by database data, readable medium and electronic equipment
CN114007244A (en) * 2021-11-03 2022-02-01 广州地铁集团有限公司 Method for analyzing train-ground communication quality
WO2023093257A1 (en) * 2021-11-26 2023-06-01 华为技术有限公司 Modeling method for self-driving service and related device
CN114394128A (en) * 2022-01-27 2022-04-26 中铁第四勘察设计院集团有限公司 Train control method and system, vehicle-mounted subsystem and trackside resource management subsystem
CN114394128B (en) * 2022-01-27 2023-09-05 中铁第四勘察设计院集团有限公司 Train control method and system, vehicle-mounted subsystem and trackside resource management subsystem
CN114189834B (en) * 2022-02-17 2022-04-19 成都市以太节点科技有限公司 Rail transit vehicle-ground communication supervision system and method
CN114189834A (en) * 2022-02-17 2022-03-15 成都市以太节点科技有限公司 Rail transit vehicle-ground communication supervision system and method

Also Published As

Publication number Publication date
CN110825801B (en) 2023-03-10

Similar Documents

Publication Publication Date Title
CN110825801B (en) Train signal system vehicle-mounted log analysis system and method based on distributed architecture
CN107256219B (en) Big data fusion analysis method applied to mass logs of automatic train control system
CN108335075B (en) Logistics big data oriented processing system and method
CN107341595B (en) Public service platform for vehicle dynamic information
CN112199430A (en) Business data processing system and method based on data middling station
CN113179173B (en) Operation and maintenance monitoring system for expressway system
CN108398934B (en) equipment fault monitoring system for rail transit
CN111858251B (en) Data security audit method and system based on big data computing technology
CN108228378A (en) The data processing method and device of train groups failure predication
CN111462351A (en) Vehicle driving data processing method, system and equipment based on automatic driving platform
CN116842055A (en) System and method for integrated processing of internet of things data batch flow
CN104991939A (en) Transaction data monitoring method and system
CN109739912A (en) Data analysing method and system
CN103049365A (en) Monitoring and evaluating method for information and application resource operating states
CN107562538A (en) Data pick-up multitask management process and system in railway traffic statistics
CN114312913A (en) Fault propagation monitoring method and system for rail transit vehicle-mounted equipment
CN109445304A (en) A kind of intelligent fault analysis system and method based on cab signal
CN105515192A (en) Monitoring and early warning system and method for power transmission and transformation equipment load data access electric power system
CN115567563B (en) Comprehensive transportation hub monitoring and early warning system based on end edge cloud and control method thereof
CN115689277A (en) Chemical industry park risk early warning system under cloud limit collaborative technology
CN110458455A (en) A kind of employee's adjustmenting management method and system
CN116027754A (en) Production equipment operation and maintenance method, equipment and medium based on active identification carrier
CN111090698B (en) Alarm synchronization method and device for centralized monitoring of distribution network 1+ N system
CN112732729A (en) Scenic spot resource management system and method based on three-dimensional Gis
CN115374101A (en) Rail transit station level data management system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant