CN115733896A

CN115733896A - Train data processing and classifying method and device, computer equipment and storage medium

Info

Publication number: CN115733896A
Application number: CN202110991202.7A
Authority: CN
Inventors: 邓健; 杨卫峰; 黄铖; 胡卫民; 杨永滔; 蓝凡; 雷静
Original assignee: Zhuzhou CRRC Times Electric Co Ltd
Current assignee: Zhuzhou CRRC Times Electric Co Ltd
Priority date: 2021-08-26
Filing date: 2021-08-26
Publication date: 2023-03-03

Abstract

The application relates to a train data processing and classifying method, a train data processing and classifying device, computer equipment and a storage medium. The method comprises the following steps: acquiring a real-time data packet sent by train equipment, packaging the real-time data packet into a binary data structure, and writing the binary data structure into a kafka message queue to form an original data stream; reading binary data from an original data stream, analyzing the data into point location information according to a preset rule, and writing the point location information into a kafka message queue to form an analyzed data stream; and acquiring data content from the analyzed data stream, automatically screening data point positions required by different service processing programs, and writing the data point positions into a kafka message queue to form data streams of different service classifications. The method makes the service function modules independent from each other, improves the flexibility of system development and product function assembly, and increases the stability of the background processing system. The data capability required by on-demand subscription is provided, and the complexity of the functional module is obviously reduced.

Description

Train data processing and classifying method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of data processing, and in particular, to a train data processing and classifying method, apparatus, computer device, and storage medium.

Background

Under the background of rapid development of the internet of things and big data, in order to better play the supporting role of the data on the business field, the data acquired from each sensor and module equipment on the train are transmitted back to the ground data center in real time, and the real-time data can be received, analyzed and calculated to obtain the real value of the real-time data. In the field of current train real-time data application, along with the deepening of service mining, the extended service functions are increased, and services such as correlation calculation, logic analysis and the like based on data of whole or partial categories can appear, so that the functions of state monitoring, health diagnosis, fault early warning and the like are achieved.

Due to the fact that the number of sensors and equipment state data of the train is large, the real-time data volume becomes very large under the condition that the number of trains is high and the concurrence is caused, and the data transmission through API (application program interface) calling in the traditional method cannot meet the real-time requirement of data processing. If a centralized streaming processing frame is adopted, all services need to be concentrated into one main program module, so that the coupling among the service modules is too large, and the whole main program module needs to be compiled and released again each time one service module is added or modified, so that the development and debugging complexity can be increased; the main program modules have complicated functions, and any functional module has problems which may cause the main program to be abnormal.

Disclosure of Invention

In view of the above, it is necessary to provide a train data processing and classifying method, apparatus, computer device and storage medium for solving the above technical problems.

In a first aspect, an embodiment of the present invention provides a train data processing and classifying method, where the method includes:

acquiring a real-time data packet sent by train equipment, packaging the real-time data packet into a binary data structure, and writing the binary data structure into a kafka message queue to form an original data stream;

reading binary data from the original data stream, analyzing the data into point location information according to a preset rule, and writing the point location information into a kafka message queue to form an analyzed data stream;

and acquiring data content from the analyzed data stream, automatically screening data point positions required by different service processing programs, and writing the data point positions into a kafka message queue to form data streams of different service classifications.

Further, the obtaining of the real-time data packet sent by the train equipment, encapsulating the real-time data packet into a binary data structure, and writing the binary data structure into a kafka message queue to form an original data stream includes:

receiving a TCP/UDP real-time data packet sent by the train equipment through a network, and verifying, decrypting and decompressing data according to a communication protocol;

packaging the data into a binary data structure of a specific protocol after data verification is carried out, and forming the original data stream through writing of the kafka message queue;

and carrying out data stream transmission and data backup on the original data stream.

Further, the reading binary data from the original data stream, parsing the data into point location information according to a preset rule, and writing the point location information into a kafka message queue to form a parsed data stream, including:

reading and processing the original data stream in a distributed task mode through a spark streaming processing framework based on big data;

according to a preset rule, carrying out data length comparison, data analysis and unit conversion on the original data stream, and forming point location information;

and after the data is processed by the data analysis engine, packaging and writing the data to form an analysis data stream.

Further, the acquiring data content from the parsed data stream, automatically screening data points required by different service processing programs, and writing the data points into the kafka message queue to form data streams of different service classifications includes:

constructing a subscription configuration table according to data point positions and data stream names required by different services and writing corresponding kafka queue topic information;

automatically screening out data point positions required by different service processing programs from the analysis data stream according to data contents through the subscription configuration table;

and configuring the refresh frequency according to the project requirement to automatically refresh the classification of the data, and dynamically adjusting the required data point positions according to the service program.

On the other hand, an embodiment of the present invention further provides a train data processing and classifying system, including:

the train-ground data communication module is used for acquiring a real-time data packet sent by the train equipment, packaging the real-time data packet into a binary data structure, and writing the binary data structure into a kafka message queue to form an original data stream;

the data analysis module is used for reading binary data from the original data stream, analyzing the binary data into point location information according to a preset rule, and writing the point location information into a kafka message queue to form an analysis data stream;

and the data automatic classification module is used for acquiring data contents from the analyzed data stream, automatically screening data point positions required by different service processing programs, and writing the data point positions into the kafka message queue to form data streams of different service classifications.

Further, the train-ground data communication module comprises a preprocessing unit, and the preprocessing unit is used for:

carrying out network reception on the TCP/UDP real-time data packet sent by the train equipment, and carrying out data verification, decryption and decompression according to a communication protocol;

packaging into a binary data structure of a specific protocol after data verification is carried out, and forming the original data stream through writing in the kafka message queue;

Further, the data parsing module includes a distributed framework unit, and the distributed framework unit is configured to:

Further, the data automatic classification module includes a refresh classification unit, and the refresh classification unit is configured to:

An embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

The train data processing and classifying method, the train data processing and classifying device, the computer equipment and the storage medium comprise the following steps: acquiring a real-time data packet sent by train equipment, packaging the real-time data packet into a binary data structure, and writing the binary data structure into a kafka message queue to form an original data stream; reading binary data from the original data stream, analyzing the data into point location information according to a preset rule, and writing the point location information into a kafka message queue to form an analyzed data stream; and acquiring data content from the analyzed data stream, automatically screening data point positions required by different service processing programs, and writing the data point positions into a kafka message queue to form data streams of different service classifications. The background processing architecture with the data stream as the core is adopted, so that the service function modules are mutually independent, the flexibility of system development and product function assembly is improved, and the stability of the background processing system can be improved. And moreover, the high-performance big data message middleware is used as a data stream carrier, so that the effect of low delay of data interaction is ensured under the condition of high concurrence of data. Meanwhile, the data automatic classification module is adopted to provide the data capability required by the subscription of each service processing module according to the requirement, and the complexity of the functional module is obviously reduced.

Drawings

FIG. 1 is a schematic flow chart of a train data processing and classification method according to an embodiment;

FIG. 2 is a schematic flow chart illustrating a preprocessing method in a train-ground data communication process according to an embodiment;

FIG. 3 is a schematic flow diagram of distributed data parsing in one embodiment;

FIG. 4 is a schematic flow chart diagram illustrating the automatic classification step of data in one embodiment;

FIG. 5 is a block diagram of the train data processing and classification system in one embodiment;

FIG. 6 is a diagram of the internal structure of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.

In one embodiment, as shown in fig. 1, there is provided a train data processing and classifying method, including the steps of:

step 101, acquiring a real-time data packet sent by train equipment, packaging the real-time data packet into a binary data structure, and writing the binary data structure into a kafka message queue to form an original data stream;

step 102, reading binary data from the original data stream, analyzing the data into point location information according to a preset rule, and writing the point location information into a kafka message queue to form an analyzed data stream;

and 103, acquiring data contents from the analyzed data stream, automatically screening data point positions required by different service processing programs, and writing the data point positions into a kafka message queue to form data streams of different service classifications.

Specifically, the data processing and classifying method is a real-time train data processing and automatic classifying method based on a data flow architecture, the architecture takes the data flow as a core, and each service function module can directly acquire required data from the data flow to realize the decoupling of service functions; the method takes message middleware based on big data as a data stream carrier, and supports high throughput and low delay data interaction capacity among different program modules; the data automatic classification module designed by the method can automatically configure and screen required data point positions according to the requirements of a service processing program, form a new service data stream for professional program processing, and achieve the capabilities of data subscription and automatic distribution as required; the method can realize independent development and assembly of the service module under the conditions of multiple vehicles and high data concurrency, increase the development flexibility of the function module and enhance the stability of the background processing architecture.

The data processing and classifying method is characterized in that a data stream is used as a core, a big data message middleware Kafka is used as a carrier for the data stream, a topic represents a real-time data stream, an upstream program module of the data stream corresponds to a producer of the Kafka topic, a downstream program module of the data stream corresponds to a consumer of the Kafka topic, and therefore the producer writes data into the topic and the consumer reads data from the topic, and a background data processing system forms a streaming framework based on the data. Kafka is an open-source log subscription system that can be used as messaging middleware to provide producer and consumer patterns.

In one embodiment, as shown in fig. 2, the preprocessing method in the train-ground data communication process includes:

step 201, receiving a TCP/UDP real-time data packet sent by the train equipment through a network, and verifying, decrypting and decompressing data according to a communication protocol;

step 202, after data verification, packaging into a binary data structure of a specific protocol, and forming the original data stream through writing in the kafka message queue;

step 203, performing data stream transmission and data backup on the original data stream.

Specifically, the train-ground data communication program module receives a TCP/UDP real-time data packet sent by the train equipment through the network, and after verification, decryption and decompression are carried out according to a communication protocol, the TCP/UDP real-time data packet is packaged into a binary data structure with a specific protocol, and the binary data structure is written into a kafka message queue to form an original data stream. In the original data stream, the data stream is original binary data, which cannot be used as an analysis, display and other related service systems, but can provide a data source for the original data backup module.

In one embodiment, as shown in fig. 3, the flow of distributed data parsing includes:

step 301, reading and processing the original data stream in a distributed task manner through a spark streaming processing framework based on big data;

step 302, according to a preset rule, performing data length comparison, data analysis and unit conversion on the original data stream, and forming point location information;

step 303, after the processing by the data parsing engine, performing data encapsulation and write streaming to form a parsed data stream.

Specifically, the data parsing program reads binary data from the original data stream, parses the data into point location information (including operations such as data length comparison, data parsing, unit conversion and the like) with specific significance according to corresponding rules, and writes the point location information into the kafka message queue to form a parsed data stream. In order to improve the data processing efficiency, the data analysis module adopts a processing framework based on big data sparkstreaming and realizes the rapid processing of data streams in a distributed task mode. The Spark streaming is a streaming distributed processing framework, is realized based on a Spark memory computing micro-batch processing mode, and has high-efficiency and extensible data processing characteristics. Analyzing the data stream to obtain the Key-Value data after analysis, and data points with specific meanings such as voltage, current and the like. The point location data storage module can be directly used for data analysis, real-time display and point location data storage.

In one embodiment, as shown in fig. 4, the flow of the data automatic classification step includes:

step 401, constructing a subscription configuration table according to data point locations and data stream names required by different services and writing corresponding kafka queue topic information;

step 402, automatically screening out data point locations required by different service processing programs from the analysis data stream according to data contents through the subscription configuration table;

step 403, configuring the refresh frequency according to the project requirement to automatically refresh the classification of the data, and dynamically adjusting the data point location required by the business program according to the business program. And when the system operation instruction is operated for the first time, acquiring a system source code corresponding to a system where the engineering project to be detected is located.

Specifically, in the process of automatic classification, data are acquired from the analyzed data streams, according to a service requirement subscription configuration table (the subscription configuration table contains data point locations and data stream names required by each service and writes corresponding kafka queue topic information), the data point locations required by each service processing program are automatically screened out from the analyzed data streams according to data content and are written into corresponding kafka message queues to form each service data stream, and the module has an automatic data classification rule refreshing function, can configure refreshing frequency according to project requirements, and can realize that the service programs dynamically adjust the required data point locations. The specific service data stream is a new data stream formed by a part of data points screened from the analysis data stream according to the predetermined data points according to the requirements of the service processing program, the data stream is a specific service, and the data point bit quantity contained in the data stream is changed along with the service requirements.

It should be understood that, although the steps in the above-described flowcharts are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the above flowcharts may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or the stages is not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a part of the sub-steps or the stages of other steps.

In one embodiment, as shown in fig. 5, there is provided a train data processing and classification system, including:

the train-ground data communication module 501 is used for acquiring real-time data packets sent by train equipment, packaging the real-time data packets into a binary data structure, and writing the binary data structure into a kafka message queue to form an original data stream;

a data parsing module 502, configured to read binary data from the original data stream, parse the data into point location information according to a preset rule, and write the point location information into a kafka message queue to form a parsed data stream;

and the data automatic classification module 503 is configured to obtain data content from the parsed data stream, automatically screen out data point locations required by different service processing programs, and write the data point locations into the kafka message queue to form data streams of different service classifications.

In one embodiment, as shown in fig. 5, the vehicle-to-ground data communication module 501 includes a preprocessing unit 5011, the preprocessing unit 5011 is configured to:

In one embodiment, as shown in fig. 5, the data parsing module 502 includes a distributed framework unit 5021, and the distributed framework unit 5021 is configured to:

In one embodiment, as shown in fig. 5, the data automatic classification module 503 includes a refresh classification unit 5031, and the refresh classification unit 5031 is configured to:

and configuring a refreshing frequency according to project requirements to automatically refresh the classification of the data, and dynamically adjusting the required data point positions according to a service program.

For specific limitations of the train data processing and classifying system, reference may be made to the above limitations of the train data processing and classifying method, which are not described herein again. All or part of the modules in the train data processing and classifying system can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

FIG. 6 is a diagram illustrating an internal structure of a computer device in one embodiment. As shown in fig. 6, the computer apparatus includes a processor, a memory, a network interface, an input device, and a display screen connected through a system bus. The memory comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may further store a computer program that, when executed by the processor, causes the processor to implement a train data processing classification method. The internal memory may also have a computer program stored therein, which when executed by the processor, causes the processor to perform a train data processing classification method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

In one embodiment, the processor when executing the computer program further performs the steps of:

In one embodiment, the processor, when executing the computer program, further performs the steps of:

according to a preset rule, performing data length comparison, data analysis and unit conversion on the original data stream, and forming point location information;

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, performs the steps of:

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent application shall be subject to the appended claims.

Claims

1. A train data processing and classifying method is characterized by comprising the following steps:

2. The method of claim 1, wherein the obtaining real-time data packets sent by the train equipment, encapsulating the real-time data packets into a binary data structure, and writing the binary data structure into a kafka message queue to form an original data stream comprises:

3. The train data processing and classifying method according to claim 2, wherein the reading of binary data from the original data stream, parsing of the data into point location information according to a preset rule, and writing into a kafka message queue to form a parsed data stream comprises:

4. The train data processing and classifying method according to claim 2, wherein the obtaining data contents from the parsed data stream, automatically screening data point locations required by different service processing programs, and writing the data point locations into a kafka message queue to form data streams of different service classifications comprises:

5. A train data processing and classification system, comprising:

6. The train data processing and classification system of claim 5, wherein the train-to-ground data communication module comprises a preprocessing unit configured to:

7. The train data processing and classification system of claim 5, wherein the data parsing module comprises a distributed framework unit configured to:

8. The train data processing and classification system of claim 5, wherein the data automatic classification module comprises a refresh classification unit configured to:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 4 are implemented when the computer program is executed by the processor.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.