CN113625264A

CN113625264A - Method and system for parallel processing of railway detection big data

Info

Publication number: CN113625264A
Application number: CN202110665774.6A
Authority: CN
Inventors: 杜翠; 张千里; 陈锋; 刘杰; 程远水; 郭浏卉; 张栋; 张新冈; 刘景宇; 邓逆涛
Original assignee: China Academy of Railway Sciences Corp Ltd CARS; Railway Engineering Research Institute of CARS
Current assignee: China Academy of Railway Sciences Corp Ltd CARS; Railway Engineering Research Institute of CARS
Priority date: 2021-06-16
Filing date: 2021-06-16
Publication date: 2021-11-09

Abstract

The invention provides a method and a system for processing railway detection big data in parallel, wherein the method divides the collected detection data and stores the divided detection data according to a set storage strategy; the method comprises the steps of creating a combination of a plurality of signal processing algorithms required by detection data as a conventional workflow and an iterative workflow according to engineering property requirements and interpretation target requirements of the detection data, then utilizing a designed parallel node architecture to parallelly call the stored detection data, and respectively realizing parallel processing based on the conventional operation and the iterative operation according to a scheduling strategy of a matched workflow. By adopting the scheme, the problems of unbalanced operation load and easy exclusive operation module memory in the existing data processing technology are solved, the load balancing performance and the cluster resource utilization rate in the processing process are improved based on the specific parallel node architecture and by combining formulated different scheduling strategies, and the processing requirement of large-scale detection data at present and in the future can be met.

Description

Method and system for parallel processing of railway detection big data

Technical Field

The invention relates to the technical field of cluster resource processing and optimization, in particular to a method and a system for parallel processing of railway detection big data.

Background

Railway transportation brings convenience and support for life and work of common people, but guaranteeing safe operation of railways is also a core direction which needs to be always paid attention now and in the future, and with the rapid increase of railway operation mileage in China, how to rapidly, accurately and nondestructively detect infrastructure of increasingly huge railway networks and grasp the health state of the infrastructure in time becomes a major issue to be solved urgently at present.

Ground Penetrating radar (gpr) is a geophysical method for detecting the characteristics and distribution rules of substances inside a medium by using antennas to transmit and receive high-frequency electromagnetic waves, information related to the physical properties of the underground medium can be obtained by researching the change of the polarization mode of radar waves, the Ground Penetrating radar can be used for detection in various modes during application, correspondingly, the obtained detection data has large type complexity and scale, and needs to be processed and analyzed by adopting a set strategy.

In the prior art, research is conducted on processing means of ground penetrating radar detection data, but most of the technologies are based on single computing nodes, and related algorithms are often highly serialized and are only suitable for processing small-scale GPR data sets. With the rapid expansion of the data scale of the current ground penetrating radar, part of technicians also perform optimization research of the existing data processing technology, based on the technology, the capacity of rapidly processing large-scale data is improved by means of the processing capacity of massive high-speed parallelization of the large-data cloud computing technology, the distributed cluster management data based on Hadoop specifically adopts the relational database cluster of the distributed file systems HDFS and MySQL in combination with Hbase to solve the massive storage and efficient access of the structured data, thus, although the expandability and the portability of cluster resources are guaranteed, the data in various formats can be stored, but the problem of repeated reading and writing exists in the process of preprocessing or subsequent processing the radar data, the load balancing problem is easily caused by the overlarge difference of the calculated amount of different parallel tasks, and when the algorithm steps in a workflow are complicated, the parallel computing nodes can not normally operate and the processing requirement of large-scale ground penetrating radar detection data can not be met. Therefore, it is desirable to provide an efficient and reasonable data parallel processing method that can meet the application requirements.

The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.

The rapid development of a new generation of high-performance hardware architecture system creates a new opportunity for the development of rapid processing of massive GPR data. A large amount of high-precision and large-area GPR detection data can be processed by utilizing a parallel technology, so that the processing efficiency is greatly improved.

Disclosure of Invention

To solve the above problems, the present invention provides a method for parallel processing of big railway detection data, which in one embodiment comprises:

the data storage step, dividing the acquired detection data, and storing the divided detection data according to a set storage strategy;

the method comprises the steps of establishing an algorithm combination, establishing a corresponding operation workflow according to engineering property requirements and target interpretation requirements of detection data, wherein the operation workflow is a combination of a plurality of signal processing algorithms required by the detection data and comprises a conventional workflow and an iterative workflow;

and a parallel processing step, parallelly calling the stored detection data based on a pre-established parallel node architecture, and respectively realizing parallel processing based on conventional operation and iterative operation according to a scheduling strategy matched with the operation workflow.

Preferably, in the data storage step, the railway detection data is divided according to the acquisition area and the acquisition equipment type to obtain detection data of each level with different processing priorities.

Further, in one embodiment, when storing the divided detection data, the parallel storage of the detection data in various formats is realized by adopting various storage modes of a relational database and a distributed database and combining priority information;

the data body of the detection data file comprises track head data and data content, the track head data of each detection data and the file head are stored in a relational database in an associated mode, and the data content of each detection data is stored in a distributed database.

Further, in a preferred embodiment, the method further comprises: and a node architecture creating step, namely setting a main node as a task scheduling node, and forming a parallel node architecture together with a read-write node and N computing nodes.

Further, in one embodiment, in the parallel processing step, the parallel processing based on the normal class operation is implemented according to the following steps:

the task scheduling node judges whether the detection data needing to be processed exist or not, if yes, the total track number x of the detection data needing to be processed is counted, and the read-write node and the computing node are started;

sequentially and circularly executing the following steps until all the detection data to be processed are operated, and releasing the read-write nodes and the computing nodes:

reading x/K channels by a read-write node as the processing data of the current round based on a preset scheduling base number K; the task scheduling node averagely divides the processing data of the current round into N parts and distributes the N parts to each computing node;

each computing node executes operation in parallel according to the conventional workflow;

the parallel scheduling node confirms whether the processed detection data which is calculated is received or not, if the processed detection data is received, the processed detection data is transmitted to the read-write node, and the read-write node integrates the processed detection data according to the corresponding space structure and writes the integrated detection data into a storage area;

and K is a positive integer, K is more than or equal to 1 and less than or equal to N, or the optimization is further carried out by combining the channel number S corresponding to the unit marking data, and the scheduling base number K is determined.

Further, in one embodiment, the parallel processing based on the iterative class operation is implemented according to the following steps:

setting a scheduling base number K according to hardware resources, x and the size of a calculation unit of the iterative algorithm, and reading x/K channels by a read-write node to serve as first round of processing data; the task scheduling node averagely divides the processing data of the current round into N parts and distributes the N parts to each computing node;

and then, sequentially and circularly executing the following steps until all the detection data to be processed are operated, and releasing the read-write nodes and the computing nodes:

the task scheduling node counts the number n of idle computing nodes in real time,

read-write node read

The data is divided into n parts on average and distributed to each idle computing node;

each computing node executes operation in parallel according to the iteration workflow and feeds back operation state information to the task scheduling node in real time;

and the parallel scheduling node confirms whether the processed detection data after calculation is received or not, if so, the processed detection data is transmitted to the read-write node, and the read-write node integrates according to the corresponding spatial structure and writes the integrated data into a storage area.

Based on the method for processing big railway detection data in parallel in any one or more of the above embodiments, the present invention further provides a storage medium, on which program codes for implementing the method in any one or more of the above embodiments are stored.

Based on other aspects of the method in any one or more of the embodiments, the present invention further provides a system for parallel processing of big railway inspection data, where the system includes:

the data storage module is configured to divide the acquired detection data and store the divided detection data according to a set storage strategy;

the system comprises an algorithm combination formulation module, a calculation workflow and a data analysis module, wherein the algorithm combination formulation module is configured to create a corresponding calculation workflow according to engineering property requirements and interpretation target requirements of detection data, and the calculation workflow is a combination of a plurality of signal processing algorithms required by the detection data and comprises a conventional workflow and an iterative workflow;

and the parallel processing module is configured to parallelly call the stored detection data based on a pre-established parallel node architecture, and respectively realize parallel processing based on conventional class operation and iterative class operation according to a scheduling strategy matched with the operation workflow.

Further, in one embodiment, the system further comprises: the node architecture creating module is configured to set a main node as a task scheduling node, and the main node, a read-write node and N computing nodes form a parallel node architecture together.

Compared with the closest prior art, the invention also has the following beneficial effects:

according to the method and the system for processing the railway detection big data in parallel, the acquired detection data are divided, so that the data with emergency processing requirements can be processed in the first time, and accidents or influences caused by untimely data processing can be avoided; furthermore, the data dividing and storing method provided by the invention carries out multi-granularity segmentation operation aiming at the data processing algorithm principle, realizes parallel processing aiming at the conventional algorithm and the iterative algorithm based on the parallel node architecture according to different scheduling strategies, improves the data processing rate from two aspects of load balancing performance and cluster resource utilization rate, can effectively meet the processing requirements of current and future large-scale data, and provides reliable assistance for the subsequent application of railway detection data.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a schematic flow chart of a method for parallel processing of big railway detection data according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a system for parallel processing of railway detection big data according to another embodiment of the present invention.

Detailed Description

The following detailed description will be provided for the embodiments of the present invention with reference to the accompanying drawings and examples, so that the practitioner of the present invention can fully understand how to apply the technical means to solve the technical problems, achieve the technical effects, and implement the present invention according to the implementation procedures. It should be noted that, unless otherwise conflicting, the embodiments and features of the embodiments of the present invention may be combined with each other, and the technical solutions formed are all within the scope of the present invention.

Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel, concurrently, or simultaneously. The order of the operations may be rearranged. A process may be terminated when its operations are completed, but may have additional steps not included in the figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc.

The computer equipment comprises user equipment and network equipment. The user equipment or the client includes but is not limited to a computer, a smart phone, a PDA, and the like; network devices include, but are not limited to, a single network server, a server group of multiple network servers, or a cloud based on cloud computing consisting of a large number of computers or network servers. The computer devices may operate individually to implement the present invention or may be networked and interoperate with other computer devices in the network to implement the present invention. The network in which the computer device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.

With the rapid increase of railway operation mileage in China, how to rapidly, accurately and nondestructively detect infrastructure of a huge railway network in the future and grasp the health state of the infrastructure in time becomes a major issue to be solved urgently at present.

The GPR (ground penetrating radar) data processing technology of the prior art has been relatively mature at processing small and medium scale datasets. However, most of the technologies are based on a single computing node, and the involved algorithms are often highly serialized and cannot be effectively applied to processing large-scale railway detection data. With the rapid expansion of the data scale of the current ground penetrating radar, part of technicians also perform optimization research of the existing data processing technology, based on the technology, the capacity of rapidly processing large-scale data is improved by means of the processing capacity of massive high-speed parallelization of the big data cloud computing technology, the capacity of rapidly processing the large-scale data is improved, the distributed cluster management data based on Hadoop specifically adopts the relational database cluster of the distributed file systems HDFS and MySQL and Hbase to solve the massive storage and efficient access of the structured data, so that although the expandability and the portability of cluster resources are guaranteed, the data in various formats can be stored, but an explicit data storage strategy is not provided, the problem of repeated reading and writing exists in the preprocessing or subsequent processing of the radar data, and the load balancing problem is easily caused by the overlarge difference of the calculated quantities of different parallel tasks, when the algorithm steps in the workflow are complex, the parallel computing nodes may not operate normally, and the processing requirements of large-scale ground penetrating radar detection data cannot be met. Therefore, it is desirable to provide an efficient and reasonable data parallel processing method that can meet the application requirements.

The rapid development of a new generation of high-performance hardware architecture system creates a new opportunity for the development of rapid processing of mass GPR data, GPR detection data with high precision and large area can be processed by utilizing a parallel technology to greatly improve the processing efficiency, but in the process of parallel processing, because the computation of parallel operation of each path is difficult to balance, the influence of overlarge load difference is difficult to overcome, the phenomena of uneven computation resources and insufficient consumption of intermediate memory resources are easy to cause, and the processing efficiency is difficult to guarantee.

In order to solve the problems, the invention provides a method and a system for parallel processing of railway detection big data, the method establishes a hybrid parallel computing architecture of 'data parallel + algorithm parallel', the radar signal data processing algorithms are divided into two categories, and applicable scheduling and computing strategies are respectively formulated, so that the effect of load balancing can be realized in the whole workflow processing mode, and the utilization efficiency of cluster resources is maximized.

The detailed flow of the method of the embodiments of the present invention is described in detail below based on the accompanying drawings, and the steps shown in the flow chart of the drawings can be executed in a computer system containing a computer-executable instruction such as a set of computer-executable instructions. Although a logical order of steps is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

Example one

Fig. 1 is a schematic flow chart illustrating a method for processing railway detection big data in parallel according to an embodiment of the present invention, and as can be seen from fig. 1, the method includes the following steps.

In the practical application process of the railway data detection field, the ground penetrating radar data acquisition device comprises a ground penetrating radar host, a transmitting and receiving antenna, a measuring point and measuring line space coordinate determining and recording system; the ground penetrating radar host is respectively connected with and controls the transmitting and receiving antenna, the measuring point and the measuring line space coordinate determining and recording system; and the measuring point and measuring line space coordinate determining and recording system is used for determining and recording the specific position of each ground penetrating radar data acquisition point.

For the collected original detection data, the original detection data is read by a ground penetrating radar data reader with a corresponding model and a data converter and is converted into a standard format meeting the parallel processing of the invention.

The researchers of the invention consider that the detection data obtained by different types of ground penetrating radar equipment and ground penetrating radar equipment arranged in different areas have a mild or urgent score in the aspect of processing requirements, for example, if an earthquake occurs in a certain area, the real-time detection data of the railway related ground penetrating radar in the area has urgent processing requirements. Therefore, in one embodiment, in the data storage step, the railway detection data is divided according to the acquisition area and the acquisition equipment type to obtain detection data of each level with different processing priorities.

Specifically, priority marks can be added to each batch of detection data packets, and the detection data packets are directly stored in the sub-storage areas corresponding to different priority levels in batches according to the priority marks, so that the influence of the priority marks in the detection data file header on the reading rate and the scheduling operation rate in the subsequent parallel processing process is avoided. Accordingly, the memory area with the highest processing priority, wherein the stored data has the highest scheduling priority and read priority.

In an actual application scenario, a data set obtained by each detection is often a plurality of files, and the data set needs to be distributed to a plurality of parallel computing nodes and processed at the same time. The granularity of the segmentation scheduling operation directly influences the data processing amount and the processing efficiency of each parallel computing node, and has important influence on load balance.

In the prior art, splitting is performed in units of files, for example, a total of 10 files is 5 nodes, and each node is allocated with 2 files. However, sometimes a single file is very large, so that the calculated amount of a single node is too large to occupy all memories, other nodes cannot run, and the advantage of parallelization processing cannot be fully exerted. Generally speaking, it is clear that a plurality of production lines are available, but the production lines cannot be simultaneously produced.

Therefore, in order to perform fine-grained splitting, radar files need to be decomposed, and researchers of the invention consider that direct decomposition cannot be performed due to the particularity of the ground penetrating radar detection data file format, and realize parallel partition storage by combining the specific content of the file format so as to realize flexible splitting of detection data.

Specifically, the file format of the detection data is as follows: header + signal data (e.g., lane 1 data + lane 2 data … …); the file header is a plurality of acquisition parameters, and formats of different radar devices are different.

Further, each 1 track of signal data includes 2 parts, track head + radar signal (data content, e.g. 4 number +512 number in a certain brand);

because the subsequent signal processing algorithm needs to use the parameters in the file header, researchers in the patent store the file header in a relational database, such as a mysql database, and technicians in the field can select any reasonable relational database according to requirements; the radar signal data content is stored in a distributed database, such as an hdfs database, and when the radar signal data content is applied, technicians in the field can select any reasonable distributed database according to requirements; based on the method, the signal data can be flexibly and equally divided, and the minimum granularity is a single-channel signal. Therefore, by adopting the technical concept of the invention, the data processing granularity of each parallel processing node can be smaller and more balanced, from 1 file (generally, a roadbed detects a single file and has hundreds of thousands of data) to a smaller gather (such as tens of tracks and hundreds of tracks) or even 1 track of data.

In addition, whether the track is marked data or not can be represented in the track header, and the track is also useful for subsequent processing and is also stored in a relational database. The marker data may be used to match the true spatial position of the radar data, inserted by the operation of the acquisition software during acquisition, typically at equal distances, or at locations where the geological features characterizing the detection zone have changed.

Specifically, the ground penetrating radar data and the attribute characteristic values thereof relate to various different data formats, including multidimensional attribute data such as time domain root mean square amplitude, time domain coherence, frequency domain-3 dB bandwidth average frequency, frequency domain-3 dB bandwidth average phase, time-frequency domain low-frequency increasing area, time-frequency domain high-frequency attenuating area and the like. Therefore, in one embodiment, for each priority storage area, when storing the divided detection data, the MySQL relational database and the distributed database are used in multiple storage modes, and the priority information is combined to realize the parallel partition storage of the detection data in multiple formats;

the detection data comprises a file header and a data body, wherein the file header comprises the total number of data acquisition time channels, the number of sampling points, the antenna frequency, the channel spacing and the time window parameter corresponding to each file; and storing the file header of each detection data into a relational database.

In specific implementation, attribute parameters during data acquisition corresponding to the file header can be stored in the MySQL relational database, such as parameters of the total number of channels, the number of sampling points, the antenna frequency, the channel spacing, the time window and the like.

And storing the data body in the HDFS in a blocking mode, and if the track head prompt is a track number mark, storing the track number mark into the database in a correlation mode.

In actual application, the storage format of the database can be as follows:

file attribute parameter table example:

filename	Attribute 1	Attribute 2	Attribute 3	……
					Document 1	1	2	3
Document 2	1	2	3
					Document 3	1	2	3

Example file tag table:

filename	Marking track numbers
		Document 1	Mark 1
Document 1	Mark 2
		Document 1	Mark 3
Document 2	Mark 1
		Document 2	Mark 2
…	…

The input and output of the traditional algorithm are one or more radar files, the minimum data volume of a single task is generally large during parallel computing, the problems of unbalanced parallel computing load and insufficient memory of a computing module are easily caused, and the normal operation of all parallel computing is even influenced when the problems are serious.

Further, even if the minimum calculation unit of the parallel calculation is set as one radar data file in the prior art, since the iterative calculation process may need to be executed, and the number of iterative calculations is not uniform, the problem of uneven calculation load may still be caused.

Therefore, in one embodiment, in the step of formulating algorithm combinations, according to the engineering property requirement and the interpretation target requirement of the detection data, a corresponding operation workflow is created, wherein the operation workflow is a combination of a plurality of signal processing algorithms, including a normal workflow and an iterative workflow, which are required to be performed by the detection data. According to different operation principles, operation workflows are divided into conventional class workflows and iteration workflows according to whether the operation workflows are iteration algorithms or not.

It should be noted that, for a certain radar file, the operation processing with the execution order may be involved, and based on this, the operation processing workflow of each radar file is not limited to one regular workflow and one iterative workflow. Researchers can reasonably plan the operation workflows of different stages of each radar data file according to the operation requirements of the researchers, the operation workflows of each stage at least comprise one algorithm, and unified planning and storage are carried out. In practical application, operation workflows corresponding to the same type of batch radar data are consistent.

Furthermore, in order to ensure load balance and utilization rate of each parallel computing channel, researchers of the invention set the input/output unit and the minimum computing unit of the radar signal processing algorithm as a single channel or a gather.

The minimum calculation unit e of the algorithm is determined, independent calculation which is not related to each other can be carried out, the minimum calculation unit is used as input and output, for example, a single channel or a gather, and the size of the gather can be specified.

Taking the linear gain algorithm as an example, the calculation result for each channel of data is represented by multiplying each sampling point by a different coefficient:

P(i)＝aiΔt+bexp^ciΔt，i＝1，2，…，N₁

in the formula: n is a radical of₁The number of sampling points (readable in the file header) for each signal;

p (i) is a weighting factor corresponding to the ith sampling point;

Δ t is the sampling time interval (readable in the header);

a, b and c are coefficients, and the values of the coefficients can be set by a user and input in an interface;

therefore, for the linear gain algorithm, the minimum calculation unit is set to be single-channel data.

In combination with practical applications, in an embodiment, the parallel processing method provided by the present invention further includes: a node architecture creating step, a main node A serving as a task scheduling node, a read-write node B and N computing nodes C₁～C_NTogether forming a parallel node architecture.

Further, in the parallel processing step, for a plurality of algorithms in the operation workflow, processing one by one:

if the algorithm is a conventional algorithm, a static scheduling method is adopted, the node A divides the data processed this time into N parts which are evenly distributed to the nodes C1-CN, and the task scheduling is not carried out in the parallel execution process. And the nodes C1-CN respectively call the set conventional algorithms to perform parallel processing, and after the processing is finished, the calculation result is fed back to the node A.

Thus, in one embodiment, parallel processing based on conventional class operations is implemented as follows:

reading x/K channels by a read-write node as the processing data of the current round based on a preset scheduling base number K; the task scheduling node averagely divides the processing data of the current round into N parts and distributes the N parts to each computing node; therefore, the setting of K in the operation process determines the size of the scheduling granularity, wherein the scheduling granularity refers to the data volume calculated by each computing node each time; in this scenario, the scheduling granularity is x/NK;

In practical application, because the calculation processes of all parallel channels in conventional operation are consistent, and the calculation time difference is not too large, the next batch of data retrieval and operation is started after batch completion, but it needs to be explained that once the parallel scheduling nodes recognize that the processed detection data after calculation is completed is received, the data can be immediately transmitted to the read-write nodes, the read-write nodes integrate and write into the storage area according to the corresponding spatial structure, and the data are not conflicted with the starting of the next batch of reading and operation after the batch completion, so that the congestion of data reading, writing and storage can be effectively avoided.

Setting a value of a calculation unit size e according to the size of the total number x of the data to be processed, wherein the larger the number of the data to be processed is, the larger the corresponding calculation unit size is, the minimum e value is 1, and the single channel is taken as the minimum calculation unit; .

Further, considering the tag data for matching the real spatial position of the radar data, in an optional embodiment, when determining the scheduling base number of the sliced information content, the calculation is performed by combining the location of the tag data, specifically, the unit scheduling track number is kept to be the single or multiple of the unit tag data corresponding to the track number S, so that the data within each 2 tag ranges can adopt the same signal processing algorithm, and the calculation signals of the corresponding data units are prevented from having too large difference. For example, if the number of marks is about one per 1000 tracks, the unit scheduling track number in data processing will also refer to the mark granularity track number (the track number corresponding to the unit mark data), and is equal to or equal to a multiple of the mark granularity track number.

Specifically, in practical application, the value of the target call base may be determined according to the following policy:

if the value of the unit scheduling channel number N × e obtained based on the initial scheduling base number K' is smaller than or equal to S of the corresponding data unit, selecting the value of the target scheduling base number K to be equal to S;

if the value of the unit scheduling channel number N × e is larger than the S of the corresponding data unit, the unit scheduling channel number N × e is based on

And (4) rounding up to obtain a target multiple m, and selecting m × S as a value of the target calling granularity K.

Further, if the algorithm is an iterative algorithm, a dynamic scheduling method of circular allocation is adopted:

and setting a corresponding parallel scheduling cardinal number K, wherein K is an integer.

The proportion of the first distributed data volume of the A node to the total data volume is 1/K, the data are divided into N parts and evenly distributed to C₁～C_NAnd (4) nodes. C₁～C_NAnd respectively calling the specified algorithms to perform parallel processing, and feeding back the calculation result to the node A after the processing is finished.

Node A receives node C₁～C_NThe method comprises the steps that feedback information of nodes is obtained, and the number of computing nodes in an idle state is counted in real time, and researchers consider that for algorithm combinations in an iteration workflow, due to the fact that iteration times are unknown, for algorithms with multiple iteration times, single operation time differences of different computing nodes can be increased along with the iteration times to form obvious time differences, and therefore time consumption of each computing node is obviously different); from the remaining calculated quantities, the data quantity (1/Kn) is continuously distributed to eachAn idle compute node;

and circularly executing the steps until all data are processed.

Therefore, in one embodiment, in the parallel processing step, the parallel processing based on the iterative class operation is implemented according to the following steps:

read-write node read

While, for purposes of simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present invention is not limited by the illustrated ordering of acts, as some steps may occur in other orders or concurrently with other steps in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

It should be noted that, in other embodiments of the present invention, the method may further obtain a new data parallel processing method by combining one or some of the above embodiments, so as to implement efficient analysis and operation on mass detection data.

By adopting the scheme in any one or more of the embodiments of the invention, the parallel processing of the railway detection data is realized, the load balance of each parallel computing channel is further improved on the basis of realizing the rapid and accurate operation, the utilization efficiency of parallel computing resources is furthest exerted, and the reliable data support is provided for the subsequent application research.

It should be noted that, based on the method in any one or more of the above embodiments of the present invention, the present invention further provides a storage medium, on which a program code capable of implementing the method in any one or more of the above embodiments is stored, and when the program code is executed by an operating system, the method for processing railway detection big data in parallel as described above can be implemented.

Example two

The method is described in detail in the embodiments disclosed in the present invention, and the method of the present invention can be implemented by using various forms of apparatuses or systems, so based on other aspects of the method described in any one or more of the embodiments, the present invention further provides a system for parallel processing of railway detection big data, which is used for executing the method for parallel processing of railway detection big data described in any one or more of the embodiments. Specific examples are given below for a detailed description.

Specifically, fig. 2 is a schematic structural diagram of a system for parallel processing of railway detection big data provided in an embodiment of the present invention, and as shown in fig. 2, the system includes:

In one embodiment, the data storage module includes a data dividing unit configured to divide the railway detection data according to the acquisition area and the acquisition device type to obtain detection data of each level with different processing priorities.

Further, in one embodiment, when storing the partitioned detection data, the data storage unit of the data storage module adopts multiple storage modes of a MySQL relational database and an HDFS, and implements partition storage of the detection data in multiple formats by combining priority information;

the detection data comprises a file header and a data body, wherein the file header comprises the total number of tracks, the number of sampling points, the antenna frequency, the track interval and the time window parameter during data acquisition.

Further, in one embodiment, the system further comprises: and a node architecture creating step, namely setting a main node as a task scheduling node, and forming a parallel node architecture together with a read-write node and N computing nodes.

Preferably, in an embodiment, the parallel processing module includes a regular operation processing unit configured to implement parallel processing based on regular class operations according to the following steps:

In one embodiment, the parallel processing module includes an iterative operation processing unit configured to implement parallel processing based on iterative class operations according to the following steps:

read-write node read

In the system for processing the railway detection big data in parallel provided by the embodiment of the invention, each module or unit structure can independently operate or operate in a combined mode according to actual analysis and operation requirements so as to realize corresponding technical effects.

It is to be understood that the disclosed embodiments of the invention are not limited to the particular structures, process steps, or materials disclosed herein but are extended to equivalents thereof as would be understood by those ordinarily skilled in the relevant arts. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.

Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Thus, appearances of the phrase "an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment.

Although the embodiments of the present invention have been described above, the above descriptions are only for the convenience of understanding the present invention, and are not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for parallel processing of big railway inspection data, the method comprising:

2. The method according to claim 1, wherein in the data storage step, the railway detection data is divided according to the acquisition area and the acquisition equipment type to obtain detection data of each stage with different processing priorities.

3. The method according to claim 1, characterized in that when storing the divided detection data, a plurality of storage modes of a relational database and a distributed database are adopted, and the parallel storage of the detection data in a plurality of formats is realized by combining priority information;

4. The method of claim 1, further comprising: and a node architecture creating step, namely setting a main node as a task scheduling node, and forming a parallel node architecture together with a read-write node and N computing nodes.

5. The method according to claim 1, wherein in the parallel processing step, parallel processing based on normal class operation is implemented according to the following steps:

6. The method according to claim 1, wherein in the parallel processing step, the parallel processing based on iterative class operation is implemented according to the following steps:

read-write node read

7. A storage medium having program code stored thereon for implementing the method of any one of claims 1-6.

8. A system for parallel processing of big railway inspection data, the system comprising:

9. The method of claim 1, wherein the system further comprises: the node architecture creating module is configured to set a main node as a task scheduling node, and the main node, a read-write node and N computing nodes form a parallel node architecture together.