CN113625264A - Method and system for parallel processing of railway detection big data - Google Patents

Method and system for parallel processing of railway detection big data Download PDF

Info

Publication number
CN113625264A
CN113625264A CN202110665774.6A CN202110665774A CN113625264A CN 113625264 A CN113625264 A CN 113625264A CN 202110665774 A CN202110665774 A CN 202110665774A CN 113625264 A CN113625264 A CN 113625264A
Authority
CN
China
Prior art keywords
data
detection data
node
parallel
workflow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110665774.6A
Other languages
Chinese (zh)
Inventor
杜翠
张千里
陈锋
刘杰
程远水
郭浏卉
张栋
张新冈
刘景宇
邓逆涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Academy of Railway Sciences Corp Ltd CARS
Railway Engineering Research Institute of CARS
Original Assignee
China Academy of Railway Sciences Corp Ltd CARS
Railway Engineering Research Institute of CARS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Academy of Railway Sciences Corp Ltd CARS, Railway Engineering Research Institute of CARS filed Critical China Academy of Railway Sciences Corp Ltd CARS
Priority to CN202110665774.6A priority Critical patent/CN113625264A/en
Publication of CN113625264A publication Critical patent/CN113625264A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S13/00Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
    • G01S13/88Radar or analogous systems specially adapted for specific applications
    • G01S13/885Radar or analogous systems specially adapted for specific applications for ground probing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Electromagnetism (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Operations Research (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Train Traffic Observation, Control, And Security (AREA)

Abstract

The invention provides a method and a system for processing railway detection big data in parallel, wherein the method divides the collected detection data and stores the divided detection data according to a set storage strategy; the method comprises the steps of creating a combination of a plurality of signal processing algorithms required by detection data as a conventional workflow and an iterative workflow according to engineering property requirements and interpretation target requirements of the detection data, then utilizing a designed parallel node architecture to parallelly call the stored detection data, and respectively realizing parallel processing based on the conventional operation and the iterative operation according to a scheduling strategy of a matched workflow. By adopting the scheme, the problems of unbalanced operation load and easy exclusive operation module memory in the existing data processing technology are solved, the load balancing performance and the cluster resource utilization rate in the processing process are improved based on the specific parallel node architecture and by combining formulated different scheduling strategies, and the processing requirement of large-scale detection data at present and in the future can be met.

Description

Method and system for parallel processing of railway detection big data
Technical Field
The invention relates to the technical field of cluster resource processing and optimization, in particular to a method and a system for parallel processing of railway detection big data.
Background
Railway transportation brings convenience and support for life and work of common people, but guaranteeing safe operation of railways is also a core direction which needs to be always paid attention now and in the future, and with the rapid increase of railway operation mileage in China, how to rapidly, accurately and nondestructively detect infrastructure of increasingly huge railway networks and grasp the health state of the infrastructure in time becomes a major issue to be solved urgently at present.
Ground Penetrating radar (gpr) is a geophysical method for detecting the characteristics and distribution rules of substances inside a medium by using antennas to transmit and receive high-frequency electromagnetic waves, information related to the physical properties of the underground medium can be obtained by researching the change of the polarization mode of radar waves, the Ground Penetrating radar can be used for detection in various modes during application, correspondingly, the obtained detection data has large type complexity and scale, and needs to be processed and analyzed by adopting a set strategy.
In the prior art, research is conducted on processing means of ground penetrating radar detection data, but most of the technologies are based on single computing nodes, and related algorithms are often highly serialized and are only suitable for processing small-scale GPR data sets. With the rapid expansion of the data scale of the current ground penetrating radar, part of technicians also perform optimization research of the existing data processing technology, based on the technology, the capacity of rapidly processing large-scale data is improved by means of the processing capacity of massive high-speed parallelization of the large-data cloud computing technology, the distributed cluster management data based on Hadoop specifically adopts the relational database cluster of the distributed file systems HDFS and MySQL in combination with Hbase to solve the massive storage and efficient access of the structured data, thus, although the expandability and the portability of cluster resources are guaranteed, the data in various formats can be stored, but the problem of repeated reading and writing exists in the process of preprocessing or subsequent processing the radar data, the load balancing problem is easily caused by the overlarge difference of the calculated amount of different parallel tasks, and when the algorithm steps in a workflow are complicated, the parallel computing nodes can not normally operate and the processing requirement of large-scale ground penetrating radar detection data can not be met. Therefore, it is desirable to provide an efficient and reasonable data parallel processing method that can meet the application requirements.
The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.
The rapid development of a new generation of high-performance hardware architecture system creates a new opportunity for the development of rapid processing of massive GPR data. A large amount of high-precision and large-area GPR detection data can be processed by utilizing a parallel technology, so that the processing efficiency is greatly improved.
Disclosure of Invention
To solve the above problems, the present invention provides a method for parallel processing of big railway detection data, which in one embodiment comprises:
the data storage step, dividing the acquired detection data, and storing the divided detection data according to a set storage strategy;
the method comprises the steps of establishing an algorithm combination, establishing a corresponding operation workflow according to engineering property requirements and target interpretation requirements of detection data, wherein the operation workflow is a combination of a plurality of signal processing algorithms required by the detection data and comprises a conventional workflow and an iterative workflow;
and a parallel processing step, parallelly calling the stored detection data based on a pre-established parallel node architecture, and respectively realizing parallel processing based on conventional operation and iterative operation according to a scheduling strategy matched with the operation workflow.
Preferably, in the data storage step, the railway detection data is divided according to the acquisition area and the acquisition equipment type to obtain detection data of each level with different processing priorities.
Further, in one embodiment, when storing the divided detection data, the parallel storage of the detection data in various formats is realized by adopting various storage modes of a relational database and a distributed database and combining priority information;
the data body of the detection data file comprises track head data and data content, the track head data of each detection data and the file head are stored in a relational database in an associated mode, and the data content of each detection data is stored in a distributed database.
Further, in a preferred embodiment, the method further comprises: and a node architecture creating step, namely setting a main node as a task scheduling node, and forming a parallel node architecture together with a read-write node and N computing nodes.
Further, in one embodiment, in the parallel processing step, the parallel processing based on the normal class operation is implemented according to the following steps:
the task scheduling node judges whether the detection data needing to be processed exist or not, if yes, the total track number x of the detection data needing to be processed is counted, and the read-write node and the computing node are started;
sequentially and circularly executing the following steps until all the detection data to be processed are operated, and releasing the read-write nodes and the computing nodes:
reading x/K channels by a read-write node as the processing data of the current round based on a preset scheduling base number K; the task scheduling node averagely divides the processing data of the current round into N parts and distributes the N parts to each computing node;
each computing node executes operation in parallel according to the conventional workflow;
the parallel scheduling node confirms whether the processed detection data which is calculated is received or not, if the processed detection data is received, the processed detection data is transmitted to the read-write node, and the read-write node integrates the processed detection data according to the corresponding space structure and writes the integrated detection data into a storage area;
and K is a positive integer, K is more than or equal to 1 and less than or equal to N, or the optimization is further carried out by combining the channel number S corresponding to the unit marking data, and the scheduling base number K is determined.
Further, in one embodiment, the parallel processing based on the iterative class operation is implemented according to the following steps:
the task scheduling node judges whether the detection data needing to be processed exist or not, if yes, the total track number x of the detection data needing to be processed is counted, and the read-write node and the computing node are started;
setting a scheduling base number K according to hardware resources, x and the size of a calculation unit of the iterative algorithm, and reading x/K channels by a read-write node to serve as first round of processing data; the task scheduling node averagely divides the processing data of the current round into N parts and distributes the N parts to each computing node;
and then, sequentially and circularly executing the following steps until all the detection data to be processed are operated, and releasing the read-write nodes and the computing nodes:
the task scheduling node counts the number n of idle computing nodes in real time,
read-write node read
Figure RE-GDA0003281717670000031
The data is divided into n parts on average and distributed to each idle computing node;
each computing node executes operation in parallel according to the iteration workflow and feeds back operation state information to the task scheduling node in real time;
and the parallel scheduling node confirms whether the processed detection data after calculation is received or not, if so, the processed detection data is transmitted to the read-write node, and the read-write node integrates according to the corresponding spatial structure and writes the integrated data into a storage area.
Based on the method for processing big railway detection data in parallel in any one or more of the above embodiments, the present invention further provides a storage medium, on which program codes for implementing the method in any one or more of the above embodiments are stored.
Based on other aspects of the method in any one or more of the embodiments, the present invention further provides a system for parallel processing of big railway inspection data, where the system includes:
the data storage module is configured to divide the acquired detection data and store the divided detection data according to a set storage strategy;
the system comprises an algorithm combination formulation module, a calculation workflow and a data analysis module, wherein the algorithm combination formulation module is configured to create a corresponding calculation workflow according to engineering property requirements and interpretation target requirements of detection data, and the calculation workflow is a combination of a plurality of signal processing algorithms required by the detection data and comprises a conventional workflow and an iterative workflow;
and the parallel processing module is configured to parallelly call the stored detection data based on a pre-established parallel node architecture, and respectively realize parallel processing based on conventional class operation and iterative class operation according to a scheduling strategy matched with the operation workflow.
Further, in one embodiment, the system further comprises: the node architecture creating module is configured to set a main node as a task scheduling node, and the main node, a read-write node and N computing nodes form a parallel node architecture together.
Compared with the closest prior art, the invention also has the following beneficial effects:
according to the method and the system for processing the railway detection big data in parallel, the acquired detection data are divided, so that the data with emergency processing requirements can be processed in the first time, and accidents or influences caused by untimely data processing can be avoided; furthermore, the data dividing and storing method provided by the invention carries out multi-granularity segmentation operation aiming at the data processing algorithm principle, realizes parallel processing aiming at the conventional algorithm and the iterative algorithm based on the parallel node architecture according to different scheduling strategies, improves the data processing rate from two aspects of load balancing performance and cluster resource utilization rate, can effectively meet the processing requirements of current and future large-scale data, and provides reliable assistance for the subsequent application of railway detection data.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a schematic flow chart of a method for parallel processing of big railway detection data according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a system for parallel processing of railway detection big data according to another embodiment of the present invention.
Detailed Description
The following detailed description will be provided for the embodiments of the present invention with reference to the accompanying drawings and examples, so that the practitioner of the present invention can fully understand how to apply the technical means to solve the technical problems, achieve the technical effects, and implement the present invention according to the implementation procedures. It should be noted that, unless otherwise conflicting, the embodiments and features of the embodiments of the present invention may be combined with each other, and the technical solutions formed are all within the scope of the present invention.
Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel, concurrently, or simultaneously. The order of the operations may be rearranged. A process may be terminated when its operations are completed, but may have additional steps not included in the figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc.
The computer equipment comprises user equipment and network equipment. The user equipment or the client includes but is not limited to a computer, a smart phone, a PDA, and the like; network devices include, but are not limited to, a single network server, a server group of multiple network servers, or a cloud based on cloud computing consisting of a large number of computers or network servers. The computer devices may operate individually to implement the present invention or may be networked and interoperate with other computer devices in the network to implement the present invention. The network in which the computer device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.
With the rapid increase of railway operation mileage in China, how to rapidly, accurately and nondestructively detect infrastructure of a huge railway network in the future and grasp the health state of the infrastructure in time becomes a major issue to be solved urgently at present.
The GPR (ground penetrating radar) data processing technology of the prior art has been relatively mature at processing small and medium scale datasets. However, most of the technologies are based on a single computing node, and the involved algorithms are often highly serialized and cannot be effectively applied to processing large-scale railway detection data. With the rapid expansion of the data scale of the current ground penetrating radar, part of technicians also perform optimization research of the existing data processing technology, based on the technology, the capacity of rapidly processing large-scale data is improved by means of the processing capacity of massive high-speed parallelization of the big data cloud computing technology, the capacity of rapidly processing the large-scale data is improved, the distributed cluster management data based on Hadoop specifically adopts the relational database cluster of the distributed file systems HDFS and MySQL and Hbase to solve the massive storage and efficient access of the structured data, so that although the expandability and the portability of cluster resources are guaranteed, the data in various formats can be stored, but an explicit data storage strategy is not provided, the problem of repeated reading and writing exists in the preprocessing or subsequent processing of the radar data, and the load balancing problem is easily caused by the overlarge difference of the calculated quantities of different parallel tasks, when the algorithm steps in the workflow are complex, the parallel computing nodes may not operate normally, and the processing requirements of large-scale ground penetrating radar detection data cannot be met. Therefore, it is desirable to provide an efficient and reasonable data parallel processing method that can meet the application requirements.
The rapid development of a new generation of high-performance hardware architecture system creates a new opportunity for the development of rapid processing of mass GPR data, GPR detection data with high precision and large area can be processed by utilizing a parallel technology to greatly improve the processing efficiency, but in the process of parallel processing, because the computation of parallel operation of each path is difficult to balance, the influence of overlarge load difference is difficult to overcome, the phenomena of uneven computation resources and insufficient consumption of intermediate memory resources are easy to cause, and the processing efficiency is difficult to guarantee.
In order to solve the problems, the invention provides a method and a system for parallel processing of railway detection big data, the method establishes a hybrid parallel computing architecture of 'data parallel + algorithm parallel', the radar signal data processing algorithms are divided into two categories, and applicable scheduling and computing strategies are respectively formulated, so that the effect of load balancing can be realized in the whole workflow processing mode, and the utilization efficiency of cluster resources is maximized.
The detailed flow of the method of the embodiments of the present invention is described in detail below based on the accompanying drawings, and the steps shown in the flow chart of the drawings can be executed in a computer system containing a computer-executable instruction such as a set of computer-executable instructions. Although a logical order of steps is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
Example one
Fig. 1 is a schematic flow chart illustrating a method for processing railway detection big data in parallel according to an embodiment of the present invention, and as can be seen from fig. 1, the method includes the following steps.
The data storage step, dividing the acquired detection data, and storing the divided detection data according to a set storage strategy;
the method comprises the steps of establishing an algorithm combination, establishing a corresponding operation workflow according to engineering property requirements and target interpretation requirements of detection data, wherein the operation workflow is a combination of a plurality of signal processing algorithms required by the detection data and comprises a conventional workflow and an iterative workflow;
and a parallel processing step, parallelly calling the stored detection data based on a pre-established parallel node architecture, and respectively realizing parallel processing based on conventional operation and iterative operation according to a scheduling strategy matched with the operation workflow.
In the practical application process of the railway data detection field, the ground penetrating radar data acquisition device comprises a ground penetrating radar host, a transmitting and receiving antenna, a measuring point and measuring line space coordinate determining and recording system; the ground penetrating radar host is respectively connected with and controls the transmitting and receiving antenna, the measuring point and the measuring line space coordinate determining and recording system; and the measuring point and measuring line space coordinate determining and recording system is used for determining and recording the specific position of each ground penetrating radar data acquisition point.
For the collected original detection data, the original detection data is read by a ground penetrating radar data reader with a corresponding model and a data converter and is converted into a standard format meeting the parallel processing of the invention.
The researchers of the invention consider that the detection data obtained by different types of ground penetrating radar equipment and ground penetrating radar equipment arranged in different areas have a mild or urgent score in the aspect of processing requirements, for example, if an earthquake occurs in a certain area, the real-time detection data of the railway related ground penetrating radar in the area has urgent processing requirements. Therefore, in one embodiment, in the data storage step, the railway detection data is divided according to the acquisition area and the acquisition equipment type to obtain detection data of each level with different processing priorities.
Specifically, priority marks can be added to each batch of detection data packets, and the detection data packets are directly stored in the sub-storage areas corresponding to different priority levels in batches according to the priority marks, so that the influence of the priority marks in the detection data file header on the reading rate and the scheduling operation rate in the subsequent parallel processing process is avoided. Accordingly, the memory area with the highest processing priority, wherein the stored data has the highest scheduling priority and read priority.
In an actual application scenario, a data set obtained by each detection is often a plurality of files, and the data set needs to be distributed to a plurality of parallel computing nodes and processed at the same time. The granularity of the segmentation scheduling operation directly influences the data processing amount and the processing efficiency of each parallel computing node, and has important influence on load balance.
In the prior art, splitting is performed in units of files, for example, a total of 10 files is 5 nodes, and each node is allocated with 2 files. However, sometimes a single file is very large, so that the calculated amount of a single node is too large to occupy all memories, other nodes cannot run, and the advantage of parallelization processing cannot be fully exerted. Generally speaking, it is clear that a plurality of production lines are available, but the production lines cannot be simultaneously produced.
Therefore, in order to perform fine-grained splitting, radar files need to be decomposed, and researchers of the invention consider that direct decomposition cannot be performed due to the particularity of the ground penetrating radar detection data file format, and realize parallel partition storage by combining the specific content of the file format so as to realize flexible splitting of detection data.
Specifically, the file format of the detection data is as follows: header + signal data (e.g., lane 1 data + lane 2 data … …); the file header is a plurality of acquisition parameters, and formats of different radar devices are different.
Further, each 1 track of signal data includes 2 parts, track head + radar signal (data content, e.g. 4 number +512 number in a certain brand);
because the subsequent signal processing algorithm needs to use the parameters in the file header, researchers in the patent store the file header in a relational database, such as a mysql database, and technicians in the field can select any reasonable relational database according to requirements; the radar signal data content is stored in a distributed database, such as an hdfs database, and when the radar signal data content is applied, technicians in the field can select any reasonable distributed database according to requirements; based on the method, the signal data can be flexibly and equally divided, and the minimum granularity is a single-channel signal. Therefore, by adopting the technical concept of the invention, the data processing granularity of each parallel processing node can be smaller and more balanced, from 1 file (generally, a roadbed detects a single file and has hundreds of thousands of data) to a smaller gather (such as tens of tracks and hundreds of tracks) or even 1 track of data.
In addition, whether the track is marked data or not can be represented in the track header, and the track is also useful for subsequent processing and is also stored in a relational database. The marker data may be used to match the true spatial position of the radar data, inserted by the operation of the acquisition software during acquisition, typically at equal distances, or at locations where the geological features characterizing the detection zone have changed.
Specifically, the ground penetrating radar data and the attribute characteristic values thereof relate to various different data formats, including multidimensional attribute data such as time domain root mean square amplitude, time domain coherence, frequency domain-3 dB bandwidth average frequency, frequency domain-3 dB bandwidth average phase, time-frequency domain low-frequency increasing area, time-frequency domain high-frequency attenuating area and the like. Therefore, in one embodiment, for each priority storage area, when storing the divided detection data, the MySQL relational database and the distributed database are used in multiple storage modes, and the priority information is combined to realize the parallel partition storage of the detection data in multiple formats;
the detection data comprises a file header and a data body, wherein the file header comprises the total number of data acquisition time channels, the number of sampling points, the antenna frequency, the channel spacing and the time window parameter corresponding to each file; and storing the file header of each detection data into a relational database.
The data body of the detection data file comprises track head data and data content, the track head data of each detection data and the file head are stored in a relational database in an associated mode, and the data content of each detection data is stored in a distributed database.
In specific implementation, attribute parameters during data acquisition corresponding to the file header can be stored in the MySQL relational database, such as parameters of the total number of channels, the number of sampling points, the antenna frequency, the channel spacing, the time window and the like.
And storing the data body in the HDFS in a blocking mode, and if the track head prompt is a track number mark, storing the track number mark into the database in a correlation mode.
In actual application, the storage format of the database can be as follows:
file attribute parameter table example:
filename Attribute 1 Attribute 2 Attribute 3 ……
Document 1 1 2 3
Document 2 1 2 3
Document 3 1 2 3
Example file tag table:
filename Marking track numbers
Document 1 Mark 1
Document 1 Mark 2
Document 1 Mark 3
Document 2 Mark 1
Document 2 Mark 2
The input and output of the traditional algorithm are one or more radar files, the minimum data volume of a single task is generally large during parallel computing, the problems of unbalanced parallel computing load and insufficient memory of a computing module are easily caused, and the normal operation of all parallel computing is even influenced when the problems are serious.
Further, even if the minimum calculation unit of the parallel calculation is set as one radar data file in the prior art, since the iterative calculation process may need to be executed, and the number of iterative calculations is not uniform, the problem of uneven calculation load may still be caused.
Therefore, in one embodiment, in the step of formulating algorithm combinations, according to the engineering property requirement and the interpretation target requirement of the detection data, a corresponding operation workflow is created, wherein the operation workflow is a combination of a plurality of signal processing algorithms, including a normal workflow and an iterative workflow, which are required to be performed by the detection data. According to different operation principles, operation workflows are divided into conventional class workflows and iteration workflows according to whether the operation workflows are iteration algorithms or not.
It should be noted that, for a certain radar file, the operation processing with the execution order may be involved, and based on this, the operation processing workflow of each radar file is not limited to one regular workflow and one iterative workflow. Researchers can reasonably plan the operation workflows of different stages of each radar data file according to the operation requirements of the researchers, the operation workflows of each stage at least comprise one algorithm, and unified planning and storage are carried out. In practical application, operation workflows corresponding to the same type of batch radar data are consistent.
Furthermore, in order to ensure load balance and utilization rate of each parallel computing channel, researchers of the invention set the input/output unit and the minimum computing unit of the radar signal processing algorithm as a single channel or a gather.
The minimum calculation unit e of the algorithm is determined, independent calculation which is not related to each other can be carried out, the minimum calculation unit is used as input and output, for example, a single channel or a gather, and the size of the gather can be specified.
Taking the linear gain algorithm as an example, the calculation result for each channel of data is represented by multiplying each sampling point by a different coefficient:
P(i)=aiΔt+bexpciΔt,i=1,2,…,N1
in the formula: n is a radical of1The number of sampling points (readable in the file header) for each signal;
p (i) is a weighting factor corresponding to the ith sampling point;
Δ t is the sampling time interval (readable in the header);
a, b and c are coefficients, and the values of the coefficients can be set by a user and input in an interface;
therefore, for the linear gain algorithm, the minimum calculation unit is set to be single-channel data.
In combination with practical applications, in an embodiment, the parallel processing method provided by the present invention further includes: a node architecture creating step, a main node A serving as a task scheduling node, a read-write node B and N computing nodes C1~CNTogether forming a parallel node architecture.
Further, in the parallel processing step, for a plurality of algorithms in the operation workflow, processing one by one:
if the algorithm is a conventional algorithm, a static scheduling method is adopted, the node A divides the data processed this time into N parts which are evenly distributed to the nodes C1-CN, and the task scheduling is not carried out in the parallel execution process. And the nodes C1-CN respectively call the set conventional algorithms to perform parallel processing, and after the processing is finished, the calculation result is fed back to the node A.
Thus, in one embodiment, parallel processing based on conventional class operations is implemented as follows:
the task scheduling node judges whether the detection data needing to be processed exist or not, if yes, the total track number x of the detection data needing to be processed is counted, and the read-write node and the computing node are started;
sequentially and circularly executing the following steps until all the detection data to be processed are operated, and releasing the read-write nodes and the computing nodes:
reading x/K channels by a read-write node as the processing data of the current round based on a preset scheduling base number K; the task scheduling node averagely divides the processing data of the current round into N parts and distributes the N parts to each computing node; therefore, the setting of K in the operation process determines the size of the scheduling granularity, wherein the scheduling granularity refers to the data volume calculated by each computing node each time; in this scenario, the scheduling granularity is x/NK;
each computing node executes operation in parallel according to the conventional workflow;
the parallel scheduling node confirms whether the processed detection data which is calculated is received or not, if the processed detection data is received, the processed detection data is transmitted to the read-write node, and the read-write node integrates the processed detection data according to the corresponding space structure and writes the integrated detection data into a storage area;
and K is a positive integer, K is more than or equal to 1 and less than or equal to N, or the optimization is further carried out by combining the channel number S corresponding to the unit marking data, and the scheduling base number K is determined.
In practical application, because the calculation processes of all parallel channels in conventional operation are consistent, and the calculation time difference is not too large, the next batch of data retrieval and operation is started after batch completion, but it needs to be explained that once the parallel scheduling nodes recognize that the processed detection data after calculation is completed is received, the data can be immediately transmitted to the read-write nodes, the read-write nodes integrate and write into the storage area according to the corresponding spatial structure, and the data are not conflicted with the starting of the next batch of reading and operation after the batch completion, so that the congestion of data reading, writing and storage can be effectively avoided.
Setting a value of a calculation unit size e according to the size of the total number x of the data to be processed, wherein the larger the number of the data to be processed is, the larger the corresponding calculation unit size is, the minimum e value is 1, and the single channel is taken as the minimum calculation unit; .
Further, considering the tag data for matching the real spatial position of the radar data, in an optional embodiment, when determining the scheduling base number of the sliced information content, the calculation is performed by combining the location of the tag data, specifically, the unit scheduling track number is kept to be the single or multiple of the unit tag data corresponding to the track number S, so that the data within each 2 tag ranges can adopt the same signal processing algorithm, and the calculation signals of the corresponding data units are prevented from having too large difference. For example, if the number of marks is about one per 1000 tracks, the unit scheduling track number in data processing will also refer to the mark granularity track number (the track number corresponding to the unit mark data), and is equal to or equal to a multiple of the mark granularity track number.
Specifically, in practical application, the value of the target call base may be determined according to the following policy:
if the value of the unit scheduling channel number N × e obtained based on the initial scheduling base number K' is smaller than or equal to S of the corresponding data unit, selecting the value of the target scheduling base number K to be equal to S;
if the value of the unit scheduling channel number N × e is larger than the S of the corresponding data unit, the unit scheduling channel number N × e is based on
Figure RE-GDA0003281717670000101
And (4) rounding up to obtain a target multiple m, and selecting m × S as a value of the target calling granularity K.
Further, if the algorithm is an iterative algorithm, a dynamic scheduling method of circular allocation is adopted:
and setting a corresponding parallel scheduling cardinal number K, wherein K is an integer.
The proportion of the first distributed data volume of the A node to the total data volume is 1/K, the data are divided into N parts and evenly distributed to C1~CNAnd (4) nodes. C1~CNAnd respectively calling the specified algorithms to perform parallel processing, and feeding back the calculation result to the node A after the processing is finished.
Node A receives node C1~CNThe method comprises the steps that feedback information of nodes is obtained, and the number of computing nodes in an idle state is counted in real time, and researchers consider that for algorithm combinations in an iteration workflow, due to the fact that iteration times are unknown, for algorithms with multiple iteration times, single operation time differences of different computing nodes can be increased along with the iteration times to form obvious time differences, and therefore time consumption of each computing node is obviously different); from the remaining calculated quantities, the data quantity (1/Kn) is continuously distributed to eachAn idle compute node;
and circularly executing the steps until all data are processed.
Therefore, in one embodiment, in the parallel processing step, the parallel processing based on the iterative class operation is implemented according to the following steps:
the task scheduling node judges whether the detection data needing to be processed exist or not, if yes, the total track number x of the detection data needing to be processed is counted, and the read-write node and the computing node are started;
setting a scheduling base number K according to hardware resources, x and the size of a calculation unit of the iterative algorithm, and reading x/K channels by a read-write node to serve as first round of processing data; the task scheduling node averagely divides the processing data of the current round into N parts and distributes the N parts to each computing node;
and then, sequentially and circularly executing the following steps until all the detection data to be processed are operated, and releasing the read-write nodes and the computing nodes:
the task scheduling node counts the number n of idle computing nodes in real time,
read-write node read
Figure RE-GDA0003281717670000111
The data is divided into n parts on average and distributed to each idle computing node;
each computing node executes operation in parallel according to the iteration workflow and feeds back operation state information to the task scheduling node in real time;
and the parallel scheduling node confirms whether the processed detection data after calculation is received or not, if so, the processed detection data is transmitted to the read-write node, and the read-write node integrates according to the corresponding spatial structure and writes the integrated data into a storage area.
While, for purposes of simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present invention is not limited by the illustrated ordering of acts, as some steps may occur in other orders or concurrently with other steps in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
It should be noted that, in other embodiments of the present invention, the method may further obtain a new data parallel processing method by combining one or some of the above embodiments, so as to implement efficient analysis and operation on mass detection data.
By adopting the scheme in any one or more of the embodiments of the invention, the parallel processing of the railway detection data is realized, the load balance of each parallel computing channel is further improved on the basis of realizing the rapid and accurate operation, the utilization efficiency of parallel computing resources is furthest exerted, and the reliable data support is provided for the subsequent application research.
It should be noted that, based on the method in any one or more of the above embodiments of the present invention, the present invention further provides a storage medium, on which a program code capable of implementing the method in any one or more of the above embodiments is stored, and when the program code is executed by an operating system, the method for processing railway detection big data in parallel as described above can be implemented.
Example two
The method is described in detail in the embodiments disclosed in the present invention, and the method of the present invention can be implemented by using various forms of apparatuses or systems, so based on other aspects of the method described in any one or more of the embodiments, the present invention further provides a system for parallel processing of railway detection big data, which is used for executing the method for parallel processing of railway detection big data described in any one or more of the embodiments. Specific examples are given below for a detailed description.
Specifically, fig. 2 is a schematic structural diagram of a system for parallel processing of railway detection big data provided in an embodiment of the present invention, and as shown in fig. 2, the system includes:
the data storage module is configured to divide the acquired detection data and store the divided detection data according to a set storage strategy;
the system comprises an algorithm combination formulation module, a calculation workflow and a data analysis module, wherein the algorithm combination formulation module is configured to create a corresponding calculation workflow according to engineering property requirements and interpretation target requirements of detection data, and the calculation workflow is a combination of a plurality of signal processing algorithms required by the detection data and comprises a conventional workflow and an iterative workflow;
and the parallel processing module is configured to parallelly call the stored detection data based on a pre-established parallel node architecture, and respectively realize parallel processing based on conventional class operation and iterative class operation according to a scheduling strategy matched with the operation workflow.
In one embodiment, the data storage module includes a data dividing unit configured to divide the railway detection data according to the acquisition area and the acquisition device type to obtain detection data of each level with different processing priorities.
Further, in one embodiment, when storing the partitioned detection data, the data storage unit of the data storage module adopts multiple storage modes of a MySQL relational database and an HDFS, and implements partition storage of the detection data in multiple formats by combining priority information;
the detection data comprises a file header and a data body, wherein the file header comprises the total number of tracks, the number of sampling points, the antenna frequency, the track interval and the time window parameter during data acquisition.
The data body of the detection data file comprises track head data and data content, the track head data of each detection data and the file head are stored in a relational database in an associated mode, and the data content of each detection data is stored in a distributed database.
Further, in one embodiment, the system further comprises: and a node architecture creating step, namely setting a main node as a task scheduling node, and forming a parallel node architecture together with a read-write node and N computing nodes.
Preferably, in an embodiment, the parallel processing module includes a regular operation processing unit configured to implement parallel processing based on regular class operations according to the following steps:
the task scheduling node judges whether the detection data needing to be processed exist or not, if yes, the total track number x of the detection data needing to be processed is counted, and the read-write node and the computing node are started;
sequentially and circularly executing the following steps until all the detection data to be processed are operated, and releasing the read-write nodes and the computing nodes:
reading x/K channels by a read-write node as the processing data of the current round based on a preset scheduling base number K; the task scheduling node averagely divides the processing data of the current round into N parts and distributes the N parts to each computing node;
each computing node executes operation in parallel according to the conventional workflow;
the parallel scheduling node confirms whether the processed detection data which is calculated is received or not, if the processed detection data is received, the processed detection data is transmitted to the read-write node, and the read-write node integrates the processed detection data according to the corresponding space structure and writes the integrated detection data into a storage area;
and K is a positive integer, K is more than or equal to 1 and less than or equal to N, or the optimization is further carried out by combining the channel number S corresponding to the unit marking data, and the scheduling base number K is determined.
In one embodiment, the parallel processing module includes an iterative operation processing unit configured to implement parallel processing based on iterative class operations according to the following steps:
the task scheduling node judges whether the detection data needing to be processed exist or not, if yes, the total track number x of the detection data needing to be processed is counted, and the read-write node and the computing node are started;
setting a scheduling base number K according to hardware resources, x and the size of a calculation unit of the iterative algorithm, and reading x/K channels by a read-write node to serve as first round of processing data; the task scheduling node averagely divides the processing data of the current round into N parts and distributes the N parts to each computing node;
and then, sequentially and circularly executing the following steps until all the detection data to be processed are operated, and releasing the read-write nodes and the computing nodes:
the task scheduling node counts the number n of idle computing nodes in real time,
read-write node read
Figure RE-GDA0003281717670000131
The data is divided into n parts on average and distributed to each idle computing node;
each computing node executes operation in parallel according to the iteration workflow and feeds back operation state information to the task scheduling node in real time;
and the parallel scheduling node confirms whether the processed detection data after calculation is received or not, if so, the processed detection data is transmitted to the read-write node, and the read-write node integrates according to the corresponding spatial structure and writes the integrated data into a storage area.
In the system for processing the railway detection big data in parallel provided by the embodiment of the invention, each module or unit structure can independently operate or operate in a combined mode according to actual analysis and operation requirements so as to realize corresponding technical effects.
It is to be understood that the disclosed embodiments of the invention are not limited to the particular structures, process steps, or materials disclosed herein but are extended to equivalents thereof as would be understood by those ordinarily skilled in the relevant arts. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.
Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Thus, appearances of the phrase "an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment.
Although the embodiments of the present invention have been described above, the above descriptions are only for the convenience of understanding the present invention, and are not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (9)

1. A method for parallel processing of big railway inspection data, the method comprising:
the data storage step, dividing the acquired detection data, and storing the divided detection data according to a set storage strategy;
the method comprises the steps of establishing an algorithm combination, establishing a corresponding operation workflow according to engineering property requirements and target interpretation requirements of detection data, wherein the operation workflow is a combination of a plurality of signal processing algorithms required by the detection data and comprises a conventional workflow and an iterative workflow;
and a parallel processing step, parallelly calling the stored detection data based on a pre-established parallel node architecture, and respectively realizing parallel processing based on conventional operation and iterative operation according to a scheduling strategy matched with the operation workflow.
2. The method according to claim 1, wherein in the data storage step, the railway detection data is divided according to the acquisition area and the acquisition equipment type to obtain detection data of each stage with different processing priorities.
3. The method according to claim 1, characterized in that when storing the divided detection data, a plurality of storage modes of a relational database and a distributed database are adopted, and the parallel storage of the detection data in a plurality of formats is realized by combining priority information;
the data body of the detection data file comprises track head data and data content, the track head data of each detection data and the file head are stored in a relational database in an associated mode, and the data content of each detection data is stored in a distributed database.
4. The method of claim 1, further comprising: and a node architecture creating step, namely setting a main node as a task scheduling node, and forming a parallel node architecture together with a read-write node and N computing nodes.
5. The method according to claim 1, wherein in the parallel processing step, parallel processing based on normal class operation is implemented according to the following steps:
the task scheduling node judges whether the detection data needing to be processed exist or not, if yes, the total track number x of the detection data needing to be processed is counted, and the read-write node and the computing node are started;
sequentially and circularly executing the following steps until all the detection data to be processed are operated, and releasing the read-write nodes and the computing nodes:
reading x/K channels by a read-write node as the processing data of the current round based on a preset scheduling base number K; the task scheduling node averagely divides the processing data of the current round into N parts and distributes the N parts to each computing node;
each computing node executes operation in parallel according to the conventional workflow;
the parallel scheduling node confirms whether the processed detection data which is calculated is received or not, if the processed detection data is received, the processed detection data is transmitted to the read-write node, and the read-write node integrates the processed detection data according to the corresponding space structure and writes the integrated detection data into a storage area;
and K is a positive integer, K is more than or equal to 1 and less than or equal to N, or the optimization is further carried out by combining the channel number S corresponding to the unit marking data, and the scheduling base number K is determined.
6. The method according to claim 1, wherein in the parallel processing step, the parallel processing based on iterative class operation is implemented according to the following steps:
the task scheduling node judges whether the detection data needing to be processed exist or not, if yes, the total track number x of the detection data needing to be processed is counted, and the read-write node and the computing node are started;
setting a scheduling base number K according to hardware resources, x and the size of a calculation unit of the iterative algorithm, and reading x/K channels by a read-write node to serve as first round of processing data; the task scheduling node averagely divides the processing data of the current round into N parts and distributes the N parts to each computing node;
and then, sequentially and circularly executing the following steps until all the detection data to be processed are operated, and releasing the read-write nodes and the computing nodes:
the task scheduling node counts the number n of idle computing nodes in real time,
read-write node read
Figure FDA0003116769090000021
The data is divided into n parts on average and distributed to each idle computing node;
each computing node executes operation in parallel according to the iteration workflow and feeds back operation state information to the task scheduling node in real time;
and the parallel scheduling node confirms whether the processed detection data after calculation is received or not, if so, the processed detection data is transmitted to the read-write node, and the read-write node integrates according to the corresponding spatial structure and writes the integrated data into a storage area.
7. A storage medium having program code stored thereon for implementing the method of any one of claims 1-6.
8. A system for parallel processing of big railway inspection data, the system comprising:
the data storage module is configured to divide the acquired detection data and store the divided detection data according to a set storage strategy;
the system comprises an algorithm combination formulation module, a calculation workflow and a data analysis module, wherein the algorithm combination formulation module is configured to create a corresponding calculation workflow according to engineering property requirements and interpretation target requirements of detection data, and the calculation workflow is a combination of a plurality of signal processing algorithms required by the detection data and comprises a conventional workflow and an iterative workflow;
and the parallel processing module is configured to parallelly call the stored detection data based on a pre-established parallel node architecture, and respectively realize parallel processing based on conventional class operation and iterative class operation according to a scheduling strategy matched with the operation workflow.
9. The method of claim 1, wherein the system further comprises: the node architecture creating module is configured to set a main node as a task scheduling node, and the main node, a read-write node and N computing nodes form a parallel node architecture together.
CN202110665774.6A 2021-06-16 2021-06-16 Method and system for parallel processing of railway detection big data Pending CN113625264A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110665774.6A CN113625264A (en) 2021-06-16 2021-06-16 Method and system for parallel processing of railway detection big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110665774.6A CN113625264A (en) 2021-06-16 2021-06-16 Method and system for parallel processing of railway detection big data

Publications (1)

Publication Number Publication Date
CN113625264A true CN113625264A (en) 2021-11-09

Family

ID=78378122

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110665774.6A Pending CN113625264A (en) 2021-06-16 2021-06-16 Method and system for parallel processing of railway detection big data

Country Status (1)

Country Link
CN (1) CN113625264A (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5968109A (en) * 1996-10-25 1999-10-19 Navigation Technologies Corporation System and method for use and storage of geographic data on physical media
WO2000031663A1 (en) * 1998-11-24 2000-06-02 Matsushita Electric Industrial Co., Ltd. Data structure of digital map file
WO2002035359A2 (en) * 2000-10-26 2002-05-02 Prismedia Networks, Inc. Method and system for managing distributed content and related metadata
CN101110079A (en) * 2007-06-27 2008-01-23 中国科学院遥感应用研究所 Digital globe antetype system
CN103338261A (en) * 2013-07-04 2013-10-02 北京泰乐德信息技术有限公司 Storage and processing method and system of rail transit monitoring data
CN103969627A (en) * 2014-05-26 2014-08-06 苏州市数字城市工程研究中心有限公司 Ground penetrating radar large-scale three-dimensional forward modeling method based on FDTD
CN105083336A (en) * 2014-05-19 2015-11-25 塔塔顾问服务有限公司 System and method for generating vehicle movement plans in a large railway network
CN107423338A (en) * 2017-04-28 2017-12-01 中国铁道科学研究院 A kind of railway combined detection data display method and device
CN108804220A (en) * 2018-01-31 2018-11-13 中国地质大学(武汉) A method of the satellite task planning algorithm research based on parallel computation
KR101950935B1 (en) * 2017-09-27 2019-02-22 (주) 퓨처젠 System for sniffing detection data of railway safety device, and program
CN109588054A (en) * 2016-06-18 2019-04-05 分形工业公司 Accurate and detailed modeling using distributed simulation engine to the system with large complicated data set
CN110059631A (en) * 2019-04-19 2019-07-26 中铁第一勘察设计院集团有限公司 The contactless monitoring defect identification method of contact net
CN110073301A (en) * 2017-08-02 2019-07-30 强力物联网投资组合2016有限公司 The detection method and system under data collection environment in industrial Internet of Things with large data sets

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5968109A (en) * 1996-10-25 1999-10-19 Navigation Technologies Corporation System and method for use and storage of geographic data on physical media
WO2000031663A1 (en) * 1998-11-24 2000-06-02 Matsushita Electric Industrial Co., Ltd. Data structure of digital map file
WO2002035359A2 (en) * 2000-10-26 2002-05-02 Prismedia Networks, Inc. Method and system for managing distributed content and related metadata
CN101110079A (en) * 2007-06-27 2008-01-23 中国科学院遥感应用研究所 Digital globe antetype system
CN103338261A (en) * 2013-07-04 2013-10-02 北京泰乐德信息技术有限公司 Storage and processing method and system of rail transit monitoring data
CN105083336A (en) * 2014-05-19 2015-11-25 塔塔顾问服务有限公司 System and method for generating vehicle movement plans in a large railway network
CN103969627A (en) * 2014-05-26 2014-08-06 苏州市数字城市工程研究中心有限公司 Ground penetrating radar large-scale three-dimensional forward modeling method based on FDTD
CN109588054A (en) * 2016-06-18 2019-04-05 分形工业公司 Accurate and detailed modeling using distributed simulation engine to the system with large complicated data set
CN107423338A (en) * 2017-04-28 2017-12-01 中国铁道科学研究院 A kind of railway combined detection data display method and device
CN110073301A (en) * 2017-08-02 2019-07-30 强力物联网投资组合2016有限公司 The detection method and system under data collection environment in industrial Internet of Things with large data sets
KR101950935B1 (en) * 2017-09-27 2019-02-22 (주) 퓨처젠 System for sniffing detection data of railway safety device, and program
CN108804220A (en) * 2018-01-31 2018-11-13 中国地质大学(武汉) A method of the satellite task planning algorithm research based on parallel computation
CN110059631A (en) * 2019-04-19 2019-07-26 中铁第一勘察设计院集团有限公司 The contactless monitoring defect identification method of contact net

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LIANG JIA 等: ""Impulse GPR Echo Parallel Acquisition System Design Based on FPGA"", 《APPLIED MECHANICS AND MATERIALS》, vol. 3365, no. 602, pages 2917 - 2921 *
杜翠 等: ""铁路GPR检测数据智能管理分析系统的研究与设计"", 《铁路计算机应用》, vol. 28, no. 06, pages 78 - 82 *
梁胤程 等: ""铁路路基状态检测中探地雷达数据并行处理"", 《中国铁道科学》, vol. 38, no. 2, pages 11 - 18 *

Similar Documents

Publication Publication Date Title
CN110199273B (en) System and method for loading, aggregating and bulk computing in one scan in a multidimensional database environment
US10049134B2 (en) Method and system for processing queries over datasets stored using hierarchical data structures
CN102915347B (en) A kind of distributed traffic clustering method and system
CN106547882A (en) A kind of real-time processing method and system of big data of marketing in intelligent grid
CN105630988A (en) Method and system for rapidly detecting space data changes and updating data
KR20110035899A (en) Dimensional reduction mechanisms for representing massive communication network graphs for structural queries
Duggan et al. Skew-aware join optimization for array databases
CN103678671A (en) Dynamic community detection method in social network
CN104156463A (en) Big-data clustering ensemble method based on MapReduce
Gupta et al. Faster as well as early measurements from big data predictive analytics model
Li et al. Parallelizing skyline queries over uncertain data streams with sliding window partitioning and grid index
CN106649828A (en) Data query method and system
Lei et al. An incremental clustering algorithm based on grid
US20230418824A1 (en) Workload-aware column inprints
Tran et al. Conditioning and aggregating uncertain data streams: Going beyond expectations
KR20180120570A (en) Method and apparatus for graph generation
Ji et al. Scalable nearest neighbor query processing based on inverted grid index
CN109657197A (en) A kind of pre-stack depth migration calculation method and system
CN113010597B (en) Ocean big data-oriented parallel association rule mining method
Lim et al. Lazy and eager approaches for the set cover problem
CN110058942B (en) Resource allocation system and method based on analytic hierarchy process
CN113625264A (en) Method and system for parallel processing of railway detection big data
US8832157B1 (en) System, method, and computer-readable medium that facilitates efficient processing of distinct counts on several columns in a parallel processing system
CN109633781B (en) Geological property acquisition method and device, electronic equipment and storage medium
Wu et al. Indexing blocks to reduce space and time requirements for searching large data files

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination