CN114936224A

CN114936224A - Rail inspection data service system based on Hadoop

Info

Publication number: CN114936224A
Application number: CN202210648201.7A
Authority: CN
Inventors: 朱洪涛; 朱磊; 陶捷; 王志勇; 吴维军; 张苗苗
Original assignee: Jiangxi Everbright Measurement And Control Technology Co ltd
Current assignee: Jiangxi Everbright Measurement And Control Technology Co ltd
Priority date: 2022-06-09
Filing date: 2022-06-09
Publication date: 2022-08-23

Abstract

The invention provides a track inspection data service system based on Hadoop, which comprises: the data acquisition module is used for acquiring current rail inspection data of each rail inspection device; the data processing module comprises a data alignment unit and a data processing unit, wherein the data alignment unit is used for carrying out local interpolation and compression on each current rail detection data by using the historical rail detection data so as to realize the feature point matching of the historical rail detection data and each current rail detection data and further obtain the rail detection data with aligned data; the data processing unit is used for carrying out data preprocessing on the rail detection data aligned with the data to obtain a plurality of groups of preprocessed rail detection data, and carrying out distributed processing on the plurality of groups of preprocessed rail detection data to realize information interaction between a user and each rail detection device. According to the method, data alignment is carried out on current rail detection data and historical rail detection data by using the characteristic points as marks through a data alignment unit, and mileage error interference is eliminated for subsequent data analysis; the efficient utilization and classification management of the rail inspection data are improved through distributed processing.

Description

Rail inspection data service system based on Hadoop

Technical Field

The invention relates to the technical field of data processing, in particular to a track inspection data service system based on Hadoop.

Background

The internet is already a great strategy for the common development of all industries in China. The development of high-tech information technologies such as internet, big data, cloud computing and the like is rapid, and the internet integrates the information technologies with various fields, so that the traditional business is digitalized and online, and the innovation and transformation of the traditional industry are realized.

In the past, the analysis and processing of the track detection data are basically based on a single machine or C/S architecture mode, an information island is easily formed, basic data are difficult to systematically classify and store, and the formed analysis result is not comprehensive enough and has no macroscopic property. By means of the new technology of internet +' at present, a large amount of orbit detection data are collected to one block through the internet to be subjected to systematic classified storage and mining analysis, the information island barrier is broken through, a uniform big data analysis cloud platform is formed, a comprehensive and macroscopic data visualization analysis result is provided, and the method becomes a development direction and a market hotspot of current orbit detection data mining analysis.

The rail inspection data service platform is a data platform which is in line with the high-level development of the current social informatization technology and is used for serving the reform of a rail detection system. The method mainly aims to strengthen the track detection analysis, establish a standardized management system, promote the sharing of data of a track system and the mutual cooperation of various businesses, provide a high-efficiency, accurate and firm data basis for decision making, and improve the predictability and pertinence of track maintenance work. There are some problems with the management of current tracking information:

(1) the high-efficiency storage and classification management of various files generated by various services of track detection cannot be met, the compatibility of various different information is poor, uniform access and high-efficiency accurate query and statistics cannot be achieved, and the defect is that the current track detection information has a big defect;

(2) various data are not managed hierarchically, are relatively disordered in use, and do not have a uniform data storage management center and a data application platform;

(3) the application service and the application mode have no good expandability, and the platform service function cannot be perfected in stages;

(4) the compatibility is poor, the system can not be well fused with other information systems, and the system operation and maintenance cost is high;

(5) the method has the advantages that the value of data is not mined sufficiently, modes, rules and relations hidden in mass data cannot be explored and found deeply, knowledge with high value density cannot be extracted from data with low value density, and the method is in a stage of 'surplus data and insufficient information';

(6) the data center has weak automatic instantaneous fault transfer capability and cannot ensure the safety of data.

Disclosure of Invention

Based on this, the invention aims to provide a track inspection data service system based on Hadoop to at least solve the defects in the technology.

The invention provides a track inspection data service system based on Hadoop, which comprises a data acquisition module and a data processing module, wherein the data acquisition module comprises:

the data acquisition module is used for acquiring current rail inspection data of each rail inspection device;

the data processing module comprises a data alignment unit and a data processing unit, wherein the data alignment unit is used for carrying out local interpolation and compression on each current rail detection data by using historical rail detection data so as to realize the matching of the historical rail detection data and the characteristic points of each current rail detection data and further obtain the rail detection data with aligned data;

the data processing unit is used for performing data preprocessing on the rail detection data with aligned data to obtain a plurality of groups of preprocessed rail detection data, performing distributed processing on the plurality of groups of preprocessed rail detection data to judge whether the rail detection data with aligned data meets preset requirements or not, and if the rail detection data with aligned data meets the preset requirements, generating a corresponding data report to realize information interaction between a user and each rail detection device.

Further, the track inspection data service system based on the Hadoop further comprises a database support module, wherein the database support module is used for generating a plurality of track inspection ledger databases according to the track inspection data acquired by each track inspection device and generating a plurality of authority management databases according to user authorities, the plurality of track inspection ledger databases at least comprise a plane curve database, a vertical curve database, a CP3 database and a sleeper information database, and the plurality of authority management databases at least comprise a standard management database.

Furthermore, the data processing module further comprises a file storage unit, wherein the file storage unit is used for uniformly distributing files with a size smaller than a preset storage size according to the occupied memory space, so that the combined files meet a preset maximum data block threshold.

Further, the file storage unit is specifically configured to:

when a file storage request is received, initializing a data queue, and creating a plurality of temporary queues, a plurality of merging queues and a plurality of file information mapping tables, wherein the number of the temporary queues is less than that of the merging queues;

acquiring a currently uploaded file, comparing the storage size of the currently uploaded file with the remaining storage space of the temporary queue, if the remaining storage space of the temporary queue is larger than the storage size of the currently uploaded file, storing the currently uploaded file into the temporary queue, and checking whether the currently uploaded file exists in the file information mapping table;

if the currently uploaded file exists in the file information mapping table, judging whether the space occupied by all files in the temporary queue reaches a first threshold value of a preset threshold value space of the temporary queue;

if the space occupied by all the files in the temporary queue reaches a first threshold value of a preset threshold value space of the temporary queue, merging all the files in the temporary queue until the space reaches the preset threshold value space of the temporary queue, and obtaining a merged queue corresponding to the temporary queue;

finding the file occupying the largest storage space in the temporary queue, and judging whether the storage size of the file occupying the largest storage space is larger than a second threshold value of a preset threshold value space of the temporary queue;

and if the storage size of the file occupying the largest storage space is larger than a second threshold value of a preset threshold value space of the temporary queue, storing the currently uploaded file in a merging queue corresponding to the temporary queue, searching for friendly files in other temporary queues, and storing all the found friendly files in the merging queue corresponding to the temporary queue for file packaging and uploading.

Further, the file storage unit is further configured to:

if the storage size of the file occupying the largest storage space is not larger than a second threshold value of a preset threshold value space of the temporary queue, continuously uploading the file to the temporary queue until the space occupied by all the files in the temporary queue is larger than or equal to the second threshold value of the preset threshold value space of the temporary queue;

and storing the currently uploaded files in the merging queue corresponding to the temporary queue, searching friendly files in other temporary queues, and storing all the searched friendly files in the merging queue corresponding to the temporary queue for file packaging and uploading.

Further, the data alignment unit is specifically configured to:

giving two identical characteristic sequences with equal time intervals in the historical track inspection data and each current track inspection data, and calculating the distance between any two points between the two identical characteristic sequences to obtain a distance matrix of the two identical characteristic sequences;

and calculating an optimal planning path of the distance matrix of the two identical characteristic sequences by using a dynamic planning method, and calculating the minimum dynamic time warping distance of the two identical characteristic sequences according to the optimal planning path so as to align the data of the two identical characteristic sequences.

Furthermore, the track inspection data service system based on Hadoop further comprises an analysis module, wherein the analysis module is used for carrying out data analysis on the data acquired by the data acquisition module, and the data analysis at least comprises contour data analysis, corrugation data analysis, warping analysis and weld flatness analysis.

Furthermore, the track inspection data service system based on Hadoop also comprises a report generation module, wherein the report generation module is used for converting the data output by each module into a corresponding data report so as to enable a user to check the data report.

Furthermore, the track inspection data service system based on Hadoop also comprises a data sharing module, wherein the data sharing module is used for creating a data sharing rule and providing uniform data distribution export service for each platform by using the data sharing rule.

The system further comprises a data visualization module, wherein the data visualization module comprises a first visualization module and a second visualization module, and the first visualization module is used for connecting the sampling data acquired by the data acquisition module at different times so as to show the variation degree of the measurement signal of the data acquisition module and the time; the second visualization module is used for carrying out integration processing on the data signals sampled by the data acquisition module so as to display the current signal intensity of the data acquisition module.

Compared with the prior art, the invention has the beneficial effects that: data alignment is carried out on each current rail detection data and historical rail detection data by using the characteristic points as marks through a data alignment unit, and mileage error interference is eliminated for subsequent data analysis; the rail inspection data aligned with the data are split, and distributed processing is performed on the plurality of groups of preprocessed rail inspection data, so that efficient utilization and classification management of the rail inspection data are improved, and users can achieve unified access, efficient and accurate query and statistics.

Drawings

FIG. 1 is a block diagram of a Hadoop-based track inspection data service system according to an embodiment of the present invention;

FIG. 2 is a block diagram of a data processing module according to an embodiment of the present invention;

FIG. 3 is a model distribution diagram of a Hadoop-based orbit inspection data service system according to an embodiment of the present invention;

FIG. 4 is a table relating to databases for unified authentication and authorization control in an embodiment of the present invention;

FIG. 5 is an organizational chart of a railroad department in an embodiment of the invention;

FIG. 6 is an exemplary illustration of a star atlas of embodiments of the invention;

FIG. 7 is an inheritance graph of an object-oriented class structure in an embodiment of the present invention;

FIG. 8 is a diagram illustrating algorithm merging after optimization according to an embodiment of the present invention;

FIG. 9 is a block diagram of an overall architecture of the track inspection data service system according to an embodiment of the present invention;

FIG. 10 is a diagram of the technical architecture of the orbit detection data service system in an embodiment of the present invention;

FIG. 11 is a diagram illustrating the rights management of the track inspection data service system according to an embodiment of the present invention;

FIG. 12 is an interface diagram of the service components of the Cloudera Manager in an embodiment of the present invention;

FIG. 13 is a user management interface diagram of the track check data service system in an embodiment of the present invention;

FIG. 14 is a diagram of an organization management interface of the track check data service system in an embodiment of the present invention;

FIG. 15 is a diagram of a role management interface of the track inspection data service system according to an embodiment of the present invention;

FIG. 16 is a diagram of a menu management interface of the track check data service system in accordance with an embodiment of the present invention;

FIG. 17 is a SQL monitoring interface diagram of the rail inspection data service system in the embodiment of the present invention;

FIG. 18 is a diagram of an operation log interface of the orbital examination data service system in an embodiment of the invention;

FIG. 19 is a block diagram of an error log interface of the track check data service system according to an embodiment of the present invention.

Description of the main element symbols:

the following detailed description will further illustrate the invention in conjunction with the above-described figures.

Detailed Description

To facilitate an understanding of the invention, the invention will now be described more fully with reference to the accompanying drawings. Several embodiments of the invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

It will be understood that when an element is referred to as being "secured to" another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like as used herein are for illustrative purposes only.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

It should be noted that, with the rapid development of railways, mass data generated during train maintenance is not effectively utilized, and the data has great significance for self-diagnosis of equipment, intelligent track analysis, information fusion and data post-processing. The rail inspection data service platform conforms to the development level of the current informatization technology, constructs a rail inspection data intelligent service platform based on technologies such as big data analysis and cloud computing, and realizes efficient acquisition of various information, interconnection and intercommunication of various types of inspection equipment and data sharing of various information platforms in daily operation and inspection operation of trains.

In the application, the track inspection data service system comprises the steps of inputting daily inspection data, storing inspection generated report files and processing the inspection data, and a user can check processed information on the track inspection data service system through a mobile terminal such as a PC (personal computer), a mobile phone and the like, and the method comprises the steps of checking various reports on line and displaying the inspection working data of the track inspection instrument in real time.

Hadoop is an open-source distributed computing platform on Apache Nutch, not only provides a distributed infrastructure for users, but also has very transparent system bottom level details. Meanwhile, the Hadoop is mainly developed by adopting Java language, and the Hadoop also inherits the excellent cross-platform characteristics of the Java language, and most importantly, the Hadoop can well reduce the cost.

Referring to fig. 1 to fig. 3, the track inspection data service system based on Hadoop according to the embodiment of the present invention includes, but is not limited to, a data acquisition module, a database support module, a data processing module, an analysis module, a report generation module, a data sharing module, and a data visualization module, where the data acquisition module is configured to acquire track inspection data of each track inspection device, and the track inspection devices include, but are not limited to, a track inspection tester, a track measurement instrument, a corrugation meter, a structure inspection tester, a contact network monitor, and a numerical scale, and the data acquisition module transmits the track inspection data uploaded by the track inspection device to a corresponding module for corresponding analysis and processing, and stores the processed data in a classified manner, so that a user can view the data through a corresponding user interface.

The rail detection data service system based on the Hadoop also has an early warning function, and when the input data are detected to be abnormal after the input data are analyzed and processed by the modules, early warning information is sent in time to remind an operator to detect again by using the corresponding rail detection equipment.

Specifically, the data processing module comprises a data alignment unit and a data processing unit, wherein the data alignment unit is used for performing local interpolation and compression on each piece of current rail detection data by using historical rail detection data so as to realize feature point matching of the historical rail detection data and each piece of current rail detection data and further obtain rail detection data with aligned data;

it should be noted that in this embodiment, a dynamic warping algorithm, that is, a dtw (dynamic Time warping) algorithm, is adopted, a previous measurement result is used as a template, data measured this Time is used as data to be aligned, and local interpolation and compression are performed on features of the data measured this Time, so as to implement matching of feature points, and achieve high-precision alignment of the two measured data.

Further, the data alignment unit is specifically configured to:

giving two identical characteristic sequences with equal time intervals in the historical track inspection data and each current track inspection data, and calculating the distance between any two points between the two identical characteristic sequences to obtain a distance matrix D of the two identical characteristic sequences;

in specific implementation, two identical characteristic sequences with equal time intervals are given in the historical track inspection data and each current track inspection data, and the query sequences are respectively S ═ S { (S) ₁ ，s ₂ ，…，s _m H, matching sequence M ═ M ₁ ，m ₂ ，…，m _n And calculating the distance between any two points between the query sequence and the matching sequence according to the following formula:

d(i，j)＝||s _i -m _j || _w ；

wherein i is 1, 2, …, m, j is 1, 2, …, n; when w is 1, manhattan distance; when w is 2, it is the euclidean distance.

The resulting distance matrix is:

further, an optimal planning path of the distance matrix D is obtained by means of a DP (dynamic programming) method according to the following formula:

P _best ＝{p ₁ ，p ₂ ，…，p _k ，…，p _K }；

in the formula p _k Indicating the position of the path plan, i.e. p _k ＝(i，j) _k Denotes s _m And m _n Aligning, wherein max (m, n) is more than or equal to K and less than or equal to m + n-1.

In order to ensure that all searched paths have meanings, any searched path must meet the following constraint conditions:

(1) boundary property: the starting point and the end point of the shortest path are fixed, namely the following formula is shown:

(2) monotonicity: in the same measurement, the feature calculated immediately before cannot appear after the feature at the next moment, i.e. when p is given _k ＝(s _i ，m _j ) And p _k+1 ＝(s _i′ ，m _j′ ) When the sum of i 'is less than or equal to i +1, and j' is less than or equal to j + 1.

p ₁ And p _K Regular path p between _k Determined by constructing a cost matrix, the matrix elements γ (i, j) are defined as:

γ(i，j)＝d(i，j)+min[γ(i-1，j-1)，γ(i-1，j)，γ(i，j-1)]；

in the formula, i ∈ {1, 2, …, m }, j ∈ {1, 2, …, n }, γ (0, 0) ═ 0, and γ (i, 0) ═ γ (0, j) ∞.

Optimal regular path P _best Should minimize the cumulative distance value of S and M, the dynamic time warping distance DTW (S, M) is calculated as shown in the following equation:

further, the data processing unit is configured to perform data preprocessing on the rail inspection data with aligned data to obtain multiple sets of preprocessed rail inspection data, perform distributed processing on the multiple sets of preprocessed rail inspection data to determine whether the rail inspection data with aligned data meets preset requirements, and if the rail inspection data with aligned data meets the preset requirements, generate a corresponding data report to implement information interaction between a user and each rail inspection device.

In specific implementation, the data processing unit splits the rail inspection data aligned with the data to obtain multiple groups of split rail inspection data, multiple intermediate results in a form of < key, value > are generated during splitting, the rail inspection data with the same intermediate results are divided into the same processing subunit to be processed, whether the rail inspection data aligned with the data meet the requirements of the corresponding processing subunit is judged, if the rail inspection data aligned with the data meet the requirements, the rail inspection data detected by the rail inspection equipment is normal, that is, the rail detected by the rail inspection equipment is in a normal state, and after the processing subunit performs data processing, whether the rail inspection data meet the preset requirements or not, a corresponding data report is generated to realize information interaction between a user and each rail inspection equipment.

In some optional embodiments, the Hadoop-based track inspection data service system further includes a database support module, where the database support module is configured to generate a plurality of track inspection ledger databases according to the track inspection data acquired by each track inspection device, and generate a plurality of right management databases according to user rights, where the plurality of track inspection ledger databases at least include a plane curve database, a vertical curve database, a CP3 database, and a sleeper information database, and the plurality of right management databases at least include a specification management database.

The plane curve database and the vertical curve database are used for counting relevant elements and attributes related to the plane curve and the vertical curve according to the rail detection data acquired by each rail detection device. The information is as follows:

the related attributes of the plane curve are mainly as follows: the cross point number, the weft distance (m), the warp distance (m) (or the north coordinate m, the east coordinate m, the elevation m), the steering, the radius (m), the slow length (m), the tangent length (m), the curve length (m), the straight point (Km), the slow point (Km), the straight line clamping length (m), the super height (mm), the track gauge widening (mm), the breakpoint mileage (Km), the corrected mileage (Km), the straight down slope (m), the total length (m) of the down slope, remarks and remarks 2. The flat curve database table structure is shown in table 1 below.

TABLE 1

The vertical curves mainly relate to the following relevant attributes: serial number, mileage at variable slope points (Km), elevation of a rail surface (m), slope rate ([ permillage ]), slope length (m), radius (m), tangent length T1(m), tangent length T2(m), outer vector distance E (m) and remarks. The vertical curve database table structure is shown in table 2 below.

TABLE 2

The element information of the plane curve and the vertical curve has strong structural characteristics, and the background stores and maintains the data by adopting a relational database MySQL.

In addition to establishing a corresponding data table to store the information of the plane curve and the vertical curve, in order to facilitate the system indexing and maintaining the information in the two data tables, specific line information (belonging unit department, line name, row type and the like) needs to be established to be associated with the information, and the line information table is in one-to-many relationship with the plane curve and the vertical curve. The routing information database table structure is shown in table 3 below.

TABLE 3

TABLE 3.3 line information database Table Structure

This table 3 is associated with the system section door table to form a many-to-one relationship. After the corresponding data table is established to store the information, the corresponding operation function of the platform needs to be determined to complete the maintenance of the information.

Further, the CP3 database includes CP3 coordinate information, and the related attributes related to the CP3 coordinate information include: the CP3 controls point number, north coordinate (m), east coordinate (m), elevation (m), and the like. The CP3 coordinate information table database table structure is shown in table 4.

TABLE 4

The related information table mainly includes a specific circuit information table and a control point detailed quantity table. This table 4 is associated with the CP3 control point detail table to form a many-to-one relationship. The specific route information table is consistent with table 3. The CP3 control point detail table may retrieve the relevant attributes: the buried point name, the starting mileage, the ending mileage, the length (m), the number (pair) of CP III, etc. The CP3 control point detail quantity database table structure is shown in table 5.

TABLE 5

The platform simultaneously provides a maintenance operation function similar to a plane curve and vertical curve service module, and supports a batch import function under the condition of meeting a specified template.

Further, the related attributes in the tie information data table include: design mileage, sleeper number, fastener type, fastener parameters (left, right, inner side, outer side and height adjustment), line characteristics (bridge, culvert, tunnel and the like), limit and the like; and the information is also associated with specific routing information.

Wherein the value range of the fastener type is as follows: WJ-7(A), WJ-7(B), WJ-8(A), WJ-8(B), SFC (inline), SFC (staggered), 300-1a, 300-1u, etc. The tie list information database table structure is shown in table 6.

TABLE 6

The information of this table 6 is in a many-to-one relationship with a specific line information table by the foreign key line _ id. The system also provides a maintenance operation function and supports the batch import function under the condition of meeting the specified template.

In this embodiment, the rights management database further includes an equipment authorization code management database, where the relevant attributes in the equipment authorization code management database include: the method comprises the steps of applying information such as a user, authorization time, a device model, a device name, an authorization computer model, an authorization computer serial number (or a machine code and the like), an authorization code, a logout user, logout time, a logout flag bit and the like, and associating the information with a system department table to form a many-to-one relation so as to specify the attribution of the device authorization code and facilitate the inquiry and statistics of data in the future. The device authorization code management database table structure is shown in table 7.

TABLE 7

The data table 7 is provided with additional information such as application user, application time, logout user, logout time and the like, and is combined with system log information, so that a manager can conveniently master the state of the equipment authorization code from application, use to logout of the whole life cycle at any time.

The standard management database comprises a management system (web) and a portal website (web), a plurality of systems share one service back end, and the back end uses a shiro framework to perform unified login authentication and authority control. The tables relating to the database for unified authentication and rights control are shown in fig. 4.

In fig. 4, the tables are described as table 8:

TABLE 8

Further, the track inspection data service system based on Hadoop further comprises an analysis module, wherein the analysis module is used for carrying out data analysis on the data acquired by the data acquisition module, and the data analysis at least comprises contour data analysis, corrugation data analysis, warping analysis and weld flatness analysis; the track inspection data service system based on the Hadoop further comprises a report generation module, and the report generation module is used for converting the data output by each module into a corresponding data report so as to enable a user to check the data report.

It should be noted that, during the operation of the system, the operation state of the whole system is monitored in real time, and errors generated during the operation process are stored in the database error log table through the report generation module, so that an administrator can find the platform BUG conveniently. The error log data table structure is shown in table 9.

TABLE 9

It should be understood that a system log exists during the operation of the system, the operation of all login users of the whole system is recorded, so that an analyst can conveniently master the high-access page, operation and occurrence time of the system in time, and some system bugs, and a developer optimizes the system according to the analysis result. The system log data table structure is shown in table 10.

Watch 10

The normative management database also comprises organization management, wherein the organization structure of a railway department (taking Nanchang railway office as an example) is shown in FIG. 5:

the upper unit of each railway bureau is China railway general company, and corresponding workshops and work area departments are arranged under each section. The rail inspection data service platform adopts a tree structure form for organization management of the organization. The organization management data table structure is shown in table 11.

TABLE 11

In this embodiment, the track inspection data service system based on Hadoop further includes a data visualization module, where the data visualization module includes a first visualization module and a second visualization module, and the first visualization module is configured to connect the sampling data acquired by the data acquisition module at different times, so as to show the change degree of the measurement signal of the data acquisition module and the time; the second visualization module is used for carrying out integration processing on the data signals sampled by the data acquisition module so as to display the current signal intensity of the data acquisition module.

It should be noted that the rail inspection data service system is mainly applied to three types, namely multidimensional data visualization, hierarchical information visualization and time series data visualization.

1. Multi-dimensional data visualization

Multi-dimensional data visualization is one way to present high-dimensional data in a two-dimensional plane. The traditional algorithm is complex, time-consuming and poor in compatibility. Multidimensional data visualization can be classified into a geometry-based visualization method and an icon-based visualization method based on data characteristics.

The geometry-based visualization method can use parallel vertical curves to represent different dimensions, and the numerical values of the multi-dimensional data are depicted on coordinate axes and connected with coordinate points on the coordinate axes, so that the multi-dimensional data are displayed in a two-dimensional space.

The icon-based visualization method mainly uses a geometric figure as an icon to depict multi-dimensional data, the characteristic attribute of the icon represents the dimensionality of information, and the visualization effect is reflected by the link between the icon and the multi-dimensional data. A representative method of the icon-based visualization method is a star plotting method, in which information dimensions are mapped in a point-to-line manner, and the length of a line segment represents the size of a numerical value, as shown in fig. 6.

2. Hierarchical information visualization

The inheritance of computer files and Java classes is very typical hierarchical information, and the inheritance of Java classes is shown in FIG. 7. This type of hierarchical information has very distinct structural properties. The visualization of the hierarchical information is mainly realized through node connection.

The node connection is mainly used for drawing data contents of information represented by different nodes, and connecting lines among the nodes represent relations among the data. In fig. 7, each class name represents each node, the solid line in blue represents the parent class of the actual inheritance, and the dashed line in green represents the parent interface of the class implementation.

3. Time series data visualization

Time series visualization is visualization display of data sampled along with time, and the main display modes are three, namely a line graph, a stacking graph and a horizon graph.

Line graph: the sampling values at different times are connected, and the change degree of the measured signal along with the time can be visually displayed.

Stacking diagram: the signal sampled along with the time is subjected to integral processing to calculate the area, and the visualization mode can well show the magnitude of the current signal quantity, but when the signal has a negative number, the visualization effect is greatly reduced.

A horizon diagram: the measured signal is differentiated in a horizontal line graph, the number ratio of the signal along with time can be clearly observed in the horizontal line graph, and the effect of variation is shown by the light and dark colors.

In other optional embodiments, the data visualization module performs dynamic customization and fine control on the system menu and function buttons, and provides a standard expansion means for the function expansion of the system. The menu data table structure is shown in table 12.

TABLE 12

Wherein, the menu URL type: 1. normal page (e.g. user management,/sys/user) 2. nested full external page, link 3 starting with http(s). nested server page, using iframe prefix + target URL (e.g. SQL monitor, iframe:/drive/region. html, iframe: prefix will replace server address).

Specifically, in the visual interface of the data visualization module, system roles support infinite creation and expansion, specific system service menus and functions can be dynamically assigned to any role, and the roles determine the specific available services and function ranges of the system. The character data table structure is shown in table 13.

Watch 13

In the system, the roles and departments (organizations) should have a many-to-many relationship, and after a corresponding role is assigned to a certain department (organization), all persons belonging to the department (organization) will have the platform services and functions owned by the role. And the role binding of each person one by one can be avoided. The role-to-organization correspondence table structure is shown in table 14.

TABLE 14

It should be understood that, in the present system, there should be many-to-many relationship between roles and menus (function buttons), and the same role has multiple menu (function button) usage rights, and a menu (function button) can be owned by different roles at the same time. The structure of the relationship table between characters and menus is shown in table 15.

Watch 15

In this embodiment, the system encrypts the user sensitive information, and a system super administrator can disable and enable the user. The relationship between users and departments (mechanisms) should be many-to-one, and a certain department includes a plurality of users, and a certain user only belongs to a certain department. The user information data table structure is shown in table 16.

TABLE 16

In the system, the user and the roles should have many-to-many relationship, the same user can have a plurality of roles, and a certain role can be simultaneously owned by different users. The role owned by a user is a collection of the role corresponding to the user and the role corresponding to the department (institution) where the user is located. The user-to-character correspondence table structure is shown in table 17.

TABLE 17

In this embodiment, after a user successfully logs in the system, the platform returns a string of access codes (the access codes are unique and have expiration time limit, if special control requirements can be considered to add IP or MAC field for combined access control), the user must carry the unique access codes in the following platform service and function access process, otherwise the platform will reject the access request, and the user token table structure is shown in table 18:

watch 18

Furthermore, in order to overcome the problem that the common storage method wastes space and better utilize the space, the track inspection data service system adopts a small file storage method based on space optimization. The data processing unit further comprises a file storage unit, files with a storage size smaller than a preset storage size (5MB) are stored as small files, and the implementation process of the algorithm is shown in FIG. 8. And uniformly distributing the small files according to the size of the occupied memory space, so that the combined file can reach the set maximum threshold of the data block. Compared with a simple algorithm, the algorithm has the advantages that the space of each data block can be better utilized, and the storage overhead is reduced;

specifically, the file storage unit is specifically configured to:

if the storage size of the file occupying the largest storage space is larger than a second threshold value of a preset threshold value space of the temporary queue, storing the currently uploaded file in a merging queue corresponding to the temporary queue, searching friendly files in other temporary queues, and storing all the searched friendly files in the merging queue corresponding to the temporary queue for file packaging and uploading;

if the storage size of the file occupying the largest storage space is not larger than a second threshold value of the preset threshold value space of the temporary queue, continuously uploading the file to the temporary queue until the space occupied by all the files in the temporary queue is larger than or equal to the second threshold value of the preset threshold value space of the temporary queue;

In the specific implementation, the specific steps of the algorithm are as follows:

(1) after receiving a file storage request, initializing a data queue, reading configuration information including a merging threshold value and the like in a configuration file, and creating a temporary queue, a merging queue and a file information mapping table. And the number of the temporary queues is less than that of the merging queues.

(2) And comparing the size of the incoming file with the space left in the temporary queue, if the space left in the temporary queue is larger than the space occupied by the incoming file, storing the queue, otherwise, comparing the size of the file with other temporary queues, and if the temporary queues which meet the requirements do not exist after all the temporary queues are compared, establishing a new temporary queue and storing the file into the new temporary queue.

(3) And checking whether the currently uploaded file exists in the file information mapping table, if so, merging the file, and if not, recording the file information into the file information mapping table.

(4) And judging whether the space occupied by the files in the temporary queue reaches 90% of the set threshold space, and if so, starting to merge the files in the temporary queue.

(5) The temporary queue reaches a threshold to find the file with the largest occupied space, if the volume of the file is greater than 1/2 of the threshold, the sixth step is entered, otherwise, the seventh step is entered.

(6) And storing the file into a merging queue, and searching for friendly files in the rest temporary queues, namely the files with the space most suitable for being filled into the current merging queue until the friendly files cannot be found. And entering the eighth step.

(7) The queue where the file is located is not changed, and when the total space of the queue is greater than or equal to 1/2 of the threshold value, the sixth step is returned.

(8) And packaging and uploading the files in the merging queue to the HDFS.

Furthermore, in this embodiment, the rail inspection data service system further includes a data sharing module, and the data transmission link security measures of the rail inspection data service system mainly construct an encryption transmission link through technical means such as HTTPS and the like, and directly encrypt data at the same time, and transmit the data in a ciphertext form, thereby ensuring the security of the data transmission process;

in addition, the aim of the safety protection of the data use link is to ensure that data is accessed and processed in an authorized range and prevent the data from being stolen, leaked and damaged. Except network security protection technical measures such as dual-computer hot backup, firewall, intrusion detection, virus prevention, DDoS attack prevention, vulnerability detection and the like, the method also comprises the following steps:

(1) account rights management

And establishing a unified account authority management subsystem, realizing unified management on accounts and access ranges of various service systems, ensuring that data is used in an authorization range, and implementing account authority management and an approval system.

(2) Data desensitization

From the perspective of protecting the confidentiality of the sensitive data, when data display is carried out, fuzzification processing is carried out on the sensitive data according to the support of a specific business requirement platform.

(3) Log management and auditing

The technical capability requirements in the aspects of log management and audit are mainly to record and audit account management operation logs, authority approval logs, data access operation logs and the like so as to assist the landing execution of a related management system. In technical implementation, a unified log management and audit subsystem is built according to business requirements.

(4) Abnormal operation real-time supervision

Compared with the safety technical measures of log recording, safety audit and other 'after-the-fact' tracing properties, the real-time monitoring of abnormal behaviors is a necessary measure for realizing 'before-the-fact' and 'in-the-fact' link monitoring early warning and real-time disposal. The abnormal behavior monitoring system should be capable of monitoring dangerous behaviors such as unauthorized access of data, sensitive operation of data files and the like in real time.

The data sharing link relates to providing data service for a third party, establishing data sharing safety related management system rules, providing unified data distribution export service for the platform, effectively managing data sharing behaviors, and preventing safety risks such as data stealing and leakage. The data distribution outlet service needs to be designed according to the data sharing business requirements and the related data standards.

In the destruction link of system data, under the condition of ensuring sufficient storage space, physical deletion of data is not carried out in principle, and only the value assignment of a deletion mark bit is carried out on the data.

Referring to fig. 9, an overall architecture of the orbit inspection data service system is shown, based on a Hadoop distributed processing architecture, the orbit inspection data service system adopts a hierarchical bottom-up system design, and is divided into five layers: data layer, analysis layer, integration layer, business layer, visualization layer.

1. Data layer

The data layer is a basic layer of the whole track inspection data service system, is a source and guarantee of data resources, and comprises structural data and non-structural data.

The structural data is data that can be expressed and stored using a relational database, and is expressed in a two-dimensional form. The structural data involved in the system comprises original data of devices such as a track detector, a corrugation tester and the like and positioning data of a Beidou satellite. The storage of these data is performed using a MySQL relational database.

The unstructured data refers to data without a regular structure, and various reports, images, videos, audios and the like belong to unstructured data. The non-structural data involved in the system mainly comprises various reports generated by a rail inspection instrument, picture data acquired by a patrol inspection instrument and audio/video data acquired by helmet equipment. For the storage of unstructured data, the data is stored in the HDFS in a file manner, and links or paths pointing to the file are stored in the MySQL relational database.

2. Analysis layer

The analysis layer is established to solve the problems that the current track overhaul data volume is large, the data is scattered, and the data cannot be analyzed and applied. First, in the integration and cloud storage phases, data cleaning work is done on various types of raw data that are not processed. And then, specifically analyzing the actual requirements of the user, taking out the data of the data layer, analyzing and packaging the data, transmitting the data to the data analysis layer through Sqoop, and analyzing the data based on the actual requirements at the data analysis layer. The processed data have certain readability, and the data analysis layer encapsulates the data into JSON-form data and transmits the JSON-form data to the visualization layer for data display.

3. Integrated layer

The integration layer is used for establishing a support module with a unified standard, and realizing effective integration of related application components (historical data support, intelligent analysis support, information fusion support, post-processing support, satellite positioning support and the like) in an integrated mode.

4. Business layer

The business layer directly shows various functions of the rail inspection data service system, and various applications are realized on the basis of the integration layer.

The business layer strictly executes the classification standards of various applications, the development and management of the whole application are operated in a unified management mode, data among application modules can be shared, a system provides a unified data interface, and a richer data source is provided for the realization of the business.

5. Visualization layer

Various data analysis of the rail inspection data service system can be displayed in a Web site mode, based on the requirement of data security, after working personnel with different post identities log in the system, corresponding resources can be obtained on the platform according to the authority of the working personnel, and meanwhile, the management of the resources, such as importing, sharing, deleting and the like, can also be carried out.

Furthermore, the rail inspection data service system bears subsystems (rail inspection instrument analysis, rail inspection data management, contour data analysis, corrugation data analysis, three-dimensional constraint, single pry operation analysis and the like) of each service, and the system structure of each subsystem is solved by adopting a micro-service architecture in consideration of the characteristics of unification and distributed deployment of the system structure of each subsystem. The technical architecture is shown in fig. 10.

In the orbit detection data service system architecture, a Hadoop distributed computing framework is built by a Server, and a distributed scheduling framework and specific micro services (namely modules for providing business logic services) are developed by the Server, wherein functions of a service registration center (Eureka Server), a service configuration center (Spring Cloud Config), a service monitoring center (Spring Admin), a message bus (Rabbit MQ), a service API gateway (Spring Cloud Zuul), single-point service login, load balancing (Ribbon, Nginx) and the like can be reused for all subsystems, so that a uniform distributed system scheduling architecture is formed, and uniform management and resource scheduling are performed on each internal micro service.

The problem of system structure is not considered when each subsystem is designed, only the logic splitting of the service is needed under the distributed system framework according to the service characteristics of the subsystem, and the service module meeting the requirement is designed and deployed.

Furthermore, the rail inspection data service system has a lot of internal sensitive data, and in view of the security requirements of the current network environment, the rail inspection data service system fully considers the security problem of authentication and authorization of the Web application, and in the process of the architecture, the authority management framework is fully planned. According to a plurality of elements of user identity, department, post and the like, different system authorities are set, and different authorities for data increasing, deleting, checking and command issuing are granted.

Meanwhile, sensitive resources in the system are comprehensively protected, and various sensitive data can be encrypted and added with watermarks, so that data leakage is prevented. The right management design of the track inspection data service system is shown in FIG. 11

In the present application, the Hadoop-based rail inspection data service system uses three servers to build a test cluster with three nodes, and the configuration of each server is shown in table 19:

watch 19

The rail inspection data service system adopts CDH (cloud's Distribution incorporating Apache Hadoop) to carry out off-line deployment, and the CDH integrates the functions of automatic cluster installation, fault monitoring, mail alarming and the like, thereby effectively reducing the installation cost of the cluster, simultaneously reducing the maintenance cost in the later period and improving the management efficiency. The cluster specific deployment steps are as follows:

(1) the servers are planned, with server IP and name as shown in table 20.

Watch 20

(2) And modifying the host name, configuring the hosts file of each server, and configuring the cm-server so that the cm-server can log in other nodes without secret. Secondly, maridb is installed on the cm-server and used for storing later data. Then configuring Java environment, and finally configuring server requirement of each node.

(3) And then installing a Cloudera Manager, firstly downloading a related software package for pressurization, creating user and initialization data, then distributing the file modified by the cm-Server to each node, then creating a local source, finally starting a Server and an Agent service at the cm-Server, and starting the Agent service at other nodes. After all services are started, the 7180 port which can access the cm-server logs into the clouder Manager.

Corresponding components can be added according to needs, and the roles of the services can be divided in the component services, so that load balancing is realized. The cluster uses components such as HDFS, Yarn, Zookeeper, Hive, and Flume, and the interface of the components is shown in FIG. 12.

The track inspection data service system grants login authority according to departments, posts and roles of users. The setting of the authority is realized according to the identity of the user and the manual verification of a super administrator.

The authentication of the user mainly comprises the steps of searching and comparing information input by the user requesting login with information in a user information data table, and judging that the user login is legal if the user exists and the current account is in an available state. The information recorded in the user information data table is the detailed information of authorized users which can access the track inspection data platform, and is the most important table in user management.

The most important attributes in the user information table are a user name (name), a password (password) and salt (salt), the password is stored in the user information table through salt encryption, the safety of the user is guaranteed, and the setting is to prevent unauthorized or illegal users from logging in the system to cause system damage or data leakage. And only when the user name and the password are correct, the user can enter the orbit checking data service platform, otherwise, the user always stays in the login interface. It is emphasized that the user password needs to contain upper and lower case letters, numbers and special symbols to be qualified. After logging in, the user can inquire corresponding content in the platform according to the authority.

Further, according to the requirement analysis of the third chapter rail inspection data service platform, the platform realizes the functions of user management, mechanism management, role management, menu management, SQL monitoring, interface documents, system logs and the like for rail inspection data management.

In this embodiment, the system has a user management function, an organization management function, a role management function, and a menu management function, and a super administrator can implement the functions of adding, deleting, modifying and checking users on a user management interface. Legal users created by a super administrator can log in the rail inspection data service platform to inquire data. The user management interface is shown in fig. 13.

On the mechanism management page, an administrator can create each mechanism according to the composition of the actual mechanism of the company, and different authorities are given to workers of different departments of different mechanisms through the mechanisms, so that the workers can obtain information in the authority range on the rail inspection data service system according to the authorities of the workers. The organization management interface is shown in fig. 14.

The roles in the system can be freely expanded without touching the maximum capacity of the database, the specific service menu and functions of the system can be dynamically endowed with any role, and the roles determine the range and the services of the platform which can be used by a user. The role management interface is shown in fig. 15.

Extended functionality may be added to the platform through menu management, the menu management interface being shown in FIG. 16. An expansion means of system standard is added in menu management, the process of unifying the standard can reduce the occurrence of non-standard definition of system expansion, and the efficiency of subsequent system function development can be improved.

In this embodiment, the system has an SQL monitoring function, and can discover abnormal behaviors in the system, for example, when the query speed is slow, it can analyze which service has a problem by analyzing the SQL-monitored report, and whether a user makes an error when querying or writes an SQL statement when developing does not consider the problem of indexing. The SQL monitoring interface is shown in fig. 17.

Meanwhile, the monitoring module has the functions of monitoring not only SQL but also Web application, URI monitoring, Session monitoring, Spring monitoring and the like, and detects dangerous behaviors such as unauthorized access, sensitive operation of data files and the like in real time.

It should be noted that any user using any function of the system will leave an operation log, where the operation log contains the user's IP, the URL that the user accessed, the status code indicating whether the user accessed successfully, and the time that the user accessed. The information manager can efficiently understand the use of the system by each user. The oplog interface is shown in FIG. 18.

By analyzing the operation log, it can be analyzed which functions of the system are used with high frequency and which are used by the fresh people. After the analysis is finished, resources of the platform can be reasonably called, more resources are provided for high-frequency service, and therefore the service efficiency of the server is improved, and the server resources are reasonably distributed.

Meanwhile, if the system has an accident, such as data loss and other problems, a part of data can be recovered through the operation log, the disaster tolerance of the platform is improved, and the loss in the accident is reduced.

When the system has errors, the error information is recorded, and the back end records the error information into the error log database and displays the error information on the front-end page. The error logs are helpful for an administrator to check bugs appearing in the orbit inspection data service system, the bug discovery and bug processing efficiency can be effectively improved, and the long-term stable operation of the platform can be promoted. The error log interface is shown in FIG. 19

In summary, in the track inspection data service system based on Hadoop in the above embodiment of the present invention, the data alignment unit aligns each current track inspection data with the historical track inspection data by using the feature point as a mark, so as to eliminate the mileage error interference for subsequent data analysis; the rail inspection data aligned with the data are split, and distributed processing is performed on the plurality of groups of preprocessed rail inspection data, so that efficient utilization and classification management of the rail inspection data are improved, and users can achieve unified access, efficient and accurate query and statistics.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. The rail inspection data service system based on Hadoop is characterized by comprising a data acquisition module and a data processing module:

the data processing module comprises a data alignment unit and a data processing unit, wherein the data alignment unit is used for performing local interpolation and compression on each current rail detection data by using historical rail detection data so as to realize the matching of characteristic points of the historical rail detection data and each current rail detection data and further obtain the rail detection data with aligned data;

the data processing unit is used for performing data preprocessing on the rail detection data with aligned data to obtain multiple groups of preprocessed rail detection data, performing distributed processing on the multiple groups of preprocessed rail detection data to judge whether the rail detection data with aligned data meets preset requirements, and if the rail detection data with aligned data meets the preset requirements, generating a corresponding data report to realize information interaction between a user and each rail detection device.

2. The Hadoop-based track inspection data service system according to claim 1, further comprising a database support module, wherein the database support module is configured to generate a plurality of track inspection ledger databases according to the track inspection data obtained by each track inspection device, and a plurality of rights management databases according to user rights, the plurality of track inspection ledger databases at least include a plane curve database, a vertical curve database, a CP3 database, and a sleeper information database, and the plurality of rights management databases at least include a specification management database.

3. The Hadoop-based track inspection data service system according to claim 1, wherein the data processing module further comprises a file storage unit, and the file storage unit is configured to uniformly distribute files with a size smaller than a preset storage size according to a memory space occupied by the files, so that the merged files meet a preset maximum threshold of data blocks.

4. The Hadoop-based track check data service system according to claim 1, wherein the file storage unit is specifically configured to:

if the currently uploaded files exist in the file information mapping table, judging whether the space occupied by all the files in the temporary queue reaches a first threshold value of a preset threshold value space of the temporary queue;

and if the storage size of the file occupying the largest storage space is larger than a second threshold value of the preset threshold value space of the temporary queue, storing the currently uploaded file in a merging queue corresponding to the temporary queue, searching friendly files in other temporary queues, and storing all the searched friendly files in the merging queue corresponding to the temporary queue for file packaging and uploading.

5. The Hadoop-based track inspection data service system according to claim 4, wherein the file storage unit is further configured to:

6. The Hadoop-based track inspection data service system according to claim 1, wherein the data alignment unit is specifically configured to:

7. The Hadoop-based rail inspection data service system according to claim 1, further comprising an analysis module for performing data analysis on the data collected by the data collection module, wherein the data analysis at least comprises contour data analysis, corrugation data analysis, warping analysis and weld flatness analysis.

8. The Hadoop-based track inspection data service system according to claim 1, further comprising a report generation module, wherein the report generation module is configured to convert the data output by each module into a corresponding data report for a user to view.

9. The Hadoop-based track inspection data service system according to claim 1, further comprising a data sharing module, wherein the data sharing module is configured to create data sharing rules and provide unified data distribution export services to the platforms by using the data sharing rules.

10. The Hadoop-based rail inspection data service system according to claim 1, further comprising a data visualization module, wherein the data visualization module comprises a first visualization module and a second visualization module, the first visualization module is used for connecting the sampling data acquired by the data acquisition module at different times so as to show the variation degree of the measurement signal of the data acquisition module with time; the second visualization module is used for carrying out integration processing on the data signals sampled by the data acquisition module so as to display the current signal intensity of the data acquisition module.