CN111930807A

CN111930807A - Rail transit data analysis method, device, equipment and storage medium

Info

Publication number: CN111930807A
Application number: CN202010828219.6A
Authority: CN
Inventors: 郑炜龙; 贾沛; 吴志秋
Original assignee: Guangzhou Xinke Jiadu Technology Co Ltd
Current assignee: Guangzhou Xinke Jiadu Technology Co Ltd
Priority date: 2020-08-17
Filing date: 2020-08-17
Publication date: 2020-11-13
Anticipated expiration: 2040-08-17
Also published as: CN111930807B

Abstract

The embodiment of the application discloses a rail transit data analysis method, a rail transit data analysis device, a rail transit data analysis equipment and a storage medium. According to the technical scheme provided by the embodiment of the application, data collection is carried out on a data source, metadata, a data set and a data standard in rail transit data, data analysis, statistical analysis, data sharing and data management are carried out according to a data collection result, rail transit data are collected, analyzed and shared, a data island in rail transit is effectively broken through, unified data standards are provided for data collection and arrangement of rail transit, cross-professional data are effectively integrated and deeply applied, and application innovation after fusion of multi-professional heterogeneous data is achieved.

Description

Rail transit data analysis method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method, a device, equipment and a storage medium for analyzing rail transit data.

Background

With the advent of cloud computing and big data era, particularly, the quantity of iron-based data information resources in the whole country in recent years shows a rapid growth situation, and meanwhile, the application requirements of various types of information are increasingly complex, and higher requirements are put forward on data service and application supporting capacity.

The current application basic database that supports urban rail transit wisdom management, wisdom fortune dimension, wisdom operation, safety guarantee, emergent processing, wisdom service is still weak, and the problem that faces is also outstanding day by day. The long-term development of urban rail transit promotes the chimney type construction of a plurality of service systems, and the problem of data island is serious.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a storage medium for analyzing rail transit data, so as to solve the problem of data island of the rail transit data and realize fusion of multi-professional heterogeneous data.

In a first aspect, an embodiment of the present application provides a rail transit data analysis method, including:

performing data collection on rail transit data, wherein the rail transit data comprises a data source, metadata, a data set and a data standard;

performing data analysis and statistical analysis based on the data collection result, wherein the data analysis comprises data flow management, item catalog management and function configuration, and the statistical analysis comprises interactive query, report tools and machine learning;

and performing data sharing and data governance based on the data collection result, wherein the data sharing comprises data sharing management and data sharing charging, and the data governance comprises quality analysis, blood relationship analysis and metadata analysis.

Further, the data collection of the rail transit data includes:

docking a data source in rail transit data and configuring and managing the data source in a resource directory mode, wherein the type of the data source comprises one or more of JDBC/ODBC database, FTP, HTTP, Socket and ElasticSearch;

managing structured and/or resolvable data structure metadata in the rail transit data in a resource directory mode, and generating metadata for a relational database and a CSV file to be resolved;

managing structured and/or parsable track traffic data in the form of a data set and constructing a resource catalog and/or a data warehouse;

and converting the rail transit data with different semantics into unified standard data according to a set standard, wherein the set data standard is configured and managed in a resource directory mode.

Further, performing data analysis based on the data compilation result includes:

dragging and combining data processing units of the rail transit data into a directed acyclic graph to form a data flow, and performing arrangement, operation and monitoring of data batch processing, flow calculation and mixed call flow through a data flow management function;

accessing a directory management request through a Browser component, verifying through Gateway, distributing the directory management request to an AppServer, and interacting a data source APP in the AppServer with a metadata base to centralize a data set, metadata and a data flow;

the function configuration request is accessed through the Browser component, verification is carried out through Gateway, the function configuration request is distributed to the platform server, and the platform server interacts with the metadata base to achieve the function configuration function.

Further, performing statistical analysis based on the data compilation results includes:

accessing an interactive query request through a Browser component, verifying through Gateway, and performing interactive query on the rail transit data according to a query type;

the report request is accessed through the Browser component, the report request is verified through the Gateway and is distributed to the AppServer, a report tool APP in the AppServer interacts with the metadata base, and report tool functions are executed on report configuration parameters and index metadata;

accessing the learning request through the Browser component, verifying through the Gateway, distributing the learning request to the platform, interacting with the metadatabase by the platform and executing the machine learning process.

Further, the data sharing based on the data aggregation result includes:

pushing the data set increment and/or the full amount customized by the exchange strategy to a set data exchange front-end server by a database exchange interface;

generating an exchange file by a data set customized by an exchange strategy according to a set format requirement through a file exchange interface, and pushing the exchange file to a set data exchange front-end server;

generating Restful interface service by the RestAPI service exchange interface according to the standard format requirement for the data set customized by the exchange strategy, and issuing the Restful interface service in a set server;

the method comprises the steps of carrying out data application and/or data subscription sharing on published data resources, accessing a sharing request through a Browser component, verifying through a Gateway and distributing the sharing request to an AppServer, and carrying out data sharing by a data sharing consumption APP component in the AppServer.

Further, the data governance based on the data collection result comprises:

accessing a quality analysis request through a Browser component, verifying through Gateway, distributing the quality analysis request to an AppServer, interacting a data quality APP in the AppServer with a metadata base, and performing quality analysis;

accessing a blood margin analysis request through a Browser component, verifying through Gateway, distributing the blood margin analysis request to an AppServer, interacting a blood margin analysis APP in the AppServer with a metadata base, and performing blood margin analysis;

the metadata analysis request is accessed through the Browser component, verification is carried out through the Gateway, the metadata analysis request is distributed to the AppServer, the metadata analysis APP in the AppServer interacts with the metadata base, and metadata analysis is carried out.

Further, after the data collection of the rail transit data, the method further includes:

and performing operation and maintenance monitoring and system management based on the data collection result, wherein the operation and maintenance monitoring comprises task monitoring, operation and maintenance management and control and access monitoring, and the system management comprises a unified portal, authority management and log audit.

Further, the operation and maintenance monitoring based on the data collection result includes:

accessing a task monitoring request through a Browser component, verifying through Gateway, distributing the task monitoring request to an AppServer, interacting a task monitoring APP in the AppServer with a metadata base and a log base, and monitoring the task;

accessing an operation and maintenance control request through a Browser component, verifying through Gateway and distributing the operation and maintenance control request to an AppServer, interacting an operation and maintenance control APP in the AppServer with a metadata base and a log base, and performing operation and maintenance control;

accessing the access monitoring request through the Browser component, verifying through the Gateway, distributing the access monitoring request to the AppServer, interacting the access monitoring APP in the AppServer with the metadata base and the log base, and performing access monitoring.

Further, performing system management based on the data aggregation result includes:

accessing a user login request sent by a front end through a Browser component, verifying and distributing the user login request through Gateway, storing user information in a metadata base, and returning and displaying the request to the front end according to user authority;

accessing the authority management request through the Browser component, verifying through the Gateway and distributing the authority management request to the metadata base for authority management;

the method comprises the steps of accessing a journal audit request through a Browser component, verifying through Gateway, distributing the journal audit request to an AppServer, and performing journal audit through interaction of a journal audit APP in the AppServer, a metadata base and a journal base.

In a second aspect, an embodiment of the present application provides a rail transit data analysis device, including a data collection module, a data analysis module, and a data management module, where:

the data collection module is used for collecting track traffic data, and the track traffic data comprises a data source, metadata, a data set and a data standard;

the data analysis module is used for carrying out data analysis and statistical analysis based on the data collection result, the data analysis comprises data flow management, project catalog management and function configuration, and the statistical analysis comprises interactive query, report tools and machine learning;

and the data management module is used for carrying out data sharing and data governance based on the data collection result, wherein the data sharing comprises data sharing management and data sharing charging, and the data governance comprises quality analysis, blood relationship analysis and metadata analysis.

Further, when the data collection module collects the rail transit data, the data collection module specifically includes:

Further, when the data analysis module performs data analysis based on the data collection result, the data analysis module specifically includes:

Further, when the data analysis module performs statistical analysis based on the data collection result, the method specifically includes:

Further, when the data management module performs data sharing based on the data aggregation result, the data management module specifically includes:

Further, when the data management module performs data governance based on the data collection result, the data management module specifically includes:

Furthermore, the device further comprises a system operation and maintenance module which is used for performing operation and maintenance monitoring and system management based on the data collection result, wherein the operation and maintenance monitoring comprises task monitoring, operation and maintenance management and control and access monitoring, and the system management comprises a unified portal, authority management and log audit.

Further, when the system operation and maintenance module performs operation and maintenance monitoring based on the data collection result, the method specifically includes:

Further, when the system operation and maintenance module performs system management based on the data collection result, the method specifically includes:

In a third aspect, an embodiment of the present application provides a rail transit data analysis system, which is built based on a lambda architecture and is used to implement the rail transit data analysis method according to the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer device, including: a memory and one or more processors;

the memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the rail transit data analysis method of the first aspect.

In a fifth aspect, the present application provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are used for executing the rail transit data analysis method according to the first aspect.

According to the embodiment of the application, data collection is carried out on a data source, metadata, a data set and a data standard in rail transit data, data analysis, statistical analysis, data sharing and data management are carried out according to a data collection result, rail transit data are collected, analyzed and shared, a data island in rail transit is effectively broken through, unified data standards are provided for data collection and arrangement of rail transit, cross-professional data are effectively integrated and deeply applied, and application innovation after fusion of multi-professional heterogeneous data is achieved.

Drawings

Fig. 1 is a flowchart of a rail transit data analysis method provided in an embodiment of the present application;

fig. 2 is a flowchart of another rail transit data analysis method provided in the embodiment of the present application;

fig. 3 is a schematic structural diagram of a rail transit data analysis apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a rail transit data analysis system according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, specific embodiments of the present application will be described in detail with reference to the accompanying drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some but not all of the relevant portions of the present application are shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

Fig. 1 is a flowchart of a rail transit data analysis method according to an embodiment of the present disclosure, where the rail transit data analysis method according to the embodiment of the present disclosure may be executed by a rail transit data analysis apparatus, and the rail transit data analysis apparatus may be implemented in a hardware and/or software manner and integrated in a computer device.

The following description will be given taking as an example a method in which the rail transit data analysis apparatus performs the rail transit data analysis. Referring to fig. 1, the rail transit data analysis method includes:

s101: and carrying out data collection on the rail transit data, wherein the rail transit data comprises a data source, metadata, a data set and a data standard.

The rail transit data provided by the embodiment of the application can be provided by data acquisition of various professional systems and management systems related to ISCS (Integrated Supervisory Control System, also called as urban rail transit Integrated monitoring System), such as power monitoring and data acquisition, environment monitoring, electromechanical equipment monitoring, automatic fire alarm, broadcasting, closed-circuit television, wireless communication, signals, entrance guard, automatic ticket selling, passenger information, clocks, screen doors and the like.

The data collection comprises a data source collection, a metadata collection, a data set collection and a data standard collection. Illustratively, a data warehouse is established by accessing rail transit data through interfaces of systems, and data source collection, metadata collection, data set collection and data standard collection are respectively performed on data sources, metadata, data sets and data standards in the rail transit data.

Optionally, after the collection of the rail transit data is completed, the data source database, the metadata database, the data set database and the data standard database may be respectively established.

S102: and performing data analysis and statistical analysis based on the data collection result, wherein the data analysis comprises data flow management, item catalog management and function configuration, and the statistical analysis comprises interactive query, report tool and machine learning.

Illustratively, after the rail transit data is collected, data flow management, project catalog management and function configuration are respectively performed based on the data collection result, that is, a data processing unit in the rail transit data is processed into a data flow, and arrangement, operation and monitoring of various large data batch processing, flow calculation and mixed call flows are performed through a data flow management function.

Furthermore, a data set, metadata and a data flow are integrated in a project catalog management mode, so that a unified management view is conveniently provided for various data projects, a function configuration function for a metadata database is provided, and interaction with the metadata database is performed.

Further, interactive query, report tool and machine learning are respectively carried out based on the data collection result, namely an interactive query function, a report tool function and a machine learning function for the rail transit data are provided.

S103: and performing data sharing and data governance based on the data collection result, wherein the data sharing comprises data sharing management and data sharing charging, and the data governance comprises quality analysis, blood relationship analysis and metadata analysis.

Illustratively, after data collection is carried out on rail transit data, data sharing management and data sharing charging are respectively carried out based on data collection results, namely, a Restful interface service is generated according to a standard format requirement on a data set customized by a preset exchange strategy, and the data set is published on a specified server so as to share the data.

Furthermore, a data sharing consumption function is provided, namely, a consumer can apply for data and subscribe data to the published data resources, and corresponding data is pushed to the consumer according to the application and subscription conditions of the consumer.

The data collection, the statistical analysis, the data sharing and the data management are carried out according to the data collection result, the rail transit data are subjected to the collection analysis and the sharing management, a data island on the rail transit is effectively broken through, the unified data standard is provided for the data collection and the data arrangement of the rail transit, the cross-professional data are effectively integrated and deeply applied, and the application innovation after the multi-professional heterogeneous data are fused is realized.

On the basis of the foregoing embodiment, fig. 2 is a flowchart of another rail transit data analysis method provided in the embodiment of the present application, which is an embodiment of the rail transit data analysis method. Referring to fig. 2, the rail transit data analysis method includes:

s201: and carrying out data collection on the rail transit data, wherein the rail transit data comprises a data source, metadata, a data set and a data standard.

Specifically, for data source collection, data sources in rail transit data are docked, and the data sources are configured and managed in a resource directory mode, wherein the types of the data sources comprise one or more of JDBC/ODBC database, FTP, HTTP, Socket and ElasticSearch.

In one possible embodiment, operations such as adding and deleting data sources and resource directories, driving loading, parameter configuration and the like are also provided. Optionally, the Browser component accesses the operation request, the Gateway verifies the operation request and distributes the operation request to the APP server, and a data source APP in the APP server interacts with the metadata database to realize the operation on the data source and the resource directory.

The Gateway provided by the embodiment is a data background service Gateway based on Spring closed Gateway, and has built-in Auth and JWT token-based access control, and the AppServer is a Spring boot-based component, and provides restful api implementation, and provides access control for related metadata affecting data access, data management, and data opening.

Furthermore, for metadata collection, structured and/or resolvable data structure metadata in the track traffic data are managed in a resource directory mode, and metadata are generated for files needing to be resolved in a relational database and a CSV (common record and maintenance) class.

In one possible embodiment, operations such as adding, deleting, modifying, etc. to the metadata resource catalog are also provided. Optionally, the Browser component accesses the operation request, the Gateway verifies the operation request and distributes the request to the APP server, and the metadata APP in the APP server interacts with the metadata database to realize the operation on the metadata resource directory.

Further, for data set aggregation, structured and/or parsable rail transit data is managed in the form of data sets, and resource catalogs and/or data warehouses are built. The data warehouse mainly comprises a temporary buffer layer, a basic data resource layer, a data precipitation library, a comprehensive association library, a time sequence library, an unstructured library, a theme data warehouse and the like.

The data set is a high-level abstraction of physical data in logic, the data set can uniformly define various data sources or data stored in the JDBC/ODBC database, HDFS, HIVE, KAFKA, FTP, HBASE, ElasticSearch and the like, and a data asset directory or a data warehouse can be flexibly constructed through the data set module.

In one possible embodiment, adding, deleting, modifying and the like to the data set resource directory are also provided. Optionally, the Browser component accesses the operation request, the Gateway verifies the operation request and distributes the operation request to the APP server, and a data set APP in the APP server interacts with the metadata base to realize the operation on the data set resource directory.

Further, for data standard collection, in order to unify various data standards, rail transit data with different semantics is converted into unified standard data according to a set standard, wherein the set data standard is configured and managed in a resource directory mode.

In one possible embodiment, operations of adding, deleting, modifying and the like to the data standard resource directory are also provided. Optionally, the Browser component accesses the operation request, the Gateway verifies the operation request and distributes the operation request to the APP server, and the data standard APP in the APP server interacts with the metadata base to realize the operation on the data standard resource directory.

In a possible embodiment, after the data collection is performed on the rail transit data, the rail transit data analysis method provided in this embodiment further includes: and performing advanced resource directory search, file import and file management operation based on the data collection result.

Specifically, for the advanced search operation of the resource directory, advanced query operation on the provided resource directory is provided, namely, a data source, a data set, metadata and a data standard resource directory are queried through query conditions of screening, relational operation, time query and/or logical operation, a query request is accessed through a Browser component, verification is performed through Gateway, and the query request is distributed to a metadata base, so that the advanced query operation is executed on the resource directory.

Furthermore, for file import operation, the FTP data source files are imported into the HDFS directly, mainly for importing unstructured data, and the import task only concerns the number of files without analyzing the file contents. For example, an import request is accessed through a Browser component to configure an import task, verification is carried out through Gateway, the import request is distributed to an AppServer, the AppServer interacts with a metadata base, and an AppServer-Collector component in the AppServer imports an FTP data source file into an HDFS. In the embodiment, an AppServer-Collector component in the AppServer performs specific execution of the import task.

Further, for file management operation, a management request is accessed through a Browser component, verification is carried out through Gateway, the management request is distributed to an AppServer, and the specific operation of file management is carried out by an AppServer-Collector component in the AppServer so as to carry out query, download, deletion and local file uploading operation on the HDFS file.

S202: and performing data analysis based on the data collection result, wherein the data analysis comprises data flow management, project catalog management and function configuration.

Specifically, in this embodiment, the data analysis based on the data collection result includes steps S2021 to S2023:

s2021: dragging and combining the data processing units of the rail transit data into a directed acyclic graph to form a data flow, and performing arrangement, operation and monitoring of data batch processing, flow calculation and mixed call flow through a data flow management function.

The data flow arrangement configuration can be achieved by accessing an arrangement request through a Browser component, verifying through Gateway and distributing the arrangement request to PlatformServer, and interacting the PlatformServer with a metadata base.

Further, a Pipeline task can be generated through a PlatformServer component and sent to the PipelineServer component, the PipelineServer component identifies and judges the specific type of the data flow, batch calculation and flow calculation flows are submitted to be executed in a cluster through a FlowExecutor component, and Shell, Java and Python script flows are executed through a MoreeExecutor submission program.

The data flow can be flexibly used in various scenes needing to call cluster storage and calculation power, such as data import and export, conversion processing, analysis calculation and the like, and can meet the functional requirements of data exchange collection, data storage calculation and data statistical analysis.

S2022: the method comprises the steps of accessing a directory management request through a Browser component, verifying through Gateway, distributing the directory management request to an AppServer, and interacting a data source APP in the AppServer with a metadata base to centralize a data set, metadata and a data flow.

The catalog management is to centralize the data set, the metadata and the data flow in the view of project management, so as to conveniently provide a uniform management view for various data projects in the rail transit data.

S2023: the function configuration request is accessed through the Browser component, verification is carried out through Gateway, the function configuration request is distributed to the platform server, and the platform server interacts with the metadata base to achieve the function configuration function.

S203: and carrying out statistical analysis based on the data collection result, wherein the statistical analysis comprises interactive query, report tool and machine learning.

Specifically, the performing of statistical analysis based on the data collection result in this embodiment specifically includes steps S2031 to S2033:

s2031: and accessing the interactive query request through the Browser component, verifying through the Gateway, and performing interactive query on the rail transit data according to the query type.

The interactive query request is accessed through the Browser component, the interactive query function for the rail transit data is provided, the interactive query service can be directly carried out on any data set except Kafka through SQL, the query result can be exported to a file in a CSV or Json format, the query execution condition can be monitored, and the query can be terminated at any time. For configuration type metadata query, an interactive query request is distributed to an AppServer, and the AppServer interacts with a metadata base through a JPA/Hibernate tool to perform configuration type metadata query; and for physical storage data query, distributing the interactive query request to an AppServer-Collector component, and performing cluster physical storage query by the AppServer-Collector.

S2032: the report request is accessed through the Browser component, the report request is verified through the Gateway and is distributed to the AppServer, and a report tool APP in the AppServer interacts with the metadata base to execute the functions of the report tool on the report configuration parameters and the index metadata.

Specifically, the report tool function for report configuration parameters and index metadata is provided, and various types of data report development, associated drilling-down and theme construction can be performed. Report requests for report configuration parameters and index metadata can be accessed through the Browser component, verification is carried out through the Gateway, the report requests are distributed to the AppServer, and a report tool APP in the AppServer interacts with the metadata base. And for report data query preview, performing cluster physical storage query through an AppServer-Collector component.

S2033: accessing the learning request through the Browser component, verifying through the Gateway, distributing the learning request to the platform, interacting with the metadatabase by the platform and executing the machine learning process.

Specifically, a machine learning function for rail transit data is provided, and a full-flow function of data preparation, feature engineering, training evaluation and prediction deployment is provided by creating various machine learning projects. The programming configuration of the machine learning process accesses the learning request through the Browser component, carries out verification through the Gateway and distributes the learning request to the PlatformServer, and the PlatformServer interacts with the metadata base to execute the machine learning process. The process task is executed, a Pipeline task is generated through a PlatformServer component and sent to the Pipeline server, and the flow manager component submits cluster execution.

S204: and performing data sharing based on the data collection result, wherein the data sharing comprises data sharing management and data sharing charging.

The data sharing management provides interface service, wherein the interface service has the characteristics of standardization, universality, uniformity, flexible operability and the like, can be configured by a data administrator, and can be activated and called after an exchange user and an exchange strategy are bound by an exchange object management function. The interface service includes three forms of database exchange interface service, file exchange interface service and RestAPI service exchange interface service. The data sharing charge provides a data sharing consumption function, and a consumer can apply for data and/or subscribe data to the published data resources.

Specifically, the data sharing based on the data aggregation result in the embodiment of the present application specifically includes steps S2041 to S2044:

s2041: and pushing the data set increment and/or the full amount customized by the exchange strategy to the set data exchange front-end server by the database exchange interface.

The database exchange interface provided by this embodiment supports various mainstream database management systems (e.g., ORACLE, MYSQL, SQLSERVER, etc.), and may incrementally and/or fully push the data set customized by the exchange policy to the specified data exchange front-end server.

S2042: and generating an exchange file by the file exchange interface according to the data set customized by the exchange strategy according to the set format requirement, and pushing the exchange file to the set data exchange front-end server.

The file exchange interface supports the exchange forms such as plain text file format with separators, and can generate the exchange files from the data set customized by the exchange strategy according to the format requirement and push the exchange files to the appointed data exchange front-end server.

S2043: and generating Restful interface service by the RestAPI service exchange interface according to the standard format requirement on the data set customized by the exchange strategy, and issuing the Restful interface service on a set server.

The RestAPI service exchange interface can generate a Restful interface service according to the standard format requirement for the data set customized by the exchange policy, and issue the RestAPI service at a specified server. The administrator may publish and approve the data sharing service. The data that can be shared includes DB, HDFS, KAFKA, ES, HTTP, S/FTP, etc., and the data sharing service mode provided by this embodiment includes Pull and Push. For example, a Browser component accesses a shared service request, a Gateway component verifies the shared service request and distributes the shared service request to an AppServer, and a data sharing management APP component in the AppServer performs data sharing and service management.

S2044: and sharing data application and/or data subscription to the published data resources.

Specifically, a sharing request for data application and/or data subscription to the published data resource is received, the sharing request is accessed through a Browser component, verification is performed through Gateway, the sharing request is distributed to an AppServer, and data sharing is performed by a data sharing consumption APP component in the AppServer.

S205: and performing data governance based on the data compilation result, wherein the data governance comprises quality analysis, blood margin analysis and metadata analysis.

Specifically, in this embodiment, the data improvement based on the data collection result specifically includes steps S2051 to S2053:

s2051: accessing a quality analysis request through the Browser component, verifying through the Gateway, distributing the quality analysis request to the AppServer, interacting the data quality APP in the AppServer with the metadata base, and performing quality analysis.

Specifically, for quality analysis, a data quality analysis template can be configured, and a plurality of analysis rules are built in the data quality analysis template, wherein the rules can be customized. And generating a data quality analysis task based on the data quality analysis template, and providing functions of periodic scheduling, result evaluation statistics, task monitoring and the like.

The configuration of the data quality task can access the quality analysis request through the Browser component, the verification is carried out through the Gateway, the quality analysis request is distributed to the AppServer, and the data quality APP in the AppServer interacts with the metadata base to execute the quality analysis task.

Further, the execution of the quality analysis task generates a Pipeline task through a platform for server component, sends the Pipeline task to the Pipeline server component, and submits the cluster execution through a FlowExecutor component in the form of a Spark task.

The data quality analysis is an important step of data management engineering, and the data compliance is guaranteed to be accurate through the verification of the data quality.

S2052: the method comprises the steps of accessing a blood margin analysis request through a Browser component, verifying through Gateway, distributing the blood margin analysis request to an AppServer, interacting the blood margin analysis APP in the AppServer with a metadata base, and performing blood margin analysis.

Specifically, for blood margin analysis, a blood margin analysis template can be configured, a plurality of analysis rules are arranged in the blood margin analysis template, the rules can be customized, and a blood margin analysis task is generated based on the blood margin analysis template. The configuration of the blood margin analysis task can access a blood margin analysis request through the Browser component, verify through the Gateway and distribute the blood margin analysis request to the AppServer, and the blood margin analysis APP in the AppServer interacts with the metadata base to execute the blood margin analysis task, wherein blood margin analysis data is generated by the platform for Server and stored in the metadata base.

The blood relationship analysis can realize the traceability of the data multistage fusion processing process, and is an important function of data management. The user can look over the blood relationship analysis diagram of appointed data set, through the analysis diagram can trace back the front and back incidence relation of different levels of dimensionality such as data set, field, flow, clear up data flow direction and influence.

S2053: the metadata analysis request is accessed through the Browser component, verification is carried out through the Gateway, the metadata analysis request is distributed to the AppServer, the metadata analysis APP in the AppServer interacts with the metadata base, and metadata analysis is carried out.

Specifically, for metadata analysis, a metadata analysis template can be configured, a plurality of analysis rules are built in the metadata analysis template, the rules can be customized, and a metadata analysis task is generated based on the metadata analysis template. The configuration of the metadata analysis task can access a metadata analysis request through a Browser component, the metadata analysis request is verified through Gateway and is distributed to an AppServer, a metadata analysis APP in the AppServer interacts with a metadata database to execute the metadata analysis task, and metadata analysis data is generated by an AppServer-Collector and is stored in the metadata database.

The metadata analysis mainly aims at the relational database, the original incidence relation can be lost after the relational database data are migrated to a big data environment, and the main external key connection and the corresponding relation among the metadata can be reproduced or reconstructed through the metadata analysis function.

S206: and performing operation and maintenance monitoring based on the data collection result, wherein the operation and maintenance monitoring comprises task monitoring, operation and maintenance management and control and access monitoring.

Specifically, the operation and maintenance monitoring based on the data collection result in this embodiment specifically includes steps S2061 to S2063:

s2061: the task monitoring request is accessed through the Browser component, verification is carried out through the Gateway, the task monitoring request is distributed to the AppServer, the task monitoring APP in the AppServer interacts with the metadata base and the log base, and task monitoring is carried out.

Specifically, for task monitoring, a task monitoring request is accessed through a Browser component, verification is performed through Gateway, the task monitoring request is distributed to an AppServer, a task monitoring APP in the AppServer interacts with a metadata base and a log base, and task monitoring is performed, wherein task state information in the task monitoring is provided by a platform server.

By the task monitoring function of the rail transit data, the execution conditions of various tasks can be monitored and managed, such as execution, waiting, success, failure, warning and the like of the tasks.

S2062: the Browser component is accessed to the operation and maintenance management and control request, the Gateway component is used for verifying and distributing the operation and maintenance management and control request to the AppServer, and the operation and maintenance management and control APP in the AppServer interacts with the metadata base and the log base and conducts operation and maintenance management and control.

Specifically, for operation and maintenance control, an operation and maintenance control request is accessed through a Browser component, verification is performed through Gateway, the operation and maintenance control request is distributed to an AppServer, and an operation and maintenance control APP in the AppServer interacts with a metadata base and a log base to perform operation and maintenance control. The state information of each data source and node in the operation and maintenance control is provided by an AppServer-Collector.

Through the operation and maintenance control function of the rail transit data, cluster nodes and data sources can be monitored and managed, such as node offline or online states, data source connection success or failure states and the like.

S2063: accessing the access monitoring request through the Browser component, verifying through the Gateway, distributing the access monitoring request to the AppServer, interacting the access monitoring APP in the AppServer with the metadata base and the log base, and performing access monitoring.

Specifically, for access monitoring, an access monitoring request is accessed through a Browser component, verification is performed through Gateway, the access monitoring request is distributed to an AppServer, an access monitoring APP in the AppServer interacts with a metadata base and a log base to perform access monitoring, and interface access information of the access monitoring is provided by the Gateway.

Through the access monitoring function of the rail transit data, the access request condition can be monitored and managed, such as the access request monitoring management of the number of system requests, the number of successful requests, the access amount of the interface 100 times before the request amount, the time consumed by the interface 100 times before the request amount and the like.

S207: and carrying out system management based on the data collection result, wherein the system management comprises a unified portal, authority management and log audit.

Specifically, in this embodiment, the performing system management based on the data aggregation result specifically includes steps S2071 to S2073:

s2071: the user login request sent by the front end is accessed through the Browser component, the user login request is verified and distributed through the Gateway, the user information is stored in the metadata base, and the request is returned to the front end and displayed according to the user authority.

Specifically, a user login request is accessed through a Browser component, verification and distribution are performed through Gateway, user information is stored in a metadata base, and the request is returned to the front end and displayed according to user authority. The user can check the functional modules in the self role authority after logging in, wherein the functional modules without the authority are hidden, the administrator authority comprises data integration, data management, data analysis, data mining and data monitoring, and the data analyst authority comprises data analysis and data mining.

For system management of the unified portal, a big data base component and a platform application component are provided, wherein the big data base component is transparent to users, and the platform application component provides man-machine interaction and interface access to external users and systems. The functions of the platform application component cover all the processes of data acquisition, processing, storage, calculation, analysis and management, and a plurality of functional components and modules are spanned inside, so that a unified portal is provided for a user, and the user can conveniently log in and use subsequent functions.

S2072: and accessing the authority management request through the Browser component, verifying through the Gateway and distributing the authority management request to the metadata base for authority management.

In the embodiment, a user role authority management mode based on multiple tenants is adopted, cluster resources among the tenants are isolated and do not interfere with each other, and users, corresponding roles and authorities can be created under the tenants.

Specifically, the authority management request is accessed through the Browser component, the authentication is carried out through the Gateway, the authority management request is distributed to the metadata base for authority management, and the increasing, deleting, modifying and checking operations are provided for the user, the role and the authority information.

S2073: the method comprises the steps of accessing a journal audit request through a Browser component, verifying through Gateway, distributing the journal audit request to an AppServer, and performing journal audit through interaction of a journal audit APP in the AppServer, a metadata base and a journal base.

The embodiment records the log of the operation of the user on the system, and provides the inquiry page to facilitate the administrator to audit the operation log. Specifically, a log audit request is accessed through a Browser component, verification is carried out through Gateway, the log audit request is distributed to an AppServer, log audit is carried out through interaction of a log audit APP in the AppServer, a metadata base and a log base, and user operation information is provided by the Gateway.

The data collection, the statistical analysis, the data sharing and the data management are carried out according to the data collection result, the convergence analysis and the sharing management are carried out on the rail transit data, a data island on the rail transit is effectively broken through, and the unified data standard is provided for the data collection and the data arrangement of the rail transit. And the functions of task monitoring, operation and maintenance control, access monitoring, unified portal, authority management, log audit and the like based on a data collection result are provided, and data services such as data acquisition, management, multidimensional analysis and the like can be provided for various professional systems and management systems related to the urban rail transit comprehensive monitoring system, such as communication, signals, power supply and the like. Meanwhile, based on passenger information acquisition and analysis, the system can provide accurate and convenient face identification lockage service of the automatic fare collection system for urban rail transit for passengers, provide decision support for passenger control scheduling based on ticket service and passenger flow data fusion treatment, undertake the requirements of offline storage and batch calculation of subway historical data and state time sequence data, partially calculate application support requirements in real time, meet the requirements of offline and real time application on batch and streaming data processing, calculation and storage, has the characteristics of high fault tolerance, low delay, expandability and the like, realizes effective integration and deep application of cross-professional data, realizes application innovation after fusion of multi-professional heterogeneous data, fundamentally improves the support capability of a basic database, improves data processing efficiency and the response capability of complex application, perfects data integration and management technical means, and provides more abundant contents for deep application, The method has the advantages of being capable of providing more fresh basic data and better serving for intelligent subway construction.

Fig. 3 is a schematic structural diagram of a rail transit data analysis device according to an embodiment of the present application. Referring to fig. 3, the rail transit data analysis apparatus includes a data collection module 31, a data analysis module 32, and a data management module 33.

The data collection module 31 is configured to collect track traffic data, where the track traffic data includes a data source, metadata, a data set, and a data standard; the data analysis module 32 is configured to perform data analysis and statistical analysis based on the data collection result, where the data analysis includes data flow management, item catalog management, and function configuration, and the statistical analysis includes interactive query, report tool, and machine learning; and the data management module 33 is configured to perform data sharing and data governance based on the data aggregation result, where the data sharing includes data sharing management and data sharing charging, and the data governance includes quality analysis, blood relationship analysis, and metadata analysis.

In a possible embodiment, when the data aggregation module 31 performs data aggregation on the rail transit data, the data aggregation module specifically includes:

In a possible embodiment, when the data analysis module 32 performs data analysis based on the data collection result, the data analysis module specifically includes:

In a possible embodiment, when performing the statistical analysis based on the data collection result, the data analysis module 32 specifically includes:

In a possible embodiment, when the data management module 33 performs data sharing based on the data aggregation result, the data sharing method specifically includes:

In a possible embodiment, when performing data governance based on the data aggregation result, the data management module 33 specifically includes:

In a possible embodiment, the device further includes a system operation and maintenance module, configured to perform operation and maintenance monitoring and system management based on the data collection result, where the operation and maintenance monitoring includes task monitoring, operation and maintenance management and access monitoring, and the system management includes unified portal, authority management, and log audit.

In a possible embodiment, when the operation and maintenance module performs operation and maintenance monitoring based on the data collection result, the operation and maintenance monitoring method specifically includes:

In a possible embodiment, when the system operation and maintenance module performs system management based on the data collection result, the method specifically includes:

Fig. 4 is a schematic structural diagram of a rail transit data analysis system according to an embodiment of the present application. The rail transit data analysis system provided by the embodiment is built based on a lambda architecture and is used for realizing the rail transit data analysis method provided by any of the above embodiments.

As shown in fig. 4, the rail transit data analysis system provided in the embodiment of the present application includes a data source layer 41, an exchange aggregation layer 42, a storage calculation layer 43, a statistical analysis layer 44, a data flow layer 45, a data quality layer 46, and a data sharing layer 47, and the above function layers together complete the rail transit data analysis method provided in any of the above embodiments.

The data source layer 41 is configured with a function of collecting rail transit data such as device state, alarm fault, event record, energy consumption statistics, fault recording and train time, the exchange and collection layer 42 is configured with a function of data exchange and collection such as interface adaptation, file transmission, data loading, data conversion and acquisition scheduling, the storage and calculation layer 43 is configured with a function of data warehouse, a storage engine (including storage engines such as distributed files, relational databases, MPP databases and NoSQL) and a calculation engine (including calculation engines such as streaming calculation and offline calculation), the statistical analysis layer 44 is configured with a function of statistical analysis such as data retrieval, multidimensional analysis, data mining, machine learning and self-service reporting, and the data sharing layer 47 is configured with a function of data sharing such as application approval, service release and push subscription.

The embodiment of the application also provides computer equipment which can be integrated with the rail transit data analysis device provided by the embodiment of the application. Fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present application. Referring to fig. 5, the computer apparatus includes: an input device 53, an output device 54, a memory 52, and one or more processors 51; the memory 52 for storing one or more programs; when the one or more programs are executed by the one or more processors 51, the one or more processors 51 are enabled to implement the rail transit data analysis method provided by the above embodiment. Wherein the input device 53, the output device 54, the memory 52 and the processor 51 may be connected by a bus or other means, as exemplified by the bus connection in fig. 5.

The memory 52 is a storage medium readable by a computing device, and can be used for storing software programs, computer executable programs, and modules, such as program instructions/modules corresponding to the rail transit data analysis method according to any embodiment of the present application (for example, the data collection module 31, the data analysis module 32, and the data management module 33 in the rail transit data analysis apparatus). The memory 52 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory 52 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 52 may further include memory located remotely from the processor 51, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 53 may be used to receive input numeric or character information and generate key signal inputs relating to user settings and function control of the apparatus. The output device 54 may include a display device such as a display screen.

The processor 51 executes various functional applications and data processing of the device by running software programs, instructions and modules stored in the memory 52, that is, the above-mentioned rail transit data analysis method is realized.

The rail transit data analysis device, the rail transit data analysis equipment, the rail transit data analysis system and the rail transit data analysis computer can be used for executing the rail transit data analysis method provided by any of the above embodiments, and have corresponding functions and beneficial effects.

Embodiments of the present application further provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the rail transit data analysis method provided in the above embodiments, where the rail transit data analysis method includes: performing data collection on rail transit data, wherein the rail transit data comprises a data source, metadata, a data set and a data standard; performing data analysis and statistical analysis based on the data collection result, wherein the data analysis comprises data flow management, item catalog management and function configuration, and the statistical analysis comprises interactive query, report tools and machine learning; and performing data sharing and data governance based on the data collection result, wherein the data sharing comprises data sharing management and data sharing charging, and the data governance comprises quality analysis, blood relationship analysis and metadata analysis.

Storage medium-any of various types of memory devices or storage devices. The term "storage medium" is intended to include: mounting media such as CD-ROM, floppy disk, or tape devices; computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, Lanbas (Rambus) RAM, etc.; non-volatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. In addition, the storage medium may be located in a first computer system in which the program is executed, or may be located in a different second computer system connected to the first computer system through a network (such as the internet). The second computer system may provide program instructions to the first computer for execution. The term "storage medium" may include two or more storage media that may reside in different locations, such as in different computer systems that are connected by a network. The storage medium may store program instructions (e.g., embodied as a computer program) that are executable by one or more processors.

Of course, the storage medium provided in the embodiments of the present application includes computer-executable instructions, and the computer-executable instructions are not limited to the rail transit data analysis method described above, and may also perform related operations in the rail transit data analysis method provided in any embodiment of the present application.

The rail transit data analysis device, the rail transit data analysis system, and the storage medium provided in the above embodiments may execute the rail transit data analysis method provided in any embodiments of the present application, and reference may be made to the rail transit data analysis method provided in any embodiments of the present application without detailed technical details described in the above embodiments.

The foregoing is considered as illustrative of the preferred embodiments of the invention and the technical principles employed. The present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the claims.

Claims

1. A rail transit data analysis method is characterized by comprising the following steps:

2. The rail transit data analysis method according to claim 1, wherein the data gathering of rail transit data includes:

3. The rail transit data analysis method of claim 1, wherein performing data analysis based on the data compilation results comprises:

4. The rail transit data analysis method of claim 1, wherein performing statistical analysis based on the data compilation results comprises:

5. The rail transit data analysis method of claim 1, wherein the data sharing based on the data aggregation result comprises:

6. The rail transit data analysis method of claim 1, wherein performing data governance based on the data compilation result comprises:

7. The rail transit data analysis method according to claim 1, wherein after the data collection of the rail transit data, the method further comprises:

8. The rail transit data analysis method of claim 7, wherein the operation and maintenance monitoring based on the data collection result comprises:

9. The rail transit data analysis method according to claim 7, wherein performing system management based on the data aggregation result includes:

10. The rail transit data analysis device is characterized by comprising a data collection module, a data analysis module and a data management module, wherein:

11. The rail transit data analysis device according to claim 10, further comprising a system operation and maintenance module, configured to perform operation and maintenance monitoring and system management based on the data collection result, where the operation and maintenance monitoring includes task monitoring, operation and maintenance management and access monitoring, and the system management includes unified portal, authority management and log audit.

12. A rail transit data analysis system, characterized in that the rail transit data analysis system is built based on lambda architecture and is used to implement the rail transit data analysis method according to any one of claims 1 to 9.

13. A computer device, comprising: a memory and one or more processors;

the memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the rail transit data analysis method of any of claims 1-9.

14. A storage medium containing computer-executable instructions for performing the rail transit data analysis method of any of claims 1-9 when executed by a computer processor.