CN117093617A - Rail transit data analysis method, system, storage medium and electronic equipment - Google Patents

Rail transit data analysis method, system, storage medium and electronic equipment Download PDF

Info

Publication number
CN117093617A
CN117093617A CN202311068956.0A CN202311068956A CN117093617A CN 117093617 A CN117093617 A CN 117093617A CN 202311068956 A CN202311068956 A CN 202311068956A CN 117093617 A CN117093617 A CN 117093617A
Authority
CN
China
Prior art keywords
data
service
historical
real
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311068956.0A
Other languages
Chinese (zh)
Inventor
李松昂
于增
孙方
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Rail Transport Roa Network Management Co ltd
Original Assignee
Beijing Rail Transport Roa Network Management Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Rail Transport Roa Network Management Co ltd filed Critical Beijing Rail Transport Roa Network Management Co ltd
Priority to CN202311068956.0A priority Critical patent/CN117093617A/en
Publication of CN117093617A publication Critical patent/CN117093617A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a rail transit data analysis method, a rail transit data analysis system, a storage medium and electronic equipment. The method comprises the following steps: collecting historical data of a track traffic management and control system in a first time period and real-time data of the track traffic management and control system in a second time period; storing historical data through an offline data warehouse and storing real-time data through a distributed event stream platform, extracting the historical data to obtain first service index data, and extracting the real-time data to obtain second service index data; periodically synchronizing the first service index data and the second service index data through a column-type database management system to obtain access data; acquiring a data query instruction through a preset query interface, and displaying corresponding target access data through the query interface based on the access data. The application solves the technical problems that the prior art is difficult to support the processing of big data by adopting the traditional data warehouse and the prior data application support is defective by adopting the big data technology.

Description

Rail transit data analysis method, system, storage medium and electronic equipment
Technical Field
The application relates to the technical field of rail transit, in particular to a rail transit data analysis method, a system, a storage medium and electronic equipment.
Background
Along with the increase of enterprise business scale and the rapid development of digital transformation, in order to realize internal unified data storage and analysis, data warehouse system projects are generally implemented, and data of all internal systems are summarized, cleaned and converted according to data standardization requirements, and finally unified storage is used for in-line data statistics and analysis. The concept of big data has been popular in recent years, and storage, processing and analysis technologies for big data have also been rapidly developed. However, the big data technology has some limitations, such as Hadoop technology has great advantages in terms of oversized files, stream data processing and the like, but has great defects in support of low-delay data access, data write-many and large-volume small-file processing.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The embodiment of the application provides a rail transit data analysis method, a rail transit data analysis system, a storage medium and electronic equipment, which at least solve the technical problems that the related technology is difficult to support large data processing by adopting a traditional data warehouse and the large data technology has defects in traditional data application support.
According to an aspect of an embodiment of the present application, there is provided a rail transit data analysis method including: collecting historical data of a track traffic management and control system in a first time period and real-time data of the track traffic management and control system in a second time period, wherein the first time period is larger than the second time period; storing historical data through an offline data warehouse and storing real-time data through a distributed event stream platform, extracting the historical data to obtain first service index data, and extracting the real-time data to obtain second service index data; periodically synchronizing the first service index data and the second service index data through a column-type database management system to obtain access data; acquiring a data query instruction through a preset query interface, and displaying target access data corresponding to the data query instruction through the query interface based on the access data.
Optionally, storing the historical data through an offline data warehouse and storing the real-time data through a distributed-based event stream platform, comprising: preprocessing operation is respectively carried out on the historical data and the real-time data, so that the processed historical data and real-time data are obtained; the processed historical data is stored through an offline data warehouse, and the processed real-time data is stored through a distributed event stream based platform.
Optionally, extracting the historical data to obtain the first business index data includes: dividing the historical data stored in the offline data warehouse according to a preset service classification standard to obtain historical service data corresponding to multiple types of track traffic services, wherein the real-time historical service data corresponding to the multiple types of track traffic services comprises at least one of the following: passenger flow data, clearing data, service data, emergency data, ticket data, asset data, security data and basic data; extracting historical service data corresponding to various track traffic services to obtain first service index data corresponding to each type of track traffic service, wherein the first service index data comprises at least one of the following: basic index class data, passenger flow information class data, train operation class data, equipment information class data, clearing information class data, ticket information class data and service information class data.
Optionally, the synchronizing, by the columnar database management system, the first traffic index data and the second traffic index data periodically to obtain access data includes: periodically synchronizing the first service index data and the second service index data to a columnar database management system according to a preset synchronization rule to obtain access data, and generating an access data report for storing the access data, wherein the preset synchronization rule comprises at least one of the following: a first synchronization rule based on the statistical granularity of the business index, a second synchronization rule based on the time dimension of the data statistics, a third synchronization rule based on the space dimension of the data statistics, and a fourth synchronization rule based on the access frequency of the data access.
Optionally, the method further comprises: the queried historical access data is stored in a memory-based key value database for a third time period, wherein the third time period is greater than the second time period.
Optionally, the data query instruction is obtained through a preset query interface, and the target access data corresponding to the data query instruction is displayed through the query interface based on the access data, including: acquiring a data query instruction through a preset query interface, wherein the query interface comprises at least one of the following: the method comprises the steps of integrating an analysis service interface, a passenger flow prediction service interface and a passenger portrait service interface; accessing the access data through Java database connection, and displaying target access data corresponding to the data query instruction through the query interface.
Optionally, the method further comprises: the track traffic management and control system comprises at least one of the following: the system comprises an road network command and dispatch system, an automatic ticket selling and checking system, a ticket management system, an information technology service management system, an asset management system and a security system.
According to another aspect of the embodiment of the present application, there is also provided a rail transit data analysis system including: the data acquisition module is used for acquiring historical data of the track traffic management and control system in a first time period and real-time data of the track traffic management and control system in a second time period, wherein the first time period is greater than the second time period; the data storage management module is used for storing historical data through an offline data warehouse and storing real-time data through a distributed event stream platform, extracting the historical data to obtain first business index data, and extracting the real-time data to obtain second business index data; the data access module is used for periodically synchronizing the first service index data and the second service index data through the column-type database management system to obtain access data; the business application module is used for acquiring a data query instruction through a preset query interface and displaying target access data corresponding to the data query instruction through the query interface based on the access data.
According to another aspect of the embodiment of the present application, there is also provided a nonvolatile storage medium, where the nonvolatile storage medium includes a stored program, and a device where the nonvolatile storage medium is located executes the above-mentioned track traffic data analysis method by running the program.
According to another aspect of the embodiment of the present application, there is also provided an electronic device including: the system comprises a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the rail transit data analysis method through the computer program.
In the embodiment of the application, historical data of a track traffic management and control system in a first time period and real-time data of the track traffic management and control system in a second time period are collected, wherein the first time period is greater than the second time period; storing historical data through an offline data warehouse and storing real-time data through a distributed event stream platform, extracting the historical data to obtain first service index data, and extracting the real-time data to obtain second service index data; periodically synchronizing the first service index data and the second service index data through a column-type database management system to obtain access data; acquiring a data query instruction through a preset query interface, and displaying target access data corresponding to the data query instruction through the query interface based on the access data. The offline data warehouse and the column type database management system are combined to analyze and mine the rail transit business data, so that the advantages of offline calculation and low cost of the offline data warehouse and the high-performance query advantage of the column type database management system are utilized to realize big data management of the rail transit industry, and further the technical problems that the traditional data warehouse is difficult to support big data processing in the related technology and the traditional data application support is defective in the adoption of the big data technology are solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a flow chart of an alternative rail transit data analysis method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of an alternative real-time passenger flow indicator data according to an embodiment of the application;
FIG. 3 is a schematic architecture diagram of an alternative rail transit data analysis system in accordance with an embodiment of the present application;
FIG. 4 is a schematic diagram of an alternative access data table according to an embodiment of the application;
FIG. 5 is a schematic diagram of an alternative access data table corresponding to a daily granularity inbound/outbound metric in accordance with an embodiment of the present application;
FIG. 6 is a schematic diagram of an alternative rail transit data analysis system in accordance with an embodiment of the present application;
FIG. 7 is a schematic architecture diagram of another alternative rail transit data analysis system in accordance with an embodiment of the present application;
fig. 8 is a schematic structural view of an alternative electronic device according to an embodiment of the present application.
Detailed Description
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and the claims and drawings of the present application are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In addition, the related information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party. For example, an interface is provided between the system and the relevant user or institution, before acquiring the relevant information, the system needs to send an acquisition request to the user or institution through the interface, and acquire the relevant information after receiving the consent information fed back by the user or institution.
Example 1
In a statistical analysis scene of large data volume at a mobile terminal, a rule in data is analyzed through multi-dimensional multi-view data query visualization, and the method is mainly characterized in that: the data quantity of each query of the user is large, and the number of detail data queries is 50-100GB; the dimension and index flexibility of user inquiry is high, indexes are freely combined according to requirements, and the system cannot preset functions and data in advance; the user requests data anytime and anywhere through the mobile device, the concurrency of the query is high, 3000-5000 requests can be generated per second, the traditional data warehouse Hadoop is difficult to support large data processing at present, and the commercial MPP data warehouse has relatively high performance, but faces high construction cost and later operation and maintenance cost.
In order to solve the above-described problems, embodiments of the present application provide a rail transit data analysis method, it should be noted that the steps illustrated in the flowcharts of the drawings may be performed in a computer system such as a set of computer executable instructions, and although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different from that herein.
Fig. 1 is a schematic flow chart of an alternative rail transit data analysis method according to an embodiment of the present application, as shown in fig. 1, the method at least includes steps S102-S108, wherein:
step S102, historical data of the track traffic management and control system in a first time period and real-time data of the track traffic management and control system in a second time period are collected.
In the technical scheme provided in step S102, historical data, also called "offline data", of each track traffic control system in a past period of time is obtained through a data analysis platform, and real-time data, also called "near line data", of each track traffic control system is obtained, wherein the real-time data in the embodiment of the application is mainly real-time transaction details. The data analysis platform acquires service data from each track traffic control system, so the track traffic control system can be called as a data source. In addition, since the time periods of acquiring the history data and the real-time data are not the same, it is necessary to ensure that the first time period of acquiring the history data is longer than the second time period of acquiring the real-time data, that is, the time of acquiring the history data is longer than or equal to the time of acquiring the real-time data.
Optionally, the track traffic management system includes at least one of the following: the system comprises an road network command and dispatch system, an automatic ticket selling and checking system, a ticket management system, an information technology service management system, an asset management system and a security system.
The road network commanding and dispatching system is used for real-time dispatching and controlling traffic flow through devices such as traffic signal lamps, traffic cameras and vehicle navigation systems in the cities in real time so as to improve traffic efficiency, reduce traffic jams and improve traffic safety. The automatic ticket selling and checking clearing system comprises an automatic ticket vending machine, a ticket checking machine and a charging system and is used for realizing the functions of automatic ticket vending of public transportation vehicles, self-service ticket checking of passengers and clearing and settling of transactions. The ticket management system is used for managing ticket business in the rail transit system, and comprises ticket selling, ticket checking, ticket price calculation, seat reservation and the like, and the functions of the ticket management system comprise real-time inquiry of remaining tickets, ticket price adjustment, ticket statistics, analysis and the like; the information technology service management system is used for managing information technology services in the rail transit system and comprises a server, network equipment, data storage equipment and the like, and the functions of the information technology service management system comprise equipment monitoring, fault diagnosis and maintenance, software updating and upgrading and the like; the asset management system is used for managing assets in the rail transit system, including vehicles, equipment, facilities and the like, and the functions of the asset management system include registering and archiving of the assets, maintenance and maintenance planning, asset scheduling and scrapping and the like; the security system is used for guaranteeing the safety and protection of the rail transit system, and comprises a video monitoring system, an alarm system, an access control system and the like, and the functions of the security system comprise real-time monitoring, abnormal alarm, event recording and playback and the like.
Specifically, the real-time data in the embodiment of the application can be real-time card swiping transaction data; the historical data includes system data obtained from within each of the rail transit control systems.
Step S104, historical data is stored through an offline data warehouse, real-time data is stored through a distributed event stream platform, the historical data is extracted to obtain first business index data, and the real-time data is extracted to obtain second business index data.
In the technical solution provided in step S104, the historical data and the real-time data acquired by the data analysis platform from the data source may be stored through an offline data warehouse and a distributed event stream platform, respectively.
Specifically, in the embodiment of the present application, hive is preferably used as an offline data warehouse, which may be used to store a full amount of historical data, and extract the historical data stored in the offline data warehouse Hive to obtain the first business index data.
And using Kafka as a real-time data calculation component for loading and summarizing real-time transaction details, and extracting real-time data stored in the distributed event stream platform based on Kafka to obtain second business index data, wherein the second business index data is real-time passenger flow index data, and comprises the following steps: real-time inbound volume index, real-time outbound volume index, real-time transfer volume index, etc., as shown in fig. 2.
As an optional implementation manner, in the solution provided in the step S104, storing, by the offline data warehouse, the historical data and storing, by the distributed-based event stream platform, the real-time data may include: preprocessing operation is respectively carried out on the historical data and the real-time data, so that the processed historical data and real-time data are obtained; the processed historical data is stored through an offline data warehouse, and the processed real-time data is stored through a distributed event stream based platform.
In this embodiment, when the data analysis platform stores service data from a data source, firstly, a preprocessing operation needs to be performed on the service data, which includes data cleaning and data conversion, wherein the first step of data cleaning is deviation detection, and a plurality of factors causing deviation include unreasonable input forms with a plurality of optional fields, artificial data input errors, intentional errors and data degradation; the scattered, scrambled data is then sorted for subsequent analysis and mining. Data conversion is the conversion of data into a format and structure suitable for analysis and mining, such as the conversion of unstructured data into structured data. The quality and consistency of the data can be effectively improved by performing the preprocessing operation.
In addition, the embodiment of the application adopts an offline data warehouse Hive for storing historical data, which has the following advantages: 1. has strong data processing capability. Hive, as a data warehouse tool based on Hadoop, can process large-scale data and perform analysis and query operations on complex data; 2. the data format is flexible. Hive supports various data formats including text, sequence files and the like, so that a user can select a proper data format for storage and inquiry according to own requirements; 3. efficient data compression and storage. The Hive supports the compression and storage of data, can effectively reduce the occupation of storage space, and improve the inquiry performance; 4. hive can be integrated with other data processing tools such as Spark, pig, etc., thereby expanding its functionality and application scenarios.
Whereas storing real-time data employs a distributed event stream based platform Kafka, where Kafka is a high throughput distributed publish-subscribe messaging system that can handle all action stream data of consumers in a web site.
As an optional implementation manner, in the technical solution provided in step S104, extracting the history data to obtain the first business index data may include: dividing the historical data stored in the offline data warehouse according to a preset service classification standard to obtain historical service data corresponding to multiple types of track traffic services, wherein the real-time historical service data corresponding to the multiple types of track traffic services comprises at least one of the following: passenger flow data, clearing data, service data, emergency data, ticket data, asset data, security data and basic data; extracting historical service data corresponding to various track traffic services to obtain first service index data corresponding to each type of track traffic service, wherein the first service index data comprises at least one of the following: basic index class data, passenger flow information class data, train operation class data, equipment information class data, clearing information class data, ticket information class data and service information class data.
Specifically, fig. 3 is a schematic architecture diagram of an alternative rail transit data analysis system according to an embodiment of the present application, and as shown in fig. 3, hive is mainly divided into three layers: paste source layer, basal layer, summarization layer, wherein:
the source pasting layer is used for storing historical data of each track traffic control system in the collected data source layer, namely, the source pasting layer is used for storing the historical data;
the base layer is used for storing mild summary data based on historical data, can uniformly process and integrate the historical data stored in the source layer according to the requirements of rail traffic standards, and stores historical data areas with detail granularity, for example, the base layer can store service data which are consistent and standard according to the service requirements of each service department of a certain urban subway. That is, the base layer divides the historical data stored in the offline data warehouse according to a preset service classification standard to obtain historical service data corresponding to multiple types of track traffic services, wherein the real-time historical service data corresponding to the multiple types of track traffic services comprises at least one of the following: passenger flow data, clearing data, service data, emergency data, ticket data, asset data, security data and basic data;
The summarizing layer is used for storing business index data, and is used for refining data access and statistics requirements which are common to an offline data warehouse Hive from the view point of business requirements, so that public data which is applied to requirements and provides shared data access service is constructed, the data flow direction of the public data is that data is extracted from the basic layer, and the data display requirements of upstream applications are met after targeted summarizing processing. That is, the summary layer is configured to extract historical service data corresponding to various track traffic services, so as to obtain first service index data corresponding to each type of track traffic service, where the first service index data includes at least one of the following: basic index class data, passenger flow information class data, train operation class data, equipment information class data, clearing information class data, ticket information class data and service information class data.
In addition, the mild summary data in the base layer can also be used as a data source of the summary layer to be directly opened to advanced data analysts for deep flexible query and data mining.
It should be noted that, in the embodiment of the present application, when the real-time passenger flow index is used as the second service index data corresponding to the real-time data, the first service index data corresponding to the selected historical data may be used to make a key statistics on the basic index data and the derivative index data related to the passenger flow, where the basic index data includes but is not limited to: passenger flow, inbound volume, outbound volume, transfer volume, section capacity, etc.; the derivative index data can be specifically divided into five categories, namely passenger flow basic index data, passenger travel characteristic index data, unbalanced coefficient index data, line definition index data and financial information index data.
Step S106, the first business index data and the second business index data are periodically synchronized through the column-type database management system, and access data are obtained.
In the technical solution provided in step S106, the columnar database management system in the embodiment of the present application is preferably a Clickhouse, where Clickhouse is a columnar storage database of an MPP architecture for online analysis processing query (Online Analytical Processing, OLAP), that is, it is an MPP high performance database system based on Hive. The data stored by clickhouses is typically recently accessed at a high frequency, or is common public data, as compared to Hive's full history data store. Therefore, in the embodiment of the application, the first business index data stored in Hive and the second business index data stored in Kafka can be synchronized into the Clickhouse at regular time according to the update frequency of the data, and meanwhile, the historical data is cleaned regularly, so that the Clickhouse is kept at a stable data magnitude.
As an optional implementation manner, in the technical solution provided in step S106, the method may include: and periodically synchronizing the first service index data and the second service index data to a column-type database management system according to a preset synchronization rule to obtain access data, and generating an access data report for storing the access data.
In this embodiment, to streamline the features of high performance queries using the columnar database management system Clickhouse, the data is deposited according to the following synchronization rules: a first synchronization rule based on the statistical granularity of the business index, a second synchronization rule based on the time dimension of the data statistics, a third synchronization rule based on the space dimension of the data statistics, and a fourth synchronization rule based on the access frequency of the data access.
For example, according to the statistical granularity of the index, the index above 15 minutes granularity is stored to the column-based database management system Clickhouse; according to the counted time integral points, storing indexes of integral point time statistics into a column database management system Clickhouse; according to the counted space dimension, storing the index above the station dimension into a column database management system Clickhouse; according to the frequency of data access, an index of the average access frequency at 60 times per hour is stored in the column database management system Clickhouse.
In order to meet the efficient query performance of the application system, the structural design of the access data table in the column-based database management system Clickhouse is performed according to the following rules: according to the statistical granularity branch table, all dimension fields supported by indexes are stored and complemented, so that the data volume of a single table can be reduced, and the aggregation query of any dimension can be supported; the indexes with the same dimension are combined into one table, so that the space occupation is reduced; the common complex report forms are used for inquiring and pre-calculating, so that the inquiring efficiency is improved; all queries in the column database management system Clickhouse are single table queries, restricting the use of join.
For example, fig. 4 is a schematic diagram of an alternative access data table structure according to an embodiment of the present application, as shown in fig. 4, where the access data table may be divided into five different tables according to a statistical granularity, and the five different tables are a month inbound amount, a day inbound amount, an hour inbound amount, a 30 minute inbound amount, and a 15 minute inbound amount. Taking the date granularity in-out quantity index as an example, the table design can be divided into three categories including space category, ticket category, measurement category and the like according to the illustration of fig. 5.
Step S108, acquiring a data query instruction through a preset query interface, and displaying target access data corresponding to the data query instruction through the query interface based on the access data.
In the technical scheme provided in step S108, an analyst may obtain a data query instruction through a preset query interface in the data analysis platform, thereby obtaining target access data corresponding to the data query instruction from the access data stored in the column database management system, and displaying the target access data in the query interface.
As an optional implementation manner, in the technical solution provided in step S108, the method may include: acquiring a data query instruction through a preset query interface, wherein the query interface comprises at least one of the following: the method comprises the steps of integrating an analysis service interface, a passenger flow prediction service interface and a passenger portrait service interface; accessing the access data through Java database connection, and displaying target access data corresponding to the data query instruction through the query interface.
In this embodiment, an analyst may obtain a data query instruction through a preset query interface on the data analysis platform, where the query interface may be a query interface corresponding to different service applications, and the service applications in the embodiment of the present application include, but are not limited to, fusion analysis, passenger image, passenger flow prediction, and palm road network, so that the query interface is a fusion analysis service interface, a passenger flow prediction service interface, a passenger portrait service interface, and a palm road network interface, respectively. And then accessing the Clickhouse cluster through Java database connection (namely JDBC database connection) to quickly respond to the data and acquire corresponding target access data.
In addition, the queried historical access data is stored via the memory-based key value database for a third time period, wherein the third time period is greater than the second time period.
That is, if the same business index data has been queried by the data analysis platform, the business index data is automatically cached in a memory-based key value database, such as Redis, so that the result can be quickly returned when the business index data is queried next time without revising the Clickhouse cluster.
Based on the above-described schemes defined in steps S102 to S108, it may be known that, in an embodiment, historical data of the track traffic management system in a first period of time and real-time data in a second period of time are collected, where the first period of time is greater than the second period of time; storing historical data through an offline data warehouse and storing real-time data through a distributed event stream platform, extracting the historical data to obtain first service index data, and extracting the real-time data to obtain second service index data; periodically synchronizing the first service index data and the second service index data through a column-type database management system to obtain access data; acquiring a data query instruction through a preset query interface, and displaying target access data corresponding to the data query instruction through the query interface based on the access data.
Therefore, through the technical scheme of the embodiment of the application, based on the architecture design thought of Hadoop+MPP, hive and Clickhouse are combined, wherein data in the Clickhouse are always stored according to columns, and operations are all allocated based on vectors instead of single values, so that the method is called as 'vectorized query execution', is beneficial to reducing the cost of actual data processing, and is more suitable for big data analysis scene application at a mobile terminal; meanwhile, based on the technical architecture advantage of clickhouse, each type of designed data table comprises all dimensionalities supported by service indexes in order to meet the requirement of free and flexible efficient data query, and each data table only stores service index data with statistical granularity, so that SQL (structured query language) of the data query can be standardized, SQL can be formed by free combination based on a unified access data template, the interaction requirement of flexible query of a mobile terminal is met, and the technical problems that the processing of big data is difficult to support by adopting a traditional data warehouse in the related technology, and the defect exists in the traditional data application support by adopting a big data technology are solved.
Example 2
According to an embodiment of the present application, there is further provided a rail transit data analysis system for implementing the above rail transit data analysis method, as shown in fig. 6, where the rail transit data analysis system at least includes a data acquisition module 61, a data storage management module 62, a data access module 63, and a service application module 64, where:
the data acquisition module 61 is configured to acquire historical data of the track traffic management and control system in a first time period and real-time data of the track traffic management and control system in a second time period, where the first time period is greater than the second time period;
the data storage management module 62 is configured to store historical data through an offline data warehouse and store real-time data through a distributed event stream platform, extract the historical data to obtain first business index data, and extract the real-time data to obtain second business index data;
a data access module 63, configured to periodically synchronize, by using the columnar database management system, the first service index data and the second service index data to obtain access data;
the service application module 64 is configured to obtain a data query instruction through a preset query interface, and display target access data corresponding to the data query instruction through the query interface based on the access data.
Specifically, fig. 7 is a schematic architecture diagram of another alternative rail transit data analysis system according to an embodiment of the present application, and as shown in fig. 7, the data storage management module 63 in the embodiment of the present application is implemented on an offline data warehouse Hive and a distributed event stream based platform Kafka; the data access module 64 in the embodiment of the present application is implemented by a column-based database management system Clickhouse and a memory-based key-value database dis; the query interface in the service application module 64 in the embodiment of the present application is a query interface corresponding to the application 1, the application 2 or the application 3.
It should be noted that, each module in the track traffic data analysis system in the embodiment of the present application corresponds to each implementation step of the track traffic data analysis method in embodiment 1 one by one, and since detailed description has been made in embodiment 1, details not shown in part in this embodiment may refer to embodiment 1, and will not be repeated here.
Example 3
According to an embodiment of the present application, there is also provided a nonvolatile storage medium having a program stored therein, wherein the device in which the nonvolatile storage medium is controlled to execute the rail transit data analysis method in embodiment 1 when the program runs.
Optionally, the device where the nonvolatile storage medium is located performs the following steps by running the program:
step S102, collecting historical data of a track traffic management and control system in a first time period and real-time data of the track traffic management and control system in a second time period, wherein the first time period is larger than the second time period;
step S104, storing historical data through an offline data warehouse and storing real-time data through a distributed event stream platform, extracting the historical data to obtain first business index data, and extracting the real-time data to obtain second business index data;
step S106, periodically synchronizing the first service index data and the second service index data through a column-type database management system to obtain access data;
step S108, acquiring a data query instruction through a preset query interface, and displaying target access data corresponding to the data query instruction through the query interface based on the access data.
According to an embodiment of the present application, there is also provided a processor for running a program, wherein the program executes the rail transit data analysis method in embodiment 1.
Optionally, the program execution realizes the following steps:
Step S102, collecting historical data of a track traffic management and control system in a first time period and real-time data of the track traffic management and control system in a second time period, wherein the first time period is larger than the second time period;
step S104, storing historical data through an offline data warehouse and storing real-time data through a distributed event stream platform, extracting the historical data to obtain first business index data, and extracting the real-time data to obtain second business index data;
step S106, periodically synchronizing the first service index data and the second service index data through a column-type database management system to obtain access data;
step S108, acquiring a data query instruction through a preset query interface, and displaying target access data corresponding to the data query instruction through the query interface based on the access data.
There is further provided, in accordance with an embodiment of the present application, an electronic device, where fig. 8 is a schematic structural diagram of an alternative electronic device according to an embodiment of the present application, and as shown in fig. 8, the electronic device includes one or more processors; and a memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for running the programs, wherein the programs are configured to perform the rail transit data analysis method in embodiment 1 described above when run.
Optionally, the processor is configured to implement the following steps by computer program execution:
step S102, collecting historical data of a track traffic management and control system in a first time period and real-time data of the track traffic management and control system in a second time period, wherein the first time period is larger than the second time period;
step S104, storing historical data through an offline data warehouse and storing real-time data through a distributed event stream platform, extracting the historical data to obtain first business index data, and extracting the real-time data to obtain second business index data;
step S106, periodically synchronizing the first service index data and the second service index data through a column-type database management system to obtain access data;
step S108, acquiring a data query instruction through a preset query interface, and displaying target access data corresponding to the data query instruction through the query interface based on the access data.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of units may be a logic function division, and there may be another division manner in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims (10)

1. A method of analyzing rail transit data, comprising:
collecting historical data of a track traffic management and control system in a first time period and real-time data of the track traffic management and control system in a second time period, wherein the first time period is larger than the second time period;
storing the historical data through an offline data warehouse and storing the real-time data through a distributed event stream platform, extracting the historical data to obtain first business index data, and extracting the real-time data to obtain second business index data;
periodically synchronizing the first business index data and the second business index data through a column-type database management system to obtain access data;
acquiring a data query instruction through a preset query interface, and displaying target access data corresponding to the data query instruction through the query interface based on the access data.
2. The method of claim 1, wherein storing the historical data via an offline data warehouse and storing the real-time data via a distributed-based event stream platform comprises:
preprocessing the historical data and the real-time data respectively to obtain the processed historical data and the processed real-time data;
And storing the processed historical data through the offline data warehouse, and storing the processed real-time data through the distributed event stream based platform.
3. The method of claim 2, wherein extracting the historical data to obtain first business index data comprises:
dividing the historical data stored in the offline data warehouse according to a preset service classification standard to obtain historical service data corresponding to multiple types of track traffic services, wherein the real-time historical service data corresponding to the multiple types of track traffic services comprises at least one of the following: passenger flow data, clearing data, service data, emergency data, ticket data, asset data, security data and basic data;
extracting historical service data corresponding to various track traffic services to obtain first service index data corresponding to each track traffic service, wherein the first service index data comprises at least one of the following: basic index class data, passenger flow information class data, train operation class data, equipment information class data, clearing information class data, ticket information class data and service information class data.
4. The method of claim 1, wherein periodically synchronizing the first business index data and the second business index data by a columnar database management system to obtain access data comprises:
periodically synchronizing the first service index data and the second service index data to the column database management system according to a preset synchronization rule to obtain the access data, and generating an access data report for storing the access data, wherein the preset synchronization rule comprises at least one of the following: a first synchronization rule based on the statistical granularity of the business index, a second synchronization rule based on the time dimension of the data statistics, a third synchronization rule based on the space dimension of the data statistics, and a fourth synchronization rule based on the access frequency of the data access.
5. The method according to claim 1, wherein the method further comprises: the queried historical access data is stored in a third time period through a key value database based on a memory, wherein the third time period is larger than the second time period.
6. The method of claim 1, wherein obtaining a data query via a preset query interface, and displaying target access data corresponding to the data query via the query interface based on the access data, comprises:
Acquiring a data query instruction through the preset query interface, wherein the query interface comprises at least one of the following: the method comprises the steps of integrating an analysis service interface, a passenger flow prediction service interface and a passenger portrait service interface;
accessing the access data through Java database connection, and displaying the target access data corresponding to the data query instruction through the query interface.
7. The method according to claim 1, wherein the method further comprises: the track traffic management and control system comprises at least one of the following components: the system comprises an road network command and dispatch system, an automatic ticket selling and checking system, a ticket management system, an information technology service management system, an asset management system and a security system.
8. A rail transit data analysis system, comprising:
the data acquisition module is used for acquiring historical data of the track traffic management and control system in a first time period and real-time data of the track traffic management and control system in a second time period, wherein the first time period is greater than the second time period;
the data storage management module is used for storing the historical data through an offline data warehouse and storing the real-time data through a distributed event stream platform, extracting the historical data to obtain first business index data, and extracting the real-time data to obtain second business index data;
The data access module is used for periodically synchronizing the first service index data and the second service index data through a column-type database management system to obtain access data;
the business application module is used for acquiring a data query instruction through a preset query interface and displaying target access data corresponding to the data query instruction through the query interface based on the access data.
9. A non-volatile storage medium, characterized in that the non-volatile storage medium comprises a stored program, wherein a device in which the non-volatile storage medium is located performs the rail transit data analysis method according to any one of claims 1 to 7 by running the program.
10. An electronic device, comprising: a memory and a processor, wherein the memory has stored therein a computer program, the processor being configured to perform the rail transit data analysis method of any one of claims 1-7 by the computer program.
CN202311068956.0A 2023-08-23 2023-08-23 Rail transit data analysis method, system, storage medium and electronic equipment Pending CN117093617A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311068956.0A CN117093617A (en) 2023-08-23 2023-08-23 Rail transit data analysis method, system, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311068956.0A CN117093617A (en) 2023-08-23 2023-08-23 Rail transit data analysis method, system, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN117093617A true CN117093617A (en) 2023-11-21

Family

ID=88776520

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311068956.0A Pending CN117093617A (en) 2023-08-23 2023-08-23 Rail transit data analysis method, system, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN117093617A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117993695A (en) * 2024-04-07 2024-05-07 江苏金恒信息科技股份有限公司 Regulation and control method for realizing integrated calculation of flow batch under industrial data management

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117993695A (en) * 2024-04-07 2024-05-07 江苏金恒信息科技股份有限公司 Regulation and control method for realizing integrated calculation of flow batch under industrial data management
CN117993695B (en) * 2024-04-07 2024-06-04 江苏金恒信息科技股份有限公司 Regulation and control method for realizing integrated calculation of flow batch under industrial data management

Similar Documents

Publication Publication Date Title
CN109993661B (en) Insurance claim settlement data analysis method and system
CN112001586A (en) Enterprise networking big data audit risk control architecture based on block chain consensus mechanism
CN117093617A (en) Rail transit data analysis method, system, storage medium and electronic equipment
CN110647563A (en) Data processing method, device and equipment
US20240095256A1 (en) Method and system for persisting data
WO2012054572A2 (en) Computer metrics system and process for implementing same
CN109145109B (en) User group message propagation abnormity analysis method and device based on social network
CN116483822B (en) Service data early warning method, device, computer equipment and storage medium
CN113704178A (en) Big data management method, system, electronic device and storage medium
CN112258220A (en) Information acquisition and analysis method, system, electronic device and computer readable medium
CN109886318B (en) Information processing method and device and computer readable storage medium
CN106709029A (en) File hierarchical processing method and processing system based on Hadoop and MySQL
Uçak et al. A scalable platform for big data analysis in public transport
CN116795816A (en) Stream processing-based multi-bin construction method and system
Prabawa et al. Analysis and Design Data Warehouse For E-Travel Business Optimization
CN114312930A (en) Train operation abnormity diagnosis method and device based on log data
CN115114285A (en) Management method and system for multi-advertisement platform delivery data
Andriansyah et al. The Application of Power Business Intelligence in Analyzing the Availability of Rental Units
CN113505172B (en) Data processing method, device, electronic equipment and readable storage medium
CN111582743A (en) Big data analysis and application method for parking
Maktoubian et al. Analyzing large-scale smart card data to investigate public transport travel behaviour using big data analytics
CN111259423A (en) Method and system for applying car insurance information across boundary
US20140278752A1 (en) System and method for identifying potential mergers and acquisitions
CN116129640B (en) Data management method and device for road data
CN114239517A (en) Data processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination