CN115952173A - Passenger flow data processing method and device, big data platform and storage medium - Google Patents

Passenger flow data processing method and device, big data platform and storage medium Download PDF

Info

Publication number
CN115952173A
CN115952173A CN202310232929.6A CN202310232929A CN115952173A CN 115952173 A CN115952173 A CN 115952173A CN 202310232929 A CN202310232929 A CN 202310232929A CN 115952173 A CN115952173 A CN 115952173A
Authority
CN
China
Prior art keywords
passenger flow
data
target
inbound
afc
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310232929.6A
Other languages
Chinese (zh)
Inventor
王雨
刘军
张�杰
张波
张晚秋
李擎
王舟帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CRSC Research and Design Institute Group Co Ltd
Original Assignee
CRSC Research and Design Institute Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CRSC Research and Design Institute Group Co Ltd filed Critical CRSC Research and Design Institute Group Co Ltd
Priority to CN202310232929.6A priority Critical patent/CN115952173A/en
Publication of CN115952173A publication Critical patent/CN115952173A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a passenger flow data processing method, a passenger flow data processing device, a big data platform and a storage medium, wherein the method comprises the following steps: acquiring AFC passenger flow data of an automatic ticket checking system; extracting target inbound data from the AFC passenger flow data, and generating an inbound passenger flow table according to the target inbound data; extracting target outbound data from the AFC passenger flow data, and generating an outbound passenger flow table according to the target outbound data; and generating a passenger flow statistical table based on the inbound passenger flow table and the outbound passenger flow table. According to the method, the AFC passenger flow data are quickly extracted and generated through the big data platform, the passenger flow statistical table which can be used for data application is generated, the problems that the data table is low in generation speed, low in efficiency and difficult to mine are solved, the statistics and analysis efficiency of the passenger flow data is improved, and the rail transit big data application can be supported.

Description

Passenger flow data processing method and device, big data platform and storage medium
Technical Field
The invention relates to the technical field of big data, in particular to a passenger flow data processing method and device, a big data platform and a storage medium.
Background
When passengers take the rail transit to travel by using the smart card, the System generally relates to an Automatic Fare Collection (AFC) System of urban rail transit, and a terminal electronic reader of the AFC System obtains detailed information of the passengers by swiping the smart card by the passengers to generate regional rail transit passenger flow data, namely AFC data.
In the related art, the main flow for processing AFC data is to simply count the amount of incoming and outgoing stations by a computer and then generate a relevant table. The AFC data processing technology is not only slow in processing speed and low in efficiency, but also cannot process a large amount of data in real time and deeply mine the data.
Disclosure of Invention
The invention provides a passenger flow data processing method, a passenger flow data processing device, a big data platform and a storage medium, which solve the problem that AFC passenger flow data can not be processed in real time and efficiently.
According to one aspect of the invention, a passenger flow data processing method is provided and applied to a big data platform, and the method comprises the following steps:
acquiring AFC passenger flow data of an automatic ticket checking system;
extracting target inbound data from the AFC passenger flow data, and generating an inbound passenger flow table according to the target inbound data;
extracting target outbound data from the AFC passenger flow data, and generating an outbound passenger flow table according to the target outbound data;
and generating a passenger flow statistical table based on the inbound passenger flow table and the outbound passenger flow table.
According to another aspect of the present invention, there is provided a passenger flow data processing apparatus applied to a big data platform, the apparatus comprising:
the passenger flow data acquisition module is used for acquiring AFC passenger flow data of the automatic ticket checking system;
the inbound passenger flow table generating module is used for extracting target inbound data from the AFC passenger flow data and generating an inbound passenger flow table according to the target inbound data;
the outbound passenger flow table generating module is used for extracting target outbound data from the AFC passenger flow data and generating an outbound passenger flow table according to the target outbound data;
and the passenger flow statistical table generating module is used for generating a passenger flow statistical table based on the inbound passenger flow table and the outbound passenger flow table.
According to another aspect of the present invention, there is provided a big data platform, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform a passenger flow data processing method according to any of the embodiments of the invention.
According to another aspect of the present invention, a computer-readable storage medium is provided, which stores computer instructions for causing a processor to implement a passenger flow data processing method according to any one of the embodiments of the present invention when executed.
According to the technical scheme, the method and the system for automatically checking the ticket for the passenger flow comprise the steps of obtaining AFC passenger flow data of an automatic ticket checking system, extracting target inbound data from the AFC passenger flow data, generating an inbound passenger flow table according to the target inbound data, extracting target outbound data from the AFC passenger flow data, generating an outbound passenger flow table according to the target outbound data, and extracting data of corresponding fields into a passenger flow statistics table based on the target inbound passenger flow table and the target outbound passenger flow table. According to the technical scheme, the AFC passenger flow data are quickly extracted from the AFC passenger flow data through the big data platform, the passenger flow statistical table which can be used for data application is generated, the problems of low data table generation speed and low efficiency are solved, the statistics and analysis efficiency of the passenger flow data is improved, and the big data application of rail transit can be supported.
It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present invention, nor are they intended to limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a passenger flow data processing method according to an embodiment of the present invention;
FIG. 2 is a flow chart of OD pairing for generating a passenger flow statistics table, which is suitable for use in accordance with an embodiment of the present invention;
fig. 3 is a flowchart of a passenger flow data processing method according to a second embodiment of the present invention;
FIG. 4 is a comparison chart of AFC passenger flow data format conversion before and after provided according to the second embodiment of the invention;
FIG. 5 is a flow chart of AFC passenger flow data cleaning provided according to the second embodiment of the present invention;
FIG. 6 is a schematic diagram of AFC passenger flow data for cleaning AFC passenger flow data according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating inbound data for a second target according to a second embodiment of the present invention;
FIG. 8 is a diagram illustrating the display of outbound data of a target according to the second embodiment of the present invention;
FIG. 9 is an AFC passenger flow data processing flow diagram suitable for use in accordance with an embodiment of the present invention;
fig. 10 is a schematic structural diagram of a passenger flow data processing device according to a third embodiment of the present invention;
fig. 11 is a schematic structural diagram of an electronic device implementing the passenger flow data processing method according to the embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example one
Fig. 1 is a flowchart of a passenger flow data processing method according to an embodiment of the present invention, where the present embodiment is applicable to a case of processing AFC passenger flow data, and the method may be executed by a passenger flow data processing device, where the passenger flow data processing device may be implemented in a form of hardware and/or software, and the passenger flow data processing device may be configured in any large data platform with a communication function. As shown in fig. 1, the method includes:
and S110, acquiring AFC passenger flow data of the automatic ticket checking system.
The passenger flow data may be data generated by one automatic ticket checking system AFC or some automatic ticket checking systems AFC when passengers board rail transit, and the passenger flow data may include: the information of the intelligent card ID, the transaction date, the transaction time, the ticket type, the station of entering the station, the station of leaving the station, the transaction equipment number, the balance before the transaction, the balance after the transaction, the spent amount and the like.
Specifically, the passenger flow data is acquired from an automatic ticket checking system (AFC) by establishing an index condition.
Illustratively, AFC data is acquired from rail transit operation enterprises in batches, and the data is segmented and subjected to format conversion operation to generate data consistent with the example attribute data in table 1, where table 1 is a structural table of AFC passenger flow data, and the example fields in table 1 can be looked at by the example contents of AFC passenger flow data.
Table 1: structure table of AFC passenger flow volume data
Figure SMS_1
For example, the AFC passenger flow data has a large data volume, and a big data platform for performing data processing, analysis and mining on the AFC passenger flow data may be a platform built based on a Hadoop framework and a big data computing framework Spark. The big data platform can be divided into five layers of architectures according to functions, and the five layers of architectures respectively comprise: the system comprises a data source layer, a data storage layer, a resource management layer, a data calculation layer and a data service layer. The data storage layer, the resource management layer and the data calculation layer are realized by building a Hive on spark parallel calculation architecture, and the data source layer is a Shell script written according to the actual situation of a data provider and specific scenes.
And S120, extracting target inbound data from the AFC passenger flow data, and generating an inbound passenger flow table according to the target inbound data.
The inbound passenger flow table may be a two-dimensional table for storing data-processed target inbound data, and the target inbound data may be data generated when passengers perform inbound operations, and may be divided into several types of attribute information according to different fields such as inbound time. For example, passenger flow data including one or more destination entry points may be extracted from AFC passenger flow data as destination entry data, passenger flow data whose entry time is within a certain preset time interval may be extracted from AFC passenger flow data as destination entry data, and passenger flow data whose entry time is within a certain preset time interval may be extracted from AFC passenger flow data as destination entry data.
Specifically, target inbound data is extracted from AFC passenger flow data and stored in texts in two formats of CSV and TXT, the data is divided line by using commas and other separators, target inbound data with different attribute information or different columns is preliminarily formed, an inbound passenger flow table is created according to the attribute information, and the target inbound data is stored in the inbound passenger flow table corresponding to fields of the target inbound data and fields of the inbound passenger flow table.
For example, AFC passenger flow data is obtained from a rail transit operation enterprise in bulk, where the AFC passenger flow data may be an offline data packet, the offline data packet may be a file in two text formats, i.e., a file in a CSV format and a file in a TXT format, and the file in the two text formats needs to be arranged, for example, the file in the CSV format is divided by an english comma "," each line of data in the file is divided, a line at the end of each line of data is changed by CRLF, the type of the file is converted into UTF-8, and the arranged file data in the CSV text format may be represented as: 010125503,20180901,175106,0104,52.45,4918,01044101,2018/9/118, extracting target inbound data and acquiring inbound passenger flow tables.
S130, extracting target outbound data from the AFC passenger flow data, and generating an outbound passenger flow table according to the target outbound data.
The outbound passenger flow table may be a two-dimensional table for storing data-processed target outbound data, the target outbound data may be data generated when passengers perform outbound operations, and the target outbound data may be divided into different attribute information according to different fields such as outbound time. For example, passenger flow data including one or more destination exit stations may be extracted from AFC passenger flow data as destination exit data, passenger flow data whose exit time is within a certain preset time interval may be extracted from AFC passenger flow data as destination exit data, and passenger flow data whose exit time is within a certain preset time interval may be extracted from AFC passenger flow data as destination exit data.
Specifically, in the analogy step 120, target outbound data is extracted from AFC passenger flow data, stored in a text in a CSV or TXT format, and the data is divided line by using a separator such as a comma or the like, so as to preliminarily form target outbound data having different attribute information or different columns, where the attribute information corresponds to keyword information of the outbound passenger flow table, and create the outbound passenger flow table, and the target outbound data is stored in the outbound passenger flow table line by line according to the keyword information, so as to generate the target outbound data.
S140, generating a passenger flow statistical table based on the inbound passenger flow table and the outbound passenger flow table.
Specifically, according to the step 120 and the step 130, the inbound passenger flow table and the outbound passenger flow table are obtained, the screening condition is specified, the data corresponding to the fields of the passenger flow statistics table are extracted, and the passenger flow statistics table is generated.
Optionally, the passenger flow statistics table includes a time section passenger flow statistics table.
The time section passenger flow statistical table can be a statistical table formed by passenger flow data of which the arrival time or the departure time is within a preset time period. Whether the passenger flow volume data is in the preset time period or not is judged, and the judgment can be carried out according to whether the station entering and exiting time of the passenger flow volume data is in the preset time period or not.
As an optional but non-limiting implementation, generating a passenger flow statistics table based on the inbound passenger flow table and the outbound passenger flow table may include: and respectively extracting first target passenger flow data of which the station-entering time and the station-exiting time are within a preset time interval from the station-entering passenger flow table and the station-exiting passenger flow table, and generating a time section passenger flow statistical table according to the first target passenger flow data.
Specifically, passenger flow data that the inbound time and the outbound time are both within a preset time interval can be extracted from the inbound passenger flow table and the outbound passenger flow table respectively, and the data is used as first target passenger flow data, and the first target passenger flow data is stored in a pre-established statistical table to generate a time section passenger flow statistical table.
Optionally, the passenger flow statistics table includes an interval section passenger flow statistics table.
The section cross section passenger flow statistical table can be a statistical table composed of passenger flow data of the entrance and exit stations in a certain station section. It is understood that the section is a part of the section of a certain station. The passenger flow volume statistical table is used for counting passenger flow data generated by the train in a preset driving interval.
As an optional but non-limiting implementation, generating a passenger flow statistics table based on the inbound passenger flow table and the outbound passenger flow table may include: and respectively extracting second target passenger flow data of the station in and out in a preset station interval from the station-entering passenger flow table and the station-exiting passenger flow table, and generating an interval section passenger flow statistical table according to the second target passenger flow data.
Specifically, passenger flow data of the station in and out of a preset station interval is extracted from the station-entering passenger flow table, the passenger flow data of the station in and out of the preset station interval is extracted from the station-exiting passenger flow table, the two parts of data are used as second target passenger flow data, the second target passenger flow data are stored in a pre-established statistical table, and an interval section passenger flow statistical table is generated.
Illustratively, table 2 is a structure table of the section passenger flow table, and the section passenger flow table stores data date, section number, station number of entering station, station number of leaving station, start time, end time, and relevant data of passenger flow table fields.
Table 2: structure table of passenger flow meter with interval section
Figure SMS_2
Optionally, the passenger flow statistics table includes a start point OD passenger flow statistics table.
The origin and destination OD passenger flow statistical table can be a two-dimensional table formed by counting passenger flow data including the number of the station entering the station as the preset station entering the station and the number of the station leaving the station as the preset station leaving the station.
As an optional but non-limiting implementation manner, generating a passenger flow statistics table based on the inbound passenger flow table and the outbound passenger flow table may include: and respectively extracting third target passenger flow data with the inbound station as a target inbound station and the outbound station as a target outbound station from the inbound passenger flow table and the outbound passenger flow table, and generating an OD passenger flow statistical table according to the third target passenger flow data.
Specifically, a mapping relationship between the inbound passenger flow table and the outbound passenger flow table is established according to conditions between the inbound passenger flow table and the outbound passenger flow table, and the conditions between the inbound passenger flow table and the outbound passenger flow table include: and extracting third target passenger flow data from the table corresponding to the mapping relation and storing the third target passenger flow data in a statistical table to generate an OD passenger flow statistical table, wherein the IDs of the smart cards between the inbound passenger flow table and the outbound passenger flow table are consistent, the balance between the inbound passenger flow table and the outbound passenger flow table before transaction is consistent, the time for receiving the information of the inbound passenger flow table is less than the time for receiving the information of the outbound passenger flow table, and the station information of the inbound passenger flow table and the station information of the outbound passenger flow table are different. The third target passenger flow data may be passenger flow data in which the destination extracted from the inbound passenger flow table and the outbound passenger flow table is a target destination, and the outbound is a target outbound.
For example, the table structure of the OD traffic statistics table is shown in table 3, and the fields of the OD traffic statistics table may include: data date, inbound station number, outbound station number, and passenger flow volume.
Table 3: structure table of OD passenger flow statistical table
Figure SMS_3
/>
For example, fig. 2 is a flowchart of OD pairing, and as shown in fig. 2, the OD pairing operation generates a passenger flow statistics table, and data applications of the passenger flow statistics table may be analysis and calculation work such as passenger flow distribution, passenger flow prediction, train trip policy maintenance, and the like.
The embodiment of the invention provides a passenger flow data processing method, which comprises the steps of acquiring AFC passenger flow data of an automatic ticket checking system, extracting target inbound data from the AFC passenger flow data, generating an inbound passenger flow table according to the target inbound data, extracting target outbound data from the AFC passenger flow data, generating an outbound passenger flow table according to the target outbound data, and extracting data of corresponding fields into a passenger flow statistical table based on the target inbound passenger flow table and the target outbound passenger flow table. According to the technical scheme, the AFC passenger flow data are quickly extracted from the AFC passenger flow data through the big data platform, the passenger flow statistical table which can be used for data application is generated, the problems of low data table generation speed and low efficiency are solved, the statistics and analysis efficiency of the passenger flow data is improved, and the big data application of rail transit can be supported.
Example two
Fig. 3 is a flowchart of a passenger flow data processing method according to a second embodiment of the present invention, and this embodiment describes in detail an application flow of the passenger flow data processing method based on the second embodiment. As shown in fig. 3, the method includes:
and S210, acquiring AFC passenger flow data of the automatic ticket checking system.
Specifically, the passenger flow data is acquired from an automatic ticket checking system (AFC) by establishing an index condition.
And S220, converting the format of the AFC passenger flow data according to a preset data format, and storing the AFC passenger flow data after format conversion in a distributed file HDFS system.
The Distributed File System HDFS (Hadoop Distributed File System) may be a simplified System that supports large-scale File storage and is suitable for data backup.
Specifically, format conversion is carried out on AFC passenger flow data according to a preset data coding format, and the AFC passenger flow data in the preset data format is stored in the distributed file HDFS system.
Illustratively, the relationship between the big data platform and the AFC system may be established through a secure file transfer protocol, the big data platform and the AFC system send corresponding commands to each other, the AFC passenger flow data is transmitted to the big data platform, the format of the AFC passenger flow data is converted into UTF-8 according to the strong processing performance of the big data platform, fig. 4 is a comparison graph before and after the AFC passenger flow data format conversion, as shown in fig. 4, the information scrambling problem can be eliminated by the data format conversion, thereby avoiding the chinese scrambling problem of the AFC passenger flow data, and the AFC passenger flow data in the UTF-8 format is stored in the distributed file system HDFS.
And S230, extracting target inbound data from the AFC passenger flow data, and generating an inbound passenger flow table according to the target inbound data.
Specifically, target inbound data is extracted from AFC passenger flow data according to preset conditions, an inbound passenger flow table is created according to the field type of the target inbound data, and the target inbound data is stored in the inbound passenger flow table.
As an optional but non-limiting implementation manner, extracting target inbound data from the AFC passenger flow data may include:
and reading the AFC passenger flow data after format conversion from the HDFS system, and extracting target inbound data from the AFC passenger flow data after format conversion.
Specifically, AFC passenger flow data in UTF-8 format in HDFS system is read, and target inbound data is extracted from AFC passenger flow data in UTF-8 format
As an optional but non-limiting implementation manner, reading AFC passenger flow data after format conversion from the HDFS system, and extracting target arrival data from the AFC passenger flow data after format conversion may include steps A1-A2:
step A1, reading AFC passenger flow data after format conversion from the HDFS system, and filtering abnormal passenger flow data in the AFC passenger flow data after format conversion to generate target passenger flow data; the abnormal passenger flow data is passenger flow data meeting preset abnormal conditions.
Specifically, it is ensured that the format of AFC passenger flow data is completely converted into the format data of UTF-8, each field of AFC passenger flow data in UTF-8 format needs to be divided according to a separator, each line of data is divided by a line separator, and data meeting preset abnormal conditions is filtered by writing custom functions for filtering special symbols and time formats, respectively, to generate target passenger flow data.
Exemplarily, fig. 5 is a flow chart of AFC passenger flow data cleaning, as shown in fig. 5, it is required to ensure that the format of AFC passenger flow data is completely converted into the format data of UTF-8, the format conversion of data is to divide each field of AFC passenger flow data by "equal separator" according to the original data initial table oct in Hive, divide each line of data by \ n, introduce a HiveQL command for realizing this function into the HDFS system, perform filtering abnormality preset character and time format conversion operation on AFC passenger flow data, and extract necessary fields to complete AFC passenger flow data cleaning.
create table if not exists oct(
id string,
days string,
`time` string,
kind string,
action string,
station string,
beforebanlance string,
money string,
afterbanlance string,
transfer string,
counter string,
passageid string,
recivetime string)
row format delimited fields terminated by ','
lines terminated by '\n' stored as textfile;
In order to filter a special symbol, because a Hive built-in function does not have a function of filtering a specific character, a Hive custom function (User Define Functions, UDF) needs to be written, and the custom function is called in a HiveQL, in a specific application, a UDF _ clear _ data function is written for the special symbol, a colon is removed, a UDF _ clear _ date function is written for time format conversion, time format conversion is completed, a program is converted into a jar packet through packaging, hive is imported, the program is called in the HiveQL in a function form, data filtering is realized based on an original data initial table oct in Hive, data is extracted into a data month _10 in a select mode, and a code for realizing preset abnormal condition filtering is as follows:
create table if not exists month_10
as select
clean_data(oct.id) as id,
clean_data(oct.days) as days,
clean_data(oct.`time`) as `time`,
clean_data(oct.kind) as kind,
clean_data(oct.action) as action,
clean_data(oct.station) as station,
clean_data(oct.beforebanlance) as beforebanlance,
clean_data(oct.money) as money,
clean_data(oct.afterbanlance) as afterbanlance,
clean_data(oct.transfer) as transfer
clean_data(oct.counter) as counter,
clean_data(oct.passageid) as passageid,
clean_date(oct.recivetime) as recivetime
from oct;
the HiveQL executes the statement to generate the table month _10, and completes the data filtering operation, the partial content of the table month _10 is shown in FIG. 6, and the table month _10 is the real basic data which can be used for OD matching.
And A2, extracting target arrival data from the target passenger flow data.
Specifically, the target inbound data of the target passenger flow data is extracted by performing screening and extraction operation of corresponding fields.
Illustratively, target inbound data is screened from the target passenger flow data, and a field corresponding to the target inbound data is extracted from the target passenger flow data, because of the problem of Hive execution efficiency, the two operations are divided into two separate statements, where information such as an inbound station number is selected first, and the information is grouped and counted according to the information such as the inbound station number, so as to obtain the target inbound data, and codes for realizing the operations are as follows:
create table if not exists month_10_in
as SELECT id, king, station, receivetime, before FROM month _10 person action = "inbound" and (king = "0 liju card" or king = "98 one-way ticket");
create table if not exists in_flow as select station as in_station,count(*) as passenger from railway_oct_in group by station;
the partial data of the target inbound data month _10 _inand month _10 _ingenerated by executing the above-described codes is shown in fig. 7.
Optionally, the preset abnormal condition includes: at least one of a lack of default field data, a data format error, a presence of chinese character scrambling code, and a presence of a default data logic error.
Specifically, the preset exception condition is explained, and the condition of lacking the preset field data may be: data missing, incomplete, such as Null data or Null; the case of a data format error may be: the data format of AFC passenger flow data does not conform to the format of AFC generated data; the existing Chinese character messy codes can uniformly convert the AFC passenger flow data format into the UTF-8 format through the data format conversion of the AFC passenger flow data; the logic error of the preset data can be a logic error that one inbound in the AFC passenger flow data sheet corresponds to a plurality of outbound and the like.
S240, extracting target outbound data from the AFC passenger flow data, and generating an outbound passenger flow table according to the target outbound data.
Specifically, the target outbound data of the target passenger flow data is filtered and corresponding fields are extracted to obtain the target outbound data.
Illustratively, target outbound data is screened from the target passenger flow data, a field corresponding to the target outbound data is extracted from the target passenger flow data, the two operations can be divided into two separate statements, information such as outbound station numbers is selected by where, and the target outbound data can be obtained by grouping statistics according to the outbound station information, and codes for realizing the operations are as follows:
create table if not exists month_10_out
as SELECT id, king, station, receivetime, after floor FROM month _10 WHERE action = "outbound" and (king = "0 liju card" or king = "98 one-way ticket");
create table if not exists out_flow as select station as out_station,count(*) as passenger from railway_oct_out group by station;
and S250, generating a passenger flow statistical table based on the inbound passenger flow table and the outbound passenger flow table.
Specifically, the inbound passenger flow table and the outbound passenger flow table are used as data sources, and the passenger flow statistical table can be generated by specifying the screening conditions of the passenger flow statistical table.
For example, the screening condition for setting the passenger flow statistics table may be: the method comprises the following steps that the ID of an intelligent card between an inbound passenger flow table and an outbound passenger flow table is consistent, the balance before transaction between the inbound passenger flow table and the outbound passenger flow table is consistent, the time for receiving the information of the inbound passenger flow table is less than the time for receiving the information of the outbound passenger flow table, the station information of the inbound passenger flow table is unequal to the station information of the outbound passenger flow table, the data of a passenger flow statistical table is extracted from a mapping relation table between the inbound passenger flow table and the outbound passenger flow table, the passenger flow statistical table is generated, and the statement for completing the steps is:
create table if not exists month_10_od
as SELECT
month_10_in.id as in_id,
month_10_in.station as in_station,
month_10_in.recivetime as in_time,
month_10_out.station as out_station,
month_10_out.recivetime as out_time
FROM month_10_in,month_10_out
WHERE month_10_in.id==month_10_out.id
AND
month_10_in.beforebanlance==month_10_out.beforebanlance
AND month_10_in.recivetime < month_10_out.recivetime
AND month_10_in.station != month_10_out.station;
the partial data of the passenger flow statistics tables month _10_od, month_10_od generated by executing the above statements is shown in FIG. 8.
In this embodiment of the present application, fig. 9 illustrates a process of acquiring and processing AFC passenger flow data into application data, where AFC passenger flow data needs to be acquired, data format conversion and data extraction are performed on the AFC passenger flow data to obtain target inbound data and target outbound data, an inbound passenger flow table and an outbound passenger flow table are established, and the AFC passenger flow data are stored in a distributed file system, and the data is subjected to data cleaning and data conversion to obtain result data, which can be used for passenger flow allocation operation, passenger flow prediction operation, and analysis and calculation work for maintaining train travel policies.
The embodiment of the invention provides a passenger flow data processing method, which comprises the steps of obtaining AFC passenger flow data of an automatic ticket checking system, converting the AFC passenger flow data into a preset data format, storing the AFC passenger flow data by using an HDFS, extracting target inbound data from the AFC passenger flow data, generating an inbound passenger flow table according to the target inbound data, extracting target outbound data from the AFC passenger flow data, generating an outbound passenger flow table according to the target outbound data, extracting data of corresponding fields based on the target inbound passenger flow table and the target outbound passenger flow table, and generating a passenger flow statistical table. According to the technical scheme, AFC passenger flow data are obtained, after data processing, AFC passenger flow numbers are stored in an HDFS, a Hive on Spark data analysis framework is built, deep mining of data is conducted, a passenger flow statistical table which can be used for data application is extracted and generated, the problems that a data table is low in generation speed and low in efficiency are solved, passenger flow data statistics and analysis efficiency are improved, and the cost for deploying the application is low.
EXAMPLE III
Fig. 10 is a schematic structural diagram of a passenger flow data processing device according to a third embodiment of the present invention. As shown in fig. 10, the apparatus includes:
the passenger flow data acquisition module 310 is used for acquiring AFC passenger flow data of the automatic ticket checking system;
the inbound passenger flow table generating module 320 is configured to extract target inbound data from the AFC passenger flow data, and generate an inbound passenger flow table according to the target inbound data;
an outbound passenger flow table generating module 330, configured to extract target outbound data from the AFC passenger flow data, and generate an outbound passenger flow table according to the target outbound data;
a passenger flow statistics table generating module 340, configured to generate a passenger flow statistics table based on the inbound passenger flow table and the outbound passenger flow table.
Further, the passenger flow statistical table comprises a time section passenger flow statistical table;
further, the passenger flow statistics generation module 340 includes:
and the first statistical table generating unit is used for extracting first target passenger flow data with the station-entering time in a preset time interval from the station-entering passenger flow table and the station-exiting passenger flow table respectively, and generating a time section passenger flow statistical table according to the first target passenger flow data.
Further, the passenger flow statistical table comprises an interval section passenger flow statistical table;
further, the passenger flow statistics table generating module 340 includes:
and the second generation unit of the statistical table is used for extracting second target passenger flow data of the station entering and exiting in a preset station interval from the station entering passenger flow table and the station exiting passenger flow table respectively and generating an interval section passenger flow statistical table according to the second target passenger flow data.
Further, the passenger flow statistical table comprises a start point and stop point OD passenger flow statistical table;
further, the passenger flow statistics generation module 340 includes:
and a third generation unit of the statistical table, configured to extract third target passenger flow data with an inbound site being a target inbound site and an outbound site being a target outbound site from the inbound passenger flow table and the outbound passenger flow table, respectively, and generate an OD passenger flow statistical table according to the third target passenger flow data.
Further, the apparatus further comprises:
the passenger flow data storage module is used for carrying out format conversion on the AFC passenger flow data according to a preset data format and storing the AFC passenger flow data after the format conversion into a distributed file HDFS system;
further, the inbound passenger flow table generating module 320 includes:
and the target inbound data extraction unit is used for reading the AFC passenger flow data after format conversion from the HDFS system and extracting target inbound data from the AFC passenger flow data after format conversion.
Further, the target inbound data extracting unit is specifically configured to:
reading the AFC passenger flow data after format conversion from the HDFS system, and filtering abnormal passenger flow data in the AFC passenger flow data after format conversion to generate target passenger flow data; the abnormal passenger flow data are passenger flow data meeting preset abnormal conditions;
and extracting target arrival data from the target passenger flow data.
Further, the preset exception condition includes: at least one of a lack of default field data, a data format error, a presence of chinese character scrambling code, and a presence of a default data logic error.
The passenger flow data processing device provided by the embodiment of the invention can execute the passenger flow data processing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example four
FIG. 11 illustrates a schematic structural diagram of a large data platform 10 that may be used to implement embodiments of the present invention. A big data platform is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 11, the big data platform 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to the bus 14.
A number of components in the big data platform 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The processor 11 executes the respective methods and processes described above, such as the passenger flow data processing method.
In some embodiments, the passenger flow data processing method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the big data platform 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the passenger flow data processing method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the passenger flow data processing method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for implementing the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A passenger flow data processing method is applied to a big data platform and comprises the following steps:
acquiring AFC passenger flow data of an automatic ticket checking system;
extracting target inbound data from the AFC passenger flow data, and generating an inbound passenger flow table according to the target inbound data;
extracting target outbound data from the AFC passenger flow data, and generating an outbound passenger flow table according to the target outbound data;
and generating a passenger flow statistical table based on the inbound passenger flow table and the outbound passenger flow table.
2. The method of claim 1, wherein the traffic statistics table comprises a time profile traffic statistics table;
generating a passenger flow statistics table based on the inbound passenger flow table and the outbound passenger flow table, comprising:
and respectively extracting first target passenger flow data of which the station entering time is within a preset time interval from the station entering passenger flow table and the station exiting passenger flow table, and generating a time section passenger flow statistical table according to the first target passenger flow data.
3. The method of claim 1, wherein the passenger flow statistics table comprises a section profile passenger flow statistics table;
generating a passenger flow statistics table based on the inbound passenger flow table and the outbound passenger flow table, comprising:
and respectively extracting second target passenger flow data of the station in and out in a preset station interval from the station-entering passenger flow table and the station-exiting passenger flow table, and generating an interval section passenger flow statistical table according to the second target passenger flow data.
4. The method of claim 1, wherein the passenger flow statistics table comprises a start-stop OD passenger flow statistics table;
generating a passenger flow statistics table based on the inbound passenger flow table and the outbound passenger flow table, comprising:
and respectively extracting third target passenger flow data with the inbound station as a target inbound station and the outbound station as a target outbound station from the inbound passenger flow table and the outbound passenger flow table, and generating an OD passenger flow statistical table according to the third target passenger flow data.
5. The method of claim 1, further comprising, prior to extracting target inbound data from the AFC traffic data:
carrying out format conversion on the AFC passenger flow data according to a preset data format, and storing the AFC passenger flow data after format conversion in a distributed file HDFS system;
extracting target inbound data from the AFC passenger flow data, comprising:
and reading the AFC passenger flow data after format conversion from the HDFS system, and extracting target inbound data from the AFC passenger flow data after format conversion.
6. The method of claim 5, wherein reading the format-converted AFC traffic data from the HDFS system and extracting destination inbound data from the format-converted AFC traffic data comprises:
reading the AFC passenger flow data after format conversion from the HDFS system, and filtering abnormal passenger flow data in the AFC passenger flow data after format conversion to generate target passenger flow data; the abnormal passenger flow data are passenger flow data meeting preset abnormal conditions;
and extracting target arrival data from the target passenger flow data.
7. The method of claim 6, wherein the preset exception condition comprises: at least one of a lack of default field data, a data format error, a presence of chinese character scrambling code, and a presence of a default data logic error.
8. A passenger flow data processing device is characterized by being applied to a big data platform and comprising:
the passenger flow data acquisition module is used for acquiring AFC passenger flow data of the automatic ticket checking system;
the inbound passenger flow table generating module is used for extracting target inbound data from the AFC passenger flow data and generating an inbound passenger flow table according to the target inbound data;
the outbound passenger flow table generating module is used for extracting target outbound data from the AFC passenger flow data and generating an outbound passenger flow table according to the target outbound data;
and the passenger flow statistical table generating module is used for generating a passenger flow statistical table based on the inbound passenger flow table and the outbound passenger flow table.
9. A big data platform, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the passenger flow data processing method of any one of claims 1-7.
10. A computer-readable storage medium, characterized in that it stores computer instructions for causing a processor to carry out, when executed, the method of passenger flow data processing according to any one of claims 1-7.
CN202310232929.6A 2023-03-13 2023-03-13 Passenger flow data processing method and device, big data platform and storage medium Pending CN115952173A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310232929.6A CN115952173A (en) 2023-03-13 2023-03-13 Passenger flow data processing method and device, big data platform and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310232929.6A CN115952173A (en) 2023-03-13 2023-03-13 Passenger flow data processing method and device, big data platform and storage medium

Publications (1)

Publication Number Publication Date
CN115952173A true CN115952173A (en) 2023-04-11

Family

ID=87297810

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310232929.6A Pending CN115952173A (en) 2023-03-13 2023-03-13 Passenger flow data processing method and device, big data platform and storage medium

Country Status (1)

Country Link
CN (1) CN115952173A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106779429A (en) * 2016-12-25 2017-05-31 北京工业大学 Track transfer website passenger flow congestion risk evaluating method based on AFC brushing card datas
CN107358319A (en) * 2017-06-29 2017-11-17 深圳北斗应用技术研究院有限公司 Flow Prediction in Urban Mass Transit method, apparatus, storage medium and computer equipment
CN109961164A (en) * 2017-12-25 2019-07-02 比亚迪股份有限公司 Passenger flow forecast method and device
CN110782060A (en) * 2018-07-31 2020-02-11 上海宝信软件股份有限公司 Rail transit section passenger flow short-time prediction method and system based on big data technology
CN111932429A (en) * 2020-10-14 2020-11-13 中国矿业大学(北京) OD-based rail transit station passenger flow structure similarity analysis method and device
CN113850417A (en) * 2021-08-27 2021-12-28 浙江浙大中控信息技术有限公司 Passenger flow organization decision-making method based on station passenger flow prediction
CN114399099A (en) * 2021-12-30 2022-04-26 中信云网有限公司 Method and device for predicting passenger flow of urban rail transit section
WO2022134692A1 (en) * 2020-12-25 2022-06-30 卡斯柯信号有限公司 Intelligent scheduling method and system for rail transit

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106779429A (en) * 2016-12-25 2017-05-31 北京工业大学 Track transfer website passenger flow congestion risk evaluating method based on AFC brushing card datas
CN107358319A (en) * 2017-06-29 2017-11-17 深圳北斗应用技术研究院有限公司 Flow Prediction in Urban Mass Transit method, apparatus, storage medium and computer equipment
CN109961164A (en) * 2017-12-25 2019-07-02 比亚迪股份有限公司 Passenger flow forecast method and device
CN110782060A (en) * 2018-07-31 2020-02-11 上海宝信软件股份有限公司 Rail transit section passenger flow short-time prediction method and system based on big data technology
CN111932429A (en) * 2020-10-14 2020-11-13 中国矿业大学(北京) OD-based rail transit station passenger flow structure similarity analysis method and device
WO2022134692A1 (en) * 2020-12-25 2022-06-30 卡斯柯信号有限公司 Intelligent scheduling method and system for rail transit
CN113850417A (en) * 2021-08-27 2021-12-28 浙江浙大中控信息技术有限公司 Passenger flow organization decision-making method based on station passenger flow prediction
CN114399099A (en) * 2021-12-30 2022-04-26 中信云网有限公司 Method and device for predicting passenger flow of urban rail transit section

Similar Documents

Publication Publication Date Title
CN113342564A (en) Log auditing method and device, electronic equipment and medium
CN112560468A (en) Meteorological early warning text processing method, related device and computer program product
CN106844471A (en) A kind of electronic operation and maintenance system form data processing method and server
CN115509797A (en) Method, device, equipment and medium for determining fault category
CN113742174A (en) Cloud mobile phone application monitoring method and device, electronic equipment and storage medium
CN113535379A (en) Power transformation edge calculation method, system and equipment based on Internet of things
CN115048352B (en) Log field extraction method, device, equipment and storage medium
CN106708869B (en) Group data processing method and device
CN115952173A (en) Passenger flow data processing method and device, big data platform and storage medium
CN106372969A (en) Power user feature identification method and system
CN116074183A (en) C3 timeout analysis method, device and equipment based on rule engine
CN115934550A (en) Test method, test device, electronic equipment and storage medium
CN113986710A (en) Big data platform monitoring method and device
CN114722048A (en) Data processing method and device, electronic equipment and storage medium
CN115034927A (en) Data processing method and device, electronic equipment and storage medium
CN114490990A (en) Method, device and equipment for determining text to be marked and storage medium
CN115082179A (en) Data processing method, device, equipment and storage medium
CN112906930A (en) Site cargo quantity prediction method, device, equipment and storage medium
CN114615168B (en) Application level monitoring method and device, electronic equipment, storage medium and product
CN117112162B (en) Data processing method, device, equipment and storage medium
CN117632120A (en) Processing system, method, equipment and storage medium for report data
CN115599734A (en) Data acquisition method and device, electronic equipment and storage medium
CN115238695A (en) Text information extraction method, device, equipment and storage medium
CN115965276A (en) Index set determination method and device, electronic equipment and storage medium
CN115658026A (en) Service processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20230411