CN111680075A - Hadoop + Spark traffic prediction system and method based on combination of offline analysis and online prediction - Google Patents

Hadoop + Spark traffic prediction system and method based on combination of offline analysis and online prediction Download PDF

Info

Publication number
CN111680075A
CN111680075A CN202010298397.2A CN202010298397A CN111680075A CN 111680075 A CN111680075 A CN 111680075A CN 202010298397 A CN202010298397 A CN 202010298397A CN 111680075 A CN111680075 A CN 111680075A
Authority
CN
China
Prior art keywords
traffic
data
prediction
hadoop
spark
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010298397.2A
Other languages
Chinese (zh)
Inventor
张红
王文婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lanzhou University of Technology
Original Assignee
Lanzhou University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lanzhou University of Technology filed Critical Lanzhou University of Technology
Priority to CN202010298397.2A priority Critical patent/CN111680075A/en
Publication of CN111680075A publication Critical patent/CN111680075A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/256Integrating or interfacing systems involving database management systems in federated or virtual databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a Hadoop + Spark traffic prediction system based on combination of off-line analysis and on-line prediction, which comprises the storage and management of Hadoop platform traffic big data, Spark system traffic real-time data analysis, traffic flow prediction and traffic flow prediction application. According to the comprehensive traffic big data processing platform based on the Hadoop + Spark architecture, a working mechanism of a MapReduce Distributed architecture of the Hadoop cloud platform, a Distributed File storage (HDFS) principle and a working process of Spark based on memory computing are researched, and according to the strong real-time requirement of traffic flow prediction, the comprehensive traffic big data processing platform based on the Hadoop + Spark architecture is established. Aiming at the characteristics of traffic big data, the platform is optimized, so that the traffic big data can be quickly analyzed, a traffic data preprocessing method based on the big data platform is researched, and high-quality data support is provided for traffic characteristic analysis and traffic flow prediction.

Description

Hadoop + Spark traffic prediction system and method based on combination of offline analysis and online prediction
Technical Field
The invention relates to the field of traffic big data analysis and traffic prediction, in particular to a Hadoop-based traffic big data platform construction and a traffic flow prediction system architecture based on the Hadoop-based traffic big data platform construction, and provides a research platform and technical application development for analysis and pretreatment of traffic big data and short-time prediction of urban traffic flow.
Background
The traffic data is various in types, large in scale, dynamically variable, large in space-time span, high in randomness and heterogeneity, and is a representative typical sample in the concept category of big data. How to efficiently and quickly analyze the data, mine useful information and provide a data basis for traffic state analysis and traffic flow prediction is a necessary condition for improving the urban traffic flow prediction precision and instantaneity under the background of big data. A traffic big data processing platform is built based on data management and analysis technologies such as big data, distributed parallel computation, data mining and the like, a traffic flow prediction system based on the big data platform is built, and the method is a prerequisite for researching urban traffic flow prediction methods under the big data background. The traffic big data provides rich data sources for traffic flow prediction, and a big data analysis platform based on distributed parallel computing provides powerful technical support for deep mining and efficient analysis of the traffic big data.
The current situation of traffic big data research suggests the development of advanced information communication technology and intelligent information acquisition and perception technology, so that a large amount of traffic related data are accumulated in the traffic field. The data bring huge innovation for the further development and promotion of intelligent traffic, and avoid the defects brought to traffic analysis by the traditional single monitoring mode, such as that the loop coil can only detect a single lane, the microwave can not detect low-speed vehicles, the video is greatly influenced by the environment, and the mobile detection is restricted by the communication technology. The traffic big data is a basic guarantee for realizing intelligent traffic, and the traffic analysis of multi-source information fusion can better combine the advantages of various detection modes and improve the accuracy and robustness of traffic information and state detection.
Disclosure of Invention
The technical problem is as follows: the invention provides a system method for comprehensively analyzing and predicting a traffic state in a multidimensional way, which is evolved from a linear normal form taking a management flow as a main part to a flat normal form taking data as a center and promotes the fusion analysis of traffic big data to be carried out aiming at the problems in the traditional traffic management and decision making. The innovation is mainly represented in the following aspects:
1. high real-time efficiency. The traditional data analysis technology and algorithm are not suitable for a big data processing mode and cannot meet the real-time requirement of traffic information service, the big data technology can quickly analyze and process traffic big data through distributed parallel processing, the efficiency of data query and analysis is greatly improved, second-level response is provided, the internal association rules hidden in the data can be quickly excavated from mass traffic data, traffic abnormity can be found in time, the crux is positioned, reasonable traffic operation is induced, and the traffic operation efficiency and the traffic capacity of a road network are improved.
2. And (4) distribution comprehensiveness. Most of traditional traffic applications are mostly single-table mining analysis based on single-source data, once cross-table association based on the multi-source data is involved, the efficiency problem cannot be overcome, distributed parallel processing of big data is good at complex block table association analysis, multi-source data and multi-angle analysis problems can be fused, data series and parallel association is promoted, data processing capacity and multi-dimensional deep analysis problem capacity are improved, and traffic flow evolution rules can be deeply analyzed.
3. Accurate and predictive. The short-time traffic flow prediction based on big data can reduce the probability of false report and missed report of the traffic jam state, and by establishing a monitoring and predicting model of the regional traffic state, the traffic operation related data and road condition environment data are shared, the traffic dynamics is monitored in real time in multiple directions, the traffic state change is accurately predicted, drivers and travelers are helped to know the traffic jam state in advance, the jammed road sections are avoided, and the road traffic capacity is improved.
The technical scheme is as follows: in order to achieve the purpose, the invention provides a Hadoop + Spark traffic prediction system and a method based on combination of off-line analysis and on-line prediction, which adopt the currently popular Hadoop/MapReduce adopted by a plurality of large IT companies as an analysis platform of traffic historical data, adopt Spark with high-efficiency calculation and strong fault tolerance as an analysis and prediction modeling tool of real-time traffic flow data, and have the overall structure shown in figure 1 and mainly comprise a data source, traffic big data storage, traffic data analysis and prediction application.
1) Hadoop traffic big data platform
Hadoop is an open-source distributed computing framework integrating distributed computing, storage and management, provides stable and reliable interfaces for application programs through a cluster consisting of a large number of common computers, and constructs a high-reliability and strong fault-tolerant large-data distributed storage and computing system which is scalable and extensible. The core components of the system are a distributed file system HDFS and a distributed parallel computing architecture MapReduce, and the system also comprises a series of big data tools established on the system, such as Hadoop YARN, Chukwa, HBase, Hive, Mahout, Pig, Spark, ZooKeeper and the like, which are collectively called as a Hadoop ecosystem, and see FIG. 2.
The Hadoop cluster generally consists of three parts, namely a client (JobClient), a Master node (Master) and a Slave node (Slave), and the whole body presents a Master-Slave architecture (Master/Slave), and the mutual cooperation principle of the three parts is shown in fig. 3. Wherein, Job Client is used for submitting operations such as traffic data preprocessing and analysis and copying resources related to the operations; the Master manages and maintains the distributed storage of the whole traffic data, and monitors the MapReduce task related to the operation analysis; the Slave is used for actual storage of traffic data and data processing tasks; the Job Tracker receives requests of new operations such as traffic data analysis or predictive modeling, creates operation objects, encapsulates related tasks, states and progress generated in the operation process of one operation, and distributes specific execution tasks for the TaskTracker; the Task Tracker is used for monitoring and managing the operation condition of the jobs on each node, copying JAR files (including JAR package files of third parties) related to the localization jobs, and creating new instance execution tasks.
The HDFS is an open-source implementation of Google File System (GFS), can realize high-throughput parallel access and distributed storage of traffic big data, and provides high-performance, strong fault-tolerant, and highly reliable traffic big data rapid analysis and modeling, and its internal execution flow is shown in fig. 4. HDFS adopts a master-slave mode of operation, the NameNode node realizes the management of metadata files, the DataNode node is used for storing actual traffic data, and the NameNode node and the DataNode node realize mutual communication through a remote process call mechanism of Hadoop.
MapReduce is a device capable of processing large-scale dataThe collective parallel programming model can execute parallel computing tasks on a Hadoop cluster consisting of hundreds of ordinary PCs, and the operation execution flow is shown in FIG. 5. MapReduce distributes data analysis or modeling tasks to each data node to carry out sub tasks such as analysis mining and calculation of traffic big data, abstracts a parallel calculation process operated in a large-scale cluster into two stages of Map (mapping) and Reduce (protocol), and decomposes the whole calculation task into a plurality of sub calculation tasks in the Map stage, which is substantially characterized in that a group of key value pairs is less than key1,value1Mapping into a set of new intermediate key-value pairs < key2,value2The Reduce stage receives the output of the Map function, aggregates the value values of the same key value in a plurality of output results, and uses the key value pair < key3,value3Output in the form of > map phase and reduce phase may be repeated.
2) Spark real-time computing platform
Hadoop/MapReduce is a batch processing process, is good at off-line analysis of historical traffic big data, and is not suitable for analysis and prediction of real-time traffic data. Spark is a big data distributed computing framework based on memory computing, which can provide faster data analysis and prediction results, but consumes more memory. Therefore, the invention establishes a system architecture combining offline analysis and online prediction of Hadoop + Spark, provides a traffic prediction one-stop solution based on elastic distributed data sets (RDD) through Spark, realizes quick calculation of traffic real-time data, interactive query of historical traffic modes (Ad-hoc Queries), stream calculation (Streaming computing) and the like by Spark architecture, realizes seamless integration of all processing parts in a memory through a consistent Application Programming Interface (API) and the same deployment scheme, cooperatively completes the overall task of the system, avoids excessive network and disk I/O (input/output) overhead in the calculation process, and improves the real-time of traffic flow prediction under a big data background.
The Spark real-time traffic flow data analysis and prediction system mainly comprises streaming analysis of real-time traffic flow data, job task scheduling, memory management, Spark SQL, Spark MLlib and the like, and is shown in FIG. 6. The client submits analysis and prediction operation of real-time traffic data through a Spark driver; the resource management layer provides efficient traffic data management and data sharing functions for Spark through the YARN, and the overall resource utilization efficiency of the system is improved; real-time traffic data is received and analyzed through Spark stream calculation, and a Distributed elastic data set (RDD) and memory calculation are formed through micro batch processing of the flow data to improve the real-time performance of traffic data analysis; spark SQL and Mllib provide a traffic history operation mode for users, establish comprehensive query fields and establish a rapid prediction model.
Drawings
FIG. 1 is a Hadoop + Spark ensemble analysis and prediction system employed in the present invention;
FIG. 2 is an illustration of the Hadoop + Spark ecosystem employed in the present invention;
FIG. 3 is a Hadoop + Spark cluster architecture employed by the present invention;
FIG. 4 is a flow chart of HDFS traffic data read-write adopted by the present invention;
FIG. 5 is a MapReduce job execution flow adopted by the present invention;
FIG. 6 is a Spark real-time traffic flow data analysis and prediction system employed in the present invention;
the specific implementation method comprises the following steps:
in order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention, and the protection scope of the present invention is not limited thereby.
The platform established by the invention adopts a fully distributed operation mode, and the distributed storage and parallel computing capability of the platform are enhanced by using a VMWare virtual technology. The platform construction process is divided into three key stages for explanation, namely an early preparation stage, a Hadoop installation configuration stage, and a Hadoop + Spark installation and starting stage.
And the early preparation stage is mainly used for finishing the setting of the running environment, the setting of the cluster nodes and the preparation of related software. Because the unique support system of Hadoop is Unix, most of the applications run on Windows platforms at present, in order to not influence the existing applications, the embodiment builds a Unix virtual machine based on Windows environment, constructs a small cluster constructed by 4 associated ordinary personal computers (Intel (R) core (TM) i5-3210M CPU @2.50GHZ 2.50GHZ, 4G memory and Windows 7 flagship edition 64 bits), sets host names and corresponding IP addresses in a table 1, modifies hosts files of each host, and configures clock synchronization, network environment, password-free login and firewall closing. The relevant software and versions are shown in table 2.
TABLE 1 Cluster node setup
Figure BDA0002453073430000051
TABLE 2 software version and Main Functions
Figure BDA0002453073430000052
The method has the advantages that the correct installation and configuration of software of each version are the key for building the platform for the installation and configuration steps of Hadoop software, and the platform relates to the distributed coordination work of a plurality of hosts, so that the technical difficulty is high, the requirement is high, and the key links in the building process are emphasized.
1) Virtual machine and JDK installation.
Figure DEST_PATH_IMAGE001
The VMWare Workstation 10 virtual machine, the CentOS 7Linux operating system and a platform development kit JDK 1.7 are sequentially installed on the Master, the Slave1, the Slave2 and the Slave3 respectively, and the Java environment variables are configured by the getit.
2) Hadoop installation and configuration.
And installing Hadoop 2.6.4, and configuring environment variables of Hadoop-env.sh and yarn-env.sh.
Figure DEST_PATH_IMAGE002
Xml, the storage position and the port of the HDFS are set, and the file cache is set.
Figure DEST_PATH_IMAGE003
Xml, setting addresses and ports of NameNode and DataNode of distributed storage of traffic data files, and the number of file backups, and setting the number of file backups to be 3.
Figure DEST_PATH_IMAGE004
And (5) allocating yarn-site. xml, and uniformly managing Hadoop resources.
Figure DEST_PATH_IMAGE005
Xml, and setting management nodes of a distributed computing architecture MapReduce.
Figure DEST_PATH_IMAGE006
Starting the Hadoop by using start-dfs.sh and start-horn.sh, respectively using Jps command to check the system at the Master and the three Slave terminals, when the Master terminal has four processes of SecondaryNameNode, Jps, NameNode and ResourceManager, and the Slave terminal has three processes of Jps, DataNode and NodeManager, it is indicated that the Hadoop cluster is normally installed and can be started.
3) Hadoop + Spark installation and start-up
Installation at this stage needs to be performed on the basis that Hadoop has been successfully installed, and the Hadoop platform is required to be started normally. The Hadoop ecosystem comprises a series of big data storage, analysis and transmission tools, wherein Spark 1.6.2, Scala2.11.8, Hbase 1.2.2, MySQL 5.7.14, Mahout 0.10.0 and Sqoop1.99.7 are sequentially installed and deployed.
First, a development language scala2.11.8 is installed for Spark, and environment variables of scala are configured.
Figure DEST_PATH_IMAGE007
Secondly, spark 1.6.2 is installed, and spark environment variables and spark-env.
Figure DEST_PATH_IMAGE008
And finally, respectively installing and configuring other Hadoop ecosystem software, wherein the environment variables and configuration files need to be modified after the Hbase 1.2.2 and the Sqoop1.99.7 are installed, and details of implementation details and related configuration files are not detailed. Each node in the Hadoop cluster needs to execute the operation, the method selects the copy function, copies the installation to each Slave node, uses source/etc/profile to enable the configuration file to take effect, and completes the construction and deployment work of the whole traffic big data analysis platform. Hadoop and spark (namely/start-all. sh) are respectively operated on a Master Node (Master) and Slave nodes (Slave1, Slave2 and Slave3), Jps is input at the Master Node and the Slave Node, when the Master terminal has five processes of Secondary NameNode, Jps, NameNode, Resource Manager and Master, and when the Slave terminal has four processes of Jps, DataNode, Node Manager and Worker, the Hadoop + spark cluster is successfully installed, and the analysis and prediction platform can be normally started to become traffic big data.
The invention discloses a traffic prediction platform based on big data analysis, which comprises the following steps: hadoop + Spark traffic prediction system and method based on combination of off-line analysis and on-line prediction. According to the comprehensive traffic big data processing platform based on the Hadoop + Spark architecture, a working mechanism of a MapReduce Distributed architecture of the Hadoop cloud platform, a Distributed File storage (HDFS) principle and a working process of Spark based on memory computing are researched, and according to the strong real-time requirement of traffic flow prediction, the comprehensive traffic big data processing platform based on the Hadoop + Spark architecture is established. Aiming at the characteristics of traffic big data, the platform is optimized, so that the traffic big data can be quickly analyzed, a traffic data preprocessing method based on the big data platform is researched, and high-quality data support is provided for traffic characteristic analysis and traffic flow prediction.
Finally, it should be noted that the above-mentioned embodiments are only used for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the modifications to the specific embodiments of the present invention or equivalent substitutions for some technical features may be made without departing from the spirit of the technical solutions of the present invention, and all of them should be covered in the technical solutions of the present invention.

Claims (3)

1. A Hadoop + Spark traffic prediction system based on combination of off-line analysis and on-line prediction is characterized by comprising storage and management (1) of Hadoop platform traffic big data, Spark system traffic real-time data analysis and traffic flow prediction (2) and traffic flow prediction application (3), wherein the whole system is applied from bottom layer traffic data acquisition to high-layer traffic flow prediction and comprises four parts, namely a traffic big data source, the storage and management (1) of the Hadoop platform traffic big data, the Spark system traffic big data real-time analysis and traffic flow prediction (2) and the traffic flow prediction application (3).
2. The Hadoop + Spark traffic prediction system based on the combination of the offline analysis and the online prediction as claimed in claim 1, wherein a traffic flow prediction architecture based on the combination of the Hadoop + Spark offline analysis and the online flow processing is constructed, a Hadoop platform traffic big data storage and management (1) is adopted, a Hadoop/MapReduce distributed computing framework is used for analyzing and processing historical traffic data, deep knowledge contained in the data is mined, rules hidden in the data, such as daily travel behaviors of residents, travel modes, urban dynamic features and the like, then a Spark system is used for carrying out real-time analysis and traffic flow prediction (2) on the traffic big data, and the system is finally applied to traffic prediction applications (3) such as traffic induction, traffic signal control, traffic information services and the like.
3. A Hadoop + Spark traffic prediction method based on combination of off-line analysis and on-line prediction is used for realizing the platform traffic prediction and application of claim 1, and is characterized by comprising the following steps of:
1) a vehicle tracking and Positioning System based on Radio Frequency Identification (RFID), a Global Positioning System (GPS), traffic monitoring videos, social media, mobile phone applications, induction coils, buckles, microwaves, radar monitoring and the like are adopted to accumulate a large amount of traffic data;
2) the method comprises the steps that (1) Hadoop platform traffic big data storage and management is adopted, traffic unstructured file data are firstly classified according to directories, then file attributes are managed according to a metadata management method, and unified management is carried out through an HDFS distributed system; organizing and managing real-time, large-capacity and continuous traffic information, such as real-time track data and monitoring video data, by using Tachyon in a Spark system; the processed and mined partial regular traffic mode information is stored in a relational database MySQL, so that seamless access of most application development is facilitated; most unstructured traffic data which are subjected to compilation, reanalysis, classification, correlation calculation and related conversion processing are stored in an HBase non-relational database; through organization and management of different forms of traffic big data, operations such as convenient capacity expansion, deletion, migration and the like and classified storage of the traffic data are realized, traffic data access with different requirements is met, and data optimized storage and rapid query are achieved;
3) analyzing and processing historical traffic data by adopting a Hadoop/MapReduce distributed computing framework, mining deep knowledge contained in the data, searching rules hidden in the data, such as daily travel behaviors, travel modes, urban dynamic characteristics and the like of residents, and analyzing and computing traffic big data in real time by using a Spark system to realize short-term prediction of traffic flow;
4) the short-term traffic flow prediction information is utilized to realize the application of a traffic guidance system, a traffic signal control system, a real-time road condition forecasting system, real-time road network planning and road network map updating, traffic supply and demand analysis, traffic abnormity detection, intelligent electronic parking, short-term traffic jam prediction, travel information service and the like.
CN202010298397.2A 2020-04-16 2020-04-16 Hadoop + Spark traffic prediction system and method based on combination of offline analysis and online prediction Pending CN111680075A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010298397.2A CN111680075A (en) 2020-04-16 2020-04-16 Hadoop + Spark traffic prediction system and method based on combination of offline analysis and online prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010298397.2A CN111680075A (en) 2020-04-16 2020-04-16 Hadoop + Spark traffic prediction system and method based on combination of offline analysis and online prediction

Publications (1)

Publication Number Publication Date
CN111680075A true CN111680075A (en) 2020-09-18

Family

ID=72451489

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010298397.2A Pending CN111680075A (en) 2020-04-16 2020-04-16 Hadoop + Spark traffic prediction system and method based on combination of offline analysis and online prediction

Country Status (1)

Country Link
CN (1) CN111680075A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112070280A (en) * 2020-08-19 2020-12-11 贵州民族大学 Real-time traffic flow parallel prediction method, system, terminal and storage medium
CN112711593A (en) * 2021-01-04 2021-04-27 浪潮云信息技术股份公司 Big data processing method for realizing mixed transaction analysis
CN113177049A (en) * 2021-05-13 2021-07-27 中移智行网络科技有限公司 Data processing method, device and system
CN113378219A (en) * 2021-06-07 2021-09-10 北京许继电气有限公司 Processing method and system of unstructured data
CN113487856A (en) * 2021-06-04 2021-10-08 兰州理工大学 Traffic flow combination prediction model based on graph convolution network and attention mechanism
CN116415206A (en) * 2023-06-06 2023-07-11 中国移动紫金(江苏)创新研究院有限公司 Operator multiple data fusion method, system, electronic equipment and computer storage medium
CN113378219B (en) * 2021-06-07 2024-05-28 北京许继电气有限公司 Unstructured data processing method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150207706A1 (en) * 2014-01-22 2015-07-23 Telefonaktiebolaget L M Ericsson (Publ) Method for scalable distributed network traffic analytics in telco
US20160294773A1 (en) * 2015-04-03 2016-10-06 Infoblox Inc. Behavior analysis based dns tunneling detection and classification framework for network security
CN106128100A (en) * 2016-06-30 2016-11-16 华南理工大学 A kind of short-term traffic flow forecast method based on Spark platform
CN109903554A (en) * 2019-02-21 2019-06-18 长安大学 A kind of road grid traffic operating analysis method based on Spark
CN109993964A (en) * 2017-12-31 2019-07-09 广州明领基因科技有限公司 Intelligent traffic management systems based on Hadoop technology

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150207706A1 (en) * 2014-01-22 2015-07-23 Telefonaktiebolaget L M Ericsson (Publ) Method for scalable distributed network traffic analytics in telco
US20160294773A1 (en) * 2015-04-03 2016-10-06 Infoblox Inc. Behavior analysis based dns tunneling detection and classification framework for network security
CN106128100A (en) * 2016-06-30 2016-11-16 华南理工大学 A kind of short-term traffic flow forecast method based on Spark platform
CN109993964A (en) * 2017-12-31 2019-07-09 广州明领基因科技有限公司 Intelligent traffic management systems based on Hadoop technology
CN109903554A (en) * 2019-02-21 2019-06-18 长安大学 A kind of road grid traffic operating analysis method based on Spark

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JICHIANG TSAI,TIEN-YU CHANG,YU-HSIANG FANG,EN-SHUO CHANG: ""A Real-Time Traffic Flow Prediction System for National Freeways Based on the Spark Streaming Technique"", 《2018 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-TAIWAN》 *
陈丽: "基于大数据架构的智能交通信息处理的研究与设计", 《广东交通职业技术学院学报》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112070280A (en) * 2020-08-19 2020-12-11 贵州民族大学 Real-time traffic flow parallel prediction method, system, terminal and storage medium
CN112070280B (en) * 2020-08-19 2023-10-31 贵州民族大学 Real-time traffic flow parallel prediction method, system, terminal and storage medium
CN112711593A (en) * 2021-01-04 2021-04-27 浪潮云信息技术股份公司 Big data processing method for realizing mixed transaction analysis
CN113177049A (en) * 2021-05-13 2021-07-27 中移智行网络科技有限公司 Data processing method, device and system
CN113487856A (en) * 2021-06-04 2021-10-08 兰州理工大学 Traffic flow combination prediction model based on graph convolution network and attention mechanism
CN113378219A (en) * 2021-06-07 2021-09-10 北京许继电气有限公司 Processing method and system of unstructured data
CN113378219B (en) * 2021-06-07 2024-05-28 北京许继电气有限公司 Unstructured data processing method and system
CN116415206A (en) * 2023-06-06 2023-07-11 中国移动紫金(江苏)创新研究院有限公司 Operator multiple data fusion method, system, electronic equipment and computer storage medium
CN116415206B (en) * 2023-06-06 2023-08-22 中国移动紫金(江苏)创新研究院有限公司 Operator multiple data fusion method, system, electronic equipment and computer storage medium

Similar Documents

Publication Publication Date Title
CN111680075A (en) Hadoop + Spark traffic prediction system and method based on combination of offline analysis and online prediction
Li et al. MapReduce parallel programming model: a state-of-the-art survey
Jayalath et al. From the cloud to the atmosphere: Running MapReduce across data centers
Basca et al. Avalanche: Putting the spirit of the web back into semantic web querying
Lu et al. Research on Hadoop cloud computing model and its applications
Zygouras et al. Insights on a scalable and dynamic traffic management system.
CN111523003A (en) Data application method and platform with time sequence dynamic map as core
Gong et al. RT-DBSCAN: real-time parallel clustering of spatio-temporal data using spark-streaming
CN114416855A (en) Visualization platform and method based on electric power big data
Davoudian et al. A workload-adaptive streaming partitioner for distributed graph stores
US20150172369A1 (en) Method and system for iterative pipeline
Bergui et al. A survey on bandwidth-aware geo-distributed frameworks for big-data analytics
Peixoto et al. Scalable and fast top-k most similar trajectories search using mapreduce in-memory
Kumar et al. A review on recent trends in query processing and optimization in big data
Nasir et al. Partial key grouping: Load-balanced partitioning of distributed streams
Wang et al. HTD: heterogeneous throughput-driven task scheduling algorithm in MapReduce
Chen et al. Cut-and-rewind: Extending query engine for continuous stream analytics
MO'TAZ et al. Apache Hadoop performance evaluation with resources monitoring tools, and parameters optimization: IOT emerging demand
Zhang A hadoop processing method for massive sensor network data based on internet of things
Liang et al. Correlation-aware replica prefetching strategy to decrease access latency in edge cloud
Yang et al. Distributed query engine for multiple-query optimization over data stream
Zhao et al. A spatio-temporal parallel processing system for traffic sensory data
Lou et al. Hydrological stream data pipeline framework based on IoTDB
Uprety et al. MapReduce: Big Data Maintained Algorithm
Lee et al. Implementation of Distributed In-Memory Moving Objects Management System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200918