CN112163060A - System for processing mass GPS data by big data technology - Google Patents

System for processing mass GPS data by big data technology Download PDF

Info

Publication number
CN112163060A
CN112163060A CN202010971958.0A CN202010971958A CN112163060A CN 112163060 A CN112163060 A CN 112163060A CN 202010971958 A CN202010971958 A CN 202010971958A CN 112163060 A CN112163060 A CN 112163060A
Authority
CN
China
Prior art keywords
data
layer
gps
processing
big
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010971958.0A
Other languages
Chinese (zh)
Inventor
张春香
张传学
张零辉
吴鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Longyun Intelligent Technology Co ltd
Original Assignee
Anhui Longyun Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Longyun Intelligent Technology Co ltd filed Critical Anhui Longyun Intelligent Technology Co ltd
Priority to CN202010971958.0A priority Critical patent/CN112163060A/en
Publication of CN112163060A publication Critical patent/CN112163060A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Remote Sensing (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Position Fixing By Use Of Radio Waves (AREA)

Abstract

The invention discloses a system for processing mass GPS data by big data technology, comprising: the data acquisition layer comprises a log file information acquisition module, the log file information acquisition module caches acquired information into a Kafk distributed cache region, the data preprocessing layer receives data from the Kafk distributed cache region by Storm, performs data cleaning, data processing and data summarization, and then performs real-time statistics on the data, the data storage layer caches the data processed by the data preprocessing layer into an unstructured storage region Redis, and batch-stores the data into a MongoDB at regular intervals, and then the data processing layer excavates the data stored in the data storage layer by using a linear regression algorithm and a k-means algorithm, performs off-line calculation and real-time calculation, and the data analysis layer is organized data visualization and multidimensional analysis, the mass GPS data processing system based on the big data technology can automatically acquire high-availability GPS data, analyze and process the GPS data with large data volume and large concurrency in real time, is deployed in a distributed mode, prevents data loss, utilizes an algorithm to mine knowledge, and explores potential value.

Description

System for processing mass GPS data by big data technology
Technical Field
The invention relates to the technical field of computer networks, in particular to a system for processing mass GPS data by a big data technology.
Background
At present, the Internet of things system or the geographic information system partially integrates the GPS positioning function, but under the impact of hundreds of millions of GPS data, the traditional data processing mode is very time-consuming. For example, only some simple statistical analysis is performed on the GPS, the program needs several hours to run the data for one day, the requirement of real-time analysis cannot be met, some deep mining cannot be performed on the data, and in addition, the storage of historical data is a problem to be solved urgently.
At present, most of the adopted modes are that log files are compressed and then uploaded to a server for storage, and the mode is original and unreliable, and firstly, an operator needs to manually upload data every day at regular time, and the operation is inconvenient; secondly, once a server for storing data goes wrong, a large amount of data can be lost, and irreparable loss is caused.
With the maturity and popularization of big data technology, we find that the problems can be perfectly solved by means of big data technology, so how to design a big data technology to process a massive GPS data system becomes a problem that we need to solve at present.
Disclosure of Invention
The invention aims to provide a system for processing massive GPS data by a big data technology so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme: a mass GPS data processing system based on big data technology comprises a data acquisition layer, a data preprocessing layer, a data storage layer, a data processing layer and a data analysis layer, wherein the data acquisition layer comprises a log file information acquisition module which caches acquired information into a Kafk distributed cache region, the data preprocessing layer receives data from the Kafk distributed cache region by Storm, performs data cleaning, data processing and data summarization, and then performs real-time statistics on the data, the data processed by the data preprocessing layer is firstly cached in an unstructured storage region Redis by the data storage layer, the data are stored into a MongoDB in batches at certain time intervals, and then the data stored in the data storage layer is mined and offline calculated by the data processing layer by utilizing a linear regression algorithm and a k-means algorithm, And calculating in real time, wherein the data analysis layer is the organized data visualization and multidimensional analysis.
Further, the data acquisition layer adopts a Flume architecture.
Further, the data preprocessing layer applies a Storm distributed real-time big data receiving and processing framework.
Furthermore, the data processing layer predicts the stay time of the GPS client by using a linear regression algorithm. And performing clustering analysis on the GPS client side close to the position by using a k-means algorithm.
Furthermore, the data analysis module adopts a road-catching algorithm to fuse the GPS data and the map data.
Compared with the prior art, the invention has the beneficial effects that:
1. GPS data with large data volume and large concurrency can be analyzed and processed in real time;
2. automatically collecting highly available GPS data;
3. data are stored in a distributed mode, and data loss is prevented;
4. and (4) carrying out deep mining on the data by using an algorithm.
Drawings
FIG. 1 is a schematic diagram of the overall architecture of the present system;
FIG. 2 is a flow chart of the log file information collection module (Flume) of the data collection layer of the system;
FIG. 3 is a schematic flow diagram of a data reception module (Storm) of the data preprocessing layer of the system;
fig. 4 is a schematic diagram of a data analysis layer data visualization module of the system.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-4, the present invention provides a technical solution: a big data technology mass GPS data processing system, comprising: the data acquisition layer 1 comprises a log file information acquisition module 6, the log file information acquisition module 6 caches acquired information into a Kafk distributed cache region 7, the data preprocessing layer 2 receives data from the Kafk distributed cache region 7 by Storm, data cleaning 8, data processing 9 and data summarization 10 are carried out, real-time statistics is carried out on the data, the data processed by the data preprocessing layer 2 is firstly cached into an unstructured storage region 10Redis by the data storage layer 3, the data are transferred into a MongoDB in batches at regular intervals, and then the data stored by the data storage layer 3 are mined and offline calculated by the data processing layer 4 by utilizing a linear regression algorithm and a k-means algorithm 13, The real-time computation 14 and the data analysis layer 5 are the collated data visualization 15 and the multidimensional analysis 16.
Furthermore, the data acquisition layer adopts a Flume framework, and the automation of mass log data acquisition is realized by utilizing the distributed, high-reliability and high-availability mass log acquisition characteristics of the Flume framework.
Furthermore, the data preprocessing layer (2) utilizes a Storm framework to receive and process a large amount of GPS data in a distributed manner in real time, so that the receiving and processing efficiency of the large amount of GPS data is improved.
Furthermore, the data processing layer predicts the stay time of the GPS client by using a linear regression algorithm. And performing clustering analysis on the GPS client side close to the position by using a k-means algorithm.
Furthermore, the data analysis module adopts a road-catching algorithm to fuse GPS data and map data, displays tracks of the segments in different colors, and can enable dispatchers to clearly know road traffic conditions in the current time period and assist in dispatching vehicles and personnel.
The working principle is as follows: firstly, the GPS positioning terminal equipment uploads log files to a server, a data acquisition layer mainly utilizes an open source component flash to acquire the log files, the log files are directly sent to Kafka for caching, the data are compressed and then written into an HDFS for later analysis, Storm receives the data from the Kafka, then the data are counted in real time, the data processed through the Storm are firstly cached in Redis, the data are stored in MongoDB in batches at regular intervals, the staying time of a GPS client is predicted by a linear regression algorithm, and the GPS client adjacent to the position is subjected to clustering analysis by the k-means algorithm, so that a hot spot area is found. The method comprises the steps of segmenting track data according to the speed of a GPS client, analyzing road unblocked conditions and the like in a certain time period, loading the GPS data on a map, fusing the GPS data and the map data by using a road grabbing algorithm, displaying segmented tracks in different colors, enabling a dispatcher to be clear at a glance of the road passing condition in the current time period, and assisting the dispatching of the GPS client.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims (5)

1. A big data technology mass GPS data processing system, comprising: the system comprises a data acquisition layer (1), a data preprocessing layer (2), a data storage layer (3), a data processing layer (4) and a data analysis layer (5); the method is characterized in that: the data acquisition layer (1) comprises a log file information acquisition module (6), the log file information acquisition module (6) caches acquired information into a Kafk distributed cache region (7), the data preprocessing layer (2) receives data from the Kafk distributed cache region (7) by adopting Storm, performs data cleaning (8), data processing (9) and data summarization (10), performs real-time statistics on the data, the data storage layer (3) caches the data processed by the data preprocessing layer (2) into an unstructured storage region (10) Redis, and batch-stores the data into MongoDB at regular intervals, and the data processing layer (4) excavates the data stored in the data storage layer (3) by utilizing a linear regression algorithm and a k-means algorithm and performs offline calculation (13), And calculating (14) in real time, wherein the data analysis layer (5) is a consolidated data visualization (15) and a multi-dimensional analysis (16).
2. The system for processing massive GPS data by big data technology according to claim 1, wherein: the data acquisition layer (1) adopts a Flume framework.
3. The system for processing massive GPS data by big data technology according to claim 1, wherein: and the data preprocessing layer (2) applies a Storm distributed real-time big data receiving and processing framework.
4. The system for processing massive GPS data by big data technology according to claim 1, wherein: the data processing layer (4) predicts the stay time of the GPS client by using a linear regression algorithm; and performing clustering analysis on the GPS client side close to the position by using a k-means algorithm.
5. The system for processing massive GPS data by big data technology according to claim 1, wherein: and the data analysis module (5) adopts a road-catching algorithm to fuse the GPS data and the map data.
CN202010971958.0A 2020-09-16 2020-09-16 System for processing mass GPS data by big data technology Pending CN112163060A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010971958.0A CN112163060A (en) 2020-09-16 2020-09-16 System for processing mass GPS data by big data technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010971958.0A CN112163060A (en) 2020-09-16 2020-09-16 System for processing mass GPS data by big data technology

Publications (1)

Publication Number Publication Date
CN112163060A true CN112163060A (en) 2021-01-01

Family

ID=73859214

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010971958.0A Pending CN112163060A (en) 2020-09-16 2020-09-16 System for processing mass GPS data by big data technology

Country Status (1)

Country Link
CN (1) CN112163060A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113761358A (en) * 2021-05-11 2021-12-07 中科天玑数据科技股份有限公司 Multi-channel hotspot discovery method and multi-channel hotspot discovery system
CN116545740A (en) * 2023-05-30 2023-08-04 阿锐巴数据科技(上海)有限公司 Threat behavior analysis method and server based on big data
CN116545740B (en) * 2023-05-30 2024-05-14 阿锐巴数据科技(上海)有限公司 Threat behavior analysis method and server based on big data

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104036025A (en) * 2014-06-27 2014-09-10 蓝盾信息安全技术有限公司 Distribution-base mass log collection system
CN106600960A (en) * 2016-12-22 2017-04-26 西南交通大学 Traffic travel origin and destination identification method based on space-time clustering analysis algorithm
US20170169078A1 (en) * 2015-12-14 2017-06-15 Siemens Aktiengesellschaft Log Mining with Big Data
CN109284195A (en) * 2018-08-27 2019-01-29 广东电网有限责任公司信息中心 A kind of real-time representation data calculation method and system
CN109977125A (en) * 2019-04-09 2019-07-05 福建奇点时空数字科技有限公司 A kind of big data safety analysis plateform system based on network security
CN111258979A (en) * 2020-01-16 2020-06-09 山东大学 Cloud protection log system and working method thereof

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104036025A (en) * 2014-06-27 2014-09-10 蓝盾信息安全技术有限公司 Distribution-base mass log collection system
US20170169078A1 (en) * 2015-12-14 2017-06-15 Siemens Aktiengesellschaft Log Mining with Big Data
CN106600960A (en) * 2016-12-22 2017-04-26 西南交通大学 Traffic travel origin and destination identification method based on space-time clustering analysis algorithm
CN109284195A (en) * 2018-08-27 2019-01-29 广东电网有限责任公司信息中心 A kind of real-time representation data calculation method and system
CN109977125A (en) * 2019-04-09 2019-07-05 福建奇点时空数字科技有限公司 A kind of big data safety analysis plateform system based on network security
CN111258979A (en) * 2020-01-16 2020-06-09 山东大学 Cloud protection log system and working method thereof

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
周素红等: "《地理学评论》", 31 December 2019, 商务印书馆 *
张俊友等: "《智能交通系统及应用》", 31 August 2017, 哈尔滨工业大学出版社 *
董昭 等: ""大数据位置类应用实现方式研究"", 《互联网天地》 *
陈玉华: "《如何玩转专利大数据》", 31 July 2019, 知识产权出版社 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113761358A (en) * 2021-05-11 2021-12-07 中科天玑数据科技股份有限公司 Multi-channel hotspot discovery method and multi-channel hotspot discovery system
CN116545740A (en) * 2023-05-30 2023-08-04 阿锐巴数据科技(上海)有限公司 Threat behavior analysis method and server based on big data
CN116545740B (en) * 2023-05-30 2024-05-14 阿锐巴数据科技(上海)有限公司 Threat behavior analysis method and server based on big data

Similar Documents

Publication Publication Date Title
CN110135273B (en) Contact network video image cloud intelligent monitoring and fault identification method
CN109923595B (en) Urban road traffic abnormity detection method based on floating car data
CN107610421A (en) A kind of geo-hazard early-warning analysis system and method
US11544657B2 (en) Roadway maintenance condition detection and analysis
CN110728443A (en) Motor full life cycle management and control system
CN112184625A (en) Pavement defect identification method and system based on video deep learning
CN104778245A (en) Similar trajectory mining method and device on basis of massive license plate identification data
CN103106542A (en) Data analyzing and processing system
WO2022174679A1 (en) Method and apparatus for predicting voltage inconsistency fault of battery cells, and server
CN114428828A (en) Method and device for digging new road based on driving track and electronic equipment
CN111815098A (en) Traffic information processing method and device based on extreme weather, storage medium and electronic equipment
CN112446549A (en) Urban garbage intelligent supervision platform based on big data
CN112163060A (en) System for processing mass GPS data by big data technology
CN112883075A (en) Landslide universal type ground surface displacement monitoring data missing and abnormal value processing method
CN110359919B (en) Shield tunneling machine construction risk prevention and control method and system
CN117194919A (en) Production data analysis system
CN112184624A (en) Picture detection method and system based on deep learning
CN116935642A (en) Convenient travel management method based on Internet of things
CN111027827A (en) Method and device for analyzing operation risk of bottom-preserving communication network and computer equipment
CN111651648A (en) Intelligent generation method and device for pole tower key component inspection plan
CN114646021B (en) Underground pipe network monitoring method
CN110798510A (en) Intelligent garbage can internet of things monitoring platform
KR102358532B1 (en) Apparatus and method for predicting energy use and generation through self-enhancement learning
CN116070152B (en) Excavator workload identification method and device based on multidimensional operation characteristics
CN117935531A (en) Multi-source traffic data-based foothold analysis method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210101

RJ01 Rejection of invention patent application after publication