CN112163060A - System for processing mass GPS data by big data technology - Google Patents
System for processing mass GPS data by big data technology Download PDFInfo
- Publication number
- CN112163060A CN112163060A CN202010971958.0A CN202010971958A CN112163060A CN 112163060 A CN112163060 A CN 112163060A CN 202010971958 A CN202010971958 A CN 202010971958A CN 112163060 A CN112163060 A CN 112163060A
- Authority
- CN
- China
- Prior art keywords
- data
- layer
- gps
- processing
- big
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012545 processing Methods 0.000 title claims abstract description 30
- 238000005516 engineering process Methods 0.000 title claims abstract description 17
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 22
- 238000007781 pre-processing Methods 0.000 claims abstract description 14
- 238000007405 data analysis Methods 0.000 claims abstract description 10
- 238000013500 data storage Methods 0.000 claims abstract description 10
- 238000004458 analytical method Methods 0.000 claims abstract description 8
- 238000012417 linear regression Methods 0.000 claims abstract description 8
- 238000013079 data visualisation Methods 0.000 claims abstract description 5
- 238000000034 method Methods 0.000 claims abstract description 5
- 238000004140 cleaning Methods 0.000 claims abstract description 4
- 238000004364 calculation method Methods 0.000 claims abstract 3
- 238000004141 dimensional analysis Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 3
- 239000003086 colorant Substances 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010223 real-time analysis Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24552—Database cache management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- Remote Sensing (AREA)
- Computational Linguistics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Position Fixing By Use Of Radio Waves (AREA)
Abstract
The invention discloses a system for processing mass GPS data by big data technology, comprising: the data acquisition layer comprises a log file information acquisition module, the log file information acquisition module caches acquired information into a Kafk distributed cache region, the data preprocessing layer receives data from the Kafk distributed cache region by Storm, performs data cleaning, data processing and data summarization, and then performs real-time statistics on the data, the data storage layer caches the data processed by the data preprocessing layer into an unstructured storage region Redis, and batch-stores the data into a MongoDB at regular intervals, and then the data processing layer excavates the data stored in the data storage layer by using a linear regression algorithm and a k-means algorithm, performs off-line calculation and real-time calculation, and the data analysis layer is organized data visualization and multidimensional analysis, the mass GPS data processing system based on the big data technology can automatically acquire high-availability GPS data, analyze and process the GPS data with large data volume and large concurrency in real time, is deployed in a distributed mode, prevents data loss, utilizes an algorithm to mine knowledge, and explores potential value.
Description
Technical Field
The invention relates to the technical field of computer networks, in particular to a system for processing mass GPS data by a big data technology.
Background
At present, the Internet of things system or the geographic information system partially integrates the GPS positioning function, but under the impact of hundreds of millions of GPS data, the traditional data processing mode is very time-consuming. For example, only some simple statistical analysis is performed on the GPS, the program needs several hours to run the data for one day, the requirement of real-time analysis cannot be met, some deep mining cannot be performed on the data, and in addition, the storage of historical data is a problem to be solved urgently.
At present, most of the adopted modes are that log files are compressed and then uploaded to a server for storage, and the mode is original and unreliable, and firstly, an operator needs to manually upload data every day at regular time, and the operation is inconvenient; secondly, once a server for storing data goes wrong, a large amount of data can be lost, and irreparable loss is caused.
With the maturity and popularization of big data technology, we find that the problems can be perfectly solved by means of big data technology, so how to design a big data technology to process a massive GPS data system becomes a problem that we need to solve at present.
Disclosure of Invention
The invention aims to provide a system for processing massive GPS data by a big data technology so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme: a mass GPS data processing system based on big data technology comprises a data acquisition layer, a data preprocessing layer, a data storage layer, a data processing layer and a data analysis layer, wherein the data acquisition layer comprises a log file information acquisition module which caches acquired information into a Kafk distributed cache region, the data preprocessing layer receives data from the Kafk distributed cache region by Storm, performs data cleaning, data processing and data summarization, and then performs real-time statistics on the data, the data processed by the data preprocessing layer is firstly cached in an unstructured storage region Redis by the data storage layer, the data are stored into a MongoDB in batches at certain time intervals, and then the data stored in the data storage layer is mined and offline calculated by the data processing layer by utilizing a linear regression algorithm and a k-means algorithm, And calculating in real time, wherein the data analysis layer is the organized data visualization and multidimensional analysis.
Further, the data acquisition layer adopts a Flume architecture.
Further, the data preprocessing layer applies a Storm distributed real-time big data receiving and processing framework.
Furthermore, the data processing layer predicts the stay time of the GPS client by using a linear regression algorithm. And performing clustering analysis on the GPS client side close to the position by using a k-means algorithm.
Furthermore, the data analysis module adopts a road-catching algorithm to fuse the GPS data and the map data.
Compared with the prior art, the invention has the beneficial effects that:
1. GPS data with large data volume and large concurrency can be analyzed and processed in real time;
2. automatically collecting highly available GPS data;
3. data are stored in a distributed mode, and data loss is prevented;
4. and (4) carrying out deep mining on the data by using an algorithm.
Drawings
FIG. 1 is a schematic diagram of the overall architecture of the present system;
FIG. 2 is a flow chart of the log file information collection module (Flume) of the data collection layer of the system;
FIG. 3 is a schematic flow diagram of a data reception module (Storm) of the data preprocessing layer of the system;
fig. 4 is a schematic diagram of a data analysis layer data visualization module of the system.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-4, the present invention provides a technical solution: a big data technology mass GPS data processing system, comprising: the data acquisition layer 1 comprises a log file information acquisition module 6, the log file information acquisition module 6 caches acquired information into a Kafk distributed cache region 7, the data preprocessing layer 2 receives data from the Kafk distributed cache region 7 by Storm, data cleaning 8, data processing 9 and data summarization 10 are carried out, real-time statistics is carried out on the data, the data processed by the data preprocessing layer 2 is firstly cached into an unstructured storage region 10Redis by the data storage layer 3, the data are transferred into a MongoDB in batches at regular intervals, and then the data stored by the data storage layer 3 are mined and offline calculated by the data processing layer 4 by utilizing a linear regression algorithm and a k-means algorithm 13, The real-time computation 14 and the data analysis layer 5 are the collated data visualization 15 and the multidimensional analysis 16.
Furthermore, the data acquisition layer adopts a Flume framework, and the automation of mass log data acquisition is realized by utilizing the distributed, high-reliability and high-availability mass log acquisition characteristics of the Flume framework.
Furthermore, the data preprocessing layer (2) utilizes a Storm framework to receive and process a large amount of GPS data in a distributed manner in real time, so that the receiving and processing efficiency of the large amount of GPS data is improved.
Furthermore, the data processing layer predicts the stay time of the GPS client by using a linear regression algorithm. And performing clustering analysis on the GPS client side close to the position by using a k-means algorithm.
Furthermore, the data analysis module adopts a road-catching algorithm to fuse GPS data and map data, displays tracks of the segments in different colors, and can enable dispatchers to clearly know road traffic conditions in the current time period and assist in dispatching vehicles and personnel.
The working principle is as follows: firstly, the GPS positioning terminal equipment uploads log files to a server, a data acquisition layer mainly utilizes an open source component flash to acquire the log files, the log files are directly sent to Kafka for caching, the data are compressed and then written into an HDFS for later analysis, Storm receives the data from the Kafka, then the data are counted in real time, the data processed through the Storm are firstly cached in Redis, the data are stored in MongoDB in batches at regular intervals, the staying time of a GPS client is predicted by a linear regression algorithm, and the GPS client adjacent to the position is subjected to clustering analysis by the k-means algorithm, so that a hot spot area is found. The method comprises the steps of segmenting track data according to the speed of a GPS client, analyzing road unblocked conditions and the like in a certain time period, loading the GPS data on a map, fusing the GPS data and the map data by using a road grabbing algorithm, displaying segmented tracks in different colors, enabling a dispatcher to be clear at a glance of the road passing condition in the current time period, and assisting the dispatching of the GPS client.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.
Claims (5)
1. A big data technology mass GPS data processing system, comprising: the system comprises a data acquisition layer (1), a data preprocessing layer (2), a data storage layer (3), a data processing layer (4) and a data analysis layer (5); the method is characterized in that: the data acquisition layer (1) comprises a log file information acquisition module (6), the log file information acquisition module (6) caches acquired information into a Kafk distributed cache region (7), the data preprocessing layer (2) receives data from the Kafk distributed cache region (7) by adopting Storm, performs data cleaning (8), data processing (9) and data summarization (10), performs real-time statistics on the data, the data storage layer (3) caches the data processed by the data preprocessing layer (2) into an unstructured storage region (10) Redis, and batch-stores the data into MongoDB at regular intervals, and the data processing layer (4) excavates the data stored in the data storage layer (3) by utilizing a linear regression algorithm and a k-means algorithm and performs offline calculation (13), And calculating (14) in real time, wherein the data analysis layer (5) is a consolidated data visualization (15) and a multi-dimensional analysis (16).
2. The system for processing massive GPS data by big data technology according to claim 1, wherein: the data acquisition layer (1) adopts a Flume framework.
3. The system for processing massive GPS data by big data technology according to claim 1, wherein: and the data preprocessing layer (2) applies a Storm distributed real-time big data receiving and processing framework.
4. The system for processing massive GPS data by big data technology according to claim 1, wherein: the data processing layer (4) predicts the stay time of the GPS client by using a linear regression algorithm; and performing clustering analysis on the GPS client side close to the position by using a k-means algorithm.
5. The system for processing massive GPS data by big data technology according to claim 1, wherein: and the data analysis module (5) adopts a road-catching algorithm to fuse the GPS data and the map data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010971958.0A CN112163060A (en) | 2020-09-16 | 2020-09-16 | System for processing mass GPS data by big data technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010971958.0A CN112163060A (en) | 2020-09-16 | 2020-09-16 | System for processing mass GPS data by big data technology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112163060A true CN112163060A (en) | 2021-01-01 |
Family
ID=73859214
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010971958.0A Pending CN112163060A (en) | 2020-09-16 | 2020-09-16 | System for processing mass GPS data by big data technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112163060A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113761358A (en) * | 2021-05-11 | 2021-12-07 | 中科天玑数据科技股份有限公司 | Multi-channel hotspot discovery method and multi-channel hotspot discovery system |
CN116545740A (en) * | 2023-05-30 | 2023-08-04 | 阿锐巴数据科技(上海)有限公司 | Threat behavior analysis method and server based on big data |
CN116545740B (en) * | 2023-05-30 | 2024-05-14 | 阿锐巴数据科技(上海)有限公司 | Threat behavior analysis method and server based on big data |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104036025A (en) * | 2014-06-27 | 2014-09-10 | 蓝盾信息安全技术有限公司 | Distribution-base mass log collection system |
CN106600960A (en) * | 2016-12-22 | 2017-04-26 | 西南交通大学 | Traffic travel origin and destination identification method based on space-time clustering analysis algorithm |
US20170169078A1 (en) * | 2015-12-14 | 2017-06-15 | Siemens Aktiengesellschaft | Log Mining with Big Data |
CN109284195A (en) * | 2018-08-27 | 2019-01-29 | 广东电网有限责任公司信息中心 | A kind of real-time representation data calculation method and system |
CN109977125A (en) * | 2019-04-09 | 2019-07-05 | 福建奇点时空数字科技有限公司 | A kind of big data safety analysis plateform system based on network security |
CN111258979A (en) * | 2020-01-16 | 2020-06-09 | 山东大学 | Cloud protection log system and working method thereof |
-
2020
- 2020-09-16 CN CN202010971958.0A patent/CN112163060A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104036025A (en) * | 2014-06-27 | 2014-09-10 | 蓝盾信息安全技术有限公司 | Distribution-base mass log collection system |
US20170169078A1 (en) * | 2015-12-14 | 2017-06-15 | Siemens Aktiengesellschaft | Log Mining with Big Data |
CN106600960A (en) * | 2016-12-22 | 2017-04-26 | 西南交通大学 | Traffic travel origin and destination identification method based on space-time clustering analysis algorithm |
CN109284195A (en) * | 2018-08-27 | 2019-01-29 | 广东电网有限责任公司信息中心 | A kind of real-time representation data calculation method and system |
CN109977125A (en) * | 2019-04-09 | 2019-07-05 | 福建奇点时空数字科技有限公司 | A kind of big data safety analysis plateform system based on network security |
CN111258979A (en) * | 2020-01-16 | 2020-06-09 | 山东大学 | Cloud protection log system and working method thereof |
Non-Patent Citations (4)
Title |
---|
周素红等: "《地理学评论》", 31 December 2019, 商务印书馆 * |
张俊友等: "《智能交通系统及应用》", 31 August 2017, 哈尔滨工业大学出版社 * |
董昭 等: ""大数据位置类应用实现方式研究"", 《互联网天地》 * |
陈玉华: "《如何玩转专利大数据》", 31 July 2019, 知识产权出版社 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113761358A (en) * | 2021-05-11 | 2021-12-07 | 中科天玑数据科技股份有限公司 | Multi-channel hotspot discovery method and multi-channel hotspot discovery system |
CN116545740A (en) * | 2023-05-30 | 2023-08-04 | 阿锐巴数据科技(上海)有限公司 | Threat behavior analysis method and server based on big data |
CN116545740B (en) * | 2023-05-30 | 2024-05-14 | 阿锐巴数据科技(上海)有限公司 | Threat behavior analysis method and server based on big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110135273B (en) | Contact network video image cloud intelligent monitoring and fault identification method | |
CN109923595B (en) | Urban road traffic abnormity detection method based on floating car data | |
CN107610421A (en) | A kind of geo-hazard early-warning analysis system and method | |
US11544657B2 (en) | Roadway maintenance condition detection and analysis | |
CN110728443A (en) | Motor full life cycle management and control system | |
CN112184625A (en) | Pavement defect identification method and system based on video deep learning | |
CN104778245A (en) | Similar trajectory mining method and device on basis of massive license plate identification data | |
CN103106542A (en) | Data analyzing and processing system | |
WO2022174679A1 (en) | Method and apparatus for predicting voltage inconsistency fault of battery cells, and server | |
CN114428828A (en) | Method and device for digging new road based on driving track and electronic equipment | |
CN111815098A (en) | Traffic information processing method and device based on extreme weather, storage medium and electronic equipment | |
CN112446549A (en) | Urban garbage intelligent supervision platform based on big data | |
CN112163060A (en) | System for processing mass GPS data by big data technology | |
CN112883075A (en) | Landslide universal type ground surface displacement monitoring data missing and abnormal value processing method | |
CN110359919B (en) | Shield tunneling machine construction risk prevention and control method and system | |
CN117194919A (en) | Production data analysis system | |
CN112184624A (en) | Picture detection method and system based on deep learning | |
CN116935642A (en) | Convenient travel management method based on Internet of things | |
CN111027827A (en) | Method and device for analyzing operation risk of bottom-preserving communication network and computer equipment | |
CN111651648A (en) | Intelligent generation method and device for pole tower key component inspection plan | |
CN114646021B (en) | Underground pipe network monitoring method | |
CN110798510A (en) | Intelligent garbage can internet of things monitoring platform | |
KR102358532B1 (en) | Apparatus and method for predicting energy use and generation through self-enhancement learning | |
CN116070152B (en) | Excavator workload identification method and device based on multidimensional operation characteristics | |
CN117935531A (en) | Multi-source traffic data-based foothold analysis method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210101 |
|
RJ01 | Rejection of invention patent application after publication |