CN116431882B - Bus station uplink and downlink direction judging method based on vector cross product operation - Google Patents

Bus station uplink and downlink direction judging method based on vector cross product operation Download PDF

Info

Publication number
CN116431882B
CN116431882B CN202310692730.1A CN202310692730A CN116431882B CN 116431882 B CN116431882 B CN 116431882B CN 202310692730 A CN202310692730 A CN 202310692730A CN 116431882 B CN116431882 B CN 116431882B
Authority
CN
China
Prior art keywords
bus
line
uplink
station
coordinates
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310692730.1A
Other languages
Chinese (zh)
Other versions
CN116431882A (en
Inventor
王勇
邢策梅
周松
周秀华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Province Surveying & Mapping Engineering Institute
Original Assignee
Jiangsu Province Surveying & Mapping Engineering Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Province Surveying & Mapping Engineering Institute filed Critical Jiangsu Province Surveying & Mapping Engineering Institute
Priority to CN202310692730.1A priority Critical patent/CN116431882B/en
Publication of CN116431882A publication Critical patent/CN116431882A/en
Application granted granted Critical
Publication of CN116431882B publication Critical patent/CN116431882B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a bus station uplink and downlink direction judging method based on vector cross product operation, which comprises the following steps: acquiring webpage content by using a Requests library; extracting bus line names and corresponding bus uid in bus line json data to obtain the processed bus line names; acquiring line information of each bus line and station information of the bus line, and performing multi-thread coordinate conversion on station and line inflection point coordinates; judging the uplink and downlink directions of the bus station by adopting a geometric vector cross product calculation algorithm; calculating the geometric center of a bus stop; the line Excel data table and the site Excel data table are converted into ArcMap line elements and dot elements based on ArcPy. The invention realizes the judgment of the uplink and downlink directions of the bus stop and generates the navigation map bus stop and bus route database.

Description

Bus station uplink and downlink direction judging method based on vector cross product operation
Technical Field
The invention relates to the technical field of bus station uplink and downlink direction judgment, in particular to a bus station uplink and downlink direction judgment method based on vector cross product operation.
Background
The space-time big data set in the outline of the construction technology of the space-time big data platform of the smart city comprises basic space-time data, public thematic data, real-time sensing data of the Internet of things, online grabbing data of the Internet, and a data engine and a multi-node distributed big data management system driven by the same. The internet online data capture is used as one of important components of the space-time big data of the smart city, so that missing data can be supplemented.
The prior scholars have studied the acquisition of the public transportation data, such as acquiring the public transportation data of a certain city based on a navigation map by adopting a dynamic webpage technology, extracting the density of bus stops, analyzing the correlation of the attributes of the stops and the service places, and how to distinguish the uplink and downlink directions of the stops is not related. The learner obtains the information of the bus stops in a certain city from the bus inquiry website and performs duplicate removal processing on the information. The learner obtains POI data (including bus data) based on the Web service API of the navigation map developer platform, and performs data cleaning, coordinate system and other operations on the POI data. The above researches only acquire bus stop data, and have no associated bus route space information and uplink and downlink direction data of the stop.
For the problems in the related art, no effective solution has been proposed at present.
Disclosure of Invention
Aiming at the problems in the related art, the invention provides a bus stop uplink and downlink direction judging method based on vector cross product operation, which aims to solve the technical problems in the prior art.
For this purpose, the invention adopts the following specific technical scheme:
a bus station uplink and downlink direction judging method based on vector cross product operation comprises the following steps:
s1, acquiring webpage content by utilizing a Requests library based on a python web crawler technology, and analyzing the webpage content to acquire a line name;
s2, extracting bus line names and corresponding bus uids in bus line json data, and performing duplicate removal processing to obtain the processed bus line names;
s3, taking the bus uid as a parameter, acquiring line information of each bus line and station information of the line, and performing multi-thread coordinate conversion on the coordinates of the station and the inflection points of the line;
s4, judging the uplink and downlink directions of the bus station by adopting a geometric vector cross product calculation algorithm;
s5, calculating a geometric center of the bus stop based on the uplink and downlink directions of the bus stop;
s6, converting the line Excel data table and the site Excel data table into ArcMap line elements and point elements based on ArcPy, and storing the ArcMap line elements and the point elements in a geographic database.
Further, the python web crawler technology is used for acquiring web page contents by utilizing a Requests library, analyzing the web page contents and acquiring line names, and the method comprises the following steps of:
s11, analyzing the webpage content by using a Beautiful Soup analysis library;
s12, identifying the html container by combining the style class and the id attribute setting, and obtaining the json data of the bus line in the webpage content;
s13, converting character strings of the json data of the bus route into a dictionary type by using json.loads (content) sentences, and acquiring route names in a key value pair mode.
Further, the requestors library is used for acquiring webpage content;
the webpage content comprises picture webpage resources, html webpage resources and json webpage resources;
the Beautiful Soup parsing library is used for parsing the front-end page.
Further, the step of extracting the bus route name and the corresponding bus uid in the bus route json data and performing duplicate removal processing to obtain the processed bus route name comprises the following steps:
s21, taking the line name as a search keyword, and extracting the bus line name and the corresponding bus uid from bus line json data;
s22, performing duplicate removal processing on the bus route name and the bus uid to obtain the processed bus route name.
Further, the step of obtaining the line information of each bus line and the station information of the line by taking the bus uid as a parameter and performing multi-thread coordinate conversion on the coordinates of the station and the inflection point of the line comprises the following steps:
s31, taking a bus uid as a parameter, and acquiring line information of each bus line and station information of the line;
s32, storing site information into an Excel table through double circulation of the line information and the site information respectively, and obtaining a line Excel data table and a site Excel data table;
s33, performing multi-thread coordinate data conversion on the coordinates of the inflection points of the sites and the lines.
Further, the line information comprises a line name, a driving direction, a starting and ending station, a fare, an operation time and a line inflection point coordinate;
the site information comprises site uid, name, serial number of the site, name of the site and hundred-degree metric coordinates.
Further, the multi-thread coordinate data conversion of the inflection point coordinates of the station and the line includes the following steps:
s331, creating a queue for storing conversion contents;
s332, converting the hundred-degree metric coordinate into a hundred-degree longitude and latitude coordinate system by utilizing a navigation map API, converting the hundred-degree longitude and latitude coordinate system into a Mars coordinate system, and converting the Mars coordinate system into a national 2000 coordinate system;
s333, combining the Queue module of Python and the multithread Thread module to perform multithread synchronous coordinate data conversion.
Further, the method for judging the uplink and downlink directions of the bus station by adopting the geometric vector cross product calculation algorithm comprises the following steps of:
s41, acquiring longitude and latitude coordinates of a previous bus stop, a current bus stop and a next bus stop;
s42, calculating a difference value of longitude and latitude coordinates, if the difference value is larger than 0, marking the next bus stop as an uplink direction, and if the difference value is smaller than 0, marking the next bus stop as a downlink direction.
Further, the geometric center calculation for the bus stop based on the uplink and downlink directions of the bus stop comprises the following steps:
s51, after the uplink and downlink directions are judged, geometric center calculation is carried out on a plurality of identical sites and different coordinates;
s52, grouping the site names of repeated sites with non-overlapping spatial positions, the uplink and downlink directions of the sites and the distances according to a preset distinguishing threshold;
and S53, calculating the geometric center point coordinates of each group by using an arithmetic average value, and taking the geometric center point coordinates as the coordinates of the station.
Further, the conversion formula for converting the hundred-degree longitude and latitude coordinate system into the Mars coordinate system is as follows:
wherein z represents an intermediate vector, X B Representing longitude, Y in navigational map B Representing latitude, X in navigation map H Expressed in Mars coordinate system as longitude, Y H Representing the latitude in the Mars coordinate system.
The beneficial effects of the invention are as follows:
according to the invention, bus route and station data are obtained through a network information grabbing technology, operations such as cleaning, integration and conversion are carried out on the data, after structured bus data are obtained, the uplink and downlink directions of the bus station are judged by adopting a calculation geometric vector cross product algorithm, and a plurality of data of the same station are subjected to parallelization processing based on a calculation method of a geometric center, so that the bus data conforming to reality are finally obtained.
The invention combines the web crawler technology, the multithreading coordinate conversion technology, the vector cross product algorithm and the geometric center calculation method, realizes the judgment of the uplink and downlink directions of the bus stop by the steps of data acquisition, coordinate conversion, uplink and downlink direction judgment, geometric center calculation and format conversion, and generates a navigation map bus stop and bus route database.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for determining the upstream and downstream directions of a bus stop based on a vector cross product operation according to an embodiment of the invention;
FIG. 2 is a flow chart of data acquisition and processing of a bus stop uplink and downlink direction determination method based on vector cross product operation according to an embodiment of the invention;
FIG. 3 is a station uplink and downlink direction determination diagram of a bus station uplink and downlink direction determination method based on vector cross product operation according to an embodiment of the present invention;
FIG. 4 is a diagram of the geometric center effect of a bus stop based on a method for determining the uplink and downlink directions of a bus stop based on a vector cross product operation according to an embodiment of the invention;
FIG. 5 is a diagram of a bus route distribution of a city in a bus stop uplink and downlink direction determination method based on a vector cross product operation according to an embodiment of the present invention;
fig. 6 is a distribution diagram of a bus stop in a city according to a method for determining a direction of a bus stop going up and down based on a vector cross product operation according to an embodiment of the present invention.
Description of the embodiments
For the purpose of further illustrating the various embodiments, the present invention provides the accompanying drawings, which are a part of the disclosure of the present invention, and which are mainly used for illustrating the embodiments and for explaining the principles of the operation of the embodiments in conjunction with the description thereof, and with reference to these matters, it will be apparent to those skilled in the art to which the present invention pertains that other possible embodiments and advantages of the present invention may be practiced.
According to the embodiment of the invention, a bus stop uplink and downlink direction judging method based on vector cross product operation is provided.
The invention will be further described with reference to the accompanying drawings and specific embodiments, as shown in fig. 1, a bus stop uplink and downlink direction determination method based on vector cross product operation according to an embodiment of the invention, where the method includes the following steps:
s1, acquiring webpage content by utilizing a Requests library based on a python web crawler technology, and analyzing the webpage content to acquire a line name.
The method for obtaining the line name based on the python web crawler technology comprises the following steps of:
s11, acquiring webpage content by using a Requests library, and analyzing the webpage content by using a Beautiful Soup analysis library;
and S12, identifying the html container by combining the style class and the id attribute setting, and obtaining the json data of the bus line in the webpage content.
Specifically, the html container is a web page container, and is used for displaying web page content in a browser.
S13, converting character strings of the json data of the bus route into a dictionary type by using json.loads (content) sentences, and acquiring route names in a key value pair mode.
Specifically, json (content) is a function of python for converting json strings into dictionary types; the subject dictionary type, a mapping type in python, is a set of keys and corresponding values.
The Requests library is used for acquiring webpage content.
Specifically, the Requests is a Python HTTP client library for obtaining the content of the web page.
The webpage content comprises a picture webpage resource, an html webpage resource and a json webpage resource.
The Beautiful Soup parsing library is used for parsing the front-end page.
Specifically, a third party analysis library Beautiful so as to analyze webpage content is used, and useful bus related information is obtained by identifying an html container and combining with the setting of style class and id attribute; such as so.find { 'ul', } 'id:' site_ul }, the ul element in the page with id number site_ul is available.
When the acquired webpage content is in json format, a third-party json library is introduced to analyze the acquired webpage content; the bus data acquired through the hundred-degree page is in json format, and the names of the bus routes can be acquired in a key value pair mode after the data of the character string type is converted into the text dictionary type through json.
Specifically, python is used as a programming language which is simple, easy to read and easy to maintain, and is widely applied to multiple fields such as web crawlers, data analysis, machine learning, data visualization and the like. The invention adopts the third party library Requests and Beautiful Soup of Python to acquire network data.
Requests is a very practical Python HTTP client library, and can conveniently acquire the content of a webpage, including pictures, html and json webpage resources.
The Beautiful Soup can flexibly, conveniently and efficiently analyze the front-end page. After the page content is acquired based on the requestors library, the required information can be acquired conveniently by directly calling the function in the Beau Soup. For a complex front-end webpage, the Beau Soup library not only can make codes more concise and readable, but also can greatly shorten development time and improve development efficiency.
S2, extracting bus line names and corresponding bus uids in the bus line json data, and performing duplicate removal processing to obtain the processed bus line names.
The method comprises the steps of extracting bus line names and corresponding bus uids in bus line json data, performing duplicate removal processing, and obtaining the processed bus line names, wherein the step of extracting the bus line names and the corresponding bus uids in the bus line json data comprises the following steps:
s21, taking the line name as a search keyword, and extracting the bus line name and the corresponding bus uid from bus line json data;
s22, performing duplicate removal processing on the bus route name and the bus uid to obtain the processed bus route name.
Specifically, the public transportation uid represents a unique public transportation route code, and the public transportation route name comprises an uplink and downlink comparison table and a uid comparison table.
Specifically, the line name obtained in step S1 is a bus line name obtained from a bus inquiry website, and step S2 is to obtain a uid (unique code) corresponding to the bus line name by using the navigation map API through the name obtained in step S1, but the obtained data is repeated, so that the obtained data is further subjected to a deduplication process to obtain the processed bus line name and the corresponding uid; the step of acquiring the uid is to acquire other detailed information of the bus route through the unique number.
And S3, taking the bus uid as a parameter, acquiring the line information of each bus line and the station information of the line, and performing multi-thread coordinate conversion on the coordinates of the station and the inflection point of the line.
The method for converting the coordinates of the inflection points of the bus route and the bus route comprises the following steps of:
s31, taking the public transport uid as a parameter, and acquiring the line information of each public transport line and the station information of the line.
The line information comprises a line name, a driving direction, a starting and ending station, a fare, operation time and line inflection point coordinates.
Specifically, the line information of each bus line includes the line name, the driving direction, the starting and ending station, the fare, the operation time and other basic attributes and inflection point coordinates.
The inflection point coordinates are hundred-degree ink card support coordinates, the points are separated by semicolons, and the horizontal coordinates are separated by commas; the obtained data are stored in an Excel table.
The site information comprises site uid, name, serial number of the site, name of the site and hundred-degree metric coordinates.
S32, storing the site information into an Excel table through double circulation of the line information and the site information respectively, and obtaining a line Excel data table and a site Excel data table.
Specifically, the first cycle in the double cycle is to traverse the bus route, and the second cycle is to traverse the stations in each route.
S33, performing multi-thread coordinate data conversion on the coordinates of the inflection points of the sites and the lines.
The multi-thread coordinate data conversion of the inflection point coordinates of the station and the line comprises the following steps:
s331, creating a queue for storing conversion contents;
s332, converting the hundred-degree metric coordinate into a hundred-degree longitude and latitude coordinate system, converting the hundred-degree longitude and latitude coordinate system into a Mars coordinate system and converting the Mars coordinate system into a national 2000 coordinate system by utilizing a navigation map API.
The conversion formula for converting the hundred-degree longitude and latitude coordinate system into the Mars coordinate system is as follows:
wherein z represents an intermediate vector, X B Representing longitude, Y in navigational map B Representing latitude, X in navigation map H Expressed in Mars coordinate system as longitude, Y H Representing the latitude in the Mars coordinate system.
S333, combining the Queue module of Python and the multithread Thread module to perform multithread synchronous coordinate data conversion.
Specifically, the hundred-degree metric coordinate is converted into a hundred-degree longitude and latitude coordinate system through the navigation map API, and then is converted into a Mars coordinate system through a conversion formula, and then is converted into a national 2000 coordinate system.
And a Queue module and a multithread Thread module of Python are introduced to realize the conversion of a coordinate system by multithread synchronization.
The multithreading Thread module is used for fully improving the CPU utilization rate of the system and putting the subtasks with longer time consumption into the background operation.
The Queue module is suitable for a first-in first-out FIFO Queue of multithreading programming.
Specifically, a queue for storing conversion contents is created, and then a plurality of coordinate conversion tasks are processed in parallel by using multithreading; the Queue module and the multithread Thread module are combined to be used, so that a plurality of coordinate conversion subtasks can be parallel, and the time for coordinate conversion of large data volume is shortened to a great extent.
Specifically, the bus coordinate information obtained by analyzing the hundred-degree web page is a hundred-degree ink-card-made coordinate system, and the coordinate system required by the smart city is a national 2000 coordinate system, so that the coordinate conversion is required. According to the invention, firstly, the hundred-degree metric coordinates are converted into a hundred-degree longitude and latitude coordinate system through a navigation map API, and then are converted into a Mars coordinate system through formula calculation and then into a national 2000 coordinate system.
Specifically, in practical application, the navigation map is a hundred-degree map.
As the related coordinate data are more, the invention introduces the Queue and Thread modules of Python, thereby conveniently realizing the conversion of the coordinate system by multithreading synchronization. The multithread Thread module can fully improve the CPU utilization rate of the system and put the subtasks with longer time consumption into background operation. The Queue module is adapted for use with a multithreaded programmed FIFO (first in first out) Queue. A queue for storing conversion content is created, and then a plurality of coordinate conversion tasks are processed in parallel using multithreading. The Queue and Thread modules are combined to be used, so that a plurality of coordinate conversion subtasks can be parallel, and the time for coordinate conversion of a large data volume is shortened to a great extent.
S4, judging the uplink and downlink directions of the bus station by adopting a geometric vector cross product calculation algorithm.
The method for judging the uplink and downlink directions of the bus station by adopting the geometric vector cross product calculation algorithm comprises the following steps of:
s41, acquiring longitude and latitude coordinates of a previous bus stop, a current bus stop and a next bus stop;
s42, calculating a difference value of longitude and latitude coordinates, if the difference value is larger than 0, marking the next bus stop as an uplink direction, and if the difference value is smaller than 0, marking the next bus stop as a downlink direction.
Specifically, the object of the calculation geometry research is a geometric figure, wherein the application scene of the vector cross product algorithm based on the calculation geometry is wider, and the method comprises the steps of judging the turning direction of a broken line, judging which side of a straight line a point is, judging whether two straight lines intersect or not, and the like, for a line segment P with a common endpoint 0 P 1 And P 1 P 2 By calculating (P 2 -P 0 )×(P 1 -P 0 ) The sign of (2) may determine the turning direction of the fold line segment.
Specifically, the method of vector cross product algorithm based on computational geometry combines the previous site (X 1 ,Y 1 ) Current site (X) 2 ,Y 2 ) Next station (X) 3 ,Y 3 ) Is determined (X) 1 -X 3 )×(Y 2 -Y 3 )-(X 2 -X 3 )×(Y 1 -Y 3 ) If the value of the bus is greater than 0, the next bus stop is marked as ascending, otherwise, the next bus stop is marked as descending.
S5, calculating the geometric center of the bus stop based on the uplink and downlink directions of the bus stop.
The geometric center calculation for the bus stop based on the uplink and downlink directions of the bus stop comprises the following steps:
s51, calculating geometric centers of different coordinates of the same station based on the uplink and downlink directions of the bus station;
s52, grouping the site names of repeated sites with non-overlapping spatial positions, the uplink and downlink directions of the sites and the distances according to a preset distinguishing threshold;
and S53, calculating the geometric center point coordinates of each group by using an arithmetic average value, and taking the geometric center point coordinates as the coordinates of the station.
Specifically, after the uplink and downlink directions are judged, geometric center calculation is performed on a plurality of identical sites and different coordinates; for repeated sites with non-overlapping spatial positions, calculating the coordinates of the geometric center points of the repeated sites by grouping to serve as the coordinates of the sites; meanwhile, more than 2 sites with the same name at the intersection are considered to comprise uplink and downlink; when the grouping is set, 50 meters are used as thresholds for distinguishing different stations; grouping according to site names, uplink and downlink labels and 3 indexes, and calculating the geometric center point of each group through arithmetic average.
Specifically, pandas is a preferred library of Python data analysis, and is a Python data analysis package based on NumPy, for providing a powerful and efficient data structure and data processing method.
On the basis of using a DataFrame data structure, the group bypass method of Pandas is adopted to group bus station data according to requirements, and then the geometric center point coordinates of each group are obtained, namely the coordinates of the station.
S6, converting the line Excel data table and the site Excel data table into ArcMap line elements and point elements based on ArcPy, and storing the ArcMap line elements and the point elements in a geographic database.
Specifically, two Excel data tables of a bus route and a station are converted into ArcMap line elements and point elements based on ArcPy, all attribute information of the ArcMap line elements and the point elements is reserved at the same time, and the attribute information is stored in a gdb geographic database, and the specific steps are as follows:
ArcPy is a package of Python sites through which geographic data analysis, data conversion, data management, and map automation are performed in a practical and efficient manner.
ArcMap is a user desktop component and has the functions of powerful map making, spatial analysis, spatial database building and the like.
The gdb layer is created by the CreateFeatureless_management creation element method of ArcPy, an AddField_management creation attribute field is used for converting all inflection point coordinate pairs in a certain line into an array, a line Polyline element is created, and a new cursor is created by inserting a cursor insert Cursor so as to set the Polyline space attribute and the information of each attribute field.
Among them, createfeaturebalance_management is one method of ArcPy for creating a geographic element class.
AddField management is one method of ArcPy to create attribute fields for the geographic layer.
InertCursor is one method of arcPy for newly created geographic elements.
And (5) circularly running until all bus line elements are created.
Based on the above technical routes, the following description is made in connection with specific embodiments:
the bus data acquisition and processing are completed by adopting the processes of data acquisition, coordinate conversion, uplink and downlink direction judgment, geometric center calculation and format conversion, and the specific process is shown in figure 2.
(1) Acquiring line names of public transport inquiry websites
Before acquiring bus data, firstly acquiring names of all bus routes in a certain city. The public traffic information covered by the public traffic inquiry website is complete and updated timely, and the writer obtains the html content of the webpage by using the Requests library through the python web crawler technology.
And then analyzing the webpage content by using a third party analysis library Beau full so as to acquire useful bus related information by identifying an html container and combining the setting of attributes such as a style (class) and an id. Such as so.find { 'ul', } 'id:' site_ul }, the ul element in the page with id number "site_ul" is available. When the acquired webpage content is in json format, the acquired webpage content is analyzed by introducing a third-party json library. The bus data acquired through the hundred-degree page is in json format, and the name of the bus route can be acquired through json.loads (content) statements in a key value pair mode.
(2) Acquiring all line names and uid of hundred degrees
And extracting relevant bus line names and corresponding uid (unique id) by analyzing the bus line json data acquired by the navigation map with the acquired line names being search keywords, and performing de-duplication processing on the bus line names to acquire a city bus line name (including uplink and downlink) and a uid comparison table.
(3) Obtaining line information
The method comprises the steps of taking a bus uid as a parameter, acquiring information of each line, including basic attributes such as line names, driving directions, starting and ending stations, fare, operation time and the like, and inflection point coordinates, wherein the inflection point coordinates are hundred-degree ink card support coordinate systems, separating each point by a semicolon and separating horizontal and vertical coordinates by commas. The obtained data are stored in an Excel table.
(4) Structured site information
All site information of the line can be acquired while the line information is acquired, and all site information of each line is stored in an Excel table through double circulation of the line and all sites of the line respectively. The site information includes a site uid, a name, a line site number, a line name, and xy coordinates (hundred-degree metric coordinates).
(5) Multithreaded coordinate conversion
The bus data acquired by the method is used for supplementing the smart city data, so that the smart city data needs to be converted into a unified national 2000 coordinate system. The specific process is hundred degree metric coordinates-hundred degree longitude and latitude coordinates-national Mars coordinates-national 2000 coordinates. Meanwhile, as the related coordinate point data are more, 188 bus lines are included, each line contains 800 inflection points on average, and 6733 bus stops are provided, which is about 16 ten thousand coordinate point data in total. The invention adopts a multi-stroke parallel computing mode, and finishes the coordinate conversion of all points in 2 minutes.
(6) Determination of uplink and downlink directions
Because the bus stations with the same name have repetition and the spatial positions are not overlapped, and the bus stations with the same name contain uplink and downlink stations, the stations need to be judged in the uplink and downlink directions. The invention judges the uplink and downlink directions of the stations based on the principle of calculating the cross product of geometric vectors according to the symbol combination of longitude and latitude coordinate differences, and distinguishes between 0 and 1.
(7) Calculating geometric center points
Because the obtained bus stop data has redundancy, namely the bus stop data is a stop in the field, a plurality of identical-name stops (different lines) exist in the obtained data, the space positions are not overlapped, and the obtained data exists in a cluster form in space. After the uplink and downlink directions are determined, geometric center calculation needs to be performed on a plurality of identical sites (different coordinates).
For repeated sites with non-overlapping space positions, the geometric center point coordinates of the repeated sites are calculated by grouping to serve as the coordinates of the sites, and meanwhile, more than 2 sites with the same name at the crossing (including uplink and downlink) are considered, so that distance judgment is needed to be added during grouping setting. Grouping according to site names, uplink and downlink labels and 3 indexes, and calculating the geometric center point of each group through arithmetic average.
(8) Data format conversion
In order to make better use of public transportation data, it needs to be converted into geographic data recognizable by ArcMap. ArcPy can perform map-related functions such as geographic data analysis, data conversion, data management, and map automation creation in a practical and efficient manner through Python. According to the invention, two Excel data tables of a bus route and a station are converted into an ArcMap line element and a point element based on ArcPy, and all attribute information of the ArcMap line element and the point element is reserved and stored in a gdb database.
Taking a bus line as an example, a gdb layer is created by a CreateFeatureless_management method of ArcPy, an attribute field is created by using AddField_management, all inflection point coordinate pairs in a certain line are converted into an array, a line Polyline element is created, and a new cursor is created by inserting a cursor insert Cursor so as to set Polyline space attributes and information of each attribute field. And (5) circularly running until all bus line elements are created.
The realization effect is as follows:
in the invention, 188 bus lines and 6733 bus stops are obtained by using a web crawler technology by taking a city as an example. After the uplink and downlink directions of the stations based on the vector cross product are judged, the station data and the circuit diagram are overlapped for manual check, and only a small amount of station judgment errors are found, because the trend of the circuit between two points is complex (such as S-shaped), the accuracy rate reaches more than 99.5%. Taking a primary school station and a hospital station on an east of a city as an example, the up-down directions of the stations are distinguished by 0 and 1, and the invention accurately realizes the judgment of the up-down stations by comparing the line trend as shown in figure 3.
As shown in fig. 4, in the figure, the geometric center points (triangle points) of two bus stops (dots) of a hospital and a primary school are obtained after grouping mean value calculation, so that the bus stops can be better shown in geographic position.
As shown in fig. 5-6, 188 bus routes and 2815 bus stop data are obtained through the web crawler technology. And viewing the superimposed image graph, and finding that the distribution of all bus routes and stations is identical to that of the road.
In summary, by means of the above technical scheme, the invention acquires bus route and station data through a network information capturing technology, performs operations such as cleaning, integration and conversion on the data to obtain structured bus data, then adopts a calculation geometric vector cross product algorithm to judge the uplink and downlink directions of the bus station, and performs a parallel processing on a plurality of data of the same station based on a calculation method of a geometric center to finally obtain bus data conforming to reality; the invention combines the web crawler technology, the multithreading coordinate conversion technology, the vector cross product algorithm and the geometric center calculation method, realizes the judgment of the uplink and downlink directions of the bus stop by the steps of data acquisition, coordinate conversion, uplink and downlink direction judgment, geometric center calculation and format conversion, and generates a navigation map bus stop and bus route database.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (6)

1. A bus station uplink and downlink direction judging method based on vector cross product operation is characterized by comprising the following steps:
s1, acquiring webpage content by utilizing a Requests library based on a python web crawler technology, and analyzing the webpage content to acquire a line name;
s2, extracting bus line names and corresponding bus uids in bus line json data, and performing duplicate removal processing to obtain the processed bus line names;
s3, taking the bus uid as a parameter, acquiring line information of each bus line and station information of the line, and performing multi-thread coordinate conversion on the coordinates of the station and the inflection points of the line;
s4, judging the uplink and downlink directions of the bus station by adopting a geometric vector cross product calculation algorithm;
s5, calculating a geometric center of the bus stop based on the uplink and downlink directions of the bus stop;
s6, converting the line Excel data table and the site Excel data table into ArcMAP line elements and point elements based on ArcPy, and storing the ArcMAP line elements and the point elements into a geographic database;
the method for acquiring the line name based on the python web crawler technology utilizes a Requests library to acquire web page content and analyzes the web page content, and comprises the following steps:
s11, acquiring webpage content by using a Requests library, and analyzing the webpage content by using a Beautiful Soup analysis library;
s12, identifying the html container by combining the style class and the id attribute setting, and obtaining the json data of the bus line in the webpage content;
s13, converting character strings of json data of the bus route into a dictionary type by using json.loads (content) sentences, and acquiring route names in a key value pair mode;
the method for acquiring the line information of each bus line and the station information of the line by taking the bus uid as a parameter and performing multi-thread coordinate conversion on the coordinates of the station and the inflection point of the line comprises the following steps:
s31, taking a bus uid as a parameter, and acquiring line information of each bus line and station information of the line;
s32, storing site information into an Excel table through double circulation of the line information and the site information respectively, and obtaining a line Excel data table and a site Excel data table;
s33, performing multithread coordinate data conversion on the coordinates of the inflection points of the sites and the lines;
the method for judging the uplink and downlink directions of the bus station by adopting the geometric vector cross product calculation algorithm comprises the following steps of:
s41, acquiring longitude and latitude coordinates of a previous bus stop, a current bus stop and a next bus stop;
s42, calculating a difference value of longitude and latitude coordinates, if the difference value is larger than 0, marking the next bus stop as an uplink direction, and if the difference value is smaller than 0, marking the next bus stop as a downlink direction;
the geometric center calculation for the bus stop based on the uplink and downlink directions of the bus stop comprises the following steps:
s51, calculating geometric centers of different coordinates of the same station based on the uplink and downlink directions of the bus station;
s52, grouping the site names of repeated sites with non-overlapping spatial positions, the uplink and downlink directions of the sites and the distances according to a preset distinguishing threshold;
and S53, calculating the geometric center point coordinates of each group by using an arithmetic average value, and taking the geometric center point coordinates as the coordinates of the station.
2. The bus stop uplink and downlink direction judging method based on vector cross product operation according to claim 1, wherein the Requests library is used for acquiring web page contents;
the webpage content comprises picture webpage resources, html webpage resources and json webpage resources;
the Beautiful Soup parsing library is used for parsing the front-end page.
3. The bus stop uplink and downlink direction judging method based on vector cross product operation according to claim 2, wherein the steps of extracting bus line names and corresponding bus uids in bus line json data, performing de-duplication processing, and obtaining the processed bus line names include the following steps:
s21, taking the line name as a search keyword, and extracting the bus line name and the corresponding bus uid from bus line json data;
s22, performing duplicate removal processing on the bus route name and the bus uid to obtain the processed bus route name.
4. The bus stop uplink and downlink direction judging method based on vector cross product operation according to claim 3, wherein the line information comprises a line name, a driving direction, a starting and ending station, a fare, an operation time and line inflection point coordinates;
the site information comprises site uid, name, serial number of the site, name of the site and hundred-degree metric coordinates.
5. The bus stop uplink and downlink direction judging method based on vector cross product operation according to claim 4, wherein the multi-thread coordinate data conversion of the stop and line inflection point coordinates comprises the following steps:
s331, creating a queue for storing conversion contents;
s332, converting the hundred-degree metric coordinate into a hundred-degree longitude and latitude coordinate system by utilizing a navigation map API, converting the hundred-degree longitude and latitude coordinate system into a Mars coordinate system, and converting the Mars coordinate system into a national 2000 coordinate system;
s333, combining the Queue module of Python and the multithread Thread module to perform multithread synchronous coordinate data conversion.
6. The method for determining the uplink and downlink directions of a bus stop based on vector cross product operation according to claim 5, wherein the conversion formula for converting a hundred-degree longitude and latitude coordinate system into a Mars coordinate system is as follows:
X H =z*cos z
Y H =z*sin z
wherein z represents an intermediate vector, X B Representing longitude, Y in navigational map B Representing latitude, X in navigation map H Expressed in Mars coordinate system as longitude, Y H Representing the latitude in the Mars coordinate system.
CN202310692730.1A 2023-06-13 2023-06-13 Bus station uplink and downlink direction judging method based on vector cross product operation Active CN116431882B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310692730.1A CN116431882B (en) 2023-06-13 2023-06-13 Bus station uplink and downlink direction judging method based on vector cross product operation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310692730.1A CN116431882B (en) 2023-06-13 2023-06-13 Bus station uplink and downlink direction judging method based on vector cross product operation

Publications (2)

Publication Number Publication Date
CN116431882A CN116431882A (en) 2023-07-14
CN116431882B true CN116431882B (en) 2023-09-01

Family

ID=87087583

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310692730.1A Active CN116431882B (en) 2023-06-13 2023-06-13 Bus station uplink and downlink direction judging method based on vector cross product operation

Country Status (1)

Country Link
CN (1) CN116431882B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110125587A1 (en) * 2008-06-23 2011-05-26 Double Verify, Inc. Automated Monitoring and Verification of Internet Based Advertising
CN104811357A (en) * 2014-01-26 2015-07-29 广东夏野日用电器有限公司 Internet-of-things system
CN116187605A (en) * 2022-11-24 2023-05-30 洛阳市规划建筑设计研究院有限公司 Bus network line selection method based on GIS technology

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110125587A1 (en) * 2008-06-23 2011-05-26 Double Verify, Inc. Automated Monitoring and Verification of Internet Based Advertising
CN104811357A (en) * 2014-01-26 2015-07-29 广东夏野日用电器有限公司 Internet-of-things system
CN116187605A (en) * 2022-11-24 2023-05-30 洛阳市规划建筑设计研究院有限公司 Bus network line selection method based on GIS technology

Also Published As

Publication number Publication date
CN116431882A (en) 2023-07-14

Similar Documents

Publication Publication Date Title
US20110010650A1 (en) Systems and methods for decluttering electronic map displays
CN102831121A (en) Method and system for extracting webpage information
CN104714961B (en) Recommend method, apparatus and system in a kind of lodging place
US20140019450A1 (en) Text characterization of trajectories
CN113947147B (en) Training method, positioning method and related device of target map model
CN110647607A (en) POI data verification method and device based on picture identification
CN115100643B (en) Monocular vision positioning enhancement method and equipment fusing three-dimensional scene semantics
US20150339848A1 (en) Method and apparatus for generating a composite indexable linear data structure to permit selection of map elements based on linear elements
Sik et al. Implementation of a geographic information system with big data environment on common data model
CN108427710B (en) Enterprise data visualization processing method, server and storage medium
CN116431882B (en) Bus station uplink and downlink direction judging method based on vector cross product operation
US20090019081A1 (en) Integrating data from maps on the world-wide web
CN114186007A (en) High-precision map generation method and device, electronic equipment and storage medium
CN112818072A (en) Tourism knowledge map updating method, system, equipment and storage medium
CN110990651B (en) Address data processing method and device, electronic equipment and computer readable medium
Karimi et al. Geospatial data science techniques and applications
CN111177589A (en) Address information query method, device, equipment and storage medium
CN115687587A (en) Internet of things equipment and space object association matching method, device, equipment and medium based on position information
CN115438719A (en) Data processing method, device, server and storage medium
CN113868518A (en) Thermodynamic diagram generation method and device, electronic equipment and storage medium
CN114004209A (en) PDF format data export method and device, electronic equipment and readable storage medium
CN113849552A (en) Structured data conversion method and device, electronic equipment and medium
CN113761169A (en) Price identification method and device
CN112395320A (en) Building information merging method, device, equipment and computer readable storage medium
CN114238239B (en) Survey delimitation report generation method and system based on python

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant