CN116129643A - Bus travel characteristic identification method, device, equipment and medium - Google Patents

Bus travel characteristic identification method, device, equipment and medium Download PDF

Info

Publication number
CN116129643A
CN116129643A CN202310122281.7A CN202310122281A CN116129643A CN 116129643 A CN116129643 A CN 116129643A CN 202310122281 A CN202310122281 A CN 202310122281A CN 116129643 A CN116129643 A CN 116129643A
Authority
CN
China
Prior art keywords
base station
track data
bus
time
journey
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310122281.7A
Other languages
Chinese (zh)
Other versions
CN116129643B (en
Inventor
李冠耀
邓兴栋
毕瑜菲
刘洋
韩文超
廖顺意
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Urban Planning Survey and Design Institute
Original Assignee
Guangzhou Urban Planning Survey and Design Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Urban Planning Survey and Design Institute filed Critical Guangzhou Urban Planning Survey and Design Institute
Priority to CN202310122281.7A priority Critical patent/CN116129643B/en
Publication of CN116129643A publication Critical patent/CN116129643A/en
Application granted granted Critical
Publication of CN116129643B publication Critical patent/CN116129643B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0108Measuring and analyzing of parameters relative to traffic conditions based on the source of data
    • G08G1/012Measuring and analyzing of parameters relative to traffic conditions based on the source of data from other sources than vehicle or roadside beacons, e.g. mobile networks
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/123Traffic control systems for road vehicles indicating the position of vehicles, e.g. scheduled vehicles; Managing passenger vehicles circulating according to a fixed timetable, e.g. buses, trains, trams
    • G08G1/127Traffic control systems for road vehicles indicating the position of vehicles, e.g. scheduled vehicles; Managing passenger vehicles circulating according to a fixed timetable, e.g. buses, trains, trams to a central station ; Indicators in a central station
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/029Location-based management or tracking services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/20Services signaling; Auxiliary data signalling, i.e. transmitting data via a non-traffic channel
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Traffic Control Systems (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a bus travel characteristic identification method, device, equipment and medium, wherein the method comprises the following steps: preprocessing the base station track data of the user to obtain target base station track data, determining a plurality of stay areas and carrying out track division to obtain user journey track data; calculating the similarity between the journey track data and the bus line sequence to determine a plurality of candidate bus line sequences corresponding to the journey track data; calculating the average speed corresponding to the journey track data, and determining the time interval of each average speed according to the journey starting time; the average speed of each time interval is clustered, a bus speed threshold value of each time interval is determined, a target bus line sequence corresponding to each section of journey track data is determined, and then bus trip characteristics are determined according to the coordinate data of a starting base station and a destination base station in the journey track data and the starting time and the ending time of the journey. The bus travel characteristic identification method and device can improve the identification accuracy of bus travel characteristics.

Description

Bus travel characteristic identification method, device, equipment and medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a bus trip feature recognition method, a device, a terminal device, and a computer readable storage medium.
Background
The rapid expansion of urban population quantity and space scale promotes the rapid increase of resident travel demands, and the problems of traffic pressure and environmental pollution are increasingly caused by the problems of unbalanced urban traffic structure, traffic jam and the like, so that the healthy life of people and the sustainable development of cities are seriously affected. In order to optimize the travel structure and relieve the traffic pressure, the ground buses are gradually one of the main modes of daily travel of residents under the guidance of the green and low-carbon development concepts. As a travel mode with large traffic volume, low carbon, environmental protection and high cost performance, the bus travel has high ratio in daily travel of urban residents, is one of important directions of urban traffic development, accurately identifies and summarizes bus travel characteristics, has important significance in improving travel service level, relieving traffic jam and optimizing urban travel structure, and has important application in the aspects of bus network optimization, dynamic scheduling, urban public resource optimization configuration and the like. The existing bus travel characteristic identification method generally acquires resident travel data in a manual investigation mode, but the method has the problems of long data acquisition period, poor timeliness and small sample size, and is easy to cause resident travel data distortion due to fuzzy memory or misunderstanding of professional concepts of surveyors, so that the bus travel characteristics are difficult to accurately identify in the prior art.
Disclosure of Invention
The invention provides a bus travel characteristic identification method, device, equipment and medium, which are used for solving the problem that the prior art is difficult to accurately identify the bus travel characteristic, taking the characteristics of mobile phone signaling data and bus scenes into consideration, integrating time information and space information, and identifying the bus line sequence by calculating the similarity between each section of journey track data in the user journey track data and each bus line sequence, so as to determine the bus travel characteristic of a user, improve the identification accuracy of the bus travel characteristic, and are suitable for large-scale bus travel characteristic identification scenes.
In order to solve the technical problems, a first aspect of the embodiment of the present invention provides a bus travel feature recognition method, including the following steps:
acquiring base station track data of a user to be identified in a preset identification time period, and preprocessing the base station track data to acquire target base station track data;
determining a plurality of stay areas corresponding to the target base station track data based on a preset base station space threshold and a stay time threshold, and carrying out track division on the target base station track data according to the plurality of stay areas to obtain user journey track data;
Calculating the similarity between each section of journey track data in the user journey track data and each bus line sequence based on a plurality of preset bus line sequences, and determining a plurality of candidate bus line sequences corresponding to each section of journey track data in the user journey track data according to a comparison result of the similarity and a preset similarity threshold;
calculating the average speed corresponding to each section of journey track data according to the coordinate data, journey starting time and journey ending time of a plurality of base stations contained in each section of journey track data, and determining the time interval corresponding to each average speed according to the journey starting time and a plurality of time intervals obtained by dividing one day in advance;
clustering a plurality of average speeds corresponding to each time interval, and determining a speed threshold value of the candidate bus line sequence in each time interval;
determining a target bus line sequence corresponding to each section of journey track data according to the similarity between each section of journey track data and each candidate bus line sequence and the comparison result of the average speed corresponding to each section of journey track data and the speed threshold value in the time interval corresponding to each average speed;
And determining bus trip characteristics of the user to be identified in the preset identification time period according to the target bus line sequence, the starting base station coordinate data, the ending base station coordinate data, the trip starting time and the trip ending time contained in each section of trip track data.
Preferably, the determining a plurality of stay areas corresponding to the target base station track data based on a preset base station space threshold and a stay time threshold specifically includes the following steps:
determining the position information and the residence time of a first candidate residence area according to the coordinate data of a first base station, the base station connection start time and the base station connection end time contained in the target base station track data, and taking the position information and the residence time of the first candidate residence area as the position information and the residence time of the current last candidate residence area;
sequentially calculating the distance between the coordinate data of the ith base station contained in the target base station track data and the position information of the current last candidate stay region according to the preset base station sequence in the target base station track data; wherein i is an integer greater than 1;
When the distance is smaller than the base station space threshold, updating the position information and the stay time of the current last candidate stay area according to the coordinate data of the ith base station and the base station connection ending time;
when the distance is greater than or equal to the base station space threshold, determining the position information and the stay time of a newly added candidate stay area according to the coordinate data of the ith base station, the base station connection starting time and the base station connection ending time, and taking the position information and the stay time of the newly added candidate stay area as the position information and the stay time of the current last candidate stay area;
and screening out the first candidate stay region/the newly added candidate stay region with the stay time smaller than the stay time threshold according to the stay time of the first candidate stay region and the stay time of the newly added candidate stay regions, and determining a plurality of stay regions.
As a preferred scheme, each bus line sequence comprises position information of a plurality of bus stops arranged according to a preset stop sequence;
the step of calculating the similarity between each section of journey track data in the user journey track data and each bus line sequence based on a plurality of preset bus line sequences specifically comprises the following steps:
Based on a plurality of preset bus line sequences, calculating the similarity between each section of journey track data in the user journey track data and each bus line sequence through the following expression:
Figure BDA0004080440760000031
wherein L is k Representing a kth bus route sequence; tra i Representing the ith section of journey trace data; s is S i,j And S is i,j+1 Respectively representing the position information of the jth base station and the (j+1) th base station in the ith section of journey trace data; sim' (S) i,j S i,j+1 ,L k ) Representing sub-tracks (S) i,j S i,j+1 ) With the kth bus line sequence L k Is (S) i,j S i,j+1 ,L k )=max(dis(S i,j ,L k ),dis(S i,j+1 ,L k )),dis(S i,j ,L k ) Represents the distance between the jth base station and the kth bus route sequence, dis (S i,j+1 ,L k ) Represents the distance between the j+1th base station and the kth bus route sequence, dis (S i,j ,L k )=min{dis(S i,j ,B k,q ),q=1,2,…,m},B k,q The position information of the q-th bus station in the k-th bus route sequence is represented; w (w) j,j+1 Representing sub-tracks (S) i,j S i,j+1 ) Is calculated from the following expression:
Figure BDA0004080440760000041
Figure BDA0004080440760000042
representing the base station connection start time of the (j+1) th base station in the ith section of trip track data;
Figure BDA0004080440760000043
Representing the base station connection start time of the j-th base station in the i-th journey trace data;
Figure BDA0004080440760000044
Representing the base station connection start time of the last base station in the ith section of trip track data;
Figure BDA0004080440760000045
The base station connection start time of the 1 st base station in the i-th trip track data is represented.
Preferably, the clustering of the average speeds corresponding to each time interval determines a speed threshold of the candidate bus line sequence in each time interval, and specifically includes the following steps:
clustering a plurality of average speeds corresponding to each time interval by using a K-means clustering algorithm to divide the plurality of average speeds corresponding to each time interval into four speed categories;
calculating the average speed of the plurality of average speeds corresponding to each speed class in each time interval to obtain the class average speed corresponding to each speed class;
the average speeds of the classes corresponding to the four speed classes in each time interval are arranged in a descending order, and the average speed of the subway, the average speed of the automobile, the average speed of the bus and the average speed of the non-motor vehicle in each time interval are respectively determined;
and determining a speed threshold value of the candidate bus line sequence in each time interval according to the average speed of the bus in each time interval.
As a preferred solution, before calculating the similarity between each section of trip track data in the user trip track data and each bus line sequence based on a plurality of preset bus line sequences, the method further includes the following steps:
Calculating the distance between each base station contained in the user journey track data and each bus station contained in a plurality of bus route sequences; when the distance between any one base station and any one bus station is smaller than a preset distance threshold value, taking the any one bus station as an index station of the any one base station, and taking a bus line sequence containing the any one bus station as an index line sequence of the any one base station;
the similarity between each section of journey track data in the user journey track data and each bus line sequence is calculated based on a plurality of preset bus line sequences through the following expression, and the method specifically comprises the following steps:
judging whether any bus line sequence belongs to a first index line sequence and a second index line sequence corresponding to any section of journey track data according to the first index line sequence corresponding to a first base station and the second index line sequence corresponding to a last base station in each section of journey track data;
when the arbitrary bus line sequence does not belong to the first index line sequence or the second index line sequence corresponding to the arbitrary section of journey trace data, judging that the arbitrary bus line sequence is not a candidate bus line sequence corresponding to the arbitrary section of journey trace data;
When any one of theWhen the bus line sequence belongs to the first index line sequence and the second index line sequence corresponding to the arbitrary section of journey track data, the method comprises the following steps of:
Figure BDA0004080440760000051
Figure BDA0004080440760000052
and calculating the similarity between any bus line sequence and any section of journey track data. />
Preferably, the method calculates the parameter dis (S i,j ,B k,q ):
When the (th) bus station B in the (th) bus line sequence k,q Belonging to the jth base station S in the ith section of journey trace data i,j When the corresponding target index station is located, according to the j-th base station S i,j Distance from the target index site to obtain the j-th base station S i,j With the q-th bus station B k,q A distance therebetween;
when the (th) bus station B in the (th) bus line sequence k,q The jth base station S in the path track data not belonging to the ith section i,j When the corresponding target index station is located, the j base station S is set according to a preset distance set value i,j With the q-th bus station B k,q The distance between them is assigned.
As a preferred solution, the preprocessing is performed on the base station track data to obtain target base station track data, which specifically includes the following steps:
When the position information of the ith base station-1 is the same as the position information of the (i+1) th base station in the base station track data, the position information of the ith base station-1 is different from the position information of the ith base station, and the difference between the base station connection time of the (i+1) th base station and the base station connection time of the (i-1) th base station is smaller than a preset time threshold, deleting the position information of the ith base station and the base station connection time to obtain primary noise reduction base station track data;
when saidExist in the track data of the base station with one noise reduction
Figure BDA0004080440760000053
And->
Figure BDA0004080440760000054
When the position information of the ith base station and the connection time of the base station are determined to be error data and deleted, and when the noise reduction base station track data exist +_, the noise reduction base station track data are recorded in the first time>
Figure BDA0004080440760000061
And->
Figure BDA0004080440760000062
When the method is used, the position information of the i-1 th base station and the connection time of the base station are judged to be error data, and deletion is carried out to obtain secondary noise reduction base station track data;
when the position information of the ith-1 base station is the same as the position information of the ith base station in the secondary noise reduction base station track data, combining the position information of the ith-1 base station with the position information of the ith base station, taking the base station connection time of the ith-1 base station as the base station connection start time of the ith-1 base station, and taking the base station connection time of the ith base station as the base station connection end time to obtain the target base station track data;
Wherein i is an integer greater than 1; s is S i-1 Position information indicating the i-1 th base station; s is S i Position information indicating an i-th base station; s is S i+1 Position information indicating the i+1th base station; t is t i-1 Representing the base station connection time of the i-1 th base station; t is t i Representing the base station connection time of the ith base station; t is t i+1 Representing the base station connection time of the (i+1) th base station; ρ represents a preset movement speed threshold.
A second aspect of the embodiment of the present invention provides a bus travel feature recognition device, including:
the preprocessing module is used for acquiring the base station track data of the user to be identified in a preset identification time period, and preprocessing the base station track data to acquire target base station track data;
the track dividing module is used for determining a plurality of stay areas corresponding to the target base station track data based on a preset base station space threshold and a stay time threshold, and carrying out track division on the target base station track data according to the plurality of stay areas to obtain user journey track data;
the candidate bus route sequence acquisition module is used for calculating the similarity between each section of journey track data in the user journey track data and each bus route sequence based on a plurality of preset bus route sequences, and determining a plurality of candidate bus route sequences corresponding to each section of journey track data in the user journey track data according to the comparison result of the similarity and a preset similarity threshold;
The average speed calculation module is used for calculating the average speed corresponding to each section of journey track data according to the coordinate data of a plurality of base stations, the journey starting time and the journey ending time contained in each section of journey track data, and determining the time interval corresponding to each average speed according to the journey starting time and a plurality of time intervals obtained by dividing one day in advance;
the speed threshold determining module is used for clustering a plurality of average speeds corresponding to each time interval and determining a speed threshold of the candidate bus line sequence in each time interval;
the target bus route sequence determining module is used for determining a target bus route sequence corresponding to each section of journey track data according to the similarity between each section of journey track data and each candidate bus route sequence and the comparison result of the average speed corresponding to each section of journey track data and the speed threshold value in the time interval corresponding to each average speed;
and the bus travel characteristic determining module is used for determining the bus travel characteristic of the user to be identified in the preset identification time period according to the target bus line sequence, the starting base station coordinate data, the ending base station coordinate data, the journey starting time and the journey ending time contained in each section of journey track data.
A third aspect of an embodiment of the present invention provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the bus trip feature identification method according to any one of the first aspects when executing the computer program.
A fourth aspect of the embodiment of the present invention provides a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where the computer program when executed controls a device where the computer readable storage medium is located to execute the bus trip feature identification method according to any one of the first aspect.
Compared with the prior art, the method and the device have the advantages that the characteristics of the mobile phone signaling data and bus scenes are considered, the time information and the space information are fused, the bus line sequence identification is carried out by calculating the similarity between each section of journey track data in the user journey track data and each bus line sequence, further the bus trip characteristics of the user are determined, the identification accuracy of the bus trip characteristics can be improved, and the method and the device are suitable for large-scale bus trip characteristic identification scenes.
Drawings
FIG. 1 is a flow chart of a bus travel feature identification method in an embodiment of the invention;
fig. 2 is a schematic structural diagram of a bus travel characteristic recognition device in an embodiment of the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a first aspect of the embodiment of the present invention provides a bus travel feature recognition method, including steps S1 to S7 as follows:
step S1, acquiring base station track data of a user to be identified in a preset identification time period, and preprocessing the base station track data to acquire target base station track data;
step S2, determining a plurality of stay areas corresponding to the target base station track data based on a preset base station space threshold and a stay time threshold, and carrying out track division on the target base station track data according to the plurality of stay areas to obtain user journey track data;
Step S3, calculating the similarity between each section of journey track data in the user journey track data and each bus line sequence based on a plurality of preset bus line sequences, and determining a plurality of candidate bus line sequences corresponding to each section of journey track data in the user journey track data according to a comparison result of the similarity and a preset similarity threshold;
step S4, calculating the average speed corresponding to each section of journey track data according to the coordinate data, journey starting time and journey ending time of a plurality of base stations contained in each section of journey track data, and determining the time interval corresponding to each average speed according to the journey starting time and a plurality of time intervals obtained by dividing one day in advance;
step S5, clustering a plurality of average speeds corresponding to each time interval, and determining a speed threshold value of the candidate bus line sequence in each time interval;
step S6, determining a target bus line sequence corresponding to each section of journey track data according to the similarity between each section of journey track data and each candidate bus line sequence and the comparison result of the average speed corresponding to each section of journey track data and the speed threshold value in the time interval corresponding to each average speed;
And S7, determining bus travel characteristics of the user to be identified in the preset identification time period according to the target bus line sequence, the starting base station coordinate data, the destination base station coordinate data, the journey starting time and the journey ending time contained in each section of journey track data.
Specifically, the mobile phone signaling data is data captured and recorded by a communication base station of an operator when a mobile phone user has an event such as talking, sending a short message, using a network or moving a position, and generally includes: user ID, connection time, connection base station, etc. We use S i =(lat i ,lon i ) Represents a base station, wherein (lat i ,lon i ) Represent S i Longitude and latitude coordinates of (a).
As the user moves, the base station to which the mobile phone is connected will also change, so we can use the base station sequence to represent the movement track Tra of the user, called the base station track data of the user: tra= { (S) 1 ,t 1 ),(S 2 ,t 2 ),…,(S i ,t i ),…(S n ,t n ) S, where S i I-th base station, t, representing user connection i Indicating its connection time.
Because the coverage area of the base stations is wider, overlapping areas exist in the coverage area between the base stations, so that larger data noise and data redundancy exist in the mobile phone signaling data, and the track data of the base stations are required to be preprocessed, so that the track data of the target base stations are obtained.
Further, in this embodiment, a plurality of stay areas corresponding to the track data of the target base station, that is, a plurality of stay areas of the user to be identified in the preset identification time period, are determined based on the preset base station space threshold and the stay time threshold, and track division is performed on the track data of the target base station according to the plurality of stay areas, so that each track contains at most one bus trip of the user.
Further, the present embodiment uses B i =(lon i ,lat i ) Represents the location of a bus stop, where (lon i ,lat i ) Representing longitude and latitude coordinates, and bus line sequence refers to an ordered sequence of bus stops: l (L) k ={B k,1, ,B k,2 ,…,B k,q ,B k,q+1 ,…,B k,m }, wherein B is k,q Of finger typeIs the kth line L k According to the q-th bus station in the network, the similarity between each section of journey track data in the journey track data of the user and each bus line sequence is calculated, and a plurality of bus line sequences with the similarity larger than a preset similarity threshold value are used as a plurality of candidate bus line sequences corresponding to the journey track data according to the comparison result of the similarity and the preset similarity threshold value.
Further, the present embodiment takes into account that the speed variation in different time periods (e.g., peak and off-peak periods) varies greatly, and thus includes bus route sequence L for candidate routes k Trip track data Tra i Calculating the average speed corresponding to the path track data of each section, and distributing the average speed to a time interval divided in advance according to the starting time of the path, wherein the calculation expression of the average speed is as follows:
Figure BDA0004080440760000091
wherein Speed (Tra) i ) Representing an average speed corresponding to the i-th segment of journey trace data; s is S i,j And S is i,j+1 Respectively representing the position information of the jth base station and the (j+1) th base station in the ith section of journey trace data;
Figure BDA0004080440760000092
and->
Figure BDA0004080440760000093
Respectively representing the trip start time and the trip end time.
Further, for the same travel route, the moving speeds corresponding to different travel modes are different, so that in this embodiment, the average speeds corresponding to each time interval are clustered, and a speed threshold of the candidate bus line sequence in each time interval is determined, so as to verify whether the user adopts a certain bus line.
For Tra i Candidate bus route sequence L of (1) k If Tra i Is a flat part of (2)The average speed is greater than the corresponding time interval L k The corresponding speed threshold is considered L k Not Tra i Screening the candidate bus line sequences, if Tra is finally performed i If the number of the candidate bus line sequences is still greater than 1, selecting the candidate bus line sequence with the maximum similarity as the target bus line sequence corresponding to the journey track data.
Further, according to the target bus route sequence, the starting base station coordinate data and the destination base station coordinate data contained in each section of journey track data, respectively calculating two bus stops closest to the starting base station coordinate data and the destination base station coordinate data in the target bus route sequence, respectively serving as a starting stop and a final stop, and respectively taking journey starting time and journey ending time contained in the journey track data as departure time and arrival time, thereby obtaining bus travel characteristics of a user to be identified in a preset identification time period.
Preferably, the determining a plurality of stay areas corresponding to the target base station track data based on a preset base station space threshold and a stay time threshold specifically includes the following steps:
determining the position information and the residence time of a first candidate residence area according to the coordinate data of a first base station, the base station connection start time and the base station connection end time contained in the target base station track data, and taking the position information and the residence time of the first candidate residence area as the position information and the residence time of the current last candidate residence area;
Sequentially calculating the distance between the coordinate data of the ith base station contained in the target base station track data and the position information of the current last candidate stay region according to the preset base station sequence in the target base station track data; wherein i is an integer greater than 1;
when the distance is smaller than the base station space threshold, updating the position information and the stay time of the current last candidate stay area according to the coordinate data of the ith base station and the base station connection ending time;
when the distance is greater than or equal to the base station space threshold, determining the position information and the stay time of a newly added candidate stay area according to the coordinate data of the ith base station, the base station connection starting time and the base station connection ending time, and taking the position information and the stay time of the newly added candidate stay area as the position information and the stay time of the current last candidate stay area;
and screening out the first candidate stay region/the newly added candidate stay region with the stay time smaller than the stay time threshold according to the stay time of the first candidate stay region and the stay time of the newly added candidate stay regions, and determining a plurality of stay regions.
Specifically, initially, the first candidate stay zone contains only
Figure BDA0004080440760000111
Namely the coordinate data of the first base station, the base station connection start time and the base station connection end time, we use +.>
Figure BDA0004080440760000112
Indicating the location of the candidate stay region. Next, the next data is sequentially extracted from the target base station track data>
Figure BDA0004080440760000113
Namely the coordinate data of the second base station, the base station connection start time and the base station connection end time, if +.>
Figure BDA0004080440760000114
(where β is a spatial threshold, e.g., 100 m), then a second base station is added to the candidate stay region while updating the candidate region to +.>
Figure BDA0004080440760000115
Figure BDA0004080440760000116
For its position, timeThe section is->
Figure BDA0004080440760000117
The residence time is +.>
Figure BDA0004080440760000118
Otherwise, the second base station forms a second candidate stay region.
Sequentially transmitting each base station data according to the preset base station sequence in the target base station track data
Figure BDA0004080440760000119
Comparing with the data of the current last candidate stay zone, assuming that the current last candidate stay zone contains
Figure BDA00040804407600001110
The position of the current last candidate stay zone is expressed as an average of the longitudes and latitudes of all the positions:
Figure BDA00040804407600001111
If->
Figure BDA00040804407600001112
Will->
Figure BDA00040804407600001113
The corresponding base station adds the current last candidate stay area and updates the position of the area to +.>
Figure BDA00040804407600001114
The time period is updated to +. >
Figure BDA00040804407600001115
The residence time is updated to +.>
Figure BDA00040804407600001116
Otherwise, form a new candidate stay region->
Figure BDA00040804407600001117
Further, the first candidate stay zone/the newly added candidate stay zone with the stay time smaller than the stay time threshold (e.g. 15 minutes) is screened out according to the stay time of the first candidate stay zone and the stay time of the newly added candidate stay zone, and the stay zones are determined.
And finally, carrying out track division on the track data of the target base station according to the reserved plurality of stay areas. In this example, for a trajectory containing n dwell regions, we define that the 1 st dwell region is the start of the first journey, the n th dwell region is the end of the last journey, the i th dwell region (1<i<n) is the end point of the i-1 th journey, and is the start point of the i-th journey. Each journey includes, in addition to a start point and a destination point, a base station connected between the start point and the destination point. The ith leg of the user, we denote as
Figure BDA00040804407600001118
Figure BDA00040804407600001119
Wherein S is i,j Representing a jth base station in the ith section of journey trace data;
Figure BDA00040804407600001120
Representing the base station connection start time of the j-th base station in the i-th journey trace data;
Figure BDA0004080440760000121
Indicating the base station connection end time of the j-th base station in the i-th journey trace data.
As a preferred scheme, each bus line sequence comprises position information of a plurality of bus stops arranged according to a preset stop sequence;
the step of calculating the similarity between each section of journey track data in the user journey track data and each bus line sequence based on a plurality of preset bus line sequences specifically comprises the following steps:
based on a plurality of preset bus line sequences, calculating the similarity between each section of journey track data in the user journey track data and each bus line sequence through the following expression:
Figure BDA0004080440760000122
wherein L is k Representing a kth bus route sequence; tra i Representing the ith section of journey trace data; s is S i,j And S is i,j+1 Respectively representing the position information of the jth base station and the (j+1) th base station in the ith section of journey trace data; sim' (S) i,j S i,j+1 ,L k ) Representing sub-tracks (S) i,j S i,j+1 ) With the kth bus line sequence L k Is (S) i,j S i,j+1 ,L k )=max(dis(S i,j ,L k ),dis(S i,j+1 ,L k )),dis(S i,j ,L k ) Represents the distance between the jth base station and the kth bus route sequence, dis (S i,j+1 ,L k ) Represents the distance between the j+1th base station and the kth bus route sequence, dis (S i,j ,L k )=min{dis(S i,j ,B k,q ),q=1,2,…,m},B k,q The position information of the q-th bus station in the k-th bus route sequence is represented; w (w) j,j+1 Representing sub-tracks (S) i,j S i,j+1 ) Is calculated from the following expression:
Figure BDA0004080440760000123
Figure BDA0004080440760000124
Representing the base station connection start time of the (j+1) th base station in the ith section of trip track data;
Figure BDA0004080440760000125
Representing the path of the ith segment of travelThe base station connection start time of the j-th base station in the data;
Figure BDA0004080440760000126
Representing the base station connection start time of the last base station in the ith section of trip track data;
Figure BDA0004080440760000127
The base station connection start time of the 1 st base station in the i-th trip track data is represented.
It is worth noting that for two longitude and latitude coordinates l 1 =(lat 1 ,lon 1 ) And l 2 =(lat 2 ,lon 2 ) Is a distance calculation formula representing two longitude and latitude coordinates:
Figure BDA0004080440760000128
wherein a=lat 1 -lat 2 ,b=lon 1 -lon 2 6378.13 is the earth radius in kilometers.
Preferably, the clustering of the average speeds corresponding to each time interval determines a speed threshold of the candidate bus line sequence in each time interval, and specifically includes the following steps:
clustering a plurality of average speeds corresponding to each time interval by using a K-means clustering algorithm to divide the plurality of average speeds corresponding to each time interval into four speed categories;
calculating the average speed of the plurality of average speeds corresponding to each speed class in each time interval to obtain the class average speed corresponding to each speed class;
The average speeds of the classes corresponding to the four speed classes in each time interval are arranged in a descending order, and the average speed of the subway, the average speed of the automobile, the average speed of the bus and the average speed of the non-motor vehicle in each time interval are respectively determined;
and determining a speed threshold value of the candidate bus line sequence in each time interval according to the average speed of the bus in each time interval.
Specifically, for the same travel route, it is generally considered that the speed of the subway is greater than the speed of the car, which is greater than the speed of the bus, which is greater than the speed of the non-motor vehicle. For each time interval of each candidate bus line sequence, the embodiment utilizes a K-means clustering algorithm to cluster a plurality of average speeds corresponding to each time interval, so as to divide the plurality of average speeds corresponding to each time interval into four speed categories, and arrange the four speed categories in descending order, and select and arrange a third average speed as a speed threshold of the candidate bus line sequence in the corresponding time interval according to the assumption that the subway speed is greater than the speed of the automobile, the speed of the automobile is greater than the speed of the bus, and the speed of the bus is greater than the speed of the non-motor vehicle.
It is worth noting that for each interval the speed { speed } 1 ,speed 2 ,…,speed n The K-means clustering algorithm is described as follows:
(1) Firstly, selecting 4 quintuples of speed samples as an initial clustering center;
(2) For a plurality of average speed samples corresponding to the time interval, calculating distances from the average speed samples to 4 clustering centers, and dividing the average speed samples into classes corresponding to the clustering centers with the smallest distances (the speed distance is the speed difference value);
(3) Averaging the speeds in the class for each new class to obtain a new cluster center;
(4) The operations (2) and (3) are repeated until the average speed of each class is no longer changed.
As a preferred solution, before calculating the similarity between each section of trip track data in the user trip track data and each bus line sequence based on a plurality of preset bus line sequences, the method further includes the following steps:
calculating the distance between each base station contained in the user journey track data and each bus station contained in a plurality of bus route sequences; when the distance between any one base station and any one bus station is smaller than a preset distance threshold value, taking the any one bus station as an index station of the any one base station, and taking a bus line sequence containing the any one bus station as an index line sequence of the any one base station;
The similarity between each section of journey track data in the user journey track data and each bus line sequence is calculated based on a plurality of preset bus line sequences through the following expression, and the method specifically comprises the following steps:
judging whether any bus line sequence belongs to a first index line sequence and a second index line sequence corresponding to any section of journey track data according to the first index line sequence corresponding to a first base station and the second index line sequence corresponding to a last base station in each section of journey track data;
when the arbitrary bus line sequence does not belong to the first index line sequence or the second index line sequence corresponding to the arbitrary section of journey trace data, judging that the arbitrary bus line sequence is not a candidate bus line sequence corresponding to the arbitrary section of journey trace data;
when any bus line sequence belongs to the first index line sequence and the second index line sequence corresponding to the random section of journey trace data, the method comprises the following steps of:
Figure BDA0004080440760000141
Figure BDA0004080440760000142
and calculating the similarity between any bus line sequence and any section of journey track data.
Preferably, the method calculates the parameter dis (S i,j ,B k,q ):
When the (th) bus station B in the (th) bus line sequence k,q Belongs to the ith section of journey trackJth base station S in data i,j When the corresponding target index station is located, according to the j-th base station S i,j Distance from the target index site to obtain the j-th base station S i,j With the q-th bus station B k,q A distance therebetween;
when the (th) bus station B in the (th) bus line sequence k,q The jth base station S in the path track data not belonging to the ith section i,j When the corresponding target index station is located, the j base station S is set according to a preset distance set value i,j With the q-th bus station B k,q The distance between them is assigned.
Specifically, bus route identification includes a large number of distance calculations, in order to avoid redundant calculations and improve calculation efficiency, this embodiment establishes two indexes: index sites and index line sequences.
For index stops, firstly, calculating the distance between each base station contained in the user journey track data and each bus stop contained in a plurality of bus line sequences; when the distance between any one base station and any one bus station is smaller than a preset distance threshold value, taking any one bus station as an index station of any one base station, and recording as follows:
Index 1 (S i )={B j |dis(S i ,B j )<1km}
Dis_Index(S i ,B j )=dis(S i ,B j ).
For index line sequences, if bus station B j For base station S i Index station of (B) passing through bus station B j The bus line sequence of the bus is a base station S i Is a sequence of index lines:
Index 2 (S i )={L k |B j ∈Index 1 (S i ),B j ∈L k }
in Index 1 (S i ) And Index of 2 (S i ) On the basis of the above, we can perform the following calculation acceleration operation:
when the similarity between any bus line sequence and any section of journey track data is required to be calculated, judging that any bus line sequence is not a candidate bus line sequence corresponding to any section of journey track data when any bus line sequence does not belong to a first index line sequence or a second index line sequence corresponding to any section of journey track data; i.e.
Figure BDA0004080440760000151
Or->
Figure BDA0004080440760000152
When any bus line sequence belongs to the first index line sequence and the second index line sequence corresponding to the random section of journey trace data, the method comprises the following steps of:
Figure BDA0004080440760000153
Figure BDA0004080440760000154
and calculating the similarity between any bus line sequence and any section of journey track data.
When the parameter dis is required to be calculated (S i,j ,B k,q ) When the kth bus station B in the kth bus line sequence k,q Belonging to the jth base station S in the ith section of journey trace data i,j The corresponding target index site, namely B k,q ∈Index 1 (S i,j ) In time according to the jth base station S i,j Distance Dis_Index from the target Index site (S i,j ,B j ) Obtaining the j-th base station S i,j With the q-th bus station B k,q The distance between the two is not needed to be calculated repeatedly.
When the (th) bus station B in the (th) bus line sequence k,q The jth base station S in the path track data not belonging to the ith section i,j The corresponding target index sites, i.e
Figure BDA0004080440760000165
In this case, the j-th base station S is directly set according to a preset distance set value τ (e.g., 3 km) without calculation i,j With the q-th bus station B k,q The distance between the two is assigned: dis (S) i,j ,B k,q )=τ。
As a preferred solution, the preprocessing is performed on the base station track data to obtain target base station track data, which specifically includes the following steps:
when the position information of the ith base station-1 is the same as the position information of the (i+1) th base station in the base station track data, the position information of the ith base station-1 is different from the position information of the ith base station, and the difference between the base station connection time of the (i+1) th base station and the base station connection time of the (i-1) th base station is smaller than a preset time threshold, deleting the position information of the ith base station and the base station connection time to obtain primary noise reduction base station track data;
When the primary noise reduction base station track data exists
Figure BDA0004080440760000161
And->
Figure BDA0004080440760000162
When the position information of the ith base station and the connection time of the base station are determined to be error data and deleted, and when the noise reduction base station track data exist +_, the noise reduction base station track data are recorded in the first time>
Figure BDA0004080440760000163
And->
Figure BDA0004080440760000164
When the method is used, the position information of the i-1 th base station and the connection time of the base station are judged to be error data, and deletion is carried out to obtain secondary noise reduction base station track data;
when the position information of the ith-1 base station is the same as the position information of the ith base station in the secondary noise reduction base station track data, combining the position information of the ith-1 base station with the position information of the ith base station, taking the base station connection time of the ith-1 base station as the base station connection start time of the ith-1 base station, and taking the base station connection time of the ith base station as the base station connection end time to obtain the target base station track data;
wherein i is an integer greater than 1; s is S i-1 Position information indicating the i-1 th base station; s is S i Position information indicating an i-th base station; s is S i+1 Position information indicating the i+1th base station; t is t i-1 Representing the base station connection time of the i-1 th base station; t is t i Representing the base station connection time of the ith base station; t is t i+1 Representing the base station connection time of the (i+1) th base station; ρ represents a preset movement speed threshold.
In particular, in a mobile communication system, if the signal strengths of two base stations change drastically in a certain area, a mobile phone will switch back and forth between the two base stations, resulting in a so-called "ping-pong effect". For the base station track data of the user, if S i-1 =S i+1 And S is i-1 ≠S i And t i+1 -t i-1 <θ, then consider the data (S i ,t i ) The data caused by the ping-pong effect is eliminated in order to avoid the calculation redundancy and improve the identification accuracy. Where θ is a time threshold, which may be set to 5 seconds, for example.
For the base station track data of the user, the embodiment eliminates the error data by calculating the moving speed of the user. When the primary noise reduction base station track data exists
Figure BDA0004080440760000171
And->
Figure BDA0004080440760000172
When the position information of the ith base station and the connection time of the base station are determined to be error data and deleted, and when the noise reduction base station track data exist +_, the noise reduction base station track data are recorded in the first time>
Figure BDA0004080440760000173
And->
Figure BDA0004080440760000179
And when the position information of the i-1 th base station and the connection time of the base station are judged to be error data, deleting is carried out, and secondary noise reduction base station track data are obtained. Illustratively, ρ is set to 150km/h.
Further, the embodiment merges repeated data, and reduces data redundancy. When the position information of the ith-1 base station is the same as the position information of the ith base station in the secondary noise reduction base station track data, combining the position information of the ith-1 base station with the position information of the ith base station, taking the base station connection time of the ith-1 base station as the base station connection start time of the ith-1 base station, and taking the base station connection time of the ith base station as the base station connection end time, so as to obtain the target base station track data.
The combined target base station track data is expressed as:
Figure BDA0004080440760000174
wherein the method comprises the steps of
Figure BDA0004080440760000175
Representing the combined target base station trajectory data +.>
Figure BDA0004080440760000176
I base station>
Figure BDA0004080440760000177
Time to initiate connection, +_>
Figure BDA0004080440760000178
For the time of its last connection.
According to the bus travel characteristic identification method provided by the embodiment of the invention, the characteristics of the mobile phone signaling data and the bus scene are considered, the time information and the space information are fused, and the bus route sequence identification is performed by calculating the similarity between each section of journey track data in the user journey track data and each bus route sequence, so that the bus travel characteristics of the user are determined, the identification accuracy of the bus travel characteristics can be improved, and the bus travel characteristic identification method is suitable for large-scale bus travel characteristic identification scenes.
In addition, the embodiment of the invention does not depend on any training data, and is more beneficial to the rapid and stable deployment of the technical scheme.
The embodiment of the invention can automatically acquire the speed threshold value of each bus line sequence in different time intervals, and is more suitable for real scenes and more reasonable compared with a method using a unified threshold value.
Referring to fig. 2, a second aspect of the embodiment of the present invention provides a bus travel feature recognition device, including:
The preprocessing module 201 is configured to obtain base station track data of a user to be identified within a preset identification time period, and preprocess the base station track data to obtain target base station track data;
the track dividing module 202 is configured to determine a plurality of stay areas corresponding to the target base station track data based on a preset base station space threshold and a stay time threshold, and perform track division on the target base station track data according to the plurality of stay areas to obtain user trip track data;
the candidate bus route sequence obtaining module 203 is configured to calculate a similarity between each segment of trip track data in the user trip track data and each bus route sequence based on a preset number of bus route sequences, and determine a number of candidate bus route sequences corresponding to each segment of trip track data in the user trip track data according to a comparison result between the similarity and a preset similarity threshold;
the average speed calculation module 204 is configured to calculate an average speed corresponding to each piece of trip track data according to coordinate data of a plurality of base stations, trip start time and trip end time included in each piece of trip track data, and determine a time interval corresponding to each average speed according to the trip start time and a plurality of time intervals obtained by dividing one day in advance;
The speed threshold determining module 205 is configured to cluster a plurality of average speeds corresponding to each time interval, and determine a speed threshold of the candidate bus route sequence in each time interval;
the target bus route sequence determining module 206 is configured to determine a target bus route sequence corresponding to each piece of trip track data according to a similarity between each piece of trip track data and each candidate bus route sequence, and a comparison result between an average speed corresponding to each piece of trip track data and a speed threshold value in a time interval corresponding to each average speed;
the bus trip feature determining module 207 is configured to determine a bus trip feature of the user to be identified in the preset identification time period according to the target bus route sequence, the starting base station coordinate data, the ending base station coordinate data, the trip starting time and the trip ending time contained in each piece of trip track data.
Preferably, the track dividing module 202 is configured to determine a plurality of stay areas corresponding to the target base station track data based on a preset base station space threshold and a stay time threshold, and specifically includes:
Determining the position information and the residence time of a first candidate residence area according to the coordinate data of a first base station, the base station connection start time and the base station connection end time contained in the target base station track data, and taking the position information and the residence time of the first candidate residence area as the position information and the residence time of the current last candidate residence area;
sequentially calculating the distance between the coordinate data of the ith base station contained in the target base station track data and the position information of the current last candidate stay region according to the preset base station sequence in the target base station track data; wherein i is an integer greater than 1;
when the distance is smaller than the base station space threshold, updating the position information and the stay time of the current last candidate stay area according to the coordinate data of the ith base station and the base station connection ending time;
when the distance is greater than or equal to the base station space threshold, determining the position information and the stay time of a newly added candidate stay area according to the coordinate data of the ith base station, the base station connection starting time and the base station connection ending time, and taking the position information and the stay time of the newly added candidate stay area as the position information and the stay time of the current last candidate stay area;
And screening out the first candidate stay region/the newly added candidate stay region with the stay time smaller than the stay time threshold according to the stay time of the first candidate stay region and the stay time of the newly added candidate stay regions, and determining a plurality of stay regions.
As a preferred scheme, each bus line sequence comprises position information of a plurality of bus stops arranged according to a preset stop sequence;
the candidate bus route sequence obtaining module 203 is configured to calculate, based on a plurality of preset bus route sequences, a similarity between each piece of trip track data in the user trip track data and each bus route sequence, where the method specifically includes:
based on a plurality of preset bus line sequences, calculating the similarity between each section of journey track data in the user journey track data and each bus line sequence through the following expression:
Figure BDA0004080440760000191
wherein L is k Representing a kth bus route sequence; tra i Representing the ith section of journey trace data; s is S i,j And S is i,j+1 Respectively representing the position information of the jth base station and the (j+1) th base station in the ith section of journey trace data; sim' (S) i,j S i,j+1 ,L k ) Representing sub-tracks (S) i,j S i,j+1 ) With the kth bus line sequence L k Is (S) i,j S i,j+1 ,L k )=max(dis(S i,j ,L k ),dis(S i,j+1 ,L k )),dis(S i,j ,L k ) Represents the distance between the jth base station and the kth bus route sequence, dis (S i,j+1 ,L k ) Represents the distance between the j+1th base station and the kth bus route sequence, dis (S i,j ,L k )=min{dis(S i,j ,B k,q ),q=1,2,…,m},B k,q The position information of the q-th bus station in the k-th bus route sequence is represented; w (w) j,j+1 Representing sub-tracks (S) i,j S i,j+1 ) Is calculated from the following expression:
Figure BDA0004080440760000201
Figure BDA0004080440760000202
representing the base station connection start time of the (j+1) th base station in the ith section of trip track data;
Figure BDA0004080440760000203
Representing the base station connection start time of the j-th base station in the i-th journey trace data;
Figure BDA0004080440760000204
Representing the base station connection start time of the last base station in the ith section of trip track data;
Figure BDA0004080440760000205
The base station connection start time of the 1 st base station in the i-th trip track data is represented.
Preferably, the speed threshold determining module 205 is configured to cluster a plurality of average speeds corresponding to each time interval, and determine a speed threshold of the candidate bus line sequence in each time interval, which specifically includes:
clustering a plurality of average speeds corresponding to each time interval by using a K-means clustering algorithm to divide the plurality of average speeds corresponding to each time interval into four speed categories;
calculating the average speed of the plurality of average speeds corresponding to each speed class in each time interval to obtain the class average speed corresponding to each speed class;
The average speeds of the classes corresponding to the four speed classes in each time interval are arranged in a descending order, and the average speed of the subway, the average speed of the automobile, the average speed of the bus and the average speed of the non-motor vehicle in each time interval are respectively determined;
and determining a speed threshold value of the candidate bus line sequence in each time interval according to the average speed of the bus in each time interval.
Preferably, the apparatus further comprises an index construction module, configured to:
calculating the distance between each base station contained in the user journey track data and each bus station contained in a plurality of bus route sequences; when the distance between any one base station and any one bus station is smaller than a preset distance threshold value, taking the any one bus station as an index station of the any one base station, and taking a bus line sequence containing the any one bus station as an index line sequence of the any one base station;
the candidate bus route sequence obtaining module 203 is configured to calculate, based on a plurality of preset bus route sequences, a similarity between each piece of trip track data in the user trip track data and each bus route sequence according to the following expression, where the similarity specifically includes:
Judging whether any bus line sequence belongs to a first index line sequence and a second index line sequence corresponding to any section of journey track data according to the first index line sequence corresponding to a first base station and the second index line sequence corresponding to a last base station in each section of journey track data;
when the arbitrary bus line sequence does not belong to the first index line sequence or the second index line sequence corresponding to the arbitrary section of journey trace data, judging that the arbitrary bus line sequence is not a candidate bus line sequence corresponding to the arbitrary section of journey trace data;
when any bus line sequence belongs to the first index line sequence and the second index line sequence corresponding to the random section of journey trace data, the method comprises the following steps of:
Figure BDA0004080440760000211
Figure BDA0004080440760000212
and calculating the similarity between any bus line sequence and any section of journey track data.
As a preferred solution, the candidate bus route sequence obtaining module 203 is further configured to:
when the (th) bus station B in the (th) bus line sequence k,q Belonging to the jth base station S in the ith section of journey trace data i,j When the corresponding target index station is located, according to the j-th base station S i,j Distance from the target index site to obtain the j-th base station S i,j With the q-th bus station B k,q A distance therebetween;
when the (th) bus station B in the (th) bus line sequence k,q The jth base station S in the path track data not belonging to the ith section i,j When the corresponding target index station is located, the j base station S is set according to a preset distance set value i,j With the q-th bus station B k,q The distance between them is assigned.
Preferably, the preprocessing module 201 is configured to preprocess the base station track data to obtain target base station track data, and specifically includes:
when the position information of the ith base station-1 is the same as the position information of the (i+1) th base station in the base station track data, the position information of the ith base station-1 is different from the position information of the ith base station, and the difference between the base station connection time of the (i+1) th base station and the base station connection time of the (i-1) th base station is smaller than a preset time threshold, deleting the position information of the ith base station and the base station connection time to obtain primary noise reduction base station track data;
when the primary noise reduction base station track data exists
Figure BDA0004080440760000221
And->
Figure BDA0004080440760000222
When the position information of the ith base station and the connection time of the base station are determined to be error data and deleted, and when the noise reduction base station track data exist +_, the noise reduction base station track data are recorded in the first time>
Figure BDA0004080440760000223
And->
Figure BDA0004080440760000224
When the method is used, the position information of the i-1 th base station and the connection time of the base station are judged to be error data, and deletion is carried out to obtain secondary noise reduction base station track data;
when the position information of the ith-1 base station is the same as the position information of the ith base station in the secondary noise reduction base station track data, combining the position information of the ith-1 base station with the position information of the ith base station, taking the base station connection time of the ith-1 base station as the base station connection start time of the ith-1 base station, and taking the base station connection time of the ith base station as the base station connection end time to obtain the target base station track data;
wherein i is an integer greater than 1; s is S i-1 Position information indicating the i-1 th base station; s is S i Position information indicating an i-th base station; s is S i+1 Position information indicating the i+1th base station; t is t i-1 Representing the base station connection time of the i-1 th base station; t is t i Representing the base station connection time of the ith base station; t is t i+1 Representing the base station connection time of the (i+1) th base station; ρ represents a preset movement speed threshold.
It should be noted that, the bus travel feature recognition device provided by the embodiment of the present invention can implement all the processes of the bus travel feature recognition method described in any one of the embodiments, and the actions and the implemented technical effects of each module in the device are respectively the same as those of the bus travel feature recognition method described in the embodiment, and are not described herein again.
A third aspect of the embodiment of the present invention provides a terminal device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the bus trip feature identification method according to any one of the embodiments of the first aspect when executing the computer program.
The terminal equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like. The terminal device may include, but is not limited to, a processor, a memory. The terminal device may also include input and output devices, network access devices, buses, and the like. The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is a control center of the terminal device, and which connects various parts of the entire terminal device using various interfaces and lines. The memory may be used to store the computer program and/or module, and the processor may implement various functions of the terminal device by running or executing the computer program and/or module stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
A fourth aspect of the embodiments of the present invention provides a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where when the computer program runs, the device where the computer readable storage medium is controlled to execute the bus trip feature identification method according to any one of the embodiments of the first aspect.
From the above description of the embodiments, it will be clear to those skilled in the art that the present invention may be implemented by means of software plus necessary hardware platforms, but may of course also be implemented entirely in hardware. With such understanding, all or part of the technical solution of the present invention contributing to the background art may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the method described in the embodiments or some parts of the embodiments of the present invention.
While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.

Claims (10)

1. The bus travel characteristic identification method is characterized by comprising the following steps of:
acquiring base station track data of a user to be identified in a preset identification time period, and preprocessing the base station track data to acquire target base station track data;
determining a plurality of stay areas corresponding to the target base station track data based on a preset base station space threshold and a stay time threshold, and carrying out track division on the target base station track data according to the plurality of stay areas to obtain user journey track data;
calculating the similarity between each section of journey track data in the user journey track data and each bus line sequence based on a plurality of preset bus line sequences, and determining a plurality of candidate bus line sequences corresponding to each section of journey track data in the user journey track data according to a comparison result of the similarity and a preset similarity threshold;
calculating the average speed corresponding to each section of journey track data according to the coordinate data, journey starting time and journey ending time of a plurality of base stations contained in each section of journey track data, and determining the time interval corresponding to each average speed according to the journey starting time and a plurality of time intervals obtained by dividing one day in advance;
Clustering a plurality of average speeds corresponding to each time interval, and determining a speed threshold value of the candidate bus line sequence in each time interval;
determining a target bus line sequence corresponding to each section of journey track data according to the similarity between each section of journey track data and each candidate bus line sequence and the comparison result of the average speed corresponding to each section of journey track data and the speed threshold value in the time interval corresponding to each average speed;
and determining bus trip characteristics of the user to be identified in the preset identification time period according to the target bus line sequence, the starting base station coordinate data, the ending base station coordinate data, the trip starting time and the trip ending time contained in each section of trip track data.
2. The bus trip feature recognition method according to claim 1, wherein the determining a plurality of stay areas corresponding to the target base station track data based on a preset base station space threshold and a stay time threshold specifically comprises the following steps:
determining the position information and the residence time of a first candidate residence area according to the coordinate data of a first base station, the base station connection start time and the base station connection end time contained in the target base station track data, and taking the position information and the residence time of the first candidate residence area as the position information and the residence time of the current last candidate residence area;
Sequentially calculating the distance between the coordinate data of the ith base station contained in the target base station track data and the position information of the current last candidate stay region according to the preset base station sequence in the target base station track data; wherein i is an integer greater than 1;
when the distance is smaller than the base station space threshold, updating the position information and the stay time of the current last candidate stay area according to the coordinate data of the ith base station and the base station connection ending time;
when the distance is greater than or equal to the base station space threshold, determining the position information and the stay time of a newly added candidate stay area according to the coordinate data of the ith base station, the base station connection starting time and the base station connection ending time, and taking the position information and the stay time of the newly added candidate stay area as the position information and the stay time of the current last candidate stay area;
and screening out the first candidate stay region/the newly added candidate stay region with the stay time smaller than the stay time threshold according to the stay time of the first candidate stay region and the stay time of the newly added candidate stay regions, and determining a plurality of stay regions.
3. The bus travel feature recognition method according to claim 1, wherein each bus route sequence includes position information of a plurality of bus stops arranged in a preset stop order;
the step of calculating the similarity between each section of journey track data in the user journey track data and each bus line sequence based on a plurality of preset bus line sequences specifically comprises the following steps:
based on a plurality of preset bus line sequences, calculating the similarity between each section of journey track data in the user journey track data and each bus line sequence through the following expression:
Figure FDA0004080440720000021
wherein L is k Representing a kth bus route sequence; tra i Representing the ith section of journey trace data; s is S i,j And S is i,j+1 Respectively representing the position information of the jth base station and the (j+1) th base station in the ith section of journey trace data; sim' i,j S i,j+1 ,L k ) Representing sub-trajectories [ ] i,j S i,j+1 ) With the kth bus line sequence L k Is (S) i,j S i,j+ 1,L k )=max(dis(S i,j ,L k ),dis(S i,j+1 ,L k )),dis(S i,j ,L k ) Represents the distance between the jth base station and the kth bus route sequence, dis (S i,j+1 ,L k ) Represents the distance between the j+1th base station and the kth bus route sequence, dis (S i,j ,L k )=min{dis(S i,j ,B k,q ),q=1,2,...,m},B k,q The position information of the q-th bus station in the k-th bus route sequence is represented; w (w) j,j+1 Representing sub-tracks (S) i,j S i,j+1 ) Is calculated from the following expression:
Figure FDA0004080440720000031
Figure FDA0004080440720000032
representing the base station connection start time of the (j+1) th base station in the ith section of trip track data;
Figure FDA0004080440720000033
Representing the base station connection start time of the j-th base station in the i-th journey trace data;
Figure FDA0004080440720000034
Representing the base station connection start time of the last base station in the ith section of trip track data;
Figure FDA0004080440720000035
The base station connection start time of the 1 st base station in the i-th trip track data is represented.
4. The bus trip feature recognition method as set forth in claim 1, wherein the clustering the average speeds corresponding to each time interval determines a speed threshold of the candidate bus line sequence in each time interval, and specifically includes the following steps:
clustering a plurality of average speeds corresponding to each time interval by using a K-means clustering algorithm to divide the plurality of average speeds corresponding to each time interval into four speed categories;
calculating the average speed of the plurality of average speeds corresponding to each speed class in each time interval to obtain the class average speed corresponding to each speed class;
the average speeds of the classes corresponding to the four speed classes in each time interval are arranged in a descending order, and the average speed of the subway, the average speed of the automobile, the average speed of the bus and the average speed of the non-motor vehicle in each time interval are respectively determined;
And determining a speed threshold value of the candidate bus line sequence in each time interval according to the average speed of the bus in each time interval.
5. A bus travel feature recognition method as claimed in claim 3, wherein the method further comprises the steps of, before calculating the similarity between each piece of trip track data in the user trip track data and each bus line sequence based on a plurality of preset bus line sequences:
calculating the distance between each base station contained in the user journey track data and each bus station contained in a plurality of bus route sequences; when the distance between any one base station and any one bus station is smaller than a preset distance threshold value, taking the any one bus station as an index station of the any one base station, and taking a bus line sequence containing the any one bus station as an index line sequence of the any one base station;
the similarity between each section of journey track data in the user journey track data and each bus line sequence is calculated based on a plurality of preset bus line sequences through the following expression, and the method specifically comprises the following steps:
Judging whether any bus line sequence belongs to a first index line sequence and a second index line sequence corresponding to any section of journey track data according to the first index line sequence corresponding to a first base station and the second index line sequence corresponding to a last base station in each section of journey track data;
when the arbitrary bus line sequence does not belong to the first index line sequence or the second index line sequence corresponding to the arbitrary section of journey trace data, judging that the arbitrary bus line sequence is not a candidate bus line sequence corresponding to the arbitrary section of journey trace data;
when any bus line sequence belongs to the first index line sequence and the second index line sequence corresponding to the random section of journey trace data, the method comprises the following steps of:
Figure FDA0004080440720000041
Figure FDA0004080440720000042
and calculating the similarity between any bus line sequence and any section of journey track data.
6. The bus travel characteristic recognition method according to claim 5, characterized in that the method calculates the parameter dis (S i,j ,B k,q ):
When the (th) bus station B in the (th) bus line sequence k,q Belonging to the jth base station S in the ith section of journey trace data i,j When the corresponding target index station is located, according to the j-th base station S i,j Distance from the target index site to obtain the j-th base station S i,j With the q-th bus station B k,q A distance therebetween;
when the (th) bus station B in the (th) bus line sequence k,q The jth base station S in the path track data not belonging to the ith section i,j When the corresponding target index station is located, the j base station S is set according to a preset distance set value i,j With the q-th bus station B k,q The distance between them is assigned.
7. The bus travel characteristic identification method as set forth in claim 1, wherein the preprocessing of the base station track data to obtain target base station track data specifically includes the steps of:
when the position information of the ith base station-1 is the same as the position information of the (i+1) th base station in the base station track data, the position information of the ith base station-1 is different from the position information of the ith base station, and the difference between the base station connection time of the (i+1) th base station and the base station connection time of the (i-1) th base station is smaller than a preset time threshold, deleting the position information of the ith base station and the base station connection time to obtain primary noise reduction base station track data;
When the primary noise reduction base station track data exists
Figure FDA0004080440720000051
And->
Figure FDA0004080440720000052
When the position information of the ith base station and the connection time of the base station are determined to be error data and deleted, and when the noise reduction base station track data exist +_, the noise reduction base station track data are recorded in the first time>
Figure FDA0004080440720000053
And->
Figure FDA0004080440720000054
When the method is used, the position information of the i-1 th base station and the connection time of the base station are judged to be error data, and deletion is carried out to obtain secondary noise reduction base station track data;
when the position information of the ith-1 base station is the same as the position information of the ith base station in the secondary noise reduction base station track data, combining the position information of the ith-1 base station with the position information of the ith base station, taking the base station connection time of the ith-1 base station as the base station connection start time of the ith-1 base station, and taking the base station connection time of the ith base station as the base station connection end time to obtain the target base station track data;
wherein i is an integer greater than 1; s is S i-1 Position information indicating the i-1 th base station; s is S i Position information indicating an i-th base station; s is S i+1 Position information indicating the i+1th base station; t is t i-1 Representing the base station connection time of the i-1 th base station; t is t i Representing the base station connection time of the ith base station; t is t i+1 Representing the base station connection time of the (i+1) th base station; ρ represents a preset movement speed threshold.
8. The utility model provides a bus trip characteristic recognition device which characterized in that includes:
the preprocessing module is used for acquiring the base station track data of the user to be identified in a preset identification time period, and preprocessing the base station track data to acquire target base station track data;
the track dividing module is used for determining a plurality of stay areas corresponding to the target base station track data based on a preset base station space threshold and a stay time threshold, and carrying out track division on the target base station track data according to the plurality of stay areas to obtain user journey track data;
the candidate bus route sequence acquisition module is used for calculating the similarity between each section of journey track data in the user journey track data and each bus route sequence based on a plurality of preset bus route sequences, and determining a plurality of candidate bus route sequences corresponding to each section of journey track data in the user journey track data according to the comparison result of the similarity and a preset similarity threshold;
the average speed calculation module is used for calculating the average speed corresponding to each section of journey track data according to the coordinate data of a plurality of base stations, the journey starting time and the journey ending time contained in each section of journey track data, and determining the time interval corresponding to each average speed according to the journey starting time and a plurality of time intervals obtained by dividing one day in advance;
The speed threshold determining module is used for clustering a plurality of average speeds corresponding to each time interval and determining a speed threshold of the candidate bus line sequence in each time interval;
the target bus route sequence determining module is used for determining a target bus route sequence corresponding to each section of journey track data according to the similarity between each section of journey track data and each candidate bus route sequence and the comparison result of the average speed corresponding to each section of journey track data and the speed threshold value in the time interval corresponding to each average speed;
and the bus travel characteristic determining module is used for determining the bus travel characteristic of the user to be identified in the preset identification time period according to the target bus line sequence, the starting base station coordinate data, the ending base station coordinate data, the journey starting time and the journey ending time contained in each section of journey track data.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the bus trip feature identification method according to any one of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program, when run, controls a device in which the computer readable storage medium is located to perform the bus travel feature identification method according to any one of claims 1 to 7.
CN202310122281.7A 2023-02-14 2023-02-14 Bus travel characteristic identification method, device, equipment and medium Active CN116129643B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310122281.7A CN116129643B (en) 2023-02-14 2023-02-14 Bus travel characteristic identification method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310122281.7A CN116129643B (en) 2023-02-14 2023-02-14 Bus travel characteristic identification method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN116129643A true CN116129643A (en) 2023-05-16
CN116129643B CN116129643B (en) 2024-07-12

Family

ID=86307983

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310122281.7A Active CN116129643B (en) 2023-02-14 2023-02-14 Bus travel characteristic identification method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN116129643B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117061990A (en) * 2023-07-12 2023-11-14 荣耀终端有限公司 Identification method, terminal equipment and storage medium for transportation means for traveling
CN117061990B (en) * 2023-07-12 2024-10-22 荣耀终端有限公司 Identification method, terminal equipment and storage medium for transportation means for traveling

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111653099A (en) * 2020-06-10 2020-09-11 南京瑞栖智能交通技术产业研究院有限公司 Bus passenger flow OD obtaining method based on mobile phone signaling data
CN112530166A (en) * 2020-12-01 2021-03-19 江苏欣网视讯软件技术有限公司 Method and system for analyzing and identifying bus station for getting on or off bus during traveling based on signaling data and big data
CN112543427A (en) * 2020-12-01 2021-03-23 江苏欣网视讯软件技术有限公司 Method and system for analyzing and identifying urban traffic corridor based on signaling track and big data
CN112601187A (en) * 2020-12-10 2021-04-02 江苏欣网视讯软件技术有限公司 Bus frequent passenger prediction method and system based on mobile phone signaling
WO2021159865A1 (en) * 2020-02-11 2021-08-19 罗普特科技集团股份有限公司 Data calibration-based bus route prediction method and system
WO2021237812A1 (en) * 2020-05-29 2021-12-02 南京瑞栖智能交通技术产业研究院有限公司 Urban travel mode comprehensive identification method based on mobile phone signaling data and including personal attribute correction

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021159865A1 (en) * 2020-02-11 2021-08-19 罗普特科技集团股份有限公司 Data calibration-based bus route prediction method and system
WO2021237812A1 (en) * 2020-05-29 2021-12-02 南京瑞栖智能交通技术产业研究院有限公司 Urban travel mode comprehensive identification method based on mobile phone signaling data and including personal attribute correction
CN111653099A (en) * 2020-06-10 2020-09-11 南京瑞栖智能交通技术产业研究院有限公司 Bus passenger flow OD obtaining method based on mobile phone signaling data
CN112530166A (en) * 2020-12-01 2021-03-19 江苏欣网视讯软件技术有限公司 Method and system for analyzing and identifying bus station for getting on or off bus during traveling based on signaling data and big data
CN112543427A (en) * 2020-12-01 2021-03-23 江苏欣网视讯软件技术有限公司 Method and system for analyzing and identifying urban traffic corridor based on signaling track and big data
CN112601187A (en) * 2020-12-10 2021-04-02 江苏欣网视讯软件技术有限公司 Bus frequent passenger prediction method and system based on mobile phone signaling

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘旭;陈云波;施昆;黄强;: "结合Canopy-K-means算法和出租车轨迹数据的公交车站预测方法", 测绘通报, no. 11, 25 November 2018 (2018-11-25) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117061990A (en) * 2023-07-12 2023-11-14 荣耀终端有限公司 Identification method, terminal equipment and storage medium for transportation means for traveling
CN117061990B (en) * 2023-07-12 2024-10-22 荣耀终端有限公司 Identification method, terminal equipment and storage medium for transportation means for traveling

Also Published As

Publication number Publication date
CN116129643B (en) 2024-07-12

Similar Documents

Publication Publication Date Title
CN106875066B (en) Vehicle travel behavior prediction method, device, server and storage medium
CN110008413B (en) Traffic travel problem query method and device
CN108362293B (en) Vehicle track matching method based on key point technology
CN103177575B (en) System and method for dynamically optimizing online dispatching of urban taxies
CN111653099B (en) Bus passenger flow OD obtaining method based on mobile phone signaling data
US20180349792A1 (en) Method and apparatus for building a parking occupancy model
CN111985710A (en) Bus passenger trip station prediction method, storage medium and server
CN108320501A (en) Public bus network recognition methods based on user mobile phone signaling
CN107796411A (en) Navigation system and its operating method with preference analysis mechanism
CN110598917B (en) Destination prediction method, system and storage medium based on path track
CN104819726A (en) Navigation data processing method, navigation data processing device and navigation terminal
CN108537352A (en) A kind of data processing method, device and server
CN110399445B (en) Method, device and equipment for processing interest points
CN106303953A (en) A kind of people flow rate statistical system and method
CN114363842B (en) Bus passenger departure station prediction method and device based on mobile phone signaling data
CN112418518A (en) Passenger flow prediction method and device based on time characteristic weight and network topology
CN111222381A (en) User travel mode identification method and device, electronic equipment and storage medium
JP7321400B1 (en) Estimation device and estimation method
CN105387854A (en) Navigation system with content delivery mechanism and method of operation thereof
CN113079463A (en) Tourist attraction tourist travel activity identification method based on mobile phone signaling data
KR20050015306A (en) Method and System for Providing Routing Information with Heterogeneous Public Transportation Vehicles
CN113573238B (en) Method for identifying trip passenger trip chain based on mobile phone signaling
CN111445715B (en) Intelligent city traffic scheduling method and scheduling equipment based on Internet of things communication
CN116129643B (en) Bus travel characteristic identification method, device, equipment and medium
CN111343582A (en) Method and device for preventing mileage cheating

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant