WO2020199524A1 - Method for matching ride-sharing travellers based on network representation learning - Google Patents

Method for matching ride-sharing travellers based on network representation learning Download PDF

Info

Publication number
WO2020199524A1
WO2020199524A1 PCT/CN2019/107011 CN2019107011W WO2020199524A1 WO 2020199524 A1 WO2020199524 A1 WO 2020199524A1 CN 2019107011 W CN2019107011 W CN 2019107011W WO 2020199524 A1 WO2020199524 A1 WO 2020199524A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
sharing
ride
network
driver
Prior art date
Application number
PCT/CN2019/107011
Other languages
French (fr)
Chinese (zh)
Inventor
唐蕾
赵亚玲
刘子航
段宗涛
Original Assignee
长安大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 长安大学 filed Critical 长安大学
Publication of WO2020199524A1 publication Critical patent/WO2020199524A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry

Definitions

  • the present invention belongs to the field of group recommendation, and specifically relates to a method for matching online appointment shared travelers based on network representation learning.
  • the purpose of the present invention is to provide a method for matching online appointment shared travel personnel based on network representation learning.
  • the invention adopts the meta-path theory in the heterogeneous information network, converts the matching between the driver and the passenger into the similarity measurement problem of the nodes in the heterogeneous ride-sharing network, and establishes the ride-sharing behavior characteristic model based on the "driver-passenger" structure , And according to different ride sharing situations, the model is divided into end ride sharing and along the way ride sharing.
  • the model starts from the departure time of the driver and the passenger, and the location of the boarding and alighting, combined with the skip-gram model in machine learning, analyzes the relationship between features, infers the possibility of ride sharing between drivers and passengers, and provides support for high-quality services for ride sharing. .
  • a method for matching online appointment shared travel personnel based on network representation learning including the following steps:
  • Step 1 Carpool classification
  • carpooling is divided into two categories: the first type is end-point carpooling, where the starting and ending points of the passengers are on the driver's original route;
  • the other type is shared rides along the way. The starting and ending points of passengers are not on the original path of the driver. Passengers need to walk from the starting point to the boarding point, then reach a shared ride, and then walk from the point of disembarkation to the destination.
  • the shared path is Part of the passenger trajectory;
  • Step 2 Build a heterogeneous ride-sharing network
  • the request information of the driver and the passenger is expressed as a heterogeneous ride-sharing network, and the passenger and the driver are connected by location and time information to construct a heterogeneous ride-sharing network.
  • the types of nodes in the heterogeneous ride-sharing network include users, Location, time period and activity;
  • Step 3 Use the network representation learning model to perform representation learning on the heterogeneous co-multiplying network to obtain the low-dimensional vector representation of the user node;
  • Step 4 Calculate the cosine similarity between the driver and the passenger node according to the low-dimensional vector representation of the user node, and sort the calculated cosine similarity values from large to small, and return the top k passengers with the highest similarity to the driver as possible Passengers sharing a ride, reach a shared ride.
  • a further improvement of the present invention is that, in step 1, the request information of the driver and the passenger includes the driver's starting point and ending point, departure time, driver trajectory, getting on and off position, passenger starting point and ending point, and getting on and off time.
  • a further improvement of the present invention is that in step 2, the types of nodes in the heterogeneous ride-sharing network include users, locations, time periods, and activities.
  • step 3 the process of performing representation learning specifically includes the following two steps:
  • Generating node sequence set The meta-path guides the node to walk in the heterogeneous co-multiplication network, and generates a fixed-length node sequence set;
  • a further improvement of the present invention lies in that in step 1), for the end point co-multiplication, construct a meta-path with a structure of ULTLU; for co-multiplication along the way, under the same time period constraints, construct a meta-path with a ULU structure.
  • step 3 the specific process of representation learning is as follows:
  • N a (v u) v u neighbor nodes is the set of nodes, Is a set of node types, p(v jc ,...,v j+c
  • conditional probability of the context is log p(v jc ,...,v j+c
  • v u ; ⁇ ) uses the softmax function to define the conditional probability of the context node v k of the given node v u ;
  • the representation vector is generated according to the conditional probability of the context node v k of the given node v u , and then negative sampling is used to optimize the representation vector to obtain the low-dimensional vector representation of each user node in the heterogeneous co-multiplication network.
  • a further improvement of the present invention is that the walk probability of random walk as follows:
  • a represents the node type, Is a type a node on the path, Represents along a predefined meta path
  • Formula (2) indicates that the selected element paths are all symmetrical element paths.
  • a further improvement of the present invention is that the specific process of using negative sampling to optimize the representation vector is as follows:
  • the function indicates whether v'u is a context neighbor node v k .
  • the present invention has the following beneficial effects:
  • the present invention uses the location and time information of passengers and drivers to construct a heterogeneous ride sharing network, and distinguishes two different ride sharing types. For these two types, symmetrical meta-paths are selected to be generated, and different restrictions are added to the generation of meta-path sequence sets for different co-multiplication types.
  • the negative sampling skip-gram is used to represent the sequence set to generate the representation vector, and finally the cosine similarity is used to calculate the similarity between the user representation vectors for common multiplication recommendation.
  • the ride sharing recommendation method proposed by the present invention is more reliable than the traditional method using only distance recommendation, has intuitive semantic comprehension, can accurately find potential ride sharing users, and provide them with faster and more convenient services.
  • Fig. 1 is a topological structure diagram of the heterogeneous multiplication network constructed by the present invention.
  • the method of the present invention for online appointment sharing traveler matching based on network representation learning includes the following steps:
  • Step 1 Carpool classification
  • carpooling is divided into two categories: the first type is end-point carpooling, where the starting and ending points of the passengers are on the driver's original route;
  • the other type is shared rides along the way. The starting and ending points of passengers are not on the original path of the driver. Passengers need to walk from the starting point to the boarding point, then reach a shared ride, and then walk from the point of disembarkation to the destination.
  • the shared path is Part of the passenger trajectory;
  • the driver and passenger request information includes the driver's starting point and destination, departure time, driver's trajectory, alighting position, passenger starting and ending point, and alighting time.
  • Step 2 Build a heterogeneous ride-sharing network
  • the request information of the driver and the passenger is expressed as a heterogeneous ride-sharing network, and the passenger and the driver are connected by location and time information to construct a heterogeneous ride-sharing network.
  • the types of nodes in the heterogeneous ride-sharing network include users, Location, time period and activity;
  • Step 3 Use the network representation learning model to perform representation learning on the heterogeneous co-multiplying network to obtain the low-dimensional vector representation of the user node;
  • step three the process of representation learning specifically includes the following two steps:
  • ULTLU meta path For endpoint sharing, the ULTLU meta path is used, which means that passengers and drivers arriving at the same place at the same time can share the ride.
  • the first U in ULTLU represents the user node, here represents the driver, and the second U represents the user node , Here represents the passenger, L represents the boarding location or the drop-off location, and T represents the time period of the corresponding location.
  • L represents the boarding location or the drop-off location
  • T represents the time period of the corresponding location.
  • this meta path means that the driver and the passenger are in the time period
  • the best meeting point is obtained as the boarding location L, which can be shared.
  • step three the specific process of representation learning is as follows:
  • a represents the node type, Is a type a node on the path, Represents along a predefined meta path
  • Formula (2) indicates that the selected element paths are all symmetrical element paths.
  • the method will select the node set v jc ,..., v j+c as neighbor nodes, c is half of the window size in skip-gram; therefore, given a user node v u , the goal of the skip-gram model is to maximize the conditional probability of context with heterogeneous neighbor nodes:
  • N a (v u) v u neighbor nodes is the set of nodes, Is a set of node types, p(v jc ,...,v j+c
  • conditional probability of the context is log p(v jc ,...,v j+c
  • v u ; ⁇ ) uses the softmax function to define the conditional probability of the context node v k of the given node v u .
  • the representation vector is generated according to the conditional probability of the context node v k of the given node v u , and then negative sampling is used to optimize the representation vector to obtain the low-dimensional vector representation of each user node in the heterogeneous co-multiplication network.
  • the function indicates whether v'u is a context neighbor node v k .
  • Step 4 Calculate the cosine similarity between the driver and the passenger node according to the low-dimensional vector representation of the user node, and sort the calculated cosine similarity values from large to small, and return the top k passengers with the highest similarity to the driver as possible Passengers sharing a ride, reach a shared ride.
  • the size of k is determined by the maximum number of passengers that the driver can carry.
  • Step 1 Data classification and extraction
  • the experimental data of the present invention comes from the local area data of Chengdu provided by the Didi Gaia Data Open Program, including the driver's GPS trajectory data and passenger order data.
  • the driver and passengers are numbered, and the passenger's departure place and destination are extracted And the driver’s trajectory and corresponding time, where the first point of the driver’s trajectory is used as the driver’s starting point, and the trajectory end point is used as the driver’s destination.
  • the present invention divides the ride sharing types into end ride sharing and along the way ride sharing.
  • the starting point and ending point of the passengers are on the driver's original path; along the way, the starting point and ending point of the passengers are not on the original path of the driver, and the passengers need to walk from the starting point to the boarding point, and then reach a common Ride, and then walk from the drop-off point to the destination.
  • the shared path trajectory is only part of the passenger trajectory.
  • d a driver and p as a passenger.
  • Each driver and passenger has their own origin O and destination D.
  • x is the passenger's boarding position and y is the passenger's alighting position.
  • TT(O,D) represents the travel time required from the departure place O of the passenger p or the driver d to the destination D
  • DTime d represents the departure time of the driver d from a certain location
  • DTime p represents the departure of the passenger p from a certain location time.
  • the driver d can use p as a shared passenger for end-point sharing:
  • TT(O p ,D p ) represents the travel time required from the departure place O of the passenger p to the destination D
  • TT(O d ,D d ) represents the travel time required by the driver d’s departure place O to the destination D
  • Shared rides along the route are a typical way of travel. Passengers walk from the departure point to the best meeting point, travel at the departure time, and after a period of time sharing, choose the driver's track to get off at the location closest to the passenger's destination, and then walk to destination.
  • MT p is used here to indicate the maximum walking time of passenger p from the starting point of departure to the boarding point and the final destination to the alighting point. Therefore, for ride sharing along the way, the driver can allow the passenger to ride sharing if the following conditions are met:
  • TT (x p , y p ) represents the travel time required for passenger p's departure O to destination D.
  • the present invention uses the Dijkstra algorithm to obtain the best meeting point between the driver and the passenger, and uses this point as the passenger's boarding position.
  • Step 2 Preprocess the location and time data extracted in Step 1 to construct a shared heterogeneous information network:
  • Heterogeneous information networks are composed of different but related nodes connected by the edges of the networks. "Different” here means-the vertices of the network have different types, and “related” means that two nodes have a specific type of interaction or relationship.
  • the heterogeneous ride-sharing network constructed by the present invention is shown in Figure 1.
  • the present invention constructs the same network mode. Both types of carpooling are based on the time and location constraints of the driver and passenger to match the driver and passenger.
  • the node types in this network mode include: location (L ), time (T), activity (A) and user (ie passenger or driver) (U).
  • the user type node (U) includes drivers and passengers. Passengers are numbered sequentially starting from 1 with p, and drivers are numbered sequentially starting from 1 with d; the present invention serializes and obfuscates the time.
  • Hour divides 24 hours a day into 48 time periods, specifically starting from 00:00:00, every half an hour is a time period, the time period is numbered (1 ⁇ 48), and the number corresponding to each time period As a time type node (T).
  • T time type node
  • the passenger's O and D are used as the location type node, because in the end-point sharing, the passenger's OD is On the driver’s trajectory, the passenger’s boarding and disembarking point is the passenger’s OD point.
  • the best meeting point between the driver and the passenger is used as the position type node, because in the shared ride, both the passenger and the driver need to reach the most A good meeting point can achieve a ride-sharing.
  • the activity type node (A) the present invention obtains the type of each location, including real estate, educational institution, etc., through Baidu API conversion.
  • Link types in heterogeneous network Including the occurrence of a certain activity in a certain location, the path between locations and the range of time periods.
  • For each location l ⁇ L there is a set of links for users, activities and a set of departure times belong to the link type It can also contain information about the route connecting the two locations, as well as information on the time interval for some passengers to get to the meeting point for the ride.
  • the network can construct a meta-path like ULU to show the relationship between different types of nodes.
  • UL link indicates that a user starts from a certain location or intends to reach a certain destination, showing a staying relationship
  • LT link can indicate that a behavior of starting from or arriving at a certain place occurs during a certain period of time;
  • Step 3 According to different types of sharing, select the corresponding meta-path, use the network representation learning model to learn the representation of the heterogeneous sharing network, and obtain the low-dimensional vector representation of the user node;
  • the purpose of constructing a heterogeneous ride-sharing network is to establish a connection between drivers and passengers.
  • a symmetric meta-path is used here to express this relationship.
  • the meta path is defined as follows:
  • Metapath is in network mode Path defined above and end with The form represents the composition relationship between two given node types.
  • the meta path is usually used in a symmetrical manner, that is, its first node V 1 and the last node V m are of the same type.
  • the ULTLU meta-path For end-point ride sharing, the ULTLU meta-path is used, which means that passengers and drivers who arrive at the same place at the same time can ride together. Among them, the two U are the driver and the passenger respectively, L represents the boarding location or the getting off location, and T represents the time period when getting on and off the vehicle at the corresponding location. For shared rides along the way, since the departure place and destination of the driver and the passenger are different, the passengers still need to spend extra time walking to the boarding point and destination before and after getting off the bus. Multiplication is more complicated. Based on the ULU meta-path, the L in the meta-path represents the best meeting point. It means that the driver and passengers who can reach the same meeting point at the same time can reach a shared ride.
  • the trajectory network containing the corresponding driver and passenger trajectories is first used as input, and the shortest path between the two ODs on the road network is obtained through the algorithm, and it is taken as the best meeting point .
  • Network representation learning can represent the nodes in the network as low-dimensional dense real-valued vector forms.
  • the present invention inputs the node sequence set based on the symmetric element path, encodes the nodes in a one-hot encoding method as the initial vector, and then performs low Conversion of dimensional vectors.
  • the network representation learning is divided into two stages: first, the first stage is the random walk of the meta-path instructing the node in the heterogeneous co-multiplication network, and further generating a fixed-length node sequence set.
  • the next hop node under the guidance of the ULTLU meta path is of type L, and the jump probability is shown in formula (1), L
  • the next node of the type is T type, and the length of each node sequence generated by the present invention is 5.
  • the node in it is represented by a node type, Is a type a node on the path, Represents along a predefined meta path
  • Formula (2) represents the meta path used Is a symmetric meta path.
  • the node sequence set of length 5 is input into the skip-gram model for training, and the vector representation of the driver and passenger nodes is obtained.
  • the skip-gram model is used to construct a feature vector for each user type U node, and negative sampling is used in the skip-gram model to optimize the representation vector.
  • this method will select the node set v jc ,..., v j+c of type t as neighbor nodes, and c is skip -Half of the window size set in gram. Therefore, given a user node v u , the goal of the skip-gram model is to maximize the conditional probability of a context with heterogeneous neighbor nodes:
  • N a (v u) v u neighbor nodes is the set of nodes, Is a collection of node types.
  • v u ; ⁇ ) can be further decomposed into Among them, p(v k
  • the negative sampling method is also used to optimize the representation vector. This method can remove the influence of irrelevant nodes on the target vector, making the distinction between vectors of different categories more obvious.
  • the likelihood function of this method is as follows:
  • the function indicates whether v'u is a context neighbor node v k .
  • the generated vector has a dimension of 128 dimensions, wherein the window size c used by skip-gram is set to 2, and the number of samples for negative sampling is 5.
  • Step 4 According to the low-dimensional vector representation of the user node, the similarity between the driver and the passenger is calculated through the cosine similarity algorithm, and the similarity results are ranked from large to small, so the top k passengers with the highest similarity value to the driver are obtained as a Passengers sharing a ride, reach a shared ride.
  • the size of k is determined by the maximum number of passengers that the driver can carry.
  • X i is the node of vector v i
  • X j is the vector of node v j; when ⁇ X i
  • 1, cosine similarity is equivalent to the Euclidean distance, which allows using approximate nearest neighbor searching after normalization can be efficiently positioned to the first k similar node of a given node v i (passenger).
  • a given user has previously learned (the driver) vector, which is calculated by the cosine between vectors represented by X i and X j to find the similarity of the driver given a potential passenger.
  • the k passengers with the greatest similarity are identified as ridesharing participants, and then the candidates can be ranked and the ridesharing type can be observed.
  • the present invention uses the trajectory data set of Didi users to construct a heterogeneous co-multiplication heterogeneous network model, and classifies different types of co-multiplication.
  • the definition of two types of meta-paths is proposed, and the meta-path sequence set is generated in the co-multiplication network, and skip-gram is negatively sampled to generate the user's representation vector.
  • the cosine similarity algorithm is used to realize the similarity calculation between users.
  • the top k similarity predicts the passengers who can share the ride.
  • the ride sharing recommendation method proposed by the present invention is more reliable than the traditional method that only uses distance recommendation, has intuitive semantic comprehension, can accurately find potential ride sharing users, and provide them with faster and more convenient services.

Landscapes

  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Algebra (AREA)
  • Artificial Intelligence (AREA)
  • Computational Mathematics (AREA)
  • Development Economics (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Tourism & Hospitality (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Traffic Control Systems (AREA)

Abstract

A method for matching ride-sharing travellers based on network representation learning: on the basis of the relationship between the start point and end point of a passenger and the original route of a driver, dividing ride sharing into two types: the first type is end point ride-sharing and the other type is en route ride-sharing; the passenger needs to walk from the start point to the boarding point to implement ride-sharing, and then walk from the drop off point to the target destination, the ride-sharing route trajectory being part of the passenger trajectory; constructing a heterogeneous ride-sharing network, and using a network representation learning model to perform representation learning on the heterogeneous ride-sharing network to obtain a low-dimensional vector representation of user nodes; calculating the cosine similarity between driver and passenger nodes, sorting the calculated cosine similarity values from high to low, returning the top k passengers having the highest similarity value to the driver as passengers that can implement ride sharing, and implementing ride sharing. The present ride-sharing matching method is more reliable than traditional methods that only use distance recommendations, has intuitive semantic comprehensiveness, and can accurately discover potential ride-sharing users, providing same with a faster and more convenient service.

Description

一种基于网络表示学习的网约共享出行人员匹配方法A Method for Matching Travelers Based on Network Representation Study 技术领域Technical field
本发明属于群体推荐领域,具体涉及一种基于网络表示学习的网约共享出行人员匹配方法。The present invention belongs to the field of group recommendation, and specifically relates to a method for matching online appointment shared travelers based on network representation learning.
背景技术Background technique
随着网约车平台和APP的日益发展,共乘出行逐渐被大众认知和接受,同时随着相关出行技术的发展,例如出行路线匹配、出行群体发现、路线规划、出行行为分析等相关工作的研究,共乘也成为了一种方便可行的出行模式。通过对共乘匹配的研究可以为用户提供更好的出行体验和更高的出行效率。With the increasing development of online car-hailing platforms and APPs, shared travel is gradually recognized and accepted by the public, and with the development of related travel technologies, such as travel route matching, travel group discovery, route planning, travel behavior analysis and other related work According to the research, ride sharing has also become a convenient and feasible mode of travel. The research on carpool matching can provide users with better travel experience and higher travel efficiency.
网络表示学习以及共乘出行的使用与影响使得共乘匹配方法的研究受到了很多的关注。在共乘匹配的研究中,主要的问题是如何将乘客准确地分配给司机,以及如何将不同出行群体之间的共乘匹配最优化。传统的匹配方法仅仅依赖于乘客和驾驶员之间的地理距离,并没有考虑到乘客或驾驶员与其他特征的关系,比如出行者和目的地、时间之间的关系。而异构信息网络能为共乘匹配提供更加有效的分析方法。因此通过利用用户、时间与地点位置信息构建的异构网络来从共乘信息中学习潜在语义,并且从用户轨迹和情感中提取特征,可以更好地为用户提供合适的共乘匹配关系。Network representation learning and the use and influence of ride sharing have made the research on ride sharing matching methods receive a lot of attention. In the research of carpool matching, the main problem is how to accurately allocate passengers to drivers, and how to optimize the carpool matching among different travel groups. The traditional matching method only relies on the geographic distance between the passenger and the driver, and does not consider the relationship between the passenger or the driver and other characteristics, such as the relationship between the traveler and the destination, and time. The heterogeneous information network can provide a more effective analysis method for sharing and matching. Therefore, by using heterogeneous networks constructed with user, time and location information to learn latent semantics from sharing information, and extracting features from user trajectories and emotions, it is possible to better provide users with appropriate sharing matching relationships.
发明内容Summary of the invention
针对共乘出行的用户匹配问题,本发明的目的是提出一种基于网络表示学习的网约共享出行人员匹配方法。本发明采用异构信息网络中的元路径理论,将司机与乘客之间的匹配转换为异质共乘网络中节点的相似性度量问题,建立基于“司机-乘客”结构的共乘行为特征模型,并根据不同的共乘情况将模型分为端点共乘和沿途共乘。模型从司机和乘客各自的出发时间、上下车位置出发,结合机器学习中的skip-gram模型,分析特征间关系,推理司机和乘 客间的共乘可能性,为共乘出行的优质服务提供支持。Aiming at the user matching problem of shared travel, the purpose of the present invention is to provide a method for matching online appointment shared travel personnel based on network representation learning. The invention adopts the meta-path theory in the heterogeneous information network, converts the matching between the driver and the passenger into the similarity measurement problem of the nodes in the heterogeneous ride-sharing network, and establishes the ride-sharing behavior characteristic model based on the "driver-passenger" structure , And according to different ride sharing situations, the model is divided into end ride sharing and along the way ride sharing. The model starts from the departure time of the driver and the passenger, and the location of the boarding and alighting, combined with the skip-gram model in machine learning, analyzes the relationship between features, infers the possibility of ride sharing between drivers and passengers, and provides support for high-quality services for ride sharing. .
为实现上述目的,本发明采用如下的技术方案:In order to achieve the above objectives, the present invention adopts the following technical solutions:
一种基于网络表示学习的网约共享出行人员匹配方法,包括以下步骤:A method for matching online appointment shared travel personnel based on network representation learning, including the following steps:
步骤一:共乘分类Step 1: Carpool classification
在司机原始路径确定的情况下,根据乘客的起点和终点与司机原始路径的关系,将共乘分为两类:第一类是端点共乘,乘客的起点和终点在司机的原始路径上;另一类是沿途共乘,乘客的起点和终点都不在司机的原始路径上,乘客需要从起点步行至上车点,然后达成共乘,再由下车点步行至目的地,共乘路径轨迹是乘客轨迹的一部分;In the case that the driver's original route is determined, based on the relationship between the passenger's starting point and end point and the driver's original route, carpooling is divided into two categories: the first type is end-point carpooling, where the starting and ending points of the passengers are on the driver's original route; The other type is shared rides along the way. The starting and ending points of passengers are not on the original path of the driver. Passengers need to walk from the starting point to the boarding point, then reach a shared ride, and then walk from the point of disembarkation to the destination. The shared path is Part of the passenger trajectory;
步骤二:构建异质共乘网络Step 2: Build a heterogeneous ride-sharing network
将司机与乘客的请求信息表示为异质共乘网络形式,乘客与司机之间通过位置与时间信息相连接,从而构建异质共乘网络,该异质共乘网络中节点的类型包括用户、地点、时间段以及活动;The request information of the driver and the passenger is expressed as a heterogeneous ride-sharing network, and the passenger and the driver are connected by location and time information to construct a heterogeneous ride-sharing network. The types of nodes in the heterogeneous ride-sharing network include users, Location, time period and activity;
步骤三:使用网络表示学习模型对异质共乘网络进行表示学习,得到用户节点的低维向量表示;Step 3: Use the network representation learning model to perform representation learning on the heterogeneous co-multiplying network to obtain the low-dimensional vector representation of the user node;
步骤四:根据用户节点的低维向量表示计算司机与乘客节点的余弦相似度,并将计算得到的余弦相似值由大到小排序,返回与司机相似度数值最高的前k个乘客,作为可以共乘的乘客,达成共乘。Step 4: Calculate the cosine similarity between the driver and the passenger node according to the low-dimensional vector representation of the user node, and sort the calculated cosine similarity values from large to small, and return the top k passengers with the highest similarity to the driver as possible Passengers sharing a ride, reach a shared ride.
本发明进一步的改进在于,步骤一中,司机与乘客的请求信息包括司机起点和终点、离开时间、司机轨迹、上下车位置、乘客起点和终点以及上下车时间。A further improvement of the present invention is that, in step 1, the request information of the driver and the passenger includes the driver's starting point and ending point, departure time, driver trajectory, getting on and off position, passenger starting point and ending point, and getting on and off time.
本发明进一步的改进在于,步骤二中,异质共乘网络中节点的类型包括用户、地点、时间段以及活动。A further improvement of the present invention is that in step 2, the types of nodes in the heterogeneous ride-sharing network include users, locations, time periods, and activities.
本发明进一步的改进在于,步骤三中,进行表示学习的过程具体包括以下两个步骤:A further improvement of the present invention is that in step 3, the process of performing representation learning specifically includes the following two steps:
1)生成节点序列集:元路径指导节点在异质共乘网络中的游走,生成固定长度的节点序列集;1) Generating node sequence set: The meta-path guides the node to walk in the heterogeneous co-multiplication network, and generates a fixed-length node sequence set;
2)将生成的固定长度的节点序列集输入到skip-gram模型中进行训练,获取司机与乘客节点的向量表示。2) Input the generated fixed-length node sequence set into the skip-gram model for training, and obtain the vector representation of the driver and passenger nodes.
本发明进一步的改进在于,步骤1)中,对于端点共乘,构建结构为ULTLU的元路径;对于沿途共乘,在时间段相同的约束下,构建结构为ULU的元路径。A further improvement of the present invention lies in that in step 1), for the end point co-multiplication, construct a meta-path with a structure of ULTLU; for co-multiplication along the way, under the same time period constraints, construct a meta-path with a ULU structure.
本发明进一步的改进在于,步骤三中,进行表示学习的具体过程如下:A further improvement of the present invention is that in step 3, the specific process of representation learning is as follows:
首先,给定具体的元路径
Figure PCTCN2019107011-appb-000001
通过元路径
Figure PCTCN2019107011-appb-000002
来指导节点在异质共乘网络中的随机游走,生成固定长度的节点序列集;其次,对于固定长度的节点序列集中任意用户节点v u,假设某一节点在序列集中的位置序号是j,则该方法会选取节点集v j-c,...,v j+c作为邻居节点,c是skip-gram中窗口大小的一半;因此,给定用户节点v u,skip-gram模型的目标是最大化具有异构邻居节点的上下文的条件概率:
First, given a specific meta path
Figure PCTCN2019107011-appb-000001
Through meta path
Figure PCTCN2019107011-appb-000002
To guide the random walk of nodes in the heterogeneous co-multiplication network, to generate a fixed-length node sequence set; secondly, for any user node v u in the fixed-length node sequence set, suppose the position number of a node in the sequence set is j , The method will select the node set v jc ,..., v j+c as neighbor nodes, and c is half of the window size in skip-gram; therefore, given the user node v u , the goal of the skip-gram model is Maximize the conditional probability of context with heterogeneous neighbor nodes:
Figure PCTCN2019107011-appb-000003
Figure PCTCN2019107011-appb-000003
其中,N a(v u)是节点v u邻居节点的集合,
Figure PCTCN2019107011-appb-000004
是节点类型的集合,p(v j-c,...,v j+c|v u;θ)为在已知中心节点的条件下,上下文的条件概率;
Wherein, N a (v u) v u neighbor nodes is the set of nodes,
Figure PCTCN2019107011-appb-000004
Is a set of node types, p(v jc ,...,v j+c |v u ; θ) is the conditional probability of the context under the condition of a known central node;
在假设各个节点间相互独立的情况下,在已知中心节点的条件下,上下文的条件概率为log p(v j-c,...,v j+c|v u;θ)进一步分解为
Figure PCTCN2019107011-appb-000005
其中的p(v k|v u;θ)使用softmax函数定义了给定节点v u的上下文节点v k的条件概率;
Under the assumption that each node is independent of each other, the conditional probability of the context is log p(v jc ,...,v j+c |v u ;θ) under the condition that the central node is known.
Figure PCTCN2019107011-appb-000005
Among them, p(v k |v u ; θ) uses the softmax function to define the conditional probability of the context node v k of the given node v u ;
Figure PCTCN2019107011-appb-000006
Figure PCTCN2019107011-appb-000006
其中
Figure PCTCN2019107011-appb-000007
表示节点v u的表示向量;
among them
Figure PCTCN2019107011-appb-000007
Represents the representation vector of node v u ;
根据给定节点v u的上下文节点v k的条件概率生成表示向量,再采用负采样来优化表示向量,得到异质共乘网络中每个用户节点的低维向量表示。 The representation vector is generated according to the conditional probability of the context node v k of the given node v u , and then negative sampling is used to optimize the representation vector to obtain the low-dimensional vector representation of each user node in the heterogeneous co-multiplication network.
本发明进一步的改进在于,随机游走的游走概率
Figure PCTCN2019107011-appb-000008
如下:
A further improvement of the present invention is that the walk probability of random walk
Figure PCTCN2019107011-appb-000008
as follows:
Figure PCTCN2019107011-appb-000009
Figure PCTCN2019107011-appb-000009
Figure PCTCN2019107011-appb-000010
Figure PCTCN2019107011-appb-000010
公式(1)中,a表示节点类型,
Figure PCTCN2019107011-appb-000011
是路径上的一个a类型节点,
Figure PCTCN2019107011-appb-000012
表示沿着预定义的元路径
Figure PCTCN2019107011-appb-000013
上节点
Figure PCTCN2019107011-appb-000014
的邻居节点数量,ε表示网络中的链路集合,v i+1,
Figure PCTCN2019107011-appb-000015
表示两个节点在网络中能够构成一条链路,f v(v i+1)=a+1指的节点v i+1是a+1类型的节点;
In formula (1), a represents the node type,
Figure PCTCN2019107011-appb-000011
Is a type a node on the path,
Figure PCTCN2019107011-appb-000012
Represents along a predefined meta path
Figure PCTCN2019107011-appb-000013
Upper node
Figure PCTCN2019107011-appb-000014
The number of neighbor nodes, ε represents the set of links in the network, v i+1 ,
Figure PCTCN2019107011-appb-000015
It means that two nodes can form a link in the network, f v (v i+1 )=a+1 means that the node v i+1 is a node of type a+1;
公式(2)表示选取的元路径均为对称元路径。Formula (2) indicates that the selected element paths are all symmetrical element paths.
本发明进一步的改进在于,采用负采样来优化表示向量的具体过程如下:A further improvement of the present invention is that the specific process of using negative sampling to optimize the representation vector is as follows:
Figure PCTCN2019107011-appb-000016
Figure PCTCN2019107011-appb-000016
其中
Figure PCTCN2019107011-appb-000017
Figure PCTCN2019107011-appb-000018
是v u的随机负节点样本的集合,负采样节点集
Figure PCTCN2019107011-appb-000019
根据噪声分布p(v′ u)来采样,
Figure PCTCN2019107011-appb-000020
among them
Figure PCTCN2019107011-appb-000017
Figure PCTCN2019107011-appb-000018
Is a set of random negative node samples of v u , negative sample node set
Figure PCTCN2019107011-appb-000019
Sampling according to the noise distribution p(v′ u ),
Figure PCTCN2019107011-appb-000020
然后使用随机梯度下降方法使得对数似然函数
Figure PCTCN2019107011-appb-000021
最大化,来更新(5)式中节点的向量表示,具体如式(6)、式(7)所示,得到异质共乘网络中每个用户节点的低维向量表示;
Then use the stochastic gradient descent method to make the log likelihood function
Figure PCTCN2019107011-appb-000021
Maximize, to update the vector representation of the node in equation (5), specifically as shown in equations (6) and (7), to obtain the low-dimensional vector representation of each user node in the heterogeneous co-multiplying network;
Figure PCTCN2019107011-appb-000022
Figure PCTCN2019107011-appb-000022
Figure PCTCN2019107011-appb-000023
Figure PCTCN2019107011-appb-000023
其中
Figure PCTCN2019107011-appb-000024
函数表明了v′ u是否是上下文邻居节点v k
among them
Figure PCTCN2019107011-appb-000024
The function indicates whether v'u is a context neighbor node v k .
与现有技术相比,本发明的有益效果在于:Compared with the prior art, the present invention has the following beneficial effects:
与普通的共乘推荐机制不同,本发明采用乘客和司机的地点和时间信息构造共乘异构网络,并区分出两种不同的共乘类型。对这两种类型选取生成对称元路径,并针对不同的共乘类型对元路径序列集的生成添加了不同的限制。通过负采样skip-gram对序列集进行表示学习,生成表示向量,最后使用余弦相似度计算用户表示向量间的相似值进行共乘推荐。本发明提 出的共乘推荐方法比传统的仅使用距离推荐的方法更加可靠,语义理解性直观,能够准确发现潜在的共乘用户,为其提供更为快捷方便的服务。Different from the common ride sharing recommendation mechanism, the present invention uses the location and time information of passengers and drivers to construct a heterogeneous ride sharing network, and distinguishes two different ride sharing types. For these two types, symmetrical meta-paths are selected to be generated, and different restrictions are added to the generation of meta-path sequence sets for different co-multiplication types. The negative sampling skip-gram is used to represent the sequence set to generate the representation vector, and finally the cosine similarity is used to calculate the similarity between the user representation vectors for common multiplication recommendation. The ride sharing recommendation method proposed by the present invention is more reliable than the traditional method using only distance recommendation, has intuitive semantic comprehension, can accurately find potential ride sharing users, and provide them with faster and more convenient services.
附图说明Description of the drawings
图1为本发明构建的异质共乘网络的拓扑结构图。Fig. 1 is a topological structure diagram of the heterogeneous multiplication network constructed by the present invention.
具体实施方式detailed description
以下结合附图对本发明提出的共乘匹配方法进行具体说明。Hereinafter, the shared matching method proposed by the present invention will be described in detail with reference to the accompanying drawings.
本发明的一种基于网络表示学习的网约共享出行人员匹配方法,包括以下步骤:The method of the present invention for online appointment sharing traveler matching based on network representation learning includes the following steps:
步骤一:共乘分类Step 1: Carpool classification
在司机原始路径确定的情况下,根据乘客的起点和终点与司机原始路径的关系,将共乘分为两类:第一类是端点共乘,乘客的起点和终点在司机的原始路径上;另一类是沿途共乘,乘客的起点和终点都不在司机的原始路径上,乘客需要从起点步行至上车点,然后达成共乘,再由下车点步行至目的地,共乘路径轨迹是乘客轨迹的一部分;In the case that the driver's original route is determined, based on the relationship between the passenger's starting point and end point and the driver's original route, carpooling is divided into two categories: the first type is end-point carpooling, where the starting and ending points of the passengers are on the driver's original route; The other type is shared rides along the way. The starting and ending points of passengers are not on the original path of the driver. Passengers need to walk from the starting point to the boarding point, then reach a shared ride, and then walk from the point of disembarkation to the destination. The shared path is Part of the passenger trajectory;
司机与乘客的请求信息包括司机起点和终点、离开时间、司机轨迹、上下车位置、乘客起点和终点以及上下车时间。The driver and passenger request information includes the driver's starting point and destination, departure time, driver's trajectory, alighting position, passenger starting and ending point, and alighting time.
步骤二:构建异质共乘网络Step 2: Build a heterogeneous ride-sharing network
将司机与乘客的请求信息表示为异质共乘网络形式,乘客与司机之间通过位置与时间信息相连接,从而构建异质共乘网络,该异质共乘网络中节点的类型包括用户、地点、时间段以及活动;The request information of the driver and the passenger is expressed as a heterogeneous ride-sharing network, and the passenger and the driver are connected by location and time information to construct a heterogeneous ride-sharing network. The types of nodes in the heterogeneous ride-sharing network include users, Location, time period and activity;
步骤三:使用网络表示学习模型对异质共乘网络进行表示学习,得到用户节点的低维向量表示;Step 3: Use the network representation learning model to perform representation learning on the heterogeneous co-multiplying network to obtain the low-dimensional vector representation of the user node;
步骤三中,进行表示学习的过程具体包括以下两个步骤:In step three, the process of representation learning specifically includes the following two steps:
1)生成节点序列集:元路径指导节点在异质共乘网络中的游走,生成固定长度的节点序 列集。对于端点共乘,构建结构为ULTLU的元路径;对于路途共乘,在时间段相同的约束下,构建结构为ULU的元路径。1) Generate node sequence set: The meta-path guides nodes to walk in the heterogeneous co-multiplying network to generate a fixed length node sequence set. For end-point sharing, construct the meta-path with the structure of ULTLU; for road sharing, construct the meta-path with the structure of ULU under the same time period constraints.
2)将生成的固定长度的节点序列集输入到skip-gram模型中进行训练,获取司机与乘客节点的向量表示。2) Input the generated fixed-length node sequence set into the skip-gram model for training, and obtain the vector representation of the driver and passenger nodes.
根据不同的共乘类型,使用不同的对称元路径来进行表示学习。对于端点共乘,使用ULTLU元路径,意味着在相同的时间到达相同地点的乘客与司机可以共乘,ULTLU中的第一个U表示用户节点,此处代表司机,第二个U表示用户节点,此处代表乘客,L表示乘车地点或下车地点,T表示对应地点上下车时间所处的时间段。对于沿途共乘,由于司机和乘客的出发地和目的地不同,因此乘客乘车前和下车后,仍需要花费额外的时间步行到乘车点和目的地,由此,基于ULU元路径,在司机和乘客对处于同一时间段的条件限定下,获取到最佳上车地点作为乘车地点L,(L表示乘车地点或下车地点)该元路径意味着司机和乘客在该时间段有最佳相遇点作为乘车地点L,可以共乘。According to different types of co-multiplication, different symmetric meta-paths are used for representation learning. For endpoint sharing, the ULTLU meta path is used, which means that passengers and drivers arriving at the same place at the same time can share the ride. The first U in ULTLU represents the user node, here represents the driver, and the second U represents the user node , Here represents the passenger, L represents the boarding location or the drop-off location, and T represents the time period of the corresponding location. For shared rides along the way, since the departure place and destination of the driver and the passenger are different, the passenger still needs to spend extra time walking to the boarding point and destination before and after getting off the bus. Therefore, based on the ULU yuan path, Under the condition that the driver and the passenger are in the same time period, the best boarding location is obtained as the boarding location L, (L represents the boarding location or the getting off location) this meta path means that the driver and the passenger are in the time period There is the best meeting point as the boarding location L, which can be shared.
步骤三中,进行表示学习的具体过程如下:In step three, the specific process of representation learning is as follows:
首先,给定具体的元路径
Figure PCTCN2019107011-appb-000025
通过元路径
Figure PCTCN2019107011-appb-000026
来指导节点在异质共乘网络中的随机游走,生成固定长度的节点序列集;随机游走的游走概率
Figure PCTCN2019107011-appb-000027
如下:
First, given a specific meta path
Figure PCTCN2019107011-appb-000025
Through meta path
Figure PCTCN2019107011-appb-000026
To guide the random walk of nodes in the heterogeneous co-multiplying network, generate a fixed-length node sequence set; the walk probability of random walk
Figure PCTCN2019107011-appb-000027
as follows:
Figure PCTCN2019107011-appb-000028
Figure PCTCN2019107011-appb-000028
Figure PCTCN2019107011-appb-000029
Figure PCTCN2019107011-appb-000029
公式(1)中,a表示节点类型,
Figure PCTCN2019107011-appb-000030
是路径上的一个a类型节点,
Figure PCTCN2019107011-appb-000031
表示沿着预定义的元路径
Figure PCTCN2019107011-appb-000032
上节点
Figure PCTCN2019107011-appb-000033
的邻居节点数量,ε表示网络中的链路集合,v i+1,
Figure PCTCN2019107011-appb-000034
表示两个节点在网络中能够构成一条链路,f v(v i+1)=a+1指的节点v i+1是a+1类型的节点。
In formula (1), a represents the node type,
Figure PCTCN2019107011-appb-000030
Is a type a node on the path,
Figure PCTCN2019107011-appb-000031
Represents along a predefined meta path
Figure PCTCN2019107011-appb-000032
Upper node
Figure PCTCN2019107011-appb-000033
The number of neighbor nodes, ε represents the set of links in the network, v i+1 ,
Figure PCTCN2019107011-appb-000034
It means that two nodes can form a link in the network, f v (v i+1 )=a+1 means that the node v i+1 is a node of type a+1.
公式(2)表示选取的元路径均为对称元路径。Formula (2) indicates that the selected element paths are all symmetrical element paths.
其次,对于固定长度的节点序列集中任意用户节点v u,假设某一节点在序列集中的位置序号是j,则该方法会选取节点集v j-c,...,v j+c作为邻居节点,c是skip-gram中窗口大小的一半;因此,给定用户节点v u,skip-gram模型的目标是最大化具有异构邻居节点的上下文的条件概率: Secondly, for any user node v u in a fixed-length node sequence set, assuming that the position number of a node in the sequence set is j, the method will select the node set v jc ,..., v j+c as neighbor nodes, c is half of the window size in skip-gram; therefore, given a user node v u , the goal of the skip-gram model is to maximize the conditional probability of context with heterogeneous neighbor nodes:
Figure PCTCN2019107011-appb-000035
Figure PCTCN2019107011-appb-000035
其中,N a(v u)是节点v u邻居节点的集合,
Figure PCTCN2019107011-appb-000036
是节点类型的集合,p(v j-c,...,v j+c|v u;θ)为在已知中心节点的条件下,上下文的条件概率。
Wherein, N a (v u) v u neighbor nodes is the set of nodes,
Figure PCTCN2019107011-appb-000036
Is a set of node types, p(v jc ,...,v j+c |v u ; θ) is the conditional probability of the context under the condition of a known central node.
在假设各个节点间相互独立的情况下,在已知中心节点的条件下,上下文的条件概率为log p(v j-c,...,v j+c|v u;θ)进一步分解为
Figure PCTCN2019107011-appb-000037
其中的p(v k|v u;θ)使用softmax函数定义了给定节点v u的上下文节点v k的条件概率。
Under the assumption that each node is independent of each other, the conditional probability of the context is log p(v jc ,...,v j+c |v u ;θ) under the condition that the central node is known.
Figure PCTCN2019107011-appb-000037
Among them, p(v k |v u ;θ) uses the softmax function to define the conditional probability of the context node v k of the given node v u .
Figure PCTCN2019107011-appb-000038
Figure PCTCN2019107011-appb-000038
其中
Figure PCTCN2019107011-appb-000039
表示节点v u的表示向量。
among them
Figure PCTCN2019107011-appb-000039
The representation vector representing the node v u .
根据给定节点v u的上下文节点v k的条件概率生成表示向量,再采用负采样来优化表示向量,得到异质共乘网络中每个用户节点的低维向量表示。 The representation vector is generated according to the conditional probability of the context node v k of the given node v u , and then negative sampling is used to optimize the representation vector to obtain the low-dimensional vector representation of each user node in the heterogeneous co-multiplication network.
采用负采样来优化表示向量的具体过程如下:The specific process of using negative sampling to optimize the representation vector is as follows:
Figure PCTCN2019107011-appb-000040
Figure PCTCN2019107011-appb-000040
其中
Figure PCTCN2019107011-appb-000041
Figure PCTCN2019107011-appb-000042
是v u的随机负节点样本的集合,负采样节点集
Figure PCTCN2019107011-appb-000043
根据噪声分布p(v′ u)来采样,
Figure PCTCN2019107011-appb-000044
among them
Figure PCTCN2019107011-appb-000041
Figure PCTCN2019107011-appb-000042
Is a set of random negative node samples of v u , negative sample node set
Figure PCTCN2019107011-appb-000043
Sampling according to the noise distribution p(v′ u ),
Figure PCTCN2019107011-appb-000044
然后使用随机梯度下降(SGD)方法使得对数似然函数
Figure PCTCN2019107011-appb-000045
最大化,来更新(5)式中节点的向量表示,具体如式(6)、式(7)所示,得到异质共乘网络中每个用户节点的低维向量表示。
Then use the stochastic gradient descent (SGD) method to make the log likelihood function
Figure PCTCN2019107011-appb-000045
Maximize, to update the vector representation of the node in equation (5), specifically as shown in equations (6) and (7), to obtain the low-dimensional vector representation of each user node in the heterogeneous co-multiplying network.
Figure PCTCN2019107011-appb-000046
Figure PCTCN2019107011-appb-000046
Figure PCTCN2019107011-appb-000047
Figure PCTCN2019107011-appb-000047
其中
Figure PCTCN2019107011-appb-000048
函数表明了v′ u是否是上下文邻居节点v k
among them
Figure PCTCN2019107011-appb-000048
The function indicates whether v'u is a context neighbor node v k .
步骤四:根据用户节点的低维向量表示计算司机与乘客节点的余弦相似度,并将计算得到的余弦相似值由大到小排序,返回与司机相似度数值最高的前k个乘客,作为可以共乘的乘客,达成共乘。其中k的大小由司机所能搭载的最大乘客量决定。Step 4: Calculate the cosine similarity between the driver and the passenger node according to the low-dimensional vector representation of the user node, and sort the calculated cosine similarity values from large to small, and return the top k passengers with the highest similarity to the driver as possible Passengers sharing a ride, reach a shared ride. The size of k is determined by the maximum number of passengers that the driver can carry.
实施例1Example 1
步骤一:数据分类提取;Step 1: Data classification and extraction;
本发明的实验数据来源于滴滴盖亚数据开放计划提供的成都局部地区数据,包括司机的gps轨迹数据和乘客的订单数据,实验中对司机和乘客进行编号,提取乘客的出发地和目的地以及司机的轨迹和对应时间,其中,司机轨迹的第一个点当做司机的出发点,轨迹终止点作为司机的目的地。本发明根据乘客的起始点与司机轨迹的关系,将共乘类型分为端点共乘和沿途共乘两类。The experimental data of the present invention comes from the local area data of Chengdu provided by the Didi Gaia Data Open Program, including the driver's GPS trajectory data and passenger order data. In the experiment, the driver and passengers are numbered, and the passenger's departure place and destination are extracted And the driver’s trajectory and corresponding time, where the first point of the driver’s trajectory is used as the driver’s starting point, and the trajectory end point is used as the driver’s destination. According to the relationship between the passenger's starting point and the driver's trajectory, the present invention divides the ride sharing types into end ride sharing and along the way ride sharing.
具体地,对于端点共乘,乘客的起点和终点在司机的原始路径上;沿途共乘则是乘客的起点和终点都不在司机的原始路径上,乘客需要从起点步行至上车点,然后达成共乘,再由下车点步行至目的地,共乘路径轨迹只是乘客轨迹的一部分。Specifically, for end-point sharing, the starting point and ending point of the passengers are on the driver's original path; along the way, the starting point and ending point of the passengers are not on the original path of the driver, and the passengers need to walk from the starting point to the boarding point, and then reach a common Ride, and then walk from the drop-off point to the destination. The shared path trajectory is only part of the passenger trajectory.
此外对于两类共乘,还需满足以下条件。此处使用如下符号进行分析。将d表示为司机,p表示为乘客,每位司机和乘客都有自己的出发地O和目的地D。对于共乘,x表示为乘客的上车位置,y表示为乘客的下车位置。TT(O,D)表示乘客p或司机d的出发地O到目的地D所需的行驶时间,DTime d表示司机d从某个位置的出发时间,DTime p表示乘客p从某个位置的出发时间。对于端点共乘类型,只有当符合以下条件时,司机d才能将p作为共乘乘客进行端点共乘: In addition, for the two types of ride-sharing, the following conditions must be met. The following symbols are used here for analysis. Denote d as a driver and p as a passenger. Each driver and passenger has their own origin O and destination D. For carpooling, x is the passenger's boarding position and y is the passenger's alighting position. TT(O,D) represents the travel time required from the departure place O of the passenger p or the driver d to the destination D, DTime d represents the departure time of the driver d from a certain location, and DTime p represents the departure of the passenger p from a certain location time. For the end-point sharing type, only when the following conditions are met, the driver d can use p as a shared passenger for end-point sharing:
max pTT(O p,D p)≤TT(O d,D d)            (8) max p TT(O p ,D p )≤TT(O d ,D d ) (8)
Figure PCTCN2019107011-appb-000049
Figure PCTCN2019107011-appb-000049
式中,TT(O p,D p)表示乘客p的出发地O到目的地D所需的行驶时间,TT(O d,D d)表示司机d的出发地O到目的地D所需的行驶时间; In the formula, TT(O p ,D p ) represents the travel time required from the departure place O of the passenger p to the destination D, and TT(O d ,D d ) represents the travel time required by the driver d’s departure place O to the destination D Driving time
沿途共乘是一种典型的出行方式,乘客从出发地点步行到最佳相遇点,在出发时间出行,共乘一段时间后,选择司机轨迹上离乘客目的地最近的地点下车,然后步行到目的地。此处使用MT p表示乘客p从起始出发地到乘车地与最终目的地到下车地之间的最大步行时间。因此对于沿途共乘,如果符合以下情况,司机可以让该乘客进行共乘: Shared rides along the route are a typical way of travel. Passengers walk from the departure point to the best meeting point, travel at the departure time, and after a period of time sharing, choose the driver's track to get off at the location closest to the passenger's destination, and then walk to destination. MT p is used here to indicate the maximum walking time of passenger p from the starting point of departure to the boarding point and the final destination to the alighting point. Therefore, for ride sharing along the way, the driver can allow the passenger to ride sharing if the following conditions are met:
Figure PCTCN2019107011-appb-000050
Figure PCTCN2019107011-appb-000050
式中,TT(x p,y p)表示乘客p的出发地O到目的地D所需的行驶时间。 In the formula, TT (x p , y p ) represents the travel time required for passenger p's departure O to destination D.
对于最佳相遇点的选择,本发明采用迪杰斯特拉算法得到司机与乘客的最佳相遇点,并将该点作为乘客的上车位置。For the selection of the best meeting point, the present invention uses the Dijkstra algorithm to obtain the best meeting point between the driver and the passenger, and uses this point as the passenger's boarding position.
步骤二:将步骤一提取的地点和时间数据进行预处理,构建共乘异构信息网络:Step 2: Preprocess the location and time data extracted in Step 1 to construct a shared heterogeneous information network:
定义1异构信息网络(HIN)被定义为具有多种类型的节点和/或多种类型的链路的网络。它可以表示为H=(v,ε),其中v是一组节点,ε是一组链接。链接可以是加权的,未加权的,定向的或无向的。节点类型映射函数
Figure PCTCN2019107011-appb-000051
将节点映射到预定义类型,链接类型映射函数
Figure PCTCN2019107011-appb-000052
将链接映射到预定义链接类型。
Definition 1 Heterogeneous Information Network (HIN) is defined as a network with multiple types of nodes and/or multiple types of links. It can be expressed as H=(v,ε), where v is a set of nodes and ε is a set of links. Links can be weighted, unweighted, directed or undirected. Node type mapping function
Figure PCTCN2019107011-appb-000051
Map nodes to predefined types, link type mapping functions
Figure PCTCN2019107011-appb-000052
Map links to predefined link types.
异构信息网络由通过网络间边缘连接的不同但相关的节点组成。此处的“不同”意味着-网络的顶点具有不同类型,“相关”则意味着两个节点具有特定类型的交互或关系。Heterogeneous information networks are composed of different but related nodes connected by the edges of the networks. "Different" here means-the vertices of the network have different types, and "related" means that two nodes have a specific type of interaction or relationship.
本发明构造的异质共乘网络如图1所示。对于两种类型的共乘,本发明构造了相同的网络模式,两类共乘均是以司机和乘客的时间和地点约束来匹配司机和乘客,该网络模式中的节点类型包括:位置(L),时间(T),活动(A)和用户(即乘客或司机)(U)。其中用户类型节点(U)包括司机和乘客,其中乘客以p开头从1开始顺序编号,司机以d开头从1 开始顺序编号;本发明对时间进行了序列化模糊化处理,其按照每隔半个小时将一天24小时划分为了48个时段,具体为从00:00:00开始,每半个小时为一个时间段,将时间段进行编号(1~48),将每个时间段对应的编号作为时间类型节点(T)。对于位置类型的节点(L),选取乘客的上下车地点做为位置类型节点,在端点共乘中,以乘客的O和D做为位置类型节点,因为在端点共乘中,乘客的OD在司机轨迹上,乘客的上下车地点就是乘客的OD点,在沿途共乘中,以司机和乘客的最佳相遇点做为位置类型节点,因为在沿途共乘中,乘客和司机均需要到达最佳相遇点才能达成共乘。对于活动类型节点(A),本发明通过百度API转换得到每个位置的类型,包括房地产、教育机构等。异质共乘网络中的链接类型
Figure PCTCN2019107011-appb-000053
包括某位置发生某个活动,位置之间的路径以及时间段的范围。对于每个位置l∈L,存在一组用户的链接,活动和一组出发时间属于链接类型
Figure PCTCN2019107011-appb-000054
它还可以包含连接两个位置的路线信息,以及一些乘客走到相遇点进行乘车的时间间隔信息。可以由该网络构建形如ULU的元路径,表示出各个不同类型节点的关系。例如U-L链接表示一个用户从某个位置出发或是打算到达某目的地,表现出停留的关系;L-T链接则可表示在某个时间段有从某地出发或到达某地的行为发生;
The heterogeneous ride-sharing network constructed by the present invention is shown in Figure 1. For the two types of carpooling, the present invention constructs the same network mode. Both types of carpooling are based on the time and location constraints of the driver and passenger to match the driver and passenger. The node types in this network mode include: location (L ), time (T), activity (A) and user (ie passenger or driver) (U). The user type node (U) includes drivers and passengers. Passengers are numbered sequentially starting from 1 with p, and drivers are numbered sequentially starting from 1 with d; the present invention serializes and obfuscates the time. Hour divides 24 hours a day into 48 time periods, specifically starting from 00:00:00, every half an hour is a time period, the time period is numbered (1~48), and the number corresponding to each time period As a time type node (T). For the location type node (L), select the passenger's boarding and disembarkation location as the location type node. In the end-point sharing, the passenger's O and D are used as the location type node, because in the end-point sharing, the passenger's OD is On the driver’s trajectory, the passenger’s boarding and disembarking point is the passenger’s OD point. In the shared ride, the best meeting point between the driver and the passenger is used as the position type node, because in the shared ride, both the passenger and the driver need to reach the most A good meeting point can achieve a ride-sharing. For the activity type node (A), the present invention obtains the type of each location, including real estate, educational institution, etc., through Baidu API conversion. Link types in heterogeneous network
Figure PCTCN2019107011-appb-000053
Including the occurrence of a certain activity in a certain location, the path between locations and the range of time periods. For each location l ∈ L, there is a set of links for users, activities and a set of departure times belong to the link type
Figure PCTCN2019107011-appb-000054
It can also contain information about the route connecting the two locations, as well as information on the time interval for some passengers to get to the meeting point for the ride. The network can construct a meta-path like ULU to show the relationship between different types of nodes. For example, UL link indicates that a user starts from a certain location or intends to reach a certain destination, showing a staying relationship; LT link can indicate that a behavior of starting from or arriving at a certain place occurs during a certain period of time;
步骤三:根据不同的共乘类型,选择相应的元路径,使用网络表示学习模型对异质共乘网络进行表示学习,得到用户节点的低维向量表示;Step 3: According to different types of sharing, select the corresponding meta-path, use the network representation learning model to learn the representation of the heterogeneous sharing network, and obtain the low-dimensional vector representation of the user node;
构建异质共乘网络的目的是在司机和乘客之间建立联系,为了体现出二者具有相同的目的或要求,此处使用对称元路径来表示这种关系。其中元路径定义如下:The purpose of constructing a heterogeneous ride-sharing network is to establish a connection between drivers and passengers. In order to show that the two have the same purpose or requirements, a symmetric meta-path is used here to express this relationship. The meta path is defined as follows:
定义2(元路径)元路径是在网络模式
Figure PCTCN2019107011-appb-000055
上定义的路径,并以
Figure PCTCN2019107011-appb-000056
Figure PCTCN2019107011-appb-000057
的形式表示,该形式表示两个给定节点类型之间的组成关系。元路径通常以对称方式使用,即其第一个节点V 1与最后一个节点V m类型相同。
Definition 2 (Metapath) Metapath is in network mode
Figure PCTCN2019107011-appb-000055
Path defined above and end with
Figure PCTCN2019107011-appb-000056
Figure PCTCN2019107011-appb-000057
The form represents the composition relationship between two given node types. The meta path is usually used in a symmetrical manner, that is, its first node V 1 and the last node V m are of the same type.
对于端点共乘,使用ULTLU元路径,意味着在相同的时间到达相同地点的乘客与司机可以共乘。其中的两个U分别作为司机和乘客,L表示乘车地点或下车地点,T表示对应地 点上下车时所处的时间段。对于沿途共乘,由于司机和乘客的出发地和目的地不同,因此乘客乘车前和下车后,仍需要花费额外的时间步行到乘车点和目的地,这类共乘相比端点共乘更具有复杂性。基于ULU元路径,该元路径中的L代表最佳相遇点。表示能够在同一时段到达同一相遇点的司机和乘客可以达到共乘。对于获取最佳相遇点的迪杰斯特拉算法,首先将包含对应司机和乘客轨迹的轨迹网络作为输入,通过该算法获取到二者OD在路网上的最短路径,将其作为最佳相遇点。For end-point ride sharing, the ULTLU meta-path is used, which means that passengers and drivers who arrive at the same place at the same time can ride together. Among them, the two U are the driver and the passenger respectively, L represents the boarding location or the getting off location, and T represents the time period when getting on and off the vehicle at the corresponding location. For shared rides along the way, since the departure place and destination of the driver and the passenger are different, the passengers still need to spend extra time walking to the boarding point and destination before and after getting off the bus. Multiplication is more complicated. Based on the ULU meta-path, the L in the meta-path represents the best meeting point. It means that the driver and passengers who can reach the same meeting point at the same time can reach a shared ride. For Dijkstra's algorithm to obtain the best meeting point, the trajectory network containing the corresponding driver and passenger trajectories is first used as input, and the shortest path between the two ODs on the road network is obtained through the algorithm, and it is taken as the best meeting point .
网络表示学习能够将网络中的节点表示成低维稠密的实值的向量形式,本发明通过输入基于对称元路径的节点序列集,将节点以one-hot编码方式编码为初始向量后再进行低维向量的转换。Network representation learning can represent the nodes in the network as low-dimensional dense real-valued vector forms. The present invention inputs the node sequence set based on the symmetric element path, encodes the nodes in a one-hot encoding method as the initial vector, and then performs low Conversion of dimensional vectors.
本发明中,网络表示学习分为两个阶段:首先第一阶段是元路径指导节点在异质共乘网络中的随机游走,进一步生成固定长度的节点序列集。In the present invention, the network representation learning is divided into two stages: first, the first stage is the random walk of the meta-path instructing the node in the heterogeneous co-multiplication network, and further generating a fixed-length node sequence set.
以端点共乘使用的ULTLU类型元路径为例,如果当前节点为用户(U)类型,则ULTLU元路径指导下的下一跳节点为L类型,跳转概率如公式(1)所示,L类型的下一节点为T类型,本发明所生成的每个节点序列长度为5。Take the ULTLU type meta path used by the endpoint multiplication as an example. If the current node is of the user (U) type, the next hop node under the guidance of the ULTLU meta path is of type L, and the jump probability is shown in formula (1), L The next node of the type is T type, and the length of each node sequence generated by the present invention is 5.
随机游走的游走概率
Figure PCTCN2019107011-appb-000058
如下:
Random Walk Probability
Figure PCTCN2019107011-appb-000058
as follows:
Figure PCTCN2019107011-appb-000059
Figure PCTCN2019107011-appb-000059
Figure PCTCN2019107011-appb-000060
Figure PCTCN2019107011-appb-000060
公式(1)中,其中的节点用a来表示节点类型,
Figure PCTCN2019107011-appb-000061
是路径上的一个a类型节点,
Figure PCTCN2019107011-appb-000062
表示沿着预定义的元路径
Figure PCTCN2019107011-appb-000063
上节点
Figure PCTCN2019107011-appb-000064
的邻居节点数量,ε表示网络中的链路集合,v i+1,
Figure PCTCN2019107011-appb-000065
则表示两个节点在网络中可以构成一条链路,f v(v i+1)=a+1指的节点v i+1是a+1的类型。因此,在元路径
Figure PCTCN2019107011-appb-000066
的指导下,只有当下一个节点v i+1是a+1类型的节点时,随机游走可以进行下 去。公式(2)则表示使用的元路径
Figure PCTCN2019107011-appb-000067
是对称元路径。
In formula (1), the node in it is represented by a node type,
Figure PCTCN2019107011-appb-000061
Is a type a node on the path,
Figure PCTCN2019107011-appb-000062
Represents along a predefined meta path
Figure PCTCN2019107011-appb-000063
Upper node
Figure PCTCN2019107011-appb-000064
The number of neighbor nodes, ε represents the set of links in the network, v i+1 ,
Figure PCTCN2019107011-appb-000065
It means that two nodes can form a link in the network, and f v (v i+1 )=a+1 means that the node v i+1 is a type of a+1. So in the meta path
Figure PCTCN2019107011-appb-000066
Under the guidance of, the random walk can proceed only when the next node v i+1 is a node of type a+1. Formula (2) represents the meta path used
Figure PCTCN2019107011-appb-000067
Is a symmetric meta path.
第二阶段将长度为5的节点序列集输入到skip-gram模型中进行训练,获取司机与乘客节点的向量表示。In the second stage, the node sequence set of length 5 is input into the skip-gram model for training, and the vector representation of the driver and passenger nodes is obtained.
此处使用skip-gram模型来为每个用户类型U的节点构建特征向量,并在skip-gram模型中使用了负采样来优化表示向量。对于序列集中任意用户节点v u,假设该节点在某个序列中的位置序号是j,该方法会选取t类型的节点集v j-c,...,v j+c作为邻居节点,c是skip-gram中设置的窗口大小的一半。因此,给定用户节点v u,skip-gram模型的目标是最大化具有异构邻居节点的上下文的条件概率: Here, the skip-gram model is used to construct a feature vector for each user type U node, and negative sampling is used in the skip-gram model to optimize the representation vector. For any user node v u in the sequence set, assuming that the position number of the node in a certain sequence is j, this method will select the node set v jc ,..., v j+c of type t as neighbor nodes, and c is skip -Half of the window size set in gram. Therefore, given a user node v u , the goal of the skip-gram model is to maximize the conditional probability of a context with heterogeneous neighbor nodes:
Figure PCTCN2019107011-appb-000068
Figure PCTCN2019107011-appb-000068
N a(v u)是节点v u邻居节点的集合,
Figure PCTCN2019107011-appb-000069
是节点类型的集合。在假设各个节点间相互独立的情况下,log p(v j-c,...,v j+c|v u;θ)可以进一步分解为
Figure PCTCN2019107011-appb-000070
其中的p(v k|v u;θ)则使用softmax函数定义了给定节点v u的上下文节点v k的条件概率。
N a (v u) v u neighbor nodes is the set of nodes,
Figure PCTCN2019107011-appb-000069
Is a collection of node types. Under the assumption that each node is independent of each other, log p(v jc ,...,v j+c |v u ;θ) can be further decomposed into
Figure PCTCN2019107011-appb-000070
Among them, p(v k |v u ;θ) uses the softmax function to define the conditional probability of the context node v k of the given node v u .
Figure PCTCN2019107011-appb-000071
Figure PCTCN2019107011-appb-000071
其中
Figure PCTCN2019107011-appb-000072
表示节点v u的表示向量。
among them
Figure PCTCN2019107011-appb-000072
The representation vector representing the node v u .
除了使用上下文节点集的关系来生成向量,还使用了负采样的方法来优化表示向量,该方法能够去除无关节点对目标向量的影响,使得不同分类的向量间的区分更加明显。该方法的似然函数如下:In addition to using the relationship between the context node set to generate the vector, the negative sampling method is also used to optimize the representation vector. This method can remove the influence of irrelevant nodes on the target vector, making the distinction between vectors of different categories more obvious. The likelihood function of this method is as follows:
Figure PCTCN2019107011-appb-000073
Figure PCTCN2019107011-appb-000073
其中
Figure PCTCN2019107011-appb-000074
Figure PCTCN2019107011-appb-000075
是v u的随机负节点样本的集合,包括v j-c,...,v j+c以外的其余节点,负采样节点集
Figure PCTCN2019107011-appb-000076
根据噪声分布p(v′ u)来采样,即
Figure PCTCN2019107011-appb-000077
部分意味着随机负节点的采样期望遵循着概率密度函数p(v′ u)。然后使用随机梯度下降(SGD)来最大化对数似然函数:
among them
Figure PCTCN2019107011-appb-000074
Figure PCTCN2019107011-appb-000075
Is the set of random negative node samples of v u , including the rest of the nodes except v jc ,...,v j+c , the set of negative sampling nodes
Figure PCTCN2019107011-appb-000076
Sampling according to the noise distribution p(v′ u ), namely
Figure PCTCN2019107011-appb-000077
Partly means that the sampling expectations of random negative nodes follow the probability density function p(v' u ). Then use Stochastic Gradient Descent (SGD) to maximize the log likelihood function:
Figure PCTCN2019107011-appb-000078
Figure PCTCN2019107011-appb-000078
Figure PCTCN2019107011-appb-000079
Figure PCTCN2019107011-appb-000079
其中
Figure PCTCN2019107011-appb-000080
函数表明了v′ u是否是上下文邻居节点v k。本发明中,生成的向量维度为128维,其中skip-gram使用的窗口大小c设置为2,负采样的样本数量为5。
among them
Figure PCTCN2019107011-appb-000080
The function indicates whether v'u is a context neighbor node v k . In the present invention, the generated vector has a dimension of 128 dimensions, wherein the window size c used by skip-gram is set to 2, and the number of samples for negative sampling is 5.
步骤四:根据用户节点的低维向量表示通过余弦相似度算法计算司机和乘客的相似度,将相似度结果从大到小排列,因此获得与司机相似度数值最高的前k个乘客,作为可以共乘的乘客,达成共乘。其中k的大小由司机所能搭载的最大乘客量决定。Step 4: According to the low-dimensional vector representation of the user node, the similarity between the driver and the passenger is calculated through the cosine similarity algorithm, and the similarity results are ranked from large to small, so the top k passengers with the highest similarity value to the driver are obtained as a Passengers sharing a ride, reach a shared ride. The size of k is determined by the maximum number of passengers that the driver can carry.
通过余弦相似度算法计算司机和乘客的相似度的具体过程如下:The specific process of calculating the similarity between the driver and the passenger through the cosine similarity algorithm is as follows:
对于任意一对用户节点v i,v j,其表示向量X i,X j的余弦相似度Sim(v i,v j)定义如下 For any pair of user node v i, v j, which represents the vector X i, X j cosine similarity Sim (v i, v j) is defined as follows
Figure PCTCN2019107011-appb-000081
Figure PCTCN2019107011-appb-000081
其中,X i为节点v i的向量,X j为节点v j的向量;当‖X i‖=||X j||=1时,余弦相似度等价于欧几里得距离,这允许在归一化之后使用近似最近邻搜索能够有效地定位给定节点v i的前k个相似节点(乘客)。因此,给定先前学习的用户(司机)向量,通过计算其表示向量X i和X j之间的余弦相似性来找到给定司机的潜在乘客。具有最大相似性的k个乘客被识别为共乘参与者,然后可以对候选人进行排名并观察共乘类型。 Wherein, X i is the node of vector v i, X j is the vector of node v j; when ‖X i || = || X j || = 1, cosine similarity is equivalent to the Euclidean distance, which allows using approximate nearest neighbor searching after normalization can be efficiently positioned to the first k similar node of a given node v i (passenger). Thus, a given user has previously learned (the driver) vector, which is calculated by the cosine between vectors represented by X i and X j to find the similarity of the driver given a potential passenger. The k passengers with the greatest similarity are identified as ridesharing participants, and then the candidates can be ranked and the ridesharing type can be observed.
本发明使用滴滴用户的轨迹数据集构造异构共乘异构网络模型,并对不同类型的共乘进行了分类。提出两类元路径的定义,并在共乘网络中生成出元路径序列集,对其进行负采样skip-gram生成用户的表示向量,最后采用余弦相似性算法实现用户间的相似性计算,使用前k相似度预测可以进行共乘的乘客。本发明提出的共乘推荐方法比传统的仅使用距离推荐的方法更加可靠,语义理解性直观,能够准确发现潜在的共乘用户,为其提供更为快捷方便的服务。The present invention uses the trajectory data set of Didi users to construct a heterogeneous co-multiplication heterogeneous network model, and classifies different types of co-multiplication. The definition of two types of meta-paths is proposed, and the meta-path sequence set is generated in the co-multiplication network, and skip-gram is negatively sampled to generate the user's representation vector. Finally, the cosine similarity algorithm is used to realize the similarity calculation between users. The top k similarity predicts the passengers who can share the ride. The ride sharing recommendation method proposed by the present invention is more reliable than the traditional method that only uses distance recommendation, has intuitive semantic comprehension, can accurately find potential ride sharing users, and provide them with faster and more convenient services.

Claims (8)

  1. 一种基于网络表示学习的网约共享出行人员匹配方法,其特征在于,包括以下步骤:An online appointment sharing traveler matching method based on network representation learning is characterized in that it includes the following steps:
    步骤一:共乘分类Step 1: Carpool classification
    在司机原始路径确定的情况下,根据乘客的起点和终点与司机原始路径的关系,将共乘分为两类:第一类是端点共乘,乘客的起点和终点在司机的原始路径上;另一类是沿途共乘,乘客的起点和终点都不在司机的原始路径上,乘客需要从起点步行至上车点,然后达成共乘,再由下车点步行至目的地,共乘路径轨迹是乘客轨迹的一部分;In the case that the driver's original route is determined, based on the relationship between the passenger's starting point and end point and the driver's original route, carpooling is divided into two categories: the first type is end-point carpooling, where the starting and ending points of passengers are on the driver's original route; The other type is shared rides along the way. The starting and ending points of passengers are not on the original path of the driver. Passengers need to walk from the starting point to the boarding point, then reach a shared ride, and then walk from the point of disembarkation to the destination. The shared path is Part of the passenger trajectory;
    步骤二:构建异质共乘网络Step 2: Build a heterogeneous ride-sharing network
    将司机与乘客的请求信息表示为异质共乘网络形式,乘客与司机之间通过位置与时间信息相连接,从而构建异质共乘网络;The request information of the driver and the passenger is expressed in the form of a heterogeneous ride-sharing network, and the passenger and the driver are connected by location and time information to construct a heterogeneous ride-sharing network;
    步骤三:使用网络表示学习模型对异质共乘网络进行表示学习,得到用户节点的低维向量表示;Step 3: Use the network representation learning model to perform representation learning on the heterogeneous co-multiplying network to obtain the low-dimensional vector representation of the user node;
    步骤四:根据用户节点的低维向量表示计算司机与乘客节点的余弦相似度,并将计算得到的余弦相似值由大到小排序,返回与司机相似度数值最高的前k个乘客,作为可以共乘的乘客,达成共乘。Step 4: Calculate the cosine similarity between the driver and the passenger node according to the low-dimensional vector representation of the user node, and sort the calculated cosine similarity values from large to small, and return the top k passengers with the highest similarity to the driver as possible Passengers sharing a ride, reach a shared ride.
  2. 根据权利要求1所述的一种基于网络表示学习的网约共享出行人员匹配方法,其特征在于,步骤一中,司机与乘客的请求信息包括司机起点和终点、离开时间、司机轨迹、上下车位置、乘客起点和终点以及上下车时间。The method for matching online appointment sharing travelers based on network representation learning according to claim 1, characterized in that, in step 1, the request information of the driver and the passenger includes the starting point and ending point of the driver, the departure time, the trajectory of the driver, and getting on and off the vehicle. Location, starting and ending points of passengers, and time of getting on and off.
  3. 根据权利要求1所述的一种基于网络表示学习的网约共享出行人员匹配方法,其特征在于,步骤二中,异质共乘网络中节点的类型包括用户、地点、时间段以及活动。The method for matching online appointment shared travelers based on network representation learning according to claim 1, characterized in that, in step 2, the types of nodes in the heterogeneous ride-sharing network include users, locations, time periods, and activities.
  4. 根据权利要求1所述的一种基于网络表示学习的网约共享出行人员匹配方法,其特征在于,步骤三中,进行表示学习的过程具体包括以下两个步骤:The online appointment sharing traveler matching method based on network representation learning according to claim 1, characterized in that, in step 3, the process of representation learning specifically includes the following two steps:
    1)生成节点序列集:元路径指导节点在异质共乘网络中的游走,生成固定长度的节点序 列集;1) Generate node sequence set: The meta-path guides the node to walk in the heterogeneous co-multiplying network, and generates a fixed-length node sequence set;
    2)将生成的固定长度的节点序列集输入到skip-gram模型中进行训练,获取司机与乘客节点的向量表示。2) Input the generated fixed-length node sequence set into the skip-gram model for training, and obtain the vector representation of the driver and passenger nodes.
  5. 根据权利要求4所述的一种基于网络表示学习的网约共享出行人员匹配方法,其特征在于,步骤1)中,对于端点共乘,构建结构为ULTLU的元路径;对于沿途共乘,在时间段相同的约束下,构建结构为ULU的元路径。According to claim 4, a method for matching travelers with online appointment sharing based on network representation learning, characterized in that, in step 1), for endpoint sharing, construct a meta-path with a structure of ULTLU; for sharing along the way, Under the constraints of the same time period, construct a meta-path with a ULU structure.
  6. 根据权利要求4所述的一种基于网络表示学习的网约共享出行人员匹配方法,其特征在于,步骤三中,进行表示学习的具体过程如下:The online appointment sharing traveler matching method based on network representation learning according to claim 4, characterized in that, in step 3, the specific process of representation learning is as follows:
    首先,给定具体的元路径
    Figure PCTCN2019107011-appb-100001
    通过元路径
    Figure PCTCN2019107011-appb-100002
    来指导节点在异质共乘网络中的随机游走,生成固定长度的节点序列集;其次,对于固定长度的节点序列集中任意用户节点v u,假设某一节点在序列集中的位置序号是j,则该方法会选取节点集v j-c,…,v j+c作为邻居节点,c是skip-gram中窗口大小的一半;因此,给定用户节点v u,skip-gram模型的目标是最大化具有异构邻居节点的上下文的条件概率:
    First, given a specific meta path
    Figure PCTCN2019107011-appb-100001
    Through meta path
    Figure PCTCN2019107011-appb-100002
    To guide the random walk of nodes in the heterogeneous co-multiplication network, to generate a fixed-length node sequence set; secondly, for any user node v u in the fixed-length node sequence set, suppose the position number of a node in the sequence set is j , The method will select the node set v jc ,..., v j+c as neighbor nodes, and c is half of the window size in skip-gram; therefore, given the user node v u , the goal of the skip-gram model is to maximize Conditional probability of context with heterogeneous neighbor nodes:
    Figure PCTCN2019107011-appb-100003
    Figure PCTCN2019107011-appb-100003
    其中,N a(v u)是节点v u邻居节点的集合,
    Figure PCTCN2019107011-appb-100004
    是节点类型的集合,p(v j-c,…,v j+c|v u;θ)为在已知中心节点的条件下,上下文的条件概率;
    Wherein, N a (v u) v u neighbor nodes is the set of nodes,
    Figure PCTCN2019107011-appb-100004
    Is a set of node types, p(v jc ,…,v j+c |v u ; θ) is the conditional probability of the context under the condition of a known central node;
    在假设各个节点间相互独立的情况下,在已知中心节点的条件下,上下文的条件概率为logp(v j-c,…,v j+c|v u;θ)进一步分解为
    Figure PCTCN2019107011-appb-100005
    其中的p(v k|v u;θ)使用softmax函数定义了给定节点v u的上下文节点v k的条件概率;
    Under the assumption that each node is independent of each other, the conditional probability of the context is logp(v jc ,…,v j+c |v u ;θ) under the condition that the central node is known.
    Figure PCTCN2019107011-appb-100005
    Among them, p(v k |v u ; θ) uses the softmax function to define the conditional probability of the context node v k of the given node v u ;
    Figure PCTCN2019107011-appb-100006
    Figure PCTCN2019107011-appb-100006
    其中
    Figure PCTCN2019107011-appb-100007
    表示节点v u的表示向量;
    among them
    Figure PCTCN2019107011-appb-100007
    Represents the representation vector of node v u ;
    根据给定节点v u的上下文节点v k的条件概率生成表示向量,再采用负采样来优化表示向 量,得到异质共乘网络中每个用户节点的低维向量表示。 The representation vector is generated according to the conditional probability of the context node v k of the given node v u , and then negative sampling is used to optimize the representation vector to obtain the low-dimensional vector representation of each user node in the heterogeneous co-multiplication network.
  7. 根据权利要求6所述的一种基于网络表示学习的网约共享出行人员匹配方法,其特征在于,随机游走的游走概率
    Figure PCTCN2019107011-appb-100008
    如下:
    According to claim 6, a method for matching travelers with online appointment sharing based on network representation learning, characterized in that the walk probability of random walk
    Figure PCTCN2019107011-appb-100008
    as follows:
    Figure PCTCN2019107011-appb-100009
    Figure PCTCN2019107011-appb-100009
    Figure PCTCN2019107011-appb-100010
    Figure PCTCN2019107011-appb-100010
    公式(1)中,a表示节点类型,
    Figure PCTCN2019107011-appb-100011
    是路径上的一个a类型节点,
    Figure PCTCN2019107011-appb-100012
    表示沿着预定义的元路径
    Figure PCTCN2019107011-appb-100013
    上节点
    Figure PCTCN2019107011-appb-100014
    的邻居节点数量,ε表示网络中的链路集合,v i+1,
    Figure PCTCN2019107011-appb-100015
    表示两个节点在网络中能够构成一条链路,f v(v i+1)=a+1指的节点v i+1是a+1类型的节点;
    In formula (1), a represents the node type,
    Figure PCTCN2019107011-appb-100011
    Is a type a node on the path,
    Figure PCTCN2019107011-appb-100012
    Represents along a predefined meta path
    Figure PCTCN2019107011-appb-100013
    Upper node
    Figure PCTCN2019107011-appb-100014
    The number of neighbor nodes, ε represents the set of links in the network, v i+1 ,
    Figure PCTCN2019107011-appb-100015
    It means that two nodes can form a link in the network, f v (v i+1 )=a+1 means that the node v i+1 is a node of type a+1;
    公式(2)表示选取的元路径均为对称元路径。Formula (2) indicates that the selected element paths are all symmetrical element paths.
  8. 根据权利要求6所述的一种基于网络表示学习的网约共享出行人员匹配方法,其特征在于,采用负采样来优化表示向量的具体过程如下:The method for matching online appointment shared travelers based on network representation learning according to claim 6, characterized in that the specific process of using negative sampling to optimize the representation vector is as follows:
    Figure PCTCN2019107011-appb-100016
    Figure PCTCN2019107011-appb-100016
    其中
    Figure PCTCN2019107011-appb-100017
    是v u的随机负节点样本的集合,负采样节点集
    Figure PCTCN2019107011-appb-100018
    根据噪声分布p(v′ u)来采样,
    Figure PCTCN2019107011-appb-100019
    among them
    Figure PCTCN2019107011-appb-100017
    Is a set of random negative node samples of v u , negative sample node set
    Figure PCTCN2019107011-appb-100018
    Sampling according to the noise distribution p(v′ u ),
    Figure PCTCN2019107011-appb-100019
    然后使用随机梯度下降方法使得对数似然函数
    Figure PCTCN2019107011-appb-100020
    最大化,来更新(5)式中节点的向量表示,具体如式(6)、式(7)所示,得到异质共乘网络中每个用户节点的低维向量表示;
    Then use the stochastic gradient descent method to make the log likelihood function
    Figure PCTCN2019107011-appb-100020
    Maximize, to update the vector representation of the node in equation (5), specifically as shown in equations (6) and (7), to obtain the low-dimensional vector representation of each user node in the heterogeneous co-multiplying network;
    Figure PCTCN2019107011-appb-100021
    Figure PCTCN2019107011-appb-100021
    Figure PCTCN2019107011-appb-100022
    Figure PCTCN2019107011-appb-100022
    其中
    Figure PCTCN2019107011-appb-100023
    函数表明了v′ u是否是上下文邻居节点v k
    among them
    Figure PCTCN2019107011-appb-100023
    The function indicates whether v'u is a context neighbor node v k .
PCT/CN2019/107011 2019-04-02 2019-09-20 Method for matching ride-sharing travellers based on network representation learning WO2020199524A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910262393.6A CN110009455B (en) 2019-04-02 2019-04-02 Network contract sharing trip personnel matching method based on network representation learning
CN201910262393.6 2019-04-02

Publications (1)

Publication Number Publication Date
WO2020199524A1 true WO2020199524A1 (en) 2020-10-08

Family

ID=67169532

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/107011 WO2020199524A1 (en) 2019-04-02 2019-09-20 Method for matching ride-sharing travellers based on network representation learning

Country Status (2)

Country Link
CN (1) CN110009455B (en)
WO (1) WO2020199524A1 (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112311608A (en) * 2020-11-25 2021-02-02 中国人民解放军66136部队 Multilayer heterogeneous network space node characterization method
CN112580945A (en) * 2020-12-08 2021-03-30 吉首大学 Dispatching method based on multiple correlation coefficients and vehicle dispatching optimization system
CN112667877A (en) * 2020-12-25 2021-04-16 陕西师范大学 Scenic spot recommendation method and equipment based on tourist knowledge map
CN113239266A (en) * 2021-04-07 2021-08-10 中国人民解放军战略支援部队信息工程大学 Personalized recommendation method and system based on local matrix decomposition
CN113326884A (en) * 2021-06-11 2021-08-31 之江实验室 Efficient learning method and device for large-scale abnormal graph node representation
CN113626654A (en) * 2021-07-16 2021-11-09 苏州大学 Batch shortest path query method based on representation learning
CN113642625A (en) * 2021-08-06 2021-11-12 北京交通大学 Method and system for deducing individual trip purpose of urban rail transit passenger
CN113642796A (en) * 2021-08-18 2021-11-12 北京航空航天大学 Dynamic sharing electric automatic driving vehicle path planning method based on historical data
CN113902200A (en) * 2021-10-14 2022-01-07 中国平安财产保险股份有限公司 Path matching method, device and equipment and computer readable storage medium
CN113919529A (en) * 2021-09-28 2022-01-11 东南大学 Environmental impact evaluation method for online taxi appointment travel
CN113947245A (en) * 2021-10-20 2022-01-18 辽宁工程技术大学 Multi-passenger multi-driver sharing matching method and system based on order accumulation
CN114124729A (en) * 2021-11-23 2022-03-01 重庆邮电大学 Dynamic heterogeneous network representation method based on meta-path
CN114461934A (en) * 2021-12-31 2022-05-10 北京工业大学 Multi-modal travel mode fusion recommendation method based on dynamic traffic network
CN114547408A (en) * 2022-01-18 2022-05-27 北京工业大学 Similar student searching method based on fine-grained student space-time behavior heterogeneous network representation
CN114595480A (en) * 2022-03-04 2022-06-07 中国科学技术大学 Real-time passenger and driver matching method with personalized location privacy protection
CN115146956A (en) * 2022-06-30 2022-10-04 东南大学 Internet appointment vehicle sharing trip man-vehicle matching method
CN115495678A (en) * 2022-11-21 2022-12-20 中南大学 Co-multiplication matching method, system and equipment based on sparse cellular signaling data
CN116108679A (en) * 2023-02-15 2023-05-12 北京工业大学 Method and system for allocating traffic flow of simultaneous traveling
CN117252318A (en) * 2023-09-26 2023-12-19 武汉理工大学 Intelligent networking automobile group machine collaborative carpooling scheduling method and system

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110009455B (en) * 2019-04-02 2022-02-15 长安大学 Network contract sharing trip personnel matching method based on network representation learning
CN113159357B (en) * 2020-01-07 2023-11-24 北京嘀嘀无限科技发展有限公司 Data processing method, device, electronic equipment and computer readable storage medium
CN112964266B (en) * 2021-02-04 2022-08-19 西北大学 Network contract service single-path-splicing planning method and storage medium
CN113361916A (en) * 2021-06-04 2021-09-07 付鑫 Multi-mode sharing travel fusion scheduling optimization system considering single-cut scene
CN113345590B (en) * 2021-06-29 2022-12-16 安徽大学 User mental health monitoring method and system based on heterogeneous graph
CN113656746B (en) * 2021-07-21 2022-06-17 东南大学 Travel mode chain selection method considering group heterogeneity under dynamic structure
CN113888138B (en) * 2021-10-27 2024-05-14 重庆邮电大学 Project management method based on blockchain and network representation learning recommendation
CN117119387B (en) * 2023-10-25 2024-01-23 北京市智慧交通发展中心(北京市机动车调控管理事务中心) Method and device for constructing user travel chain based on mobile phone signaling data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009007965A2 (en) * 2007-07-09 2009-01-15 Technion Research & Development Foundation Ltd Routing methods for multiple geographical entities
CN105631531A (en) * 2015-11-26 2016-06-01 东莞酷派软件技术有限公司 Driving friend recommendation method, driving friend recommendation device and server
CN109544900A (en) * 2018-11-21 2019-03-29 长安大学 A kind of route matching method that the privacy multiplying trip altogether towards passenger and driver retains
CN110009455A (en) * 2019-04-02 2019-07-12 长安大学 It is a kind of based on network representation study net about share out administrative staff's matching process

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503022B (en) * 2015-09-08 2020-12-01 北京邮电大学 Method and device for pushing recommendation information
CN106447387A (en) * 2016-08-31 2017-02-22 上海交通大学 Air ticket personalized recommendation method based on shared account passenger prediction
CN108256590B (en) * 2018-02-23 2019-04-02 长安大学 A kind of similar traveler recognition methods based on compound first path

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009007965A2 (en) * 2007-07-09 2009-01-15 Technion Research & Development Foundation Ltd Routing methods for multiple geographical entities
CN105631531A (en) * 2015-11-26 2016-06-01 东莞酷派软件技术有限公司 Driving friend recommendation method, driving friend recommendation device and server
CN109544900A (en) * 2018-11-21 2019-03-29 长安大学 A kind of route matching method that the privacy multiplying trip altogether towards passenger and driver retains
CN110009455A (en) * 2019-04-02 2019-07-12 长安大学 It is a kind of based on network representation study net about share out administrative staff's matching process

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHEN, JUN: "Travel Behavior Analysis and Similarity Measurement Based on Heterogeneous Information Network", CHINA MASTER'S THESES FULL-TEXT DATABASE, ENGINEERING SCIENCE & TECHNOLOGY II, 15 January 2019 (2019-01-15), ISSN: 1674-0246, DOI: 20191220140750Y *

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112311608A (en) * 2020-11-25 2021-02-02 中国人民解放军66136部队 Multilayer heterogeneous network space node characterization method
CN112580945A (en) * 2020-12-08 2021-03-30 吉首大学 Dispatching method based on multiple correlation coefficients and vehicle dispatching optimization system
CN112667877A (en) * 2020-12-25 2021-04-16 陕西师范大学 Scenic spot recommendation method and equipment based on tourist knowledge map
CN113239266A (en) * 2021-04-07 2021-08-10 中国人民解放军战略支援部队信息工程大学 Personalized recommendation method and system based on local matrix decomposition
CN113326884A (en) * 2021-06-11 2021-08-31 之江实验室 Efficient learning method and device for large-scale abnormal graph node representation
CN113326884B (en) * 2021-06-11 2023-06-16 之江实验室 Efficient learning method and device for large-scale heterograph node representation
CN113626654A (en) * 2021-07-16 2021-11-09 苏州大学 Batch shortest path query method based on representation learning
CN113626654B (en) * 2021-07-16 2023-09-15 苏州大学 Batch shortest path query method based on representation learning
CN113642625A (en) * 2021-08-06 2021-11-12 北京交通大学 Method and system for deducing individual trip purpose of urban rail transit passenger
CN113642625B (en) * 2021-08-06 2024-02-02 北京交通大学 Method and system for deducing individual travel purposes of urban rail transit passengers
CN113642796A (en) * 2021-08-18 2021-11-12 北京航空航天大学 Dynamic sharing electric automatic driving vehicle path planning method based on historical data
CN113642796B (en) * 2021-08-18 2024-06-04 北京航空航天大学 Dynamic sharing electric automatic driving vehicle path planning method based on historical data
CN113919529A (en) * 2021-09-28 2022-01-11 东南大学 Environmental impact evaluation method for online taxi appointment travel
CN113902200A (en) * 2021-10-14 2022-01-07 中国平安财产保险股份有限公司 Path matching method, device and equipment and computer readable storage medium
CN113947245A (en) * 2021-10-20 2022-01-18 辽宁工程技术大学 Multi-passenger multi-driver sharing matching method and system based on order accumulation
CN114124729A (en) * 2021-11-23 2022-03-01 重庆邮电大学 Dynamic heterogeneous network representation method based on meta-path
CN114461934A (en) * 2021-12-31 2022-05-10 北京工业大学 Multi-modal travel mode fusion recommendation method based on dynamic traffic network
CN114547408B (en) * 2022-01-18 2024-04-02 北京工业大学 Similar student searching method based on fine-grained student space-time behavior heterogeneous network characterization
CN114547408A (en) * 2022-01-18 2022-05-27 北京工业大学 Similar student searching method based on fine-grained student space-time behavior heterogeneous network representation
CN114595480A (en) * 2022-03-04 2022-06-07 中国科学技术大学 Real-time passenger and driver matching method with personalized location privacy protection
CN114595480B (en) * 2022-03-04 2024-04-02 中国科学技术大学 Real-time passenger and driver matching method with personalized location privacy protection
CN115146956A (en) * 2022-06-30 2022-10-04 东南大学 Internet appointment vehicle sharing trip man-vehicle matching method
CN115495678B (en) * 2022-11-21 2023-04-07 中南大学 Co-multiplication matching method, system and equipment based on sparse cellular signaling data
CN115495678A (en) * 2022-11-21 2022-12-20 中南大学 Co-multiplication matching method, system and equipment based on sparse cellular signaling data
CN116108679A (en) * 2023-02-15 2023-05-12 北京工业大学 Method and system for allocating traffic flow of simultaneous traveling
CN117252318A (en) * 2023-09-26 2023-12-19 武汉理工大学 Intelligent networking automobile group machine collaborative carpooling scheduling method and system
CN117252318B (en) * 2023-09-26 2024-04-09 武汉理工大学 Intelligent networking automobile group machine collaborative carpooling scheduling method and system

Also Published As

Publication number Publication date
CN110009455A (en) 2019-07-12
CN110009455B (en) 2022-02-15

Similar Documents

Publication Publication Date Title
WO2020199524A1 (en) Method for matching ride-sharing travellers based on network representation learning
CN107133262B (en) A kind of personalized POI recommended methods based on more influence insertions
CN109726336B (en) POI recommendation method combining travel interest and social preference
JP6015467B2 (en) Passenger search device, passenger search system and method
Chen et al. ScenicPlanner: planning scenic travel routes leveraging heterogeneous user-generated digital footprints
CN111931998B (en) Individual travel mode prediction method and system based on mobile positioning data
CN111141301A (en) Navigation end point determining method, device, storage medium and computer equipment
CN109284443A (en) A kind of tourism recommended method and system based on crawler technology
CN111538916B (en) Interest point recommendation method based on neural network and geographic influence
CN112311608A (en) Multilayer heterogeneous network space node characterization method
CN105447116A (en) Mamdani algorithm based parking guidance decision-making method
Ogudo et al. Sentiment analysis application and natural language processing for mobile network operators’ support on social media
Xu et al. A taxi dispatch system based on prediction of demand and destination
Luo et al. Exploring destination image through online reviews: an augmented mining model using latent Dirichlet allocation combined with probabilistic hesitant fuzzy algorithm
CN111737826A (en) Rail transit automatic simulation modeling method and device based on reinforcement learning
CN103839105A (en) Itinerary recommending method and device
CN113159371B (en) Unknown target feature modeling and demand prediction method based on cross-modal data fusion
Garcia et al. Hybrid approach for the public transportation time dependent orienteering problem with time windows
Ajani et al. Dynamic path planning approaches based on artificial intelligence and machine learning
CN113095570B (en) Bicycle riding path recommending method based on demand difference
Rahaman et al. Coact: A framework for context-aware trip planning using active transport
Wickramasinghe et al. Plus go: Intelligent complementary ride-sharing system
Heidari et al. Estimating origin-destination matrices using an efficient moth flame-based spatial clustering approach
Zhang et al. Route planning using divide-and-conquer: A GAT enhanced insertion transformer approach
Guo Group popular travel route recommendation method based on dynamic clustering

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19923453

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19923453

Country of ref document: EP

Kind code of ref document: A1