US20210398431A1 - System and method for ride order dispatching - Google Patents

System and method for ride order dispatching Download PDF

Info

Publication number
US20210398431A1
US20210398431A1 US17/460,608 US202117460608A US2021398431A1 US 20210398431 A1 US20210398431 A1 US 20210398431A1 US 202117460608 A US202117460608 A US 202117460608A US 2021398431 A1 US2021398431 A1 US 2021398431A1
Authority
US
United States
Prior art keywords
location
current
vehicle
available
orders
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US17/460,608
Inventor
Zhiwei Qin
Fei Feng
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Didi Infinity Technology and Development Co Ltd
Original Assignee
Beijing Didi Infinity Technology and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Didi Infinity Technology and Development Co Ltd filed Critical Beijing Didi Infinity Technology and Development Co Ltd
Priority to US17/460,608 priority Critical patent/US20210398431A1/en
Assigned to BEIJING DIDI INFINITY TECHNOLOGY AND DEVELOPMENT CO., LTD. reassignment BEIJING DIDI INFINITY TECHNOLOGY AND DEVELOPMENT CO., LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DIDI (HK) SCIENCE AND TECHNOLOGY LIMITED
Assigned to DIDI (HK) SCIENCE AND TECHNOLOGY LIMITED reassignment DIDI (HK) SCIENCE AND TECHNOLOGY LIMITED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DIDI RESEARCH AMERICA, LLC
Assigned to DIDI RESEARCH AMERICA, LLC reassignment DIDI RESEARCH AMERICA, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: FENG, Fei, QIN, Zhiwei
Publication of US20210398431A1 publication Critical patent/US20210398431A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/20Monitoring the location of vehicles belonging to a group, e.g. fleet of vehicles, countable or determined number of vehicles
    • G08G1/202Dispatching vehicles on the basis of a location, e.g. taxi dispatching
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/34Route searching; Route guidance
    • G01C21/3407Route searching; Route guidance specially adapted for specific applications
    • G01C21/3438Rendez-vous, i.e. searching a destination where several users can meet, and the routes to this destination for these users; Ride sharing, i.e. searching a route such that at least two users can share a vehicle for at least part of the route
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N7/005
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/02Reservations, e.g. for tickets, services or events
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • G06Q10/047Optimisation of routes or paths, e.g. travelling salesman problem
    • G06Q50/30
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry

Definitions

  • This disclosure generally relates to methods and devices for ride order dispatching.
  • a vehicle dispatch platform can automatically allocate transportation requests to corresponding vehicles for providing transportation services and reward the drivers.
  • Various embodiments of the present disclosure include systems, methods, and non-transitory computer readable media for ride order dispatching.
  • a computer-implemented method for ride order dispatching comprises: obtaining a current location of a current vehicle from a computing device associated with the current vehicle, obtaining a current list of available orders nearby based on the current location, feeding the current location, the current list of available orders nearby, and a current time to a trained Markov Decision Process (MDP) model to obtain action information, the action information being repositioning the current vehicle to another current location or completing a current ride order by the current vehicle, and transmitting the generated action information to the computing device to cause the current vehicle to reposition to the another current location, stay at the current location, or accept the current ride order by proceeding to a pick-up location of the current ride order.
  • MDP Markov Decision Process
  • the MDP model is trained based on a plurality of historical or simulated vehicle trips under a policy of maximizing a cumulative reward for a training vehicle completing the historical or simulated vehicle trips.
  • the MDP model discretizes a region into repeating zones and a time period into time slots.
  • Each state of the MDP model comprises: a time represented by a time lot index, a location represented by a repeating zone index, and a list of available orders nearby represented by repeating zone indices of destinations of the available orders nearby.
  • Each action of the MDP model comprises: completing one of the available orders nearby from the list, repositioning to another location, or staying at the location.
  • the training vehicle gets a fare for completing the one available order nearby as the reward, and the state transitions to a next state comprising: a next time corresponding to completion of the one available order nearby, a next location corresponding a destination of the one available order nearby, and a next list of available orders nearby corresponding the next location. If the training vehicle repositions to the another location, the training vehicle gets no reward, and the state transitions to a next state comprising: a next time corresponding to reaching the another location, the another location, and a next list of available orders nearby corresponding the another location. If the training vehicle stays at the location, the training vehicle gets no reward, and the state transitions to a next state comprising: the time, the location, and the list of available orders nearby.
  • the location and pick-up locations of the available orders nearby are in a same repeating zone.
  • the current location and a pick-up location of the current ride order are within a same repeating zone.
  • all orders with pick-up locations in a repeating zone corresponding to the location are divided averagely and randomly among all vehicles in the repeating zone corresponding to the location to obtain the list of available orders nearby for the training vehicle.
  • all current orders with pick-up locations in a repeating zone corresponding to the current location are divided averagely and randomly among all current vehicles in the repeating zone corresponding to the current location to obtain the current list of available orders nearby for the current vehicle.
  • the list of available orders nearby for the training vehicle is a ceiling function of a division of the all orders with pick-up locations in the repeating zone corresponding to the location by the all vehicles in the repeating zone corresponding to the location.
  • the current list of available orders nearby for the current vehicle is a ceiling function of a division of the all current orders with pick-up locations in the repeating zone corresponding to the current location by the all current vehicles in the repeating zone corresponding to the current location to obtain.
  • a computer-implemented method for ride order dispatching comprises: obtaining a current location of a current vehicle from a computing device associated with the current vehicle, obtaining a current number of available orders nearby in a current list based on the current location, feeding the current location, the current number of available orders nearby, and a current time to a solved Markov Decision Process (MDP) model to obtain action information, the action information being repositioning the current vehicle to another current location or completing a current ride order by the current vehicle, and transmitting the generated action information to the computing device to cause the current vehicle to reposition to the another current location, stay at the current location, or accept the current ride order by proceeding to a pick-up location of the current ride order.
  • MDP Markov Decision Process
  • the MDP model is solved based on a plurality of historical or simulated vehicle trips under a policy of maximizing a cumulative reward for a hypothetical vehicle completing the historical or simulated vehicle trips.
  • the MDP model discretizes a region into repeating zones and a time period into time slots.
  • Each state of the MDP model comprises: a time represented by a time lot index, a location represented by a repeating zone index, and a number of available orders nearby in a list, the available orders nearby represented by repeating zone indices of destinations of the available orders nearby.
  • Each action of the MDP model comprises: completing one of the available orders nearby from the list, repositioning to another location, or staying at the location.
  • the hypothetical vehicle completes one of the available orders nearby from the list, the hypothetical vehicle gets a fare for completing the one available order nearby as the reward, and the state transitions to a next state comprising: a next time corresponding to completion of the one available order nearby, a next location corresponding a destination of the one available order nearby, and a next number of available orders nearby in a next list corresponding the next location.
  • the hypothetical vehicle repositions to the another location, the hypothetical vehicle gets no reward, and the state transitions to a next state comprising: a next time corresponding to reaching the another location, the another location, and a next number of available orders nearby in a next list corresponding the another location.
  • the hypothetical vehicle stays at the location, the hypothetical vehicle gets no reward, and the state transitions to a next state comprising: the time, the location, and the number of available orders nearby in the list.
  • the location and pick-up locations of the available orders nearby are in a same repeating zone.
  • the current location and a pick-up location of the current ride order are within a same repeating zone.
  • all orders with pick-up locations in a repeating zone corresponding to the location are divided averagely and randomly among all vehicles in the repeating zone corresponding to the location to obtain the number of available orders nearby for the hypothetical vehicle.
  • all current orders with pick-up locations in a repeating zone corresponding to the current location are divided averagely and randomly among all current vehicles in the repeating zone corresponding to the current location to obtain the current number of available orders nearby for the current vehicle.
  • the number of available orders nearby for the hypothetical vehicle is a ceiling function of a division of the all orders with pick-up locations in the repeating zone corresponding to the location by the all vehicles in the repeating zone corresponding to the location.
  • the current number of available orders nearby for the current vehicle is a ceiling function of a division of the all current orders with pick-up locations in the repeating zone corresponding to the current location by the all current vehicles in the repeating zone corresponding to the current location to obtain.
  • solving the MDP model comprises solving the MDP model based on applying acceleration and variance reduction algorithms to tabular implementation.
  • solving the MDP model based on the plurality of historical or simulated vehicle trips comprises: obtaining data for each of the historical vehicle trips, the data comprising: a historical pick-up time, a historical pick-up location, a historical drop-off time, and a historical drop-off location, training a random forest classifier with the historical pick-up time, the historical pick-up location, and the historical drop-off location as training data and with the historical drop-off time minus the historical pick-up time as label to build a cruise time estimator, the cruise time estimator estimating a time to reach a destination based on the time, the location, and the destination of the one available order nearby, or based on the time, the location, and the anther location, and applying the cruise time estimator in each state transition to determine the next time corresponding to completion of the one available order nearby or to determine the next time corresponding to reaching the another location.
  • a system for ride order dispatching may comprise a processor and a non-transitory computer-readable storage medium storing instructions that, when executed by the processor, cause the system to perform a method for ride order dispatching, which may be any method described herein.
  • FIG. 1 illustrates an exemplary environment for ride order dispatching, in accordance with various embodiments.
  • FIG. 2 illustrates an exemplary system for program security protection, in accordance with various embodiments.
  • FIG. 3A illustrates an exemplary action search for ride order dispatching for model 1, in accordance with various embodiments.
  • FIG. 3B illustrates an exemplary action search for ride order dispatching for model 2, in accordance with various embodiments.
  • FIG. 4A illustrates a flowchart of an exemplary method for program security protection, in accordance with various embodiments.
  • FIG. 4B illustrates a flowchart of another exemplary method for program security protection, in accordance with various embodiments.
  • FIG. 5 illustrates a block diagram of an exemplary computer system in which any of the embodiments described herein may be implemented.
  • Vehicle platforms may be provided for transportation services.
  • vehicle platform may also be referred to as a vehicle hailing or vehicle dispatching platform, accessible through devices such as mobile phones installed with a platform application.
  • users can transmit transportation requests (e.g., a pick-up location, a destination, a current location of the user) to the vehicle platform.
  • the vehicle platform may relay the requests to vehicle drivers based on various factors (e.g., proximity to the location of the requestor or the pick-up location).
  • the vehicle drivers can choose from the requests, and each can pick one to accept, fulfill the request, and be rewarded accordingly.
  • the vehicle driver may search for more requests or receive more requests from a push-based dispatching platform, and the results may vary depending on the demand for the vehicle service. For example, the results may return many requests if the vehicle is at a bar area on a weekend night, or may return no request if the vehicle is at a far-flung trip on a weekday evening.
  • MDP Markov Decision Process
  • the disclosed models may consider competition and learn a smart strategy that can instruct the target driver how to cruise so as to maximize his/her long-term revenue.
  • This strategy can be referred to as an instruction system.
  • the first model (model 1) reflects the reality in a very detailed and accurate way, and the strategy trained from it is also closer to the real-life and can be deployed directly. However, the model's size may be large, which makes applying tabular implementation impossible. Therefore, model 1 may be solved by an algorithm involving neural network to find a strategy.
  • the second model (model 2) is a simplified version of the first model and has a much smaller size. As a trade-off, the second model is less accurate than the first model but makes tabular implementation possible.
  • the second model may be solved with a novel stochastic formulation. The advantage of such formulation is allowing applications optimization algorithms to the model. With acceleration and variance reduction, a solution to the second model can be found with better quality in a shorter time.
  • the disclosed systems and methods utilize the two disclosed models to obtain an optimal strategy that surpasses human decisions (decisions made by drivers of whether to reposition or take an order) and other models in terms of reward maximization. Therefore, the disclosed systems and methods improve the computer functionality by (1) providing such strategy to drivers in real time, allowing the drivers to maximize their earnings without reliance on personal experiences, and (2) allowing the vehicle platform to automatically dispatch vehicles, enhancing the functionality and user experience of the software platform.
  • the disclosed system may deploy the first model in all types of areas, and may deploy the second model in rural areas where the model size is reduced with less data of the states.
  • FIG. 1 illustrates an exemplary environment 100 for dispatching ride order, in accordance with various embodiments.
  • the exemplary environment 100 can comprise at least one computing system 102 that includes one or more processors 104 and memory 106 .
  • the memory 106 may be non-transitory and computer-readable.
  • the memory 106 may store instructions that, when executed by the one or more processors 104 , cause the one or more processors 104 to perform various operations described herein.
  • the system 102 may be implemented on or as devices such as mobile phone, tablet, server, computer, wearable device, etc.
  • the system 102 above may be installed with a software (e.g., platform program) and/or hardware (e.g., wires, wireless connections) to access other devices of the environment 100 .
  • a software e.g., platform program
  • hardware e.g., wires, wireless connections
  • the environment 100 may include one or more data stores (e.g., a data store 108 ) and one or more computing devices (e.g., a computing device 109 ) that are accessible to the system 102 .
  • the system 102 may be configured to obtain data (e.g., first training data and second training data such as location, time, and fees for historical vehicle transportation trips) from the data store 108 (e.g., a database or dataset of historical transportation trips) and/or the computing device 109 (e.g., a computer, a server, a mobile phone used by a driver or passenger that captures transportation trip information such as time, location, and fees).
  • the system 102 may use the obtained data to train the algorithm for ride order dispatching.
  • the location may comprise GPS (Global Positioning System) coordinates of a vehicle.
  • the environment 100 may further include one or more computing devices (e.g., computing devices 110 and 111 ) coupled to the system 102 .
  • the computing devices 110 and 111 may comprise cellphone, tablet, computer, wearable device, etc.
  • the computing devices 110 and 111 may transmit or receive data to or from the system 102 .
  • the system 102 may implement an online information or service platform.
  • the service may be associated with vehicles (e.g., cars, bikes, boats, airplanes, etc.), and the platform may be referred to as a vehicle (service hailing or ride order dispatching) platform.
  • the platform may accept requests for transportation, identify vehicles to fulfill the requests, arrange for pick-ups, and process transactions.
  • a user may use the computing device 110 (e.g., a mobile phone installed with a software application associated with the platform) to request transportation from the platform.
  • the system 102 may receive the request and relay it to various vehicle drivers (e.g., by posting the request to mobile phones carried by the drivers).
  • a vehicle driver may use the computing device 111 (e.g., another mobile phone installed with the application associated with the platform) to accept the posted transportation request and obtain pick-up location information.
  • Fees e.g., transportation fees
  • Some platform data may be stored in the memory 106 or retrievable from the data store 108 and/or the computing devices 109 , 110 , and 111 . For example, for each trip, the location of the origin and destination (e.g., transmitted by the computing device 111 ), the fee, and the time can be obtained by the system 102 .
  • the system 102 and the one or more of the computing devices may be integrated in a single device or system.
  • the system 102 and the one or more computing devices may operate as separate devices.
  • the data store(s) may be anywhere accessible to the system 102 , for example, in the memory 106 , in the computing device 109 , in another device (e.g., network storage device) coupled to the system 102 , or another storage location (e.g., cloud-based storage system, network file system, etc.), etc.
  • the system 102 and the computing device 109 are shown as single components in this figure, it is appreciated that the system 102 and the computing device 109 can be implemented as single devices or multiple devices coupled together.
  • the system 102 may be implemented as a single system or multiple systems coupled to each other.
  • the system 102 , computing device 109 , data store 108 , and computing device 110 and 111 may be able to communicate with one another through one or more wired or wireless networks (e.g., the Internet) through which data can be communicated.
  • wired or wireless networks e.g., the Internet
  • FIG. 2 illustrates an exemplary system 200 for dispatching ride order, in accordance with various embodiments.
  • the system 102 may obtain data 202 (e.g., historical or simulated vehicle trips) from the data store 108 and/or the computing device 109 .
  • the obtained data 202 may be stored in the memory 106 .
  • the system 102 may train an algorithm with the obtained data 202 to learn a model (e.g., model 1) for dispatching ride order, or solve a model (e.g., model 2) for dispatching ride order.
  • the system 102 may obtain a current location of a current vehicle.
  • the computing device 111 may transmit a query 204 to the system 102 , the query comprising a Global Positioning System (GPS) location of the current vehicle.
  • the computing device 111 may be associated with a driver of a service vehicle including, for example, taxi, service-hailing vehicle, etc.
  • the system 102 may perform various steps with the comprised information, apply the model, and send data 207 to the computing device 111 or one or more other devices.
  • the system 102 may obtain a current list of available orders nearby based on the current location, and apply any disclosed model with inputs such as the current location, the current time, and the available orders nearby.
  • the data 207 may comprise an instruction or recommendation for an action, such as re-positioning to another location, accepting a new ride order, etc.
  • the driver may accept the instruction to accept an order or reposition, or refuse and remain at the same position.
  • the rider order dispatching problem can be approached by Reinforcement learning (RL).
  • RL is a type of machine learning, which emphasizes on learning through interacting with the environment.
  • an RL agent repeatedly observes the current state of the environment, takes an action according to certain policy, gets an immediate reward and transits to a next state. Through trial-and-error, the agent aims to learn a best policy under which the expected cumulative reward can be maximized.
  • the RL for ride order dispatching may be modeled as a Markov decision process (MDP).
  • MDP Markov decision process
  • an infinite-horizon discounted Markov Decision Problem (DMDP) may be used.
  • i,j ⁇ , a ⁇ ⁇ which is unknown; a collection of state-transitional rewards R: ⁇ r ij (a)
  • the objective of DMDP is to maximize the expected cumulative reward regardless of the initial state so:
  • the agent is the system.
  • the system observes the state of the target driver, then provides an instruction for him/her to cruise.
  • the instruction can be an order request, then the driver will finish the order and end up at the drop-off location.
  • the instruction can be a re-positioning suggestion (without an order), then the driver can accept or decline such instruction.
  • the MDP model may be defined as follows.
  • the activity area of the target driver may be discretized into a plurality of repeating zones (e.g., a group of hex-cells as shown in FIG. 3A ) and twenty-four hours into a list of time slot, e.g., ten minutes per slot.
  • a plurality of repeating zones e.g., a group of hex-cells as shown in FIG. 3A
  • time slot e.g., ten minutes per slot.
  • each state may contain the following information:
  • next state and the reward depend on the scenarios.
  • the next state s′ and the reward r are then:
  • is an r.v. uniformly distributed in [0, 1] and S′ aon is the set of available orders nearby for the time t+T(t, h, h′) and the hex cell h′.
  • S′ aon is the set of available orders nearby for the time t+T(t, h, h′) and the hex cell h′.
  • the MDP of model 1 can be trained online with real-time data (e.g., historical trip data) or through a simulator (e.g., simulated trip data) with the inputs being a state and an action and the outputs being a next state and a reward.
  • a neural network such as DQN (deep neural network) can be applied, and the model 1 can be solved by data training and machine learning.
  • model 2 a simplified model can be obtained and applied with the amount of state data is not large.
  • model 2 the states and actions of the second model are described.
  • model 2 the basic settings are the same as model 1, except that each state contains the following information:
  • the driver's current location e.g., hex-cell index
  • n
  • aon (e.g., an integer).
  • Each action is still to transit to a next cell h′ (as shown in FIG. 3B ).
  • the arrow shown in FIG. 3B represents a transition from one state to another by an action.
  • the arrow's origin is the target driver with state (t, h, n).
  • the action is to transit to the hex-cell h′, for example, by taking an order or by repositioning.
  • the action is to finish an order:
  • the action is a re-positioning suggestion, and the driver accepts it;
  • the action is a re-positioning suggestion, and the driver rejects it.
  • T drive (t, h, h′) The time of cruise from h to h′ starting at t. f (t, h, h′) The fare of a cruise from h at t to h′.
  • P yesvt The probability of the driver accepting a vacant transition instruction.
  • P des (h′
  • n(t, h) The number of available orders nearby given time and location.
  • Model 2 can also be solved using a neuron network.
  • model 2 can be solved by applying acceleration and variance reduction algorithms on tabular implementation, the details of which are described below.
  • stochastic formulation can be first implemented. Given a DMDP instance ( , , P, ⁇ ) and a policy ⁇ , the value vector v ⁇ ⁇
  • the optimal value vector v* is defined as:
  • v i * max ⁇ ⁇ ⁇ v i ⁇ .
  • a maximizer is known as the optimal policy ⁇ *. It is well-known that a vector v* is the optimal value vector if and only if it satisfies the Bellman equation:
  • Every feasible point can recover a stationary randomized policy and any stationary randomized policy can form a feasible point. Indeed, there exists a bijection between the feasible set and the stationary randomized policy space.
  • v ⁇ : ⁇ v
  • +1, N 2 ⁇
  • Lemma 2 (4) is equivalent to (3) by letting ⁇ (i,a,j) follow the distribution, as below:
  • ⁇ ( i , a , j ) [ 0 0 A ( i , a , j ) T q i v ⁇ ( i ) ⁇ e i A ( i , a , j ) E a , i 0 r ij ⁇ ( a ) ⁇ ⁇ ( i , a ) ⁇ e a , i q i v ⁇ ( i ) ⁇ e i T 0 - r ij ⁇ ( a ) ⁇ ⁇ ( i , a ) ⁇ e a , i T 0 ] ⁇ ⁇ ⁇ ⁇ ⁇ ( i , a ) ⁇ p ij ⁇ ( a ) .
  • a (i,a,j) is a tall matrix with the same structure as ⁇ tilde over (P) ⁇ :
  • blocks each block has size
  • e i is a vector in
  • e a,i is a vector in
  • with only one nonzero entry at the ith component for block a whose value is 1: v(i) ⁇ a ⁇ (i,a).
  • ⁇ (i,a) is a distribution that can be imposed artificially, so engaging the value of ⁇ (i,a) in ⁇ is not a problem.
  • the only unknown and to-be-learned factor is p ij (a).
  • four types of alternative sampling-based algorithms can be used to solve problem (4).
  • optimization area usually an algorithm is developed for a class of problems, to be more abstract, a very general math format.
  • algorithm In order to use an algorithm, one should first write the to-be-solved problem into a correct format.
  • acceleration and variance reduction algorithms the original form (4) needs to be revised slightly.
  • v ′ argmin v ′ ⁇ V ⁇ ⁇ g ⁇ T ⁇ ( v ′ - v ) + 1 2 ⁇ ⁇ ⁇ ⁇ v ′ - v ⁇ 2 2
  • w ′ argmin w ′ ⁇ W ⁇ ⁇ g ⁇ T ⁇ ( w ′ - w ) + 1 2 ⁇ ⁇ ⁇ ⁇ w ′ - w ⁇ 2 2
  • ⁇ ′ argmin ⁇ ′ ⁇ U ⁇ ⁇ g ⁇ T ⁇ ( ⁇ ′ - ⁇ ) + 1 2 ⁇ ⁇ ⁇ ⁇ ⁇ ′ - ⁇ ⁇ 2 2 ,
  • the first algorithm used for solving problem (4) is Accelerated Stochastic Composition Gradient Descent (ASCGD).
  • ASCGD Accelerated Stochastic Composition Gradient Descent
  • the first algorithm ASC-RL (Accelerate Stochastic Composition algorithm for Reinforcement Learning) in Algorithm 1 is based on ASCGD.
  • the second algorithm used for solving problem (4) is Stochastic Accelerated GradiEnt (SAGE). This algorithm is targeted to problems like:
  • problem (4) does not belong to this group because the expectation is not outside.
  • problem (4) can be transformed into this form by introducing two i.i.d random variables ⁇ 1 , ⁇ 2 which follow the same distribution in Lemma 2. Then the new problem is:
  • the developed second algorithm ASGD-RL is summarized in Algorithm 2 to solve (5) based on an accelerated projected stochastic gradient descent algorithm called SAGE.
  • the third algorithm used for solving problem (4) is Katyusha-Xw developed.
  • Katyusha solves problems whose objective function is the sum of a group of functions with the sum being convex:
  • the fourth algorithm used for solving problem (4) is Prox-SVRG. This algorithm considers problems as:
  • a Sample Oracle (e.g., database) from a dataset.
  • each instance contains: order ID, driver ID, pick-up time, pick-up latitude, pick-up longitude, drop-off time, drop-off latitude, drop-off longitude. All time and locations may be simplified to discretized time indices and hex-cell indices. Then, the following information can be obtained by respective methods:
  • FIG. 4A illustrates a flowchart of an exemplary method 400 , according to various embodiments of the present disclosure.
  • the method 400 may be implemented in various environments including, for example, the environment 100 of FIG. 1 .
  • the exemplary method 400 may be implemented by one or more components of the system 102 (e.g., the processor 104 , the memory 106 ).
  • the exemplary method 400 may be implemented by multiple systems similar to the system 102 .
  • the operations of method 400 presented below are intended to be illustrative. Depending on the implementation, the exemplary method 400 may include additional, fewer, or alternative steps performed in various orders or in parallel.
  • the model in this figure may be referred to model 1 and related descriptions above. Model 1 may be solved by a neural network (e.g., DQN) with machine learning techniques.
  • a neural network e.g., DQN
  • Block 402 comprises obtaining a current location of a current vehicle from a computing device associated with the current vehicle.
  • Block 403 comprises obtaining a current list of available orders nearby based on the current location.
  • Block 404 comprises feeding the current location, the current list of available orders nearby, and a current time to a trained Markov Decision Process (MDP) model to obtain action information, the action information being repositioning the current vehicle to another current location or completing a current ride order by the current vehicle.
  • Block 405 comprises transmitting the generated action information to the computing device to cause the current vehicle to reposition to the another current location, stay at the current location, or accept the current ride order by proceeding to a pick-up location of the current ride order.
  • MDP Markov Decision Process
  • the MDP model is trained based on a plurality of historical or simulated vehicle trips under a policy of maximizing a cumulative reward for a training vehicle completing the historical or simulated vehicle trips.
  • the MDP model discretizes a region into repeating zones and a time period into time slots.
  • Each state of the MDP model comprises: a time represented by a time lot index, a location represented by a repeating zone index, and a list of available orders nearby represented by repeating zone indices of destinations of the available orders nearby.
  • Each action of the MDP model comprises: completing one of the available orders nearby from the list, repositioning to another location, or staying at the location.
  • the training vehicle gets a fare for completing the one available order nearby as the reward, and the state transitions to a next state comprising: a next time corresponding to completion of the one available order nearby, a next location corresponding a destination of the one available order nearby, and a next list of available orders nearby corresponding the next location. If the training vehicle repositions to the another location, the training vehicle gets no reward, and the state transitions to a next state comprising: a next time corresponding to reaching the another location, the another location, and a next list of available orders nearby corresponding the another location. If the training vehicle stays at the location, the training vehicle gets no reward, and the state transitions to a next state comprising: the time, the location, and the list of available orders nearby.
  • the location and pick-up locations of the available orders nearby are in a same repeating zone.
  • the current location and a pick-up location of the current ride order are within a same repeating zone.
  • all orders with pick-up locations in a repeating zone corresponding to the location are divided averagely and randomly among all vehicles in the repeating zone corresponding to the location to obtain the list of available orders nearby for the training vehicle.
  • all current orders with pick-up locations in a repeating zone corresponding to the current location are divided averagely and randomly among all current vehicles in the repeating zone corresponding to the current location to obtain the current list of available orders nearby for the current vehicle.
  • the list of available orders nearby for the training vehicle is a ceiling function of a division of the all orders with pick-up locations in the repeating zone corresponding to the location by the all vehicles in the repeating zone corresponding to the location.
  • the list may comprise five available orders nearby.
  • the current list of available orders nearby for the current vehicle is a ceiling function of a division of the all current orders with pick-up locations in the repeating zone corresponding to the current location by the all current vehicles in the repeating zone corresponding to the current location to obtain.
  • the list may comprise five available orders nearby.
  • FIG. 4B illustrates a flowchart of an exemplary method 410 , according to various embodiments of the present disclosure.
  • the method 410 may be implemented in various environments including, for example, the environment 100 of FIG. 1 .
  • the exemplary method 410 may be implemented by one or more components of the system 102 (e.g., the processor 104 , the memory 106 ).
  • the exemplary method 410 may be implemented by multiple systems similar to the system 102 .
  • the operations of method 410 presented below are intended to be illustrative. Depending on the implementation, the exemplary method 410 may include additional, fewer, or alternative steps performed in various orders or in parallel.
  • the model in this figure may be referred to model 2 and related descriptions above.
  • Model 2 may be solved by a neural network with machine learning technique or by applying acceleration and variance reduction algorithms described earlier. That is, solving the MDP model below may comprise solving the MDP model based on applying acceleration and variance reduction algorithms to tabular implementation.
  • Block 412 comprises obtaining a current location of a current vehicle from a computing device associated with the current vehicle.
  • Block 413 comprises obtaining a current number of available orders nearby in a current list based on the current location.
  • Block 414 comprises feeding the current location, the current number of available orders nearby, and a current time to a solved Markov Decision Process (MDP) model to obtain action information, the action information being repositioning the current vehicle to another current location or completing a current ride order by the current vehicle.
  • Block 415 comprises transmitting the generated action information to the computing device to cause the current vehicle to reposition to the another current location, stay at the current location, or accept the current ride order by proceeding to a pick-up location of the current ride order.
  • MDP Markov Decision Process
  • the MDP model is solved based on a plurality of historical or simulated vehicle trips under a policy of maximizing a cumulative reward for a hypothetical vehicle completing the historical or simulated vehicle trips.
  • the MDP model discretizes a region into repeating zones and a time period into time slots.
  • Each state of the MDP model comprises: a time represented by a time lot index, a location represented by a repeating zone index, and a number of available orders nearby in a list, the available orders nearby represented by repeating zone indices of destinations of the available orders nearby.
  • Each action of the MDP model comprises: completing one of the available orders nearby from the list, repositioning to another location, or staying at the location.
  • the hypothetical vehicle completes one of the available orders nearby from the list, the hypothetical vehicle gets a fare for completing the one available order nearby as the reward, and the state transitions to a next state comprising: a next time corresponding to completion of the one available order nearby, a next location corresponding a destination of the one available order nearby, and a next number of available orders nearby in a next list corresponding the next location.
  • the hypothetical vehicle repositions to the another location, the hypothetical vehicle gets no reward, and the state transitions to a next state comprising: a next time corresponding to reaching the another location, the another location, and a next number of available orders nearby in a next list corresponding the another location.
  • the hypothetical vehicle stays at the location, the hypothetical vehicle gets no reward, and the state transitions to a next state comprising: the time, the location, and the number of available orders nearby in the list.
  • the location and pick-up locations of the available orders nearby are in a same repeating zone.
  • the current location and a pick-up location of the current ride order are within a same repeating zone.
  • all orders with pick-up locations in a repeating zone corresponding to the location are divided averagely and randomly among all vehicles in the repeating zone corresponding to the location to obtain the number of available orders nearby for the hypothetical vehicle.
  • all current orders with pick-up locations in a repeating zone corresponding to the current location are divided averagely and randomly among all current vehicles in the repeating zone corresponding to the current location to obtain the current number of available orders nearby for the current vehicle.
  • the number of available orders nearby for the hypothetical vehicle is a ceiling function of a division of the all orders with pick-up locations in the repeating zone corresponding to the location by the all vehicles in the repeating zone corresponding to the location.
  • the number of available orders nearby may be five.
  • the current number of available orders nearby for the current vehicle is a ceiling function of a division of the all current orders with pick-up locations in the repeating zone corresponding to the current location by the all current vehicles in the repeating zone corresponding to the current location to obtain.
  • the number of available orders nearby may be five.
  • solving the MDP model based on the plurality of historical or simulated vehicle trips comprises: obtaining data for each of the historical vehicle trips, the data comprising: a historical pick-up time, a historical pick-up location, a historical drop-off time, and a historical drop-off location, training a random forest classifier with the historical pick-up time, the historical pick-up location, and the historical drop-off location as training data and with the historical drop-off time minus the historical pick-up time as label to build a cruise time estimator, the cruise time estimator estimating a time to reach a destination based on the time, the location, and the destination of the one available order nearby, or based on the time, the location, and the anther location, and applying the cruise time estimator in each state transition to determine the next time corresponding to completion of the one available order nearby or to determine the next time corresponding to reaching the another location.
  • FIG. 5 is a block diagram that illustrates a computer system 500 upon which any of the embodiments described herein may be implemented.
  • the system 500 may correspond to the system 102 or 103 described above.
  • the computer system 500 includes a bus 502 or other communication mechanism for communicating information, one or more hardware processors 504 coupled with bus 502 for processing information.
  • Hardware processor(s) 504 may be, for example, one or more general purpose microprocessors.
  • the processor(s) 504 may correspond to the processor 104 described above.
  • the computer system 500 also includes a main memory 506 , such as a random access memory (RAM), cache and/or other dynamic storage devices, coupled to bus 502 for storing information and instructions to be executed by processor 504 .
  • Main memory 506 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 504 .
  • Such instructions when stored in storage media accessible to processor 504 , render computer system 500 into a special-purpose machine that is customized to perform the operations specified in the instructions.
  • the computer system 500 further includes a read only memory (ROM) 508 or other static storage device coupled to bus 502 for storing static information and instructions for processor 504 .
  • ROM read only memory
  • a storage device 510 such as a magnetic disk, optical disk, or USB thumb drive (Flash drive), etc., is provided and coupled to bus 502 for storing information and instructions.
  • the main memory 506 , the ROM 508 , and/or the storage 510 may correspond to the memory 106 described above.
  • the computer system 500 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs computer system 500 to be a special-purpose machine. According to one embodiment, the operations, methods, and processes described herein are performed by computer system 500 in response to processor(s) 504 executing one or more sequences of one or more instructions contained in main memory 506 . Such instructions may be read into main memory 506 from another storage medium, such as storage device 510 . Execution of the sequences of instructions contained in main memory 506 causes processor(s) 504 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
  • the main memory 506 , the ROM 508 , and/or the storage 510 may include non-transitory storage media.
  • non-transitory media refers to a media that store data and/or instructions that cause a machine to operate in a specific fashion, the media excludes transitory signals.
  • Such non-transitory media may comprise non-volatile media and/or volatile media.
  • Non-volatile media includes, for example, optical or magnetic disks, such as storage device 510 .
  • Volatile media includes dynamic memory, such as main memory 506 .
  • non-transitory media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge, and networked versions of the same.
  • the computer system 500 also includes a network interface 518 coupled to bus 502 .
  • Network interface 518 provides a two-way data communication coupling to one or more network links that are connected to one or more local networks.
  • network interface 518 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line.
  • ISDN integrated services digital network
  • network interface 518 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN (or WAN component to communicated with a WAN).
  • LAN local area network
  • Wireless links may also be implemented.
  • network interface 518 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
  • the computer system 500 can send messages and receive data, including program code, through the network(s), network link and network interface 518 .
  • a server might transmit a requested code for an application program through the Internet, the ISP, the local network and the network interface 518 .
  • the received code may be executed by processor 504 as it is received, and/or stored in storage device 510 , or other non-volatile storage for later execution.
  • the various operations of exemplary methods described herein may be performed, at least partially, by an algorithm.
  • the algorithm may be comprised in program codes or instructions stored in a memory (e.g., a non-transitory computer-readable storage medium described above).
  • Such algorithm may comprise a machine learning algorithm.
  • a machine learning algorithm may not explicitly program computers to perform a function, but can learn from training data to make a predictions model that performs the function.
  • processors may be temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented engines that operate to perform one or more operations or functions described herein.
  • the methods described herein may be at least partially processor-implemented, with a particular processor or processors being an example of hardware.
  • a particular processor or processors being an example of hardware.
  • the operations of a method may be performed by one or more processors or processor-implemented engines.
  • the one or more processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS).
  • SaaS software as a service
  • at least some of the operations may be performed by a group of computers (as examples of machines including processors), with these operations being accessible via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., an Application Program Interface (API)).
  • API Application Program Interface
  • processors or processor-implemented engines may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other exemplary embodiments, the processors or processor-implemented engines may be distributed across a number of geographic locations.
  • the term “or” may be construed in either an inclusive or exclusive sense. Moreover, plural instances may be provided for resources, operations, or structures described herein as a single instance. Additionally, boundaries between various resources, operations, engines, and data stores are somewhat arbitrary, and particular operations are illustrated in a context of specific illustrative configurations. Other allocations of functionality are envisioned and may fall within a scope of various embodiments of the present disclosure. In general, structures and functionality presented as separate resources in the exemplary configurations may be implemented as a combined structure or resource. Similarly, structures and functionality presented as a single resource may be implemented as separate resources. These and other variations, modifications, additions, and improvements fall within a scope of embodiments of the present disclosure as represented by the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
  • Conditional language such as, among others, “can,” “could,” “might,” or “may,” unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements and/or steps. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding, with or without user input or prompting, whether these features, elements and/or steps are included or are to be performed in any particular embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Operations Research (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Quality & Reliability (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Game Theory and Decision Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Primary Health Care (AREA)
  • Automation & Control Theory (AREA)

Abstract

A method for ride order dispatching comprises: obtaining a current location of a current vehicle from a computing device associated with the current vehicle; obtaining a current list of available orders nearby based on the current location; feeding the current location, the current list of available orders nearby, and a current time to a trained Markov Decision Process (MDP) model to obtain action information, the action information being repositioning the current vehicle to another current location or completing a current ride order by the current vehicle; and transmitting the generated action information to the computing device to cause the current vehicle to reposition to the another current location, stay at the current location, or accept the current ride order by proceeding to a pick-up location of the current ride order.

Description

    TECHNICAL FIELD
  • This disclosure generally relates to methods and devices for ride order dispatching.
  • BACKGROUND
  • A vehicle dispatch platform can automatically allocate transportation requests to corresponding vehicles for providing transportation services and reward the drivers. However, it has been challenging to determine a ride order dispatching method that maximizes the gain for each vehicle driver.
  • SUMMARY
  • Various embodiments of the present disclosure include systems, methods, and non-transitory computer readable media for ride order dispatching.
  • According to one aspect, a computer-implemented method for ride order dispatching comprises: obtaining a current location of a current vehicle from a computing device associated with the current vehicle, obtaining a current list of available orders nearby based on the current location, feeding the current location, the current list of available orders nearby, and a current time to a trained Markov Decision Process (MDP) model to obtain action information, the action information being repositioning the current vehicle to another current location or completing a current ride order by the current vehicle, and transmitting the generated action information to the computing device to cause the current vehicle to reposition to the another current location, stay at the current location, or accept the current ride order by proceeding to a pick-up location of the current ride order.
  • The MDP model is trained based on a plurality of historical or simulated vehicle trips under a policy of maximizing a cumulative reward for a training vehicle completing the historical or simulated vehicle trips. The MDP model discretizes a region into repeating zones and a time period into time slots. Each state of the MDP model comprises: a time represented by a time lot index, a location represented by a repeating zone index, and a list of available orders nearby represented by repeating zone indices of destinations of the available orders nearby. Each action of the MDP model comprises: completing one of the available orders nearby from the list, repositioning to another location, or staying at the location. If the training vehicle completes one of the available orders nearby from the list, the training vehicle gets a fare for completing the one available order nearby as the reward, and the state transitions to a next state comprising: a next time corresponding to completion of the one available order nearby, a next location corresponding a destination of the one available order nearby, and a next list of available orders nearby corresponding the next location. If the training vehicle repositions to the another location, the training vehicle gets no reward, and the state transitions to a next state comprising: a next time corresponding to reaching the another location, the another location, and a next list of available orders nearby corresponding the another location. If the training vehicle stays at the location, the training vehicle gets no reward, and the state transitions to a next state comprising: the time, the location, and the list of available orders nearby.
  • In some embodiments, for training the MDP model, the location and pick-up locations of the available orders nearby are in a same repeating zone. For application of the trained MDP model, the current location and a pick-up location of the current ride order are within a same repeating zone.
  • In some embodiments, for training the MDP model, all orders with pick-up locations in a repeating zone corresponding to the location are divided averagely and randomly among all vehicles in the repeating zone corresponding to the location to obtain the list of available orders nearby for the training vehicle. For application of the trained MDP model, all current orders with pick-up locations in a repeating zone corresponding to the current location are divided averagely and randomly among all current vehicles in the repeating zone corresponding to the current location to obtain the current list of available orders nearby for the current vehicle.
  • In some embodiments, for training the MDP model, the list of available orders nearby for the training vehicle is a ceiling function of a division of the all orders with pick-up locations in the repeating zone corresponding to the location by the all vehicles in the repeating zone corresponding to the location. For application of the trained MDP model, the current list of available orders nearby for the current vehicle is a ceiling function of a division of the all current orders with pick-up locations in the repeating zone corresponding to the current location by the all current vehicles in the repeating zone corresponding to the current location to obtain.
  • According to another aspect, a computer-implemented method for ride order dispatching comprises: obtaining a current location of a current vehicle from a computing device associated with the current vehicle, obtaining a current number of available orders nearby in a current list based on the current location, feeding the current location, the current number of available orders nearby, and a current time to a solved Markov Decision Process (MDP) model to obtain action information, the action information being repositioning the current vehicle to another current location or completing a current ride order by the current vehicle, and transmitting the generated action information to the computing device to cause the current vehicle to reposition to the another current location, stay at the current location, or accept the current ride order by proceeding to a pick-up location of the current ride order.
  • The MDP model is solved based on a plurality of historical or simulated vehicle trips under a policy of maximizing a cumulative reward for a hypothetical vehicle completing the historical or simulated vehicle trips. The MDP model discretizes a region into repeating zones and a time period into time slots. Each state of the MDP model comprises: a time represented by a time lot index, a location represented by a repeating zone index, and a number of available orders nearby in a list, the available orders nearby represented by repeating zone indices of destinations of the available orders nearby. Each action of the MDP model comprises: completing one of the available orders nearby from the list, repositioning to another location, or staying at the location. If the hypothetical vehicle completes one of the available orders nearby from the list, the hypothetical vehicle gets a fare for completing the one available order nearby as the reward, and the state transitions to a next state comprising: a next time corresponding to completion of the one available order nearby, a next location corresponding a destination of the one available order nearby, and a next number of available orders nearby in a next list corresponding the next location. If the hypothetical vehicle repositions to the another location, the hypothetical vehicle gets no reward, and the state transitions to a next state comprising: a next time corresponding to reaching the another location, the another location, and a next number of available orders nearby in a next list corresponding the another location. If the hypothetical vehicle stays at the location, the hypothetical vehicle gets no reward, and the state transitions to a next state comprising: the time, the location, and the number of available orders nearby in the list.
  • In some embodiments, for solving the MDP model, the location and pick-up locations of the available orders nearby are in a same repeating zone. For application of the solved MDP model, the current location and a pick-up location of the current ride order are within a same repeating zone.
  • In some embodiments, for solving the MDP model, all orders with pick-up locations in a repeating zone corresponding to the location are divided averagely and randomly among all vehicles in the repeating zone corresponding to the location to obtain the number of available orders nearby for the hypothetical vehicle. For application of the solved MDP model, all current orders with pick-up locations in a repeating zone corresponding to the current location are divided averagely and randomly among all current vehicles in the repeating zone corresponding to the current location to obtain the current number of available orders nearby for the current vehicle.
  • In some embodiments, for solving the MDP model, the number of available orders nearby for the hypothetical vehicle is a ceiling function of a division of the all orders with pick-up locations in the repeating zone corresponding to the location by the all vehicles in the repeating zone corresponding to the location. For application of the solved MDP model, the current number of available orders nearby for the current vehicle is a ceiling function of a division of the all current orders with pick-up locations in the repeating zone corresponding to the current location by the all current vehicles in the repeating zone corresponding to the current location to obtain.
  • In some embodiments, solving the MDP model comprises solving the MDP model based on applying acceleration and variance reduction algorithms to tabular implementation. In some embodiments, solving the MDP model based on the plurality of historical or simulated vehicle trips comprises: obtaining data for each of the historical vehicle trips, the data comprising: a historical pick-up time, a historical pick-up location, a historical drop-off time, and a historical drop-off location, training a random forest classifier with the historical pick-up time, the historical pick-up location, and the historical drop-off location as training data and with the historical drop-off time minus the historical pick-up time as label to build a cruise time estimator, the cruise time estimator estimating a time to reach a destination based on the time, the location, and the destination of the one available order nearby, or based on the time, the location, and the anther location, and applying the cruise time estimator in each state transition to determine the next time corresponding to completion of the one available order nearby or to determine the next time corresponding to reaching the another location.
  • According to another aspect, a system for ride order dispatching may comprise a processor and a non-transitory computer-readable storage medium storing instructions that, when executed by the processor, cause the system to perform a method for ride order dispatching, which may be any method described herein.
  • These and other features of the systems, methods, and non-transitory computer readable media disclosed herein, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification, wherein like reference numerals designate corresponding parts in the various figures. It is to be expressly understood, however, that the drawings are for purposes of illustration and description only and are not intended as a definition of the limits of the invention.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Certain features of various embodiments of the present technology are set forth with particularity in the appended claims. A better understanding of the features and advantages of the technology will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:
  • FIG. 1 illustrates an exemplary environment for ride order dispatching, in accordance with various embodiments.
  • FIG. 2 illustrates an exemplary system for program security protection, in accordance with various embodiments.
  • FIG. 3A illustrates an exemplary action search for ride order dispatching for model 1, in accordance with various embodiments.
  • FIG. 3B illustrates an exemplary action search for ride order dispatching for model 2, in accordance with various embodiments.
  • FIG. 4A illustrates a flowchart of an exemplary method for program security protection, in accordance with various embodiments.
  • FIG. 4B illustrates a flowchart of another exemplary method for program security protection, in accordance with various embodiments.
  • FIG. 5 illustrates a block diagram of an exemplary computer system in which any of the embodiments described herein may be implemented.
  • DETAILED DESCRIPTION
  • Vehicle platforms may be provided for transportation services. Such vehicle platform may also be referred to as a vehicle hailing or vehicle dispatching platform, accessible through devices such as mobile phones installed with a platform application. Via the application, users (ride requestors) can transmit transportation requests (e.g., a pick-up location, a destination, a current location of the user) to the vehicle platform. The vehicle platform may relay the requests to vehicle drivers based on various factors (e.g., proximity to the location of the requestor or the pick-up location). The vehicle drivers can choose from the requests, and each can pick one to accept, fulfill the request, and be rewarded accordingly. After each trip, the vehicle driver may search for more requests or receive more requests from a push-based dispatching platform, and the results may vary depending on the demand for the vehicle service. For example, the results may return many requests if the vehicle is at a bar area on a weekend night, or may return no request if the vehicle is at a far-flung trip on a weekday evening.
  • To maximize the gain for the vehicle drivers (e.g., during a day), it is important for the vehicle platform to help them make the smartest decisions, for example suggesting the driver to re-position or accept a trip when displaying the results. Such problem can be naturally formulated as a Markov Decision Process (MDP). The MDP models can be roughly classified into single-driver or multi-driver. In single-driver models, there is a target driver, and the objective is to maximize his/her long-term revenue; in multi-driver models, there can be more than one driver, and the objective is to maximize the sum of their revenues. This disclosure focuses on the former. Existing single-driver models ignore the existence of competitors (other drivers) and always assume that a driver can get the best order. This assumption is far from the reality and usually cannot be deployed directly.
  • To at least mitigate the deficiencies of current models, two novel MDP models are disclosed. The disclosed models may consider competition and learn a smart strategy that can instruct the target driver how to cruise so as to maximize his/her long-term revenue. This strategy can be referred to as an instruction system. The first model (model 1) reflects the reality in a very detailed and accurate way, and the strategy trained from it is also closer to the real-life and can be deployed directly. However, the model's size may be large, which makes applying tabular implementation impossible. Therefore, model 1 may be solved by an algorithm involving neural network to find a strategy. The second model (model 2) is a simplified version of the first model and has a much smaller size. As a trade-off, the second model is less accurate than the first model but makes tabular implementation possible. The second model may be solved with a novel stochastic formulation. The advantage of such formulation is allowing applications optimization algorithms to the model. With acceleration and variance reduction, a solution to the second model can be found with better quality in a shorter time.
  • The disclosed systems and methods utilize the two disclosed models to obtain an optimal strategy that surpasses human decisions (decisions made by drivers of whether to reposition or take an order) and other models in terms of reward maximization. Therefore, the disclosed systems and methods improve the computer functionality by (1) providing such strategy to drivers in real time, allowing the drivers to maximize their earnings without reliance on personal experiences, and (2) allowing the vehicle platform to automatically dispatch vehicles, enhancing the functionality and user experience of the software platform. The disclosed system may deploy the first model in all types of areas, and may deploy the second model in rural areas where the model size is reduced with less data of the states.
  • FIG. 1 illustrates an exemplary environment 100 for dispatching ride order, in accordance with various embodiments. As shown in FIG. 1, the exemplary environment 100 can comprise at least one computing system 102 that includes one or more processors 104 and memory 106. The memory 106 may be non-transitory and computer-readable. The memory 106 may store instructions that, when executed by the one or more processors 104, cause the one or more processors 104 to perform various operations described herein. The system 102 may be implemented on or as devices such as mobile phone, tablet, server, computer, wearable device, etc. The system 102 above may be installed with a software (e.g., platform program) and/or hardware (e.g., wires, wireless connections) to access other devices of the environment 100.
  • The environment 100 may include one or more data stores (e.g., a data store 108) and one or more computing devices (e.g., a computing device 109) that are accessible to the system 102. In some embodiments, the system 102 may be configured to obtain data (e.g., first training data and second training data such as location, time, and fees for historical vehicle transportation trips) from the data store 108 (e.g., a database or dataset of historical transportation trips) and/or the computing device 109 (e.g., a computer, a server, a mobile phone used by a driver or passenger that captures transportation trip information such as time, location, and fees). The system 102 may use the obtained data to train the algorithm for ride order dispatching. The location may comprise GPS (Global Positioning System) coordinates of a vehicle.
  • The environment 100 may further include one or more computing devices (e.g., computing devices 110 and 111) coupled to the system 102. The computing devices 110 and 111 may comprise cellphone, tablet, computer, wearable device, etc. The computing devices 110 and 111 may transmit or receive data to or from the system 102.
  • In some embodiments, the system 102 may implement an online information or service platform. The service may be associated with vehicles (e.g., cars, bikes, boats, airplanes, etc.), and the platform may be referred to as a vehicle (service hailing or ride order dispatching) platform. The platform may accept requests for transportation, identify vehicles to fulfill the requests, arrange for pick-ups, and process transactions. For example, a user may use the computing device 110 (e.g., a mobile phone installed with a software application associated with the platform) to request transportation from the platform. The system 102 may receive the request and relay it to various vehicle drivers (e.g., by posting the request to mobile phones carried by the drivers). A vehicle driver may use the computing device 111 (e.g., another mobile phone installed with the application associated with the platform) to accept the posted transportation request and obtain pick-up location information. Fees (e.g., transportation fees) can be transacted among the system 102 and the computing devices 110 and 111. Some platform data may be stored in the memory 106 or retrievable from the data store 108 and/or the computing devices 109, 110, and 111. For example, for each trip, the location of the origin and destination (e.g., transmitted by the computing device 111), the fee, and the time can be obtained by the system 102.
  • In some embodiments, the system 102 and the one or more of the computing devices (e.g., the computing device 109) may be integrated in a single device or system. Alternatively, the system 102 and the one or more computing devices may operate as separate devices. The data store(s) may be anywhere accessible to the system 102, for example, in the memory 106, in the computing device 109, in another device (e.g., network storage device) coupled to the system 102, or another storage location (e.g., cloud-based storage system, network file system, etc.), etc. Although the system 102 and the computing device 109 are shown as single components in this figure, it is appreciated that the system 102 and the computing device 109 can be implemented as single devices or multiple devices coupled together. The system 102 may be implemented as a single system or multiple systems coupled to each other. The system 102, computing device 109, data store 108, and computing device 110 and 111 may be able to communicate with one another through one or more wired or wireless networks (e.g., the Internet) through which data can be communicated. Various aspects of the environment 100 are described below in reference to FIG. 2 to FIG. 5.
  • FIG. 2 illustrates an exemplary system 200 for dispatching ride order, in accordance with various embodiments. The operations shown in FIG. 2 and presented below are intended to be illustrative. In various embodiments, the system 102 may obtain data 202 (e.g., historical or simulated vehicle trips) from the data store 108 and/or the computing device 109. The obtained data 202 may be stored in the memory 106. The system 102 may train an algorithm with the obtained data 202 to learn a model (e.g., model 1) for dispatching ride order, or solve a model (e.g., model 2) for dispatching ride order. For deployment, the system 102 may obtain a current location of a current vehicle. For example, the computing device 111 may transmit a query 204 to the system 102, the query comprising a Global Positioning System (GPS) location of the current vehicle. The computing device 111 may be associated with a driver of a service vehicle including, for example, taxi, service-hailing vehicle, etc. Accordingly, the system 102 may perform various steps with the comprised information, apply the model, and send data 207 to the computing device 111 or one or more other devices. For example, the system 102 may obtain a current list of available orders nearby based on the current location, and apply any disclosed model with inputs such as the current location, the current time, and the available orders nearby. The data 207 may comprise an instruction or recommendation for an action, such as re-positioning to another location, accepting a new ride order, etc. The driver may accept the instruction to accept an order or reposition, or refuse and remain at the same position.
  • In some embodiments, the rider order dispatching problem can be approached by Reinforcement learning (RL). RL is a type of machine learning, which emphasizes on learning through interacting with the environment. In RL, an RL agent repeatedly observes the current state of the environment, takes an action according to certain policy, gets an immediate reward and transits to a next state. Through trial-and-error, the agent aims to learn a best policy under which the expected cumulative reward can be maximized.
  • In some embodiments, the RL for ride order dispatching may be modeled as a Markov decision process (MDP). In this disclosure, an infinite-horizon discounted Markov Decision Problem (DMDP) may be used. An instance of the D-MDP contains a finite space of states
    Figure US20210398431A1-20211223-P00001
    , a finite space of actions
    Figure US20210398431A1-20211223-P00002
    , a collection of state-to-state transition probabilities
    Figure US20210398431A1-20211223-P00003
    :={pij(a)|i,j∈
    Figure US20210398431A1-20211223-P00001
    , a∈
    Figure US20210398431A1-20211223-P00002
    } which is unknown; a collection of state-transitional rewards R:={rij(a)|i,j∈
    Figure US20210398431A1-20211223-P00001
    , a∈
    Figure US20210398431A1-20211223-P00002
    }, where 0<rij(a)<1, ∀i, j∈
    Figure US20210398431A1-20211223-P00001
    , a∈
    Figure US20210398431A1-20211223-P00002
    ; the discounted factor γ∈(0, 1). A stationary and randomized policy is then defined as a map π:
    Figure US20210398431A1-20211223-P00001
    Figure US20210398431A1-20211223-P00004
    P(
    Figure US20210398431A1-20211223-P00002
    ), where P(
    Figure US20210398431A1-20211223-P00002
    ) is a probability distribution vector over
    Figure US20210398431A1-20211223-P00002
    . If π is deterministic, then π(s) has only one nonzero entry equal to 1. Given π, let πi(a):=π(i)[a], Pπ denotes the transition probability matrix of the Discounted MDP (DMDP) under the policy π, where Pij π=
    Figure US20210398431A1-20211223-P00005
    pij(a)πi(a), ∀i, j∈
    Figure US20210398431A1-20211223-P00001
    . The objective of DMDP is to maximize the expected cumulative reward regardless of the initial state so:
  • maximize π 𝔼 π [ t = 0 γ t r s t s t + 1 ( a t ) s 0 ]
  • Here, at is determined by π, and then st+1 follows the probability distribution P(st+1=j)=ps t j(at).
  • In some embodiments with respect to the first model (model 1), the agent is the system. The system observes the state of the target driver, then provides an instruction for him/her to cruise. The instruction can be an order request, then the driver will finish the order and end up at the drop-off location. Alternatively, the instruction can be a re-positioning suggestion (without an order), then the driver can accept or decline such instruction. The MDP model may be defined as follows.
  • First, states and actions of the first model are described. The activity area of the target driver may be discretized into a plurality of repeating zones (e.g., a group of hex-cells as shown in FIG. 3A) and twenty-four hours into a list of time slot, e.g., ten minutes per slot. In this model, it may be assumed that, all orders need to be taken as soon as possible, and the hypothetical driver only takes orders having pick-up locations in the same hex-cell as the driver. The former assumption ignores the situation when the order is requested to depart in an hour or so, and the latter assumption is a natural outcome of the former. However, with the existence of competitors, a driver can not always take any order he wants. This problem has not been addresses in existing technologies and models. To address this problem, all new orders in the cell that the driver is located may be partitioned averagely and randomly, and the driver only gets to choose the optimal one from his/her partition. “Averagely” means that no driver gets more order to choose from than other drivers, and “randomly” means that an order can be partitioned into any driver's share. The driver's partition may be referred to as the available orders nearby. In one example, suppose the target driver is at the cell h, at the time slot t, with a set of new orders
    Figure US20210398431A1-20211223-P00001
    no, where |
    Figure US20210398431A1-20211223-P00001
    no|=n, and m, competitors in the same cell, then the target driver's available orders nearby is defined as:
  • S a o n := { n m randomly picked orders in S no . }
  • In this model, each state may contain the following information:
      • 1. t: current time (e.g., time slot index)
      • 2. h: the driver's current location (e.g., hex-cell index)
      • 3.
        Figure US20210398431A1-20211223-P00001
        aon: the available orders nearby (e.g., a list of hex-cell indices representing the destinations of the orders).
        Each action is to transit to a next hex-cell of a different location h′ (as shown in FIG. 3A). The arrow shown in FIG. 3A represents a transition from one state to another by an action. The arrow's origin is the target driver with state (t, h,
        Figure US20210398431A1-20211223-P00001
        aon). The action is to transit to the hex-cell h′, for example, by taking an order or by repositioning.
  • Next, the state transition is described. With the current state being s:=(t, h,
    Figure US20210398431A1-20211223-P00001
    aon) and the current action being a:=h′, several scenarios may happen:
      • h′∈
        Figure US20210398431A1-20211223-P00001
        aon, e.g., the instruction is to finish an order, then the driver will follow the instruction.
      • h′∈
        Figure US20210398431A1-20211223-P00001
        aon, e.g., the instruction is a re-positioning suggestion, and the driver accepts it.
      • h′∈
        Figure US20210398431A1-20211223-P00001
        aon, e.g., the instruction is a re-positioning suggestion, and the driver rejects it.
  • The next state and the reward depend on the scenarios. The next state s′ and the reward r are then:
  • s = { ( t + T ( t , h , h ) , h , S a o n ) , h S aon or ξ <= P repo , ( t , h , S aon ) , h S aon and ξ > P repo , r = { f ( t , h , h ) , h S aon 0 , h s aon .
  • where ξ is an r.v. uniformly distributed in [0, 1] and S′aon is the set of available orders nearby for the time t+T(t, h, h′) and the hex cell h′. Various other notations are described in Table 1 below.
  • TABLE 1
    Notations for Model I
    Variable Meaning
    Prepo The probability of the driver accepting a re-positioning
    instruction.
    f (t, h, h′) The fare for completing order as cruising from h at t to h′.
    T (t, h, h′) The time of cruise from h to h′ starting at t.
  • In some embodiments, the MDP of model 1 can be trained online with real-time data (e.g., historical trip data) or through a simulator (e.g., simulated trip data) with the inputs being a state and an action and the outputs being a next state and a reward. For example, a neural network such as DQN (deep neural network) can be applied, and the model 1 can be solved by data training and machine learning.
  • In some embodiments with respect to the second model (model 2), a simplified model can be obtained and applied with the amount of state data is not large. First, the states and actions of the second model are described. In model 2, the basic settings are the same as model 1, except that each state contains the following information:
  • 1. t: current time (e.g., time slot index)
  • 2. h: the driver's current location (e.g., hex-cell index)
  • 3. n: |
    Figure US20210398431A1-20211223-P00001
    aon| (e.g., an integer).
  • Each action is still to transit to a next cell h′ (as shown in FIG. 3B). The arrow shown in FIG. 3B represents a transition from one state to another by an action. The arrow's origin is the target driver with state (t, h, n). The action is to transit to the hex-cell h′, for example, by taking an order or by repositioning.
  • With the current state being s:=(t, h, n) and the current action being a:=h′, several scenarios may happen:
  • The action is to finish an order:
  • The action is a re-positioning suggestion, and the driver accepts it;
  • The action is a re-positioning suggestion, and the driver rejects it.
  • The next state and the reward depend on the scenarios. Some related probabilities as listed in Table 2.
  • TABLE 2
    Notations of Parameters
    Variable Meaning
    Tdrive(t, h, h′) The time of cruise from h to h′ starting at t.
    f (t, h, h′) The fare of a cruise from h at t to h′.
    Pyesvt The probability of the driver accepting a vacant transition
    instruction.
    Pdes(h′|t, h) The probability of an order going to h′ given it departs at
    (t, h).
    Pod(h′|t, h, n) The probability of ∃ an order goes to h′ given there are
    n orders depart at (t, h).
    n(t, h) The number of available orders nearby given time and
    location.
  • Note that as shown in Table 2:

  • p des(h′|t,h)=p od(h′|t,h,1)

  • p od(h′|t,h,n)=1−(1−(p des(h′|t,h))n
  • Then the next state s′ and the reward r will be
  • s = { ( t + T drive ( t , h , h ) , h , n ) , with probability q , ( t , h , n ) , with probability 1 - q , r = { f ( t , h , h ) , with probability P od ( h t , h , n ) , 0 , with probability 1 - P od ( h t , h , n ) , where q := P od ( h t , h , n ) + ( 1 - P od ( h t , h , n ) ) × P yesvt .
  • Model 2 can also be solved using a neuron network. Alternatively, model 2 can be solved by applying acceleration and variance reduction algorithms on tabular implementation, the details of which are described below.
  • In some embodiments, to solve model 2 without a neuron network, stochastic formulation can be first implemented. Given a DMDP instance (
    Figure US20210398431A1-20211223-P00001
    ,
    Figure US20210398431A1-20211223-P00002
    , P, γ) and a policy π, the value vector vπ
    Figure US20210398431A1-20211223-P00006
    |
    Figure US20210398431A1-20211223-P00001
    | is defined as:
  • v i π = 𝔼 π [ t = 0 γ t r s t s t + 1 ( a t ) s 0 = i ] , i 𝒮 .
  • The optimal value vector v* is defined as:
  • v i * = max π v i π .
  • And a maximizer is known as the optimal policy π*. It is well-known that a vector v* is the optimal value vector if and only if it satisfies the Bellman equation:
  • v i * max a 𝒜 { j 𝒮 p ij ( a ) r ij ( a ) + γ j 𝒮 p ij ( a ) v j * } , i 𝒮 , ( 1 )
  • where j represents the next state.
    Lemma 1 The optimal value vector v* is the minimizer of the following linear programming:
  • minimize v q T v subject to ( I ~ - γ P ~ ) v r , ( 2 )
  • where q is an arbitrary positive distribution, Ĩ∈
    Figure US20210398431A1-20211223-P00006
    Figure US20210398431A1-20211223-P00002
    Figure US20210398431A1-20211223-P00001
    |×|
    Figure US20210398431A1-20211223-P00001
    |:=[I|
    Figure US20210398431A1-20211223-P00001
    |×|
    Figure US20210398431A1-20211223-P00001
    |; I|
    Figure US20210398431A1-20211223-P00001
    |×|
    Figure US20210398431A1-20211223-P00001
    |; . . . ; I|
    Figure US20210398431A1-20211223-P00001
    |×|
    Figure US20210398431A1-20211223-P00001
    |], P∈
    Figure US20210398431A1-20211223-P00006
    Figure US20210398431A1-20211223-P00002
    Figure US20210398431A1-20211223-P00001
    |×|
    Figure US20210398431A1-20211223-P00001
    |:=[Pa 1 ; Pa 2 ; . . . ; Pa n ], r∈
    Figure US20210398431A1-20211223-P00006
    |
    Figure US20210398431A1-20211223-P00001
    Figure US20210398431A1-20211223-P00002
    |:=[ra 1 ; ra 2 ; . . . ; ra n ], ra,ij∈spij(a)rij(a).
    Proof 1 (1) implies that v* is in the feasible set. Any feasible point v satisfies v≥v*. Since q is positive, v* is the minimizer.
  • The dual problem of (2) is
  • maximize μ r T μ subject to ( I ~ - γ P ~ ) T μ = q μ 0 ,
  • where μ=[μa 1 ; μa 2 ; . . . ; μa n ]. Every feasible point can recover a stationary randomized policy and any stationary randomized policy can form a feasible point. Indeed, there exists a bijection between the feasible set and the stationary randomized policy space.
  • If the reward is uniforrmly bounded, there exists an optimal solution μ* to the dual LP, which can formulate an optimal policy π* in the sense that the value vector under π* equals v*. If p* is unique, then π* is deterministic; otherwise π* is randomized policy. Therefore, instead of value iteration or policy iteration, the MDP can be solved in a linear programming approach. In the reinforcement learning context, P and r are unknown and can be very large. Solving primal LP or dual LP separately is not ideal since it involves solving a large linear constraint, which can not be relaxed without changing the original problem. Therefore, a feasible point problem may be formulated as:
  • minimize 0 subject to [ 0 0 ( γ P ~ - I ~ ) T q γ P ~ - I ~ I 0 r q T 0 - r T 0 ] [ v w μ τ ] = 0 w 0 , μ 0 , τ = 1.
  • This formulation follows the KKT conditions of the LP. It is denoted that x=[v, w, μ, τ], and a new constraint x∈
    Figure US20210398431A1-20211223-P00007
    :=
    Figure US20210398431A1-20211223-P00008
    ×
    Figure US20210398431A1-20211223-P00009
    ×
    Figure US20210398431A1-20211223-P00010
    ×1 is imposed, where:

  • v∈
    Figure US20210398431A1-20211223-P00011
    :={v|0≤v,∥v∥ ≤1/(1−γ)},

  • w∈
    Figure US20210398431A1-20211223-P00012
    :={w|0≤w≤1}

  • μ∈
    Figure US20210398431A1-20211223-P00013
    :={μ|0≤μ,e Tμ=1},
  • An optimal solution v*, μ* are in the constraint set since q is a probability distribution and ra,i is in [0,1]. The above problem can be relaxed as a convex optimization over a bounded convex constraint without loss of accuracy:
  • minimize x 1 2 M x 2 subject to x 𝒞 , ( 3 )
  • where x:=[v, w, μ, τ],
    Figure US20210398431A1-20211223-P00014
    :=
    Figure US20210398431A1-20211223-P00015
    ×
    Figure US20210398431A1-20211223-P00016
    ×
    Figure US20210398431A1-20211223-P00017
    ×1, and M is a full rank matrix in
    Figure US20210398431A1-20211223-P00018
    M×N (M=|
    Figure US20210398431A1-20211223-P00019
    Figure US20210398431A1-20211223-P00020
    |+|
    Figure US20210398431A1-20211223-P00021
    |+1, N=2
    Figure US20210398431A1-20211223-P00022
    Figure US20210398431A1-20211223-P00023
    |+|
    Figure US20210398431A1-20211223-P00024
    |+1). Since x can never be 0 the solution is always nontrivial. Notice that although M is unknown (due to {tilde over (P)}), it can be represented as the expectation of a random variable ξ (check Lemma 2 for details). Then (3) can be further expressed as a stochastic composition optimization problem:
  • minimize x 𝒞 f ( x ) := 1 2 𝔼 ( ξ ) x 2 . ( 4 )
  • Lemma 2 (4) is equivalent to (3) by letting ξ(i,a,j) follow the distribution, as below:
  • ξ ( i , a , j ) = [ 0 0 A ( i , a , j ) T q i v ( i ) e i A ( i , a , j ) E a , i 0 r ij ( a ) η ( i , a ) e a , i q i v ( i ) e i T 0 - r ij ( a ) η ( i , a ) e a , i T 0 ] η ( i , a ) p ij ( a ) .
  • Here A(i,a,j) is a tall matrix with the same structure as {tilde over (P)}: |
    Figure US20210398431A1-20211223-P00025
    | blocks (each block has size |
    Figure US20210398431A1-20211223-P00026
    |×|
    Figure US20210398431A1-20211223-P00027
    |) arranged in a vertical way corresponding to each action. There are only two nonzero entries in A(i,a,j): the (i, i)th entry in the block for action a equaling
  • - 1 η ( i , a ) ,
  • the (i,j)th entry in the block for action, a equaling
  • γ η ( i , a ) ;
  • Ea,i
    Figure US20210398431A1-20211223-P00028
    |
    Figure US20210398431A1-20211223-P00029
    Figure US20210398431A1-20211223-P00030
    |×|
    Figure US20210398431A1-20211223-P00031
    Figure US20210398431A1-20211223-P00032
    | has only one nonzero entries: the (i, i)th entry in the diagonal block for action a equaling
  • 1 η ( i , a ) ;
  • ei is a vector in
    Figure US20210398431A1-20211223-P00033
    |
    Figure US20210398431A1-20211223-P00034
    | with only one nonzero entry at the ith component whose value is 1, and ea,i is a vector in
    Figure US20210398431A1-20211223-P00035
    |
    Figure US20210398431A1-20211223-P00036
    Figure US20210398431A1-20211223-P00037
    | with only one nonzero entry at the ith component for block a whose value is 1: v(i)=Σaη(i,a).
  • η(i,a)pij(a) is the probability of tuple (i, a, j) being selected with Σi,a,jη(i,a)pij(a)=1. η(i,a) is a distribution that can be imposed artificially, so engaging the value of η(i,a) in ξ is not a problem. The only unknown and to-be-learned factor is pij(a).
  • In some embodiments, four types of alternative sampling-based algorithms can be used to solve problem (4). In optimization area, usually an algorithm is developed for a class of problems, to be more abstract, a very general math format. In order to use an algorithm, one should first write the to-be-solved problem into a correct format. Here, to apply acceleration and variance reduction algorithms, the original form (4) needs to be revised slightly.
  • In some embodiments, there is access to a Sampling Orable (
    Figure US20210398431A1-20211223-P00038
    ) (described later) which produces samples of ξ in Lemma 2. Also, two operators are defined as follows:
  • Definition 1 Given a point x:=[v; w; μ; τ]∈
    Figure US20210398431A1-20211223-P00039
    and a vector g:=[gv; gw; gμ; gτ]∈
    Figure US20210398431A1-20211223-P00040
    N, operator PGKL
    Figure US20210398431A1-20211223-P00041
    :
    Figure US20210398431A1-20211223-P00041
    ×
    Figure US20210398431A1-20211223-P00042
    N
    Figure US20210398431A1-20211223-P00043
    is defined as PGKL
    Figure US20210398431A1-20211223-P00041
    (x, Δ):=[v′; w′; μ′; 1], where
  • v = argmin v 𝒱 g ν T ( v - v ) + 1 2 η v - v 2 2 w = argmin w 𝒲 g ω T ( w - w ) + 1 2 η w - w 2 2 μ = argmin μ 𝒰 g μ T ( μ - μ ) + 1 2 η 𝒟 KL ( μ ; μ ) ,
  • Definition 2 Given a point x:=[v; w; μ; τ]∈
    Figure US20210398431A1-20211223-P00044
    land a vector g:=[gv; gw; gy; gT]∈
    Figure US20210398431A1-20211223-P00045
    N, operator PGSP
    Figure US20210398431A1-20211223-P00044
    :
    Figure US20210398431A1-20211223-P00046
    ×
    Figure US20210398431A1-20211223-P00047
    N
    Figure US20210398431A1-20211223-P00048
    is defined as PGSP
    Figure US20210398431A1-20211223-P00046
    (x, Δ):=[v′; w′; μ′; 1], where
  • v = argmin v 𝒱 g ν T ( v - v ) + 1 2 η v - v 2 2 w = argmin w 𝒲 g ω T ( w - w ) + 1 2 η w - w 2 2 μ = argmin μ 𝒰 g μ T ( μ - μ ) + 1 2 η μ - μ 2 2 ,
  • In some embodiments, the first algorithm used for solving problem (4) is Accelerated Stochastic Composition Gradient Descent (ASCGD). ASCGD target problems with the form:
  • minimize x 𝒞 𝔼 f ( 𝔼 g ( x ) )
  • Problem (4) is this kind of pattern. But compared with the general form, (4) has two characteristics: the inner functional E[ξ]x is linear, the outer function ∥⋅∥2 2 is deterministic (with no
    Figure US20210398431A1-20211223-P00049
    ).
  • The first algorithm ASC-RL (Accelerate Stochastic Composition algorithm for Reinforcement Learning) in Algorithm 1 is based on ASCGD.
  • Algorithm 1 SCRL
    1: Initialize x0 = z0 = y0 = 0, sample ξ0 from
    Figure US20210398431A1-20211223-P00050
    ;
    2: for k 0 to K − 1 do
    3:  xk+1 = PGK
    Figure US20210398431A1-20211223-P00051
     (xk, ξk Tyk);
    4: z k + 1 = ( 1 - 1 β k ) x k + 1 β k x k + 1 ;
    5:  sample ξk+1 from
    Figure US20210398431A1-20211223-P00050
    ;
    6:  yk+1 (1 − βk)yk + βkξk+1zk+1.
    7: end for
    8: return x ^ := 2 K Σ k = K / 2 K x k .
  • In some embodiments, the second algorithm used for solving problem (4) is Stochastic Accelerated GradiEnt (SAGE). This algorithm is targeted to problems like:
  • minimize x 𝒞 𝔼 f ( x ) .
  • The original problem (4) does not belong to this group because the expectation is not outside. However, problem (4) can be transformed into this form by introducing two i.i.d random variables ξ1, ξ2 which follow the same distribution in Lemma 2. Then the new problem is:
  • minimize x 𝒞 f ( x ) := 𝔼 [ 1 2 x T ξ 1 T ξ 2 x ] . ( 5 )
  • The developed second algorithm ASGD-RL is summarized in Algorithm 2 to solve (5) based on an accelerated projected stochastic gradient descent algorithm called SAGE.
  • Algorithm 2 ASGD-RL
    1: Initialize x0 = z0 = y0
    Figure US20210398431A1-20211223-P00052
    :
    2: for k 0 to K − 1 do
    3:  Sample ξk 1, ξk 2 from
    Figure US20210398431A1-20211223-P00053
    .
    4:  xk+1 = (1 − αk)yk + αkzk;
    5: y k + 1 = P G S P 𝒞 , β k ( x k + 1 , ( ξ k 1 ) T ξ k 2 + ( ξ k 2 ) T ξ k 1 2 x k + 1 ) ;
    6:  zk+1 = zk − xk+1 + yk+1.
    7: end for
    8: return
  • In some embodiments, the third algorithm used for solving problem (4) is Katyusha-Xw developed. Katyusha solves problems whose objective function is the sum of a group of functions with the sum being convex:
  • minimize x 𝒞 1 n i = 1 n f i ( x ) .
  • In order to use this algorithm, the real expectation is approximated by a group of samples' average, thus problem (4) is then rewritten as:
  • minimize x 𝒞 f ( x ) := 1 2 x T ( 1 n i = 1 n ξ i ) T ( 1 n i = 1 n ξ i ) x T = 1 n 2 i = 1 , j = 1 n 1 2 x T ξ i T ξ i x
  • The third algorithm is shown in Algorithm 3.
  • Algorithm 3 SAA-RL-I
     1: Initialize x0 = y0 = y−1, n samples {ξ1, ξ2, . . . , ξn}, batch-size b
     2: for k 0 to K − 1 do
     3: x k + 1 = ( 3 k + 1 ) y k + ( k + 1 ) x k - ( 2 k - 2 ) y k - 1 2 k + 4 ;
     4:  g = ∇f(xk+1);
     5:  w0 = xk+1;
     6: for t 0 to T := n 2 b do
     7:   Let St be b i.i.d uniform random indices-pairs (i, j) from ([n],
      [n]);
     8:    ~ t = g + 1 b ( i , j ) S t ( ξ i T ξ j + ξ j T ξ i 2 ( w t - x k + 1 ) ) ;
     9:   wt+1
    Figure US20210398431A1-20211223-P00054
    (wt, {tilde over (∇)}t);
    10:  end for
    11:  yk yk+1
    12:  yk+1 wT;
    13: end for
    14: return yK
  • In some embodiments, the fourth algorithm used for solving problem (4) is Prox-SVRG. This algorithm considers problems as:
  • minimize x 𝒞 1 n i = 1 n f i ( x )
  • very similar to 3 but each fi is not required to be convex. To apply this algorithm, problem (4) can be rewritten as:
  • minimize x 𝒞 f ( x ) := 1 n i = 1 n 1 2 x T ( ξ i 1 ) T ξ i 2 x .
  • The fourth algorithm SAA-RL-II is presented in Algorithm 4.
  • Algorithm 4 SAA-RL-IT
     1: Initialize x0 = w, n pairs of samples {(ξ1 1, ξ1 2), (ξ2 1, ξ2 2) . . . ,
    n 1, ξ2 1)}, batch-size b
     2: for k 0 to K − 1 do
     3:  g ∇f (xk);
     4:  for t 0 to T do
     5:   Uniformly randomly pick It ⊂ {1, . . . ,n} (with replacement)
      such that |It| = b
     6:    ~ t g + 1 b i I t ( ( ξ i 1 ) T ξ i 2 + ( ξ i 2 ) T ξ i 1 2 ( w t - x k ) ) ;
     7:   wt+1 =
    Figure US20210398431A1-20211223-P00054
    (wt, {tilde over (∇)}t)
     8:  end for
     9:  xk+1 wT;
    10: end for
    11: return yK
  • As such, there are multiple types of algorithms for solving the DMDP, after transformation to appropriate formulation. Different algorithms have different advantages and disadvantages, which are often determined by a problem's specialty and the data. Given a DMDP, a flexible formulation makes it possible to choose the best algorithm for this specific case.
  • In some embodiments, a Sample Oracle (e.g., database) from a dataset. First of all, the
    Figure US20210398431A1-20211223-P00038
    is supposed to take a state-action pair (s, a) as input, then produces a state-reward pair (s′, r) as output, where s′ represents a next state starting from s with action a and r:=rss′r(a). By the state transition rule above for model 2, all the information listed in Table 2 is needed.
  • In some embodiment, in the dataset, each instance contains: order ID, driver ID, pick-up time, pick-up latitude, pick-up longitude, drop-off time, drop-off latitude, drop-off longitude. All time and locations may be simplified to discretized time indices and hex-cell indices. Then, the following information can be obtained by respective methods:
      • Tdrive(t, h, h′): use random forest classifier to build an estimator of cruise time. The data is pick-up time and the Euclidean distance (or l1-norm) between the latitudes-longitudes of centers of h and h′. The label is drop-off time minus pick-up time.
      • f(t, h, h′): proportional to Tdrive(t, h, h′) with a minor noise.
      • Pyesvt: 0.4
  • P des ( h t , h ) : # { historical orders starting at ( t , h ) and going to h } # { historical orders starting at ( t , h ) } n ( t , h ) : max ( 0 , min ( 5 , # { historical orders starting at ( t , h ) } + noise # { available drivers at ( t , h ) } ) ) ,
  • where

  • #{available drivers at (t,h)}=#drivers starting orders at (t,h)}

  • +#{drivers finishing orders at (t,h)}

  • −#{drivers finishing and starting orders at (t,h)}

  • +noise.
  • After getting these information, a next state and the reward following the rules described above for the state transition of model 2 can be obtained.
  • FIG. 4A illustrates a flowchart of an exemplary method 400, according to various embodiments of the present disclosure. The method 400 may be implemented in various environments including, for example, the environment 100 of FIG. 1. The exemplary method 400 may be implemented by one or more components of the system 102 (e.g., the processor 104, the memory 106). The exemplary method 400 may be implemented by multiple systems similar to the system 102. The operations of method 400 presented below are intended to be illustrative. Depending on the implementation, the exemplary method 400 may include additional, fewer, or alternative steps performed in various orders or in parallel. The model in this figure may be referred to model 1 and related descriptions above. Model 1 may be solved by a neural network (e.g., DQN) with machine learning techniques.
  • Block 402 comprises obtaining a current location of a current vehicle from a computing device associated with the current vehicle. Block 403 comprises obtaining a current list of available orders nearby based on the current location. Block 404 comprises feeding the current location, the current list of available orders nearby, and a current time to a trained Markov Decision Process (MDP) model to obtain action information, the action information being repositioning the current vehicle to another current location or completing a current ride order by the current vehicle. Block 405 comprises transmitting the generated action information to the computing device to cause the current vehicle to reposition to the another current location, stay at the current location, or accept the current ride order by proceeding to a pick-up location of the current ride order. The MDP model is trained based on a plurality of historical or simulated vehicle trips under a policy of maximizing a cumulative reward for a training vehicle completing the historical or simulated vehicle trips. The MDP model discretizes a region into repeating zones and a time period into time slots. Each state of the MDP model comprises: a time represented by a time lot index, a location represented by a repeating zone index, and a list of available orders nearby represented by repeating zone indices of destinations of the available orders nearby. Each action of the MDP model comprises: completing one of the available orders nearby from the list, repositioning to another location, or staying at the location. If the training vehicle completes one of the available orders nearby from the list, the training vehicle gets a fare for completing the one available order nearby as the reward, and the state transitions to a next state comprising: a next time corresponding to completion of the one available order nearby, a next location corresponding a destination of the one available order nearby, and a next list of available orders nearby corresponding the next location. If the training vehicle repositions to the another location, the training vehicle gets no reward, and the state transitions to a next state comprising: a next time corresponding to reaching the another location, the another location, and a next list of available orders nearby corresponding the another location. If the training vehicle stays at the location, the training vehicle gets no reward, and the state transitions to a next state comprising: the time, the location, and the list of available orders nearby.
  • In some embodiments, for training the MDP model, the location and pick-up locations of the available orders nearby are in a same repeating zone. For application of the trained MDP model, the current location and a pick-up location of the current ride order are within a same repeating zone.
  • In some embodiments, for training the MDP model, all orders with pick-up locations in a repeating zone corresponding to the location are divided averagely and randomly among all vehicles in the repeating zone corresponding to the location to obtain the list of available orders nearby for the training vehicle. For application of the trained MDP model, all current orders with pick-up locations in a repeating zone corresponding to the current location are divided averagely and randomly among all current vehicles in the repeating zone corresponding to the current location to obtain the current list of available orders nearby for the current vehicle.
  • In some embodiments, for training the MDP model, the list of available orders nearby for the training vehicle is a ceiling function of a division of the all orders with pick-up locations in the repeating zone corresponding to the location by the all vehicles in the repeating zone corresponding to the location. For example, the list may comprise five available orders nearby. For application of the trained MDP model, the current list of available orders nearby for the current vehicle is a ceiling function of a division of the all current orders with pick-up locations in the repeating zone corresponding to the current location by the all current vehicles in the repeating zone corresponding to the current location to obtain. For example, the list may comprise five available orders nearby.
  • FIG. 4B illustrates a flowchart of an exemplary method 410, according to various embodiments of the present disclosure. The method 410 may be implemented in various environments including, for example, the environment 100 of FIG. 1. The exemplary method 410 may be implemented by one or more components of the system 102 (e.g., the processor 104, the memory 106). The exemplary method 410 may be implemented by multiple systems similar to the system 102. The operations of method 410 presented below are intended to be illustrative. Depending on the implementation, the exemplary method 410 may include additional, fewer, or alternative steps performed in various orders or in parallel. The model in this figure may be referred to model 2 and related descriptions above. Model 2 may be solved by a neural network with machine learning technique or by applying acceleration and variance reduction algorithms described earlier. That is, solving the MDP model below may comprise solving the MDP model based on applying acceleration and variance reduction algorithms to tabular implementation.
  • Block 412 comprises obtaining a current location of a current vehicle from a computing device associated with the current vehicle. Block 413 comprises obtaining a current number of available orders nearby in a current list based on the current location. Block 414 comprises feeding the current location, the current number of available orders nearby, and a current time to a solved Markov Decision Process (MDP) model to obtain action information, the action information being repositioning the current vehicle to another current location or completing a current ride order by the current vehicle. Block 415 comprises transmitting the generated action information to the computing device to cause the current vehicle to reposition to the another current location, stay at the current location, or accept the current ride order by proceeding to a pick-up location of the current ride order. The MDP model is solved based on a plurality of historical or simulated vehicle trips under a policy of maximizing a cumulative reward for a hypothetical vehicle completing the historical or simulated vehicle trips. The MDP model discretizes a region into repeating zones and a time period into time slots. Each state of the MDP model comprises: a time represented by a time lot index, a location represented by a repeating zone index, and a number of available orders nearby in a list, the available orders nearby represented by repeating zone indices of destinations of the available orders nearby. Each action of the MDP model comprises: completing one of the available orders nearby from the list, repositioning to another location, or staying at the location. If the hypothetical vehicle completes one of the available orders nearby from the list, the hypothetical vehicle gets a fare for completing the one available order nearby as the reward, and the state transitions to a next state comprising: a next time corresponding to completion of the one available order nearby, a next location corresponding a destination of the one available order nearby, and a next number of available orders nearby in a next list corresponding the next location. If the hypothetical vehicle repositions to the another location, the hypothetical vehicle gets no reward, and the state transitions to a next state comprising: a next time corresponding to reaching the another location, the another location, and a next number of available orders nearby in a next list corresponding the another location. If the hypothetical vehicle stays at the location, the hypothetical vehicle gets no reward, and the state transitions to a next state comprising: the time, the location, and the number of available orders nearby in the list.
  • In some embodiments, for solving the MDP model, the location and pick-up locations of the available orders nearby are in a same repeating zone. For application of the solved MDP model, the current location and a pick-up location of the current ride order are within a same repeating zone.
  • In some embodiments, for solving the MDP model, all orders with pick-up locations in a repeating zone corresponding to the location are divided averagely and randomly among all vehicles in the repeating zone corresponding to the location to obtain the number of available orders nearby for the hypothetical vehicle. For application of the solved MDP model, all current orders with pick-up locations in a repeating zone corresponding to the current location are divided averagely and randomly among all current vehicles in the repeating zone corresponding to the current location to obtain the current number of available orders nearby for the current vehicle.
  • In some embodiments, for solving the MDP model, the number of available orders nearby for the hypothetical vehicle is a ceiling function of a division of the all orders with pick-up locations in the repeating zone corresponding to the location by the all vehicles in the repeating zone corresponding to the location. For example, the number of available orders nearby may be five. For application of the solved MDP model, the current number of available orders nearby for the current vehicle is a ceiling function of a division of the all current orders with pick-up locations in the repeating zone corresponding to the current location by the all current vehicles in the repeating zone corresponding to the current location to obtain. For example, the number of available orders nearby may be five.
  • In some embodiments, solving the MDP model based on the plurality of historical or simulated vehicle trips comprises: obtaining data for each of the historical vehicle trips, the data comprising: a historical pick-up time, a historical pick-up location, a historical drop-off time, and a historical drop-off location, training a random forest classifier with the historical pick-up time, the historical pick-up location, and the historical drop-off location as training data and with the historical drop-off time minus the historical pick-up time as label to build a cruise time estimator, the cruise time estimator estimating a time to reach a destination based on the time, the location, and the destination of the one available order nearby, or based on the time, the location, and the anther location, and applying the cruise time estimator in each state transition to determine the next time corresponding to completion of the one available order nearby or to determine the next time corresponding to reaching the another location.
  • The techniques described herein are implemented by one or more special-purpose computing devices. The special-purpose computing devices may be desktop computer systems, server computer systems, portable computer systems, handheld devices, networking devices or any other device or combination of devices that incorporate hard-wired and/or program logic to implement the techniques. Computing device(s) are generally controlled and coordinated by operating system software. Conventional operating systems control and schedule computer processes for execution, perform memory management, provide file system, networking, I/O services, and provide a user interface functionality, such as a graphical user interface (“GUI”), among other things.
  • FIG. 5 is a block diagram that illustrates a computer system 500 upon which any of the embodiments described herein may be implemented. The system 500 may correspond to the system 102 or 103 described above. The computer system 500 includes a bus 502 or other communication mechanism for communicating information, one or more hardware processors 504 coupled with bus 502 for processing information. Hardware processor(s) 504 may be, for example, one or more general purpose microprocessors. The processor(s) 504 may correspond to the processor 104 described above.
  • The computer system 500 also includes a main memory 506, such as a random access memory (RAM), cache and/or other dynamic storage devices, coupled to bus 502 for storing information and instructions to be executed by processor 504. Main memory 506 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 504. Such instructions, when stored in storage media accessible to processor 504, render computer system 500 into a special-purpose machine that is customized to perform the operations specified in the instructions. The computer system 500 further includes a read only memory (ROM) 508 or other static storage device coupled to bus 502 for storing static information and instructions for processor 504. A storage device 510, such as a magnetic disk, optical disk, or USB thumb drive (Flash drive), etc., is provided and coupled to bus 502 for storing information and instructions. The main memory 506, the ROM 508, and/or the storage 510 may correspond to the memory 106 described above.
  • The computer system 500 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs computer system 500 to be a special-purpose machine. According to one embodiment, the operations, methods, and processes described herein are performed by computer system 500 in response to processor(s) 504 executing one or more sequences of one or more instructions contained in main memory 506. Such instructions may be read into main memory 506 from another storage medium, such as storage device 510. Execution of the sequences of instructions contained in main memory 506 causes processor(s) 504 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
  • The main memory 506, the ROM 508, and/or the storage 510 may include non-transitory storage media. The term “non-transitory media,” and similar terms, as used herein refers to a media that store data and/or instructions that cause a machine to operate in a specific fashion, the media excludes transitory signals. Such non-transitory media may comprise non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 510. Volatile media includes dynamic memory, such as main memory 506. Common forms of non-transitory media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge, and networked versions of the same.
  • The computer system 500 also includes a network interface 518 coupled to bus 502. Network interface 518 provides a two-way data communication coupling to one or more network links that are connected to one or more local networks. For example, network interface 518 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, network interface 518 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN (or WAN component to communicated with a WAN). Wireless links may also be implemented. In any such implementation, network interface 518 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
  • The computer system 500 can send messages and receive data, including program code, through the network(s), network link and network interface 518. In the Internet example, a server might transmit a requested code for an application program through the Internet, the ISP, the local network and the network interface 518.
  • The received code may be executed by processor 504 as it is received, and/or stored in storage device 510, or other non-volatile storage for later execution.
  • Each of the processes, methods, and algorithms described in the preceding sections may be embodied in, and fully or partially automated by, code modules executed by one or more computer systems or computer processors comprising computer hardware. The processes and algorithms may be implemented partially or wholly in application-specific circuitry.
  • The various features and processes described above may be used independently of one another, or may be combined in various ways. All possible combinations and sub-combinations are intended to fall within the scope of this disclosure. In addition, certain method or process blocks may be omitted in some implementations. The methods and processes described herein are also not limited to any particular sequence, and the blocks or states relating thereto can be performed in other sequences that are appropriate. For example, described blocks or states may be performed in an order other than that specifically disclosed, or multiple blocks or states may be combined in a single block or state. The exemplary blocks or states may be performed in serial, in parallel, or in some other manner. Blocks or states may be added to or removed from the disclosed exemplary embodiments. The exemplary systems and components described herein may be configured differently than described. For example, elements may be added to, removed from, or rearranged compared to the disclosed exemplary embodiments.
  • The various operations of exemplary methods described herein may be performed, at least partially, by an algorithm. The algorithm may be comprised in program codes or instructions stored in a memory (e.g., a non-transitory computer-readable storage medium described above). Such algorithm may comprise a machine learning algorithm. In some embodiments, a machine learning algorithm may not explicitly program computers to perform a function, but can learn from training data to make a predictions model that performs the function.
  • The various operations of exemplary methods described herein may be performed, at least partially, by one or more processors that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented engines that operate to perform one or more operations or functions described herein.
  • Similarly, the methods described herein may be at least partially processor-implemented, with a particular processor or processors being an example of hardware. For example, at least some of the operations of a method may be performed by one or more processors or processor-implemented engines. Moreover, the one or more processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS). For example, at least some of the operations may be performed by a group of computers (as examples of machines including processors), with these operations being accessible via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., an Application Program Interface (API)).
  • The performance of certain of the operations may be distributed among the processors, not only residing within a single machine, but deployed across a number of machines. In some exemplary embodiments, the processors or processor-implemented engines may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other exemplary embodiments, the processors or processor-implemented engines may be distributed across a number of geographic locations.
  • Throughout this specification, plural instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently, and nothing requires that the operations be performed in the order illustrated. Structures and functionality presented as separate components in exemplary configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter herein.
  • Although an overview of the subject matter has been described with reference to specific exemplary embodiments, various modifications and changes may be made to these embodiments without departing from the broader scope of embodiments of the present disclosure. Such embodiments of the subject matter may be referred to herein, individually or collectively, by the term “invention” merely for convenience and without intending to voluntarily limit the scope of this application to any single disclosure or concept if more than one is, in fact, disclosed.
  • The embodiments illustrated herein are described in sufficient detail to enable those skilled in the art to practice the teachings disclosed. Other embodiments may be used and derived therefrom, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure. The Detailed Description, therefore, is not to be taken in a limiting sense, and the scope of various embodiments is defined only by the appended claims, along with the full range of equivalents to which such claims are entitled.
  • Any process descriptions, elements, or blocks in the flow diagrams described herein and/or depicted in the attached figures should be understood as potentially representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps in the process. Alternate implementations are included within the scope of the embodiments described herein in which elements or functions may be deleted, executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those skilled in the art.
  • As used herein, the term “or” may be construed in either an inclusive or exclusive sense. Moreover, plural instances may be provided for resources, operations, or structures described herein as a single instance. Additionally, boundaries between various resources, operations, engines, and data stores are somewhat arbitrary, and particular operations are illustrated in a context of specific illustrative configurations. Other allocations of functionality are envisioned and may fall within a scope of various embodiments of the present disclosure. In general, structures and functionality presented as separate resources in the exemplary configurations may be implemented as a combined structure or resource. Similarly, structures and functionality presented as a single resource may be implemented as separate resources. These and other variations, modifications, additions, and improvements fall within a scope of embodiments of the present disclosure as represented by the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
  • Conditional language, such as, among others, “can,” “could,” “might,” or “may,” unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements and/or steps. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding, with or without user input or prompting, whether these features, elements and/or steps are included or are to be performed in any particular embodiment.

Claims (21)

1.-20. (canceled)
21. A computer-implemented method for ride order dispatching, comprising:
obtaining a current location of a current vehicle from a computing device associated with the current vehicle;
obtaining a current list of available orders based on the current location;
feeding the current location, the current list of available orders or a number of available orders in the current list, and a current time to a solved Markov Decision Process (MDP) model to obtain action information, the action information comprising: staying at the current location, repositioning the current vehicle to another current location, or completing a current ride order by the current vehicle; and
transmitting the action information to the computing device associated with the current vehicle, wherein the current vehicle is configured to stay or relocate based on the action information, wherein:
the MDP model is solved based on a plurality of historical or simulated vehicle trips under a policy of maximizing a cumulative reward for a hypothetical vehicle completing the historical or simulated vehicle trips.
22. The method of claim 21, further comprising solving the MDP model,
wherein the solving comprising discretizing an activity area of the hypothetical vehicle into repeating zones and a time period into time slots.
23. The method of claim 22, wherein:
each state of the MDP model comprises:
a time represented by a time slot index,
a location represented by a repeating zone index, and
a list of available orders represented by repeating zone indices of destinations of the available orders, and
each action of the MDP model comprises: completing one of the available orders from the list, repositioning to another location, or staying at the location.
24. The method of claim 23, wherein the solving further comprises:
if the hypothetical vehicle completes one of the available orders from the list, assigning the hypothetical vehicle a reward for completing the one available order, and updating the state of the MDP model with: a next time slot corresponding to completion of the one available order, a next location corresponding a destination of the one available order, and a next list of available orders corresponding the next location,
if the hypothetical vehicle repositions to the another location, skipping assigning the hypothetical vehicle the reward, and updating the state of the MDP model with: a next time corresponding to reaching the another location, the another location, and a next list of available orders corresponding the another location, and
if the hypothetical vehicle stays at the location, skipping assigning the hypothetical vehicle the reward, and updating the state of the MDP model with: the time slot, the location, and the list of available orders.
25. The method of claim 22, wherein:
each state of the MDP model comprises:
a time represented by a time slot index,
a location represented by a repeating zone index, and
an integer representing the number of available orders, and
each action of the MDP model comprises: completing one of the available orders from the list, repositioning to another location, or staying at the location.
26. The method of claim 25, wherein the solving further comprises:
if the hypothetical vehicle completes one of the available orders from the list, assigning the hypothetical vehicle a reward for completing the one available order, and updating the state of the MDP model with: a next time slot corresponding to a completion of the one available order, a next location corresponding a destination of the one available order, and a next integer representing a quantity of available orders corresponding the next location,
if the hypothetical vehicle repositions to the another location, skipping assigning the hypothetical vehicle the reward, and updating the state of the MDP model with: a next time corresponding to reaching the another location, the another location, and another integer representing a quantity of available orders corresponding the another location, and
if the hypothetical vehicle stays at the location, skipping assigning the hypothetical vehicle the reward, and updating the state of the MDP model with: the time slot, the location, and the integer.
27. The method of claim 26, wherein:
the MDP model is solved by solving a stochastic formulation;
for solving the stochastic formulation, the location and pick-up locations of the available orders are in a same repeating zone; and
for application of the solved MDP model, a current location of a current vehicle and a pick-up location of the current ride order are within a same repeating zone.
28. The method of claim 27, wherein:
for solving the stochastic formula, all orders with pick-up locations in a repeating zone corresponding to the location are divided averagely and randomly among all vehicles in the repeating zone corresponding to the location to obtain the number of available orders nearby for the hypothetical vehicle; and
for application of the solved MDP model, all current orders with pick-up locations in a repeating zone corresponding to the current location are divided averagely and randomly among all current vehicles in the repeating zone corresponding to the current location to obtain the current number of available orders for the current vehicle.
29. The method of claim 27, wherein:
for solving the stochastic formula, the number of available orders for the hypothetical vehicle is a ceiling function of a division of the all orders with pick-up locations in the repeating zone corresponding to the location by the all vehicles in the repeating zone corresponding to the location; and
for application of the solved MDP model, the current number of available orders for the current vehicle is a ceiling function of a division of the all current orders with pick-up locations in the repeating zone corresponding to the current location by the all current vehicles in the repeating zone corresponding to the current location to obtain.
30. The method of claim 26, wherein the MDP model is solved by applying acceleration and variance reduction algorithms to tabular implementation.
31. The method of claim 26, further comprising solving the MDP model based on the plurality of historical or simulated vehicle trips, wherein the solving comprises:
obtaining data for each of the historical vehicle trips, the data comprising: a historical pick-up time, a historical pick-up location, a historical drop-off time, and a historical drop-off location;
training a random forest classifier with the historical pick-up time, the historical drop-off time, and the historical drop-off location as training data and within the historical drop-off time minus the historical pick-up time as label to build a cruise time estimator, wherein: the cruise time estimator estimates a time to reach a destination based on the time, the location, and the destination of the one available order, or based on the time, the location, and the another location; and
applying the cruise time estimator in each state transition to determine the next time corresponding to a completion of the one available order nearby or to determine the next time corresponding to reaching the another location.
32. The method of claim 24, wherein:
the MDP model is solved by training an untrained MDP model;
for training the MDP model, all orders with pick-up locations in a repeating zone corresponding to the location are divided averagely and randomly among all vehicles in the repeating zone corresponding to the location to obtain the list of available orders for the hypothetical vehicle; and
for application of the trained MDP model, all current orders with pick-up locations in a repeating zone corresponding to the current location are divided averagely and randomly among all current vehicles in the repeating zone corresponding to the current location to obtain the current list of available orders for the current vehicle.
33. The method of claim 32, wherein:
for training the MDP model, the list of available orders for the hypothetical vehicle is a ceiling function of a division of the all orders with pick-up locations in the repeating zone corresponding to the location by the all vehicles in the repeating zone corresponding to the location; and
for application of the solved MDP model, the current list of available orders for the current vehicle is a ceiling function of a division of the all current orders with pick-up locations in the repeating zone corresponding to the current location by the all current vehicles in the repeating zone corresponding to the current location to obtain.
34. A system comprising a processor and a non-transitory computer-readable storage medium storing instructions that, when executed by the processor, cause the system to perform operations comprising:
obtaining a current location of a current vehicle from a computing device associated with the current vehicle;
obtaining a current list of available orders based on the current location;
feeding the current location, the current list of available orders or a number of available orders in the current list, and a current time to a solved Markov Decision Process (MDP) model to obtain action information, the action information comprising: staying at the current location, repositioning the current vehicle to another current location, or completing a current ride order by the current vehicle; and
transmitting the action information to the computing device associated with the current vehicle, wherein the current vehicle is configured to stay or relocate based on the action information, wherein:
the MDP model is solved based on a plurality of historical or simulated vehicle trips under a policy of maximizing a cumulative reward for a hypothetical vehicle completing the historical or simulated vehicle trips.
35. The system of claim 34, wherein the operations further comprise:
solving the MDP model, wherein the solving comprising discretizing an activity area of the hypothetical vehicle into repeating zones and a time period into time slots.
36. The system of claim 35, wherein:
each state of the MDP model comprises:
a time represented by a time slot index,
a location represented by a repeating zone index, and
a list of available orders represented by repeating zone indices of destinations of the available orders, and
each action of the MDP model comprises: completing one of the available orders from the list, repositioning to another location, or staying at the location.
37. The system of claim 36, wherein the solving further comprises:
if the hypothetical vehicle completes one of the available orders from the list, assigning the hypothetical vehicle a reward for completing the one available order, and updating the state of the MDP model with: a next time slot corresponding to a completion of the one available order, a next location corresponding a destination of the one available order, and a next list of available orders corresponding the next location,
if the hypothetical vehicle repositions to the another location, skipping assigning the hypothetical vehicle the reward, and updating the state of the MDP model with: a next time corresponding to reaching the another location, the another location, and a next list of available orders corresponding the another location, and
if the hypothetical vehicle stays at the location, skipping assigning the hypothetical vehicle the reward, and updating the state of the MDP model with: the time slot, the location, and the list of available orders.
38. The system of claim 35, wherein:
each state of the MDP model comprises:
a time represented by a time slot index,
a location represented by a repeating zone index, and
an integer representing the number of available orders, and
each action of the MDP model comprises: completing one of the available orders from the list, repositioning to another location, or staying at the location.
39. The system of claim 38, wherein the solving further comprises:
if the hypothetical vehicle completes one of the available orders from the list, assigning the hypothetical vehicle a reward for completing the one available order, and updating the state of the MDP model with: a next time slot corresponding to a completion of the one available order, a next location corresponding a destination of the one available order, and a next integer representing a quantity of available orders corresponding the next location,
if the hypothetical vehicle repositions to the another location, skipping assigning the hypothetical vehicle the reward, and updating the state of the MDP model with: a next time corresponding to reaching the another location, the another location, and another integer representing a quantity of available orders corresponding the another location, and
if the hypothetical vehicle stays at the location, skipping assigning the hypothetical vehicle the reward, and updating the state of the MDP model with: the time slot, the location, and the integer.
40. Non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:
obtaining a current location of a current vehicle from a computing device associated with the current vehicle;
obtaining a current list of available orders based on the current location;
feeding the current location, the current list of available orders or a number of available orders in the current list, and a current time to a solved Markov Decision Process (MDP) model to obtain action information, the action information comprising: staying at the current location, repositioning the current vehicle to another current location, or completing a current ride order by the current vehicle; and
transmitting the action information to the computing device associated with the current vehicle, wherein the current vehicle is configured to stay or relocate based on the action information, wherein:
the MDP model is solved based on a plurality of historical or simulated vehicle trips under a policy of maximizing a cumulative reward for a hypothetical vehicle completing the historical or simulated vehicle trips.
US17/460,608 2018-12-13 2021-08-30 System and method for ride order dispatching Abandoned US20210398431A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/460,608 US20210398431A1 (en) 2018-12-13 2021-08-30 System and method for ride order dispatching

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US16/219,753 US11138888B2 (en) 2018-12-13 2018-12-13 System and method for ride order dispatching
US17/460,608 US20210398431A1 (en) 2018-12-13 2021-08-30 System and method for ride order dispatching

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US16/219,753 Continuation US11138888B2 (en) 2018-12-13 2018-12-13 System and method for ride order dispatching

Publications (1)

Publication Number Publication Date
US20210398431A1 true US20210398431A1 (en) 2021-12-23

Family

ID=71071611

Family Applications (2)

Application Number Title Priority Date Filing Date
US16/219,753 Active 2039-07-11 US11138888B2 (en) 2018-12-13 2018-12-13 System and method for ride order dispatching
US17/460,608 Abandoned US20210398431A1 (en) 2018-12-13 2021-08-30 System and method for ride order dispatching

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US16/219,753 Active 2039-07-11 US11138888B2 (en) 2018-12-13 2018-12-13 System and method for ride order dispatching

Country Status (3)

Country Link
US (2) US11138888B2 (en)
CN (1) CN113287124A (en)
WO (1) WO2020122966A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220036261A1 (en) * 2020-07-24 2022-02-03 Tata Consultancy Services Limited Method and system for dynamically predicting vehicle arrival time using a temporal difference learning technique

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112465602B (en) * 2020-12-11 2023-12-15 深圳依时货拉拉科技有限公司 Order pushing method, order pushing device, computer equipment and computer readable storage medium
US20220196413A1 (en) * 2020-12-17 2022-06-23 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for simulating transportation order bubbling behavior
US20220284533A1 (en) * 2021-02-26 2022-09-08 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for repositioning vehicles in a ride-hailing platform
US20220366437A1 (en) * 2021-04-27 2022-11-17 Beijing Didi Infinity Technology And Development Co., Ltd. Method and system for deep reinforcement learning and application at ride-hailing platform
CN113282787B (en) * 2021-05-24 2022-01-04 暨南大学 Personalized short video recommendation method and system based on reinforcement learning
CN114493071A (en) * 2021-07-16 2022-05-13 首约科技(北京)有限公司 Network appointment vehicle transport capacity scheduling method
CN114119159B (en) * 2021-11-29 2024-05-28 武汉理工大学 Real-time order matching and idle vehicle scheduling method and system for network vehicle
CN114970944B (en) * 2022-03-29 2024-06-18 武汉大学 Order matching and vehicle repositioning method based on multi-agent reinforcement learning
CN117252307B (en) * 2023-11-14 2024-04-09 北京阿帕科蓝科技有限公司 Traffic prediction method, traffic prediction device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8626565B2 (en) * 2008-06-30 2014-01-07 Autonomous Solutions, Inc. Vehicle dispatching method and system
US20180046961A1 (en) * 2016-08-09 2018-02-15 Conduent Business Services, Llc Method and system for dispatching of vehicles in a public transportation network
US20190347371A1 (en) * 2018-05-09 2019-11-14 Volvo Car Corporation Method and system for orchestrating multi-party services using semi-cooperative nash equilibrium based on artificial intelligence, neural network models,reinforcement learning and finite-state automata
US20200249047A1 (en) * 2017-10-25 2020-08-06 Ford Global Technologies, Llc Proactive vehicle positioning determinations
US20210172751A1 (en) * 2018-04-18 2021-06-10 Ford Global Technologies, Llc Dynamic promotions based on vehicle positioning and route determinations

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018010083A1 (en) * 2016-07-12 2018-01-18 华为技术有限公司 Vehicle external communication method, device and terminal
US10922566B2 (en) * 2017-05-09 2021-02-16 Affectiva, Inc. Cognitive state evaluation for vehicle navigation
CN108594804B (en) 2018-03-12 2021-06-18 苏州大学 Automatic driving control method for distribution trolley based on deep Q network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8626565B2 (en) * 2008-06-30 2014-01-07 Autonomous Solutions, Inc. Vehicle dispatching method and system
US20180046961A1 (en) * 2016-08-09 2018-02-15 Conduent Business Services, Llc Method and system for dispatching of vehicles in a public transportation network
US20200249047A1 (en) * 2017-10-25 2020-08-06 Ford Global Technologies, Llc Proactive vehicle positioning determinations
US20210172751A1 (en) * 2018-04-18 2021-06-10 Ford Global Technologies, Llc Dynamic promotions based on vehicle positioning and route determinations
US20190347371A1 (en) * 2018-05-09 2019-11-14 Volvo Car Corporation Method and system for orchestrating multi-party services using semi-cooperative nash equilibrium based on artificial intelligence, neural network models,reinforcement learning and finite-state automata

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220036261A1 (en) * 2020-07-24 2022-02-03 Tata Consultancy Services Limited Method and system for dynamically predicting vehicle arrival time using a temporal difference learning technique
US11556869B2 (en) * 2020-07-24 2023-01-17 Tata Consultancy Services Limited Method and system for dynamically predicting vehicle arrival time using a temporal difference learning technique

Also Published As

Publication number Publication date
US20200193834A1 (en) 2020-06-18
CN113287124A (en) 2021-08-20
WO2020122966A1 (en) 2020-06-18
US11138888B2 (en) 2021-10-05

Similar Documents

Publication Publication Date Title
US20210398431A1 (en) System and method for ride order dispatching
US11393341B2 (en) Joint order dispatching and fleet management for online ride-sharing platforms
US11514543B2 (en) System and method for ride order dispatching
US11094028B2 (en) System and method for determining passenger-seeking ride-sourcing vehicle navigation
US8498953B2 (en) Method for allocating trip sharing
CN110400128B (en) Spatial crowdsourcing task allocation method based on worker preference perception
US20200364627A1 (en) System and method for ride order dispatching
US10522036B2 (en) Method for robust control of a machine learning system and robust control system
CN110390415A (en) A kind of method and system carrying out trip mode recommendation based on user&#39;s trip big data
Grahn et al. Improving the performance of first-and last-mile mobility services through transit coordination, real-time demand prediction, advanced reservations, and trip prioritization
US20160364454A1 (en) Computing system with contextual search mechanism and method of operation thereof
US11790289B2 (en) Systems and methods for managing dynamic transportation networks using simulated future scenarios
US20220327650A1 (en) Transportation bubbling at a ride-hailing platform and machine learning
US20220044569A1 (en) Dispatching provider devices utilizing multi-outcome transportation-value metrics and dynamic provider device modes
US20220270488A1 (en) Systems and methods for order dispatching and vehicle repositioning
WO2020244081A1 (en) Constrained spatiotemporal contextual bandits for real-time ride-hailing recommendation
US10989546B2 (en) Method and device for providing vehicle navigation simulation environment
US20220284533A1 (en) Systems and methods for repositioning vehicles in a ride-hailing platform
CN113450557B (en) Method and device for updating prediction model for passenger flow of vehicle
US20220214179A1 (en) Hierarchical Coarse-Coded Spatiotemporal Embedding For Value Function Evaluation In Online Order Dispatching
US20220253765A1 (en) Regularized Spatiotemporal Dispatching Value Estimation

Legal Events

Date Code Title Description
AS Assignment

Owner name: BEIJING DIDI INFINITY TECHNOLOGY AND DEVELOPMENT CO., LTD., CHINA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:DIDI (HK) SCIENCE AND TECHNOLOGY LIMITED;REEL/FRAME:057352/0349

Effective date: 20200708

Owner name: DIDI (HK) SCIENCE AND TECHNOLOGY LIMITED, HONG KONG

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:DIDI RESEARCH AMERICA, LLC;REEL/FRAME:057325/0723

Effective date: 20200429

Owner name: DIDI RESEARCH AMERICA, LLC, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:QIN, ZHIWEI;FENG, FEI;REEL/FRAME:057325/0676

Effective date: 20181213

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION