CN114048978A - Supply and demand scheduling strategy fusion application based on machine learning model - Google Patents

Supply and demand scheduling strategy fusion application based on machine learning model Download PDF

Info

Publication number
CN114048978A
CN114048978A CN202111266699.2A CN202111266699A CN114048978A CN 114048978 A CN114048978 A CN 114048978A CN 202111266699 A CN202111266699 A CN 202111266699A CN 114048978 A CN114048978 A CN 114048978A
Authority
CN
China
Prior art keywords
data
order
machine learning
learning model
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111266699.2A
Other languages
Chinese (zh)
Inventor
薛鹏
于红建
余进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shansong Technology Co ltd
Original Assignee
Beijing Shansong Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shansong Technology Co ltd filed Critical Beijing Shansong Technology Co ltd
Priority to CN202111266699.2A priority Critical patent/CN114048978A/en
Publication of CN114048978A publication Critical patent/CN114048978A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06315Needs-based resource requirements planning or analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping

Abstract

The invention relates to supply and demand scheduling strategy fusion application based on a machine learning model. The method predicts the order receiving probability of the order between the transport capacity and the transport capacity by fusing a logistic regression algorithm model, and calculates the best combination of the order and the transport capacity by three steps of transport capacity recall, filtering and sequencing in the process of business flow, thereby achieving the optimal platform order dividing efficiency. The method has the advantages that basic preprocessing processes such as normalization, null value processing and the like are carried out on data according to a standard flow in the order receiving rate prediction process, then the order receiving rate between the current order and the transport capacity is predicted in real time through a trained model, the order receiving efficiency and the order receiving rate of the platform are greatly optimized through the mode, and the optimization of the order distribution matching of the platform in the space-time dimension is guaranteed.

Description

Supply and demand scheduling strategy fusion application based on machine learning model
Technical Field
The invention relates to supply and demand scheduling strategy fusion application based on a machine learning model, and belongs to the technical field of order-separating optimization and intelligent scheduling research.
Background
The instant distribution is a rapid distribution service with the distribution time length of less than 1 hour and the average distribution time length of about 30 minutes. The rapid distribution timeliness integrates the traditional online e-commerce transaction and offline logistics distribution (two businesses with definite traditional division) into a unified whole, and a ternary relationship of interaction among a user, a rider and a platform is formed. In the development of the layer-by-layer evolution of the instant logistics distributed system architecture, the technical obstacles and challenges are encountered: large scale of orders and riders and ultra-large scale calculation in the supply and demand matching process. In holidays or severe weather, the orders aggregate, and the flow peak is dozens of times of the usual. The logistics performance is the central scheduling under the on-line connecting line, which is embodied on the order dispatching system, namely, one or a batch of optimal efficiency solutions are calculated according to a series of factors to directly dispatch the orders. The challenge for the distribution system is also the balance between the requirements for identification accuracy and the cost. The requirement on accuracy is high, after all, the identification directly affects pricing, scheduling and liability judgment systems, and the problem caused by low accuracy of the underlying data is great.
The key to efficiently matching one of these is on-demand allocation, identifying the exact needs of the user, and matching to the best fit among the many resources. In order to make efficient matching, the platform accumulates from the daily order a lot of information from the drivers and users, including their journey routes, behaviour habits, special needs, etc., in addition to knowledge of the traffic conditions throughout the city, making it possible to predict the demand ahead of time and then to ensure that the supply quantity matches the demand quantity to be reached, so that the idle resources can be activated in an optimal way.
What the scheduling platform really needs to solve is how to improve matching efficiency. The platform probably relies on subsidy and ground to push away to rush to market earlier stage, and later stage has arrived, and the promotion of matching efficiency is the most important, only matches suitable trip resource, just can let customer's demand obtain furthest's satisfaction. Similarly, in the intelligent scheduling of the ant golden service customer service, how to obtain the most accurate matching of the user requirements and ensure the availability of corresponding resources solves the problems, and the user expectation can be realized to the maximum extent.
And (3) updating the geographic information in real time (a request is initiated within 5 seconds), describing the condition of the whole resource, and pushing the order according to the resource condition at the first time after the user sends the order requirement. Based on the statistics of historical data and the combination of real-time order data, the distribution of order dense areas in the current whole city range is given, valuable listening unit set reference is provided for the rider, the probability of listening the order is improved, and the idle running time of the rider is reduced. And (4) based on the supply and demand prediction result, all available transport capacity of the whole city is orderly mobilized on a large scale, and optimal allocation of resources is realized. The order taking probability model is learned in historical data of the rider and the user, the matching degree of the rider and the user is improved, and the overall transportation efficiency and the passenger trip experience are globally optimized in real time by utilizing the scale effect of the transport capacity. The fault tolerance is extremely low, the system cannot be down, the system cannot be lost, and the availability requirement is extremely high. The data has high requirements on real-time performance and accuracy and is very sensitive to delay and abnormity.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: how to realize the supply and demand scheduling strategy fusion application based on the machine learning model.
In order to achieve the above object, the present invention provides a method for fusing and applying a supply and demand scheduling policy based on a machine learning model, comprising the following steps:
a: determining the statistical aperture of the service characteristics, and collecting related data results;
b: determining a characteristic selection scheme and screening characteristics with excellent effect;
c: establishing a characteristic engineering flow, and converting data into data which can be understood and digested by an algorithm;
d: comparing the algorithm effects of the off-line data;
e: and (4) evaluating the on-line gray effect of the algorithm, and selecting an optimal expression algorithm to carry out formal on-line.
Preferably, the step of generating the supply and demand scheduling policy fusion application scheme of the corresponding opportunistic machine learning model specifically includes:
1. the features are divided into five major types of features, namely user, capacity, order, city and weather, segmentation can be carried out continuously based on the property and attribute of each type of feature, finally the number of related features is determined to be more than 100, the existing features are combed by combining service properties, and the feature statistical period and the statistical standard are determined.
2. And preparing related data based on the characteristics determined in the previous step and the corresponding statistical aperture, randomly sampling the sample data, and screening 100000 pieces of data.
3. And (3) carrying out availability evaluation on the features by utilizing various methods (such as Pearson coefficients, chi-square test, decision tree algorithm and the like), further screening the features with high correlation with the target result, and removing redundant features with high similarity.
4. In order to reduce the influence of data loss on model accuracy, methods such as mode filling, mean filling, median filling, KNN clustering filling, fixed value filling, context filling and direct elimination need to be adopted to fill the missing value. And determining a corresponding filling method according to the service scene and the algorithm requirement.
5. Method for using experience, abnormal value of box chart and
Figure BSA0000256248880000041
and (5) removing abnormal values by a principle method.
6. Discrete data processing is carried out by using label coding, one hot coding and embedding methods for reference.
7. And (5) processing continuous data in barrels.
8. Data normalization/normalization processing.
9. The sample is randomly divided into three levels of a training set, a verification set and a test set according to the ratio of 6: 2.
10. And marking the sample data by combining the service data, wherein the marking data is based on whether the order is picked up by the transport capacity after the order is dispatched.
11. A list of selected algorithms is determined, which mainly comprises (ridge regression, Lasso, LR, FM, svm, bayes classifier, Adaboost, lightgbm, etc.).
12. And determining related indexes for measuring the advantages and the disadvantages of the algorithms, ranking the algorithms based on the indexes, and screening the algorithm with the top five ranked algorithms for online testing.
13. And observing the on-line algorithm effect for 2 weeks, wherein the observation index is determined as the on-line integral order rate, and the standard of selecting the model is determined according to the 2-week average order rate.
Drawings
FIG. 1: the supply and demand scheduling strategy of the invention is integrated with the application principle flow diagram.
FIG. 2: the supply and demand scheduling strategy of the invention is integrated with an application example order distribution schematic diagram.
Detailed Description
The following detailed description of the present invention will be made with reference to the accompanying drawings and examples, which are provided for illustration of the present invention and are not intended to limit the scope of the present invention.
As shown in fig. 1, a: determining the statistical aperture of the service characteristics, and collecting related data results; b: determining a characteristic selection scheme and screening characteristics with excellent effect; c: establishing a characteristic engineering flow, and converting data into data which can be understood and digested by an algorithm; d: comparing the algorithm effects of the off-line data; e: and (4) evaluating the on-line gray effect of the algorithm, and selecting an optimal expression algorithm to carry out formal on-line. The supply and demand scheduling policy fusion application is explained in detail based on an LR model.
Firstly, the steps of determining service characteristics and statistical caliber are introduced, all relevant characteristics which can influence the order taking rate are listed according to service experience and brainstorming, meanwhile, the statistical period and the statistical formula of all the characteristics are determined, the label source of all the characteristics is whether the orders in the historical orders are taken by relevant riders, all the characteristic data and the label data are calculated in hive, and 100000 pieces of data are randomly sampled to serve as input data.
And secondly, introducing a characteristic selection scheme and screening a standard of the characteristic with excellent effect. The current popular feature selection method comprises a Pearson coefficient, a Chi-square test and a decision tree algorithm, the use scenes, the implementation cost and the final effect of each method are different, then under the comprehensive consideration, the Pearson coefficient is finally determined to be used as a method for screening features, in order to eliminate multiple collinearity among the features, the features with high similarity are deleted, the Pearson coefficient threshold value with high feature similarity is judged to be 0.5, the features larger than 0.5 can be considered to be high in correlation, and the features with the highest correlation coefficient with a target label are selected to be used as training data features from a feature group with high correlation.
Thirdly, a processing flow of feature engineering is introduced. In thatThe most basic in the characteristic engineering is to process abnormal values and null values, the processing mode of the abnormal values is generally strong, data hitting abnormal standards are directly deleted, and methods for measuring whether the data are abnormal include a box diagram abnormal value method, a method for measuring the abnormal values of the data and a method for measuring the abnormal values of the data, and a method for measuring the abnormal values of the data,
Figure BSA0000256248880000061
And (4) an abnormal value method. Bias to use for data fitting to normal distribution
Figure BSA0000256248880000062
And (4) adopting an abnormal value method, otherwise adopting a box type map abnormal value method. For the null value processing mode, the influence degree of the abnormal value on the training result and the proportion of data missing need to be judged, and under the condition that the influence degree is high and the proportion of data missing is high, the characteristic needs to be deleted. If not, the data needs to be filled in by adopting a relevant method. The corresponding filling method comprises mode filling, mean filling, median filling, KNN cluster filling, interpolation, fixed value filling and context filling.
In order to ensure that the input data can be understood by the model and the accuracy and efficiency of model calculation are ensured, discrete data need to be processed, and reference methods are label coding, onehot coding and embedding. Meanwhile, continuous data needs to be subjected to barrel processing, and reference methods include equal-width barrel division, equal-frequency barrel division and woe encoding.
Because the LR model has higher sensitivity to different dimensions, normalization processing needs to be carried out on each feature data in order to eliminate the influence of different dimensions of features on the efficiency and the accuracy of the model, and normalization processing methods comprise normalization processing and min-max normalization processing.
The data were divided into a training set, a validation set and a test set in a 6: 2 ratio.
And inputting the training data into the model for training, wherein the evaluation indexes comprise accuracy, AUC, recall ratio and accuracy.
In conclusion, the supply and demand scheduling strategy fusion application based on the machine learning model aims to improve the matching degree between orders and riders, improve the self-order-dividing efficiency of the platform and guarantee the high-quality experience of users and the two sides of the riders.

Claims (6)

1. A supply and demand scheduling strategy fusion application based on a machine learning model is characterized by comprising the following steps:
a: determining the statistical aperture of the service characteristics, and collecting related data results;
b: determining a characteristic selection scheme and screening characteristics with excellent effect;
c: establishing a characteristic engineering flow, and converting data into data which can be understood and digested by an algorithm;
d: comparing the algorithm effects of the off-line data;
e: and (4) evaluating the on-line gray effect of the algorithm, and selecting an optimal expression algorithm to carry out formal on-line.
2. The machine learning model-based supply and demand scheduling policy fusion application of claim 1, wherein: the step A specifically comprises the following steps:
a1: dividing the characteristics into five major characteristics of users, transport capacity, orders, cities and weather according to the past business experience, continuously subdividing based on the properties and attributes of each characteristic, finally determining that the number of related characteristics exceeds 100, combing the existing characteristics by combining the business properties and determining the characteristic statistical period and the statistical standard.
A2: and preparing related data based on the characteristics determined in A1 and the corresponding statistical aperture, marking the sample data by combining the service data, wherein the marking data is based on whether the order is received by the transport capacity after the order is dispatched. And simultaneously, randomly sampling sample data and screening 10000 pieces of data.
3. The machine learning model-based supply and demand scheduling policy fusion application of claim 1, wherein: the step B specifically comprises the following steps: in order to avoid dimension disasters, the computational complexity of machine learning needs to be reduced on the premise of ensuring the training result, and the feature screening is particularly important. During feature screening, a plurality of methods (such as Pearson coefficients, Chi-Square tests, decision tree algorithms and the like) can be used for evaluating the usability of the features, so that the features with high correlation with a target result are screened, and redundant features with high similarity are eliminated.
4. The machine learning model-based supply and demand scheduling policy fusion application of claim 1, wherein: the step C specifically comprises the following steps:
c1: in order to reduce the influence of data loss on model accuracy, methods such as mode filling, mean filling, median filling, KNN clustering filling, fixed value filling, context filling and direct elimination need to be adopted to fill the missing value. And determining a corresponding filling method according to the service scene and the algorithm requirement.
C2: method for using experience, abnormal value of box chart and
Figure FSA0000256248870000021
and (5) removing abnormal values by a principle method.
C3: discrete data processing is carried out by using label coding, one hot coding and embedding methods for reference.
C4: and (5) processing continuous data in barrels.
C5: data normalization/normalization processing.
C6: the sample is randomly divided into three levels of a training set, a verification set and a test set according to the ratio of 6: 2.
5. The machine learning model-based supply and demand scheduling policy fusion application of claim 1, wherein: the step D specifically comprises the following steps:
d1: determining a list of selected algorithms, which mainly comprises (Ridge regression, Lasso, LR, FM, svm, Bayesian classifier, Adaboost, lightgbm, etc.)
D2: and determining related indexes for measuring the advantages and the disadvantages of the algorithms, ranking the algorithms based on the indexes, and screening the algorithm with the top five ranked algorithms for online testing.
6. The machine learning model-based supply and demand scheduling policy fusion application of claim 1, wherein: the step E specifically comprises the following steps: the observation period of the online algorithm is determined as 2 weeks, the observation index is determined as the overall on-line order receiving rate, and the standard of selecting the model is determined according to the average on-line order receiving rate of 2 weeks.
CN202111266699.2A 2021-10-27 2021-10-27 Supply and demand scheduling strategy fusion application based on machine learning model Pending CN114048978A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111266699.2A CN114048978A (en) 2021-10-27 2021-10-27 Supply and demand scheduling strategy fusion application based on machine learning model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111266699.2A CN114048978A (en) 2021-10-27 2021-10-27 Supply and demand scheduling strategy fusion application based on machine learning model

Publications (1)

Publication Number Publication Date
CN114048978A true CN114048978A (en) 2022-02-15

Family

ID=80206324

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111266699.2A Pending CN114048978A (en) 2021-10-27 2021-10-27 Supply and demand scheduling strategy fusion application based on machine learning model

Country Status (1)

Country Link
CN (1) CN114048978A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116663858A (en) * 2023-07-25 2023-08-29 武汉新威奇科技有限公司 Screw press resource scheduling method and system based on demand matching

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116663858A (en) * 2023-07-25 2023-08-29 武汉新威奇科技有限公司 Screw press resource scheduling method and system based on demand matching
CN116663858B (en) * 2023-07-25 2023-10-24 武汉新威奇科技有限公司 Screw press resource scheduling method and system based on demand matching

Similar Documents

Publication Publication Date Title
CN107844915B (en) Automatic scheduling method of call center based on traffic prediction
CN111582559B (en) Arrival time estimation method and device
CN108038578B (en) Public bicycle static scheduling method based on demand prediction and central radiation network
CN110245377B (en) Travel scheme recommendation method and recommendation system
CN112241494B (en) Key information pushing method and device based on user behavior data
CN111428137B (en) Recommendation method and recommendation device for electric vehicle charging facilities
CN112258251A (en) Grey correlation-based integrated learning prediction method and system for electric vehicle battery replacement demand
CN116824861B (en) Method and system for scheduling sharing bicycle based on multidimensional data of urban brain platform
WO2022073444A1 (en) Systems and methods for dispatching shared rides through ride-hailing platform
CN111899061A (en) Order recommendation method, device, equipment and storage medium
CN114048978A (en) Supply and demand scheduling strategy fusion application based on machine learning model
CN116523177A (en) Vehicle energy consumption prediction method and device integrating mechanism and deep learning model
CN117610734A (en) Deep learning-based user behavior prediction method, system and electronic equipment
CN113379318A (en) Method and device for evaluating operation service quality of public transport system and computer equipment
CN117196630A (en) Transaction risk prediction method, device, terminal equipment and storage medium
CN111091460A (en) Data processing method and device
CN110796301A (en) Passenger flow prediction method and device based on IC card data
CN113792945B (en) Dispatching method, device, equipment and readable storage medium of commercial vehicle
CN115049107A (en) Electric quantity prediction system based on thing allies oneself with collection edge calculation
CN114896482A (en) Model training and energy supplementing intention recognition method, device, equipment and medium
CN115600710A (en) Method for predicting regional demand of manned vehicle, electronic device and storage medium
CN113191569A (en) Enterprise management method and system based on big data
CN111121806A (en) Travel mode planning method and device, computer equipment and storage medium
CN114518763B (en) Route planning method for cooperative bus
CN113298448B (en) Lease index analysis method and system based on Internet and cloud platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination