CN114048978A - Supply and demand scheduling strategy fusion application based on machine learning model - Google Patents
Supply and demand scheduling strategy fusion application based on machine learning model Download PDFInfo
- Publication number
- CN114048978A CN114048978A CN202111266699.2A CN202111266699A CN114048978A CN 114048978 A CN114048978 A CN 114048978A CN 202111266699 A CN202111266699 A CN 202111266699A CN 114048978 A CN114048978 A CN 114048978A
- Authority
- CN
- China
- Prior art keywords
- data
- order
- machine learning
- learning model
- algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06315—Needs-based resource requirements planning or analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06312—Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/08—Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
- G06Q10/083—Shipping
Abstract
The invention relates to supply and demand scheduling strategy fusion application based on a machine learning model. The method predicts the order receiving probability of the order between the transport capacity and the transport capacity by fusing a logistic regression algorithm model, and calculates the best combination of the order and the transport capacity by three steps of transport capacity recall, filtering and sequencing in the process of business flow, thereby achieving the optimal platform order dividing efficiency. The method has the advantages that basic preprocessing processes such as normalization, null value processing and the like are carried out on data according to a standard flow in the order receiving rate prediction process, then the order receiving rate between the current order and the transport capacity is predicted in real time through a trained model, the order receiving efficiency and the order receiving rate of the platform are greatly optimized through the mode, and the optimization of the order distribution matching of the platform in the space-time dimension is guaranteed.
Description
Technical Field
The invention relates to supply and demand scheduling strategy fusion application based on a machine learning model, and belongs to the technical field of order-separating optimization and intelligent scheduling research.
Background
The instant distribution is a rapid distribution service with the distribution time length of less than 1 hour and the average distribution time length of about 30 minutes. The rapid distribution timeliness integrates the traditional online e-commerce transaction and offline logistics distribution (two businesses with definite traditional division) into a unified whole, and a ternary relationship of interaction among a user, a rider and a platform is formed. In the development of the layer-by-layer evolution of the instant logistics distributed system architecture, the technical obstacles and challenges are encountered: large scale of orders and riders and ultra-large scale calculation in the supply and demand matching process. In holidays or severe weather, the orders aggregate, and the flow peak is dozens of times of the usual. The logistics performance is the central scheduling under the on-line connecting line, which is embodied on the order dispatching system, namely, one or a batch of optimal efficiency solutions are calculated according to a series of factors to directly dispatch the orders. The challenge for the distribution system is also the balance between the requirements for identification accuracy and the cost. The requirement on accuracy is high, after all, the identification directly affects pricing, scheduling and liability judgment systems, and the problem caused by low accuracy of the underlying data is great.
The key to efficiently matching one of these is on-demand allocation, identifying the exact needs of the user, and matching to the best fit among the many resources. In order to make efficient matching, the platform accumulates from the daily order a lot of information from the drivers and users, including their journey routes, behaviour habits, special needs, etc., in addition to knowledge of the traffic conditions throughout the city, making it possible to predict the demand ahead of time and then to ensure that the supply quantity matches the demand quantity to be reached, so that the idle resources can be activated in an optimal way.
What the scheduling platform really needs to solve is how to improve matching efficiency. The platform probably relies on subsidy and ground to push away to rush to market earlier stage, and later stage has arrived, and the promotion of matching efficiency is the most important, only matches suitable trip resource, just can let customer's demand obtain furthest's satisfaction. Similarly, in the intelligent scheduling of the ant golden service customer service, how to obtain the most accurate matching of the user requirements and ensure the availability of corresponding resources solves the problems, and the user expectation can be realized to the maximum extent.
And (3) updating the geographic information in real time (a request is initiated within 5 seconds), describing the condition of the whole resource, and pushing the order according to the resource condition at the first time after the user sends the order requirement. Based on the statistics of historical data and the combination of real-time order data, the distribution of order dense areas in the current whole city range is given, valuable listening unit set reference is provided for the rider, the probability of listening the order is improved, and the idle running time of the rider is reduced. And (4) based on the supply and demand prediction result, all available transport capacity of the whole city is orderly mobilized on a large scale, and optimal allocation of resources is realized. The order taking probability model is learned in historical data of the rider and the user, the matching degree of the rider and the user is improved, and the overall transportation efficiency and the passenger trip experience are globally optimized in real time by utilizing the scale effect of the transport capacity. The fault tolerance is extremely low, the system cannot be down, the system cannot be lost, and the availability requirement is extremely high. The data has high requirements on real-time performance and accuracy and is very sensitive to delay and abnormity.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: how to realize the supply and demand scheduling strategy fusion application based on the machine learning model.
In order to achieve the above object, the present invention provides a method for fusing and applying a supply and demand scheduling policy based on a machine learning model, comprising the following steps:
a: determining the statistical aperture of the service characteristics, and collecting related data results;
b: determining a characteristic selection scheme and screening characteristics with excellent effect;
c: establishing a characteristic engineering flow, and converting data into data which can be understood and digested by an algorithm;
d: comparing the algorithm effects of the off-line data;
e: and (4) evaluating the on-line gray effect of the algorithm, and selecting an optimal expression algorithm to carry out formal on-line.
Preferably, the step of generating the supply and demand scheduling policy fusion application scheme of the corresponding opportunistic machine learning model specifically includes:
1. the features are divided into five major types of features, namely user, capacity, order, city and weather, segmentation can be carried out continuously based on the property and attribute of each type of feature, finally the number of related features is determined to be more than 100, the existing features are combed by combining service properties, and the feature statistical period and the statistical standard are determined.
2. And preparing related data based on the characteristics determined in the previous step and the corresponding statistical aperture, randomly sampling the sample data, and screening 100000 pieces of data.
3. And (3) carrying out availability evaluation on the features by utilizing various methods (such as Pearson coefficients, chi-square test, decision tree algorithm and the like), further screening the features with high correlation with the target result, and removing redundant features with high similarity.
4. In order to reduce the influence of data loss on model accuracy, methods such as mode filling, mean filling, median filling, KNN clustering filling, fixed value filling, context filling and direct elimination need to be adopted to fill the missing value. And determining a corresponding filling method according to the service scene and the algorithm requirement.
5. Method for using experience, abnormal value of box chart andand (5) removing abnormal values by a principle method.
6. Discrete data processing is carried out by using label coding, one hot coding and embedding methods for reference.
7. And (5) processing continuous data in barrels.
8. Data normalization/normalization processing.
9. The sample is randomly divided into three levels of a training set, a verification set and a test set according to the ratio of 6: 2.
10. And marking the sample data by combining the service data, wherein the marking data is based on whether the order is picked up by the transport capacity after the order is dispatched.
11. A list of selected algorithms is determined, which mainly comprises (ridge regression, Lasso, LR, FM, svm, bayes classifier, Adaboost, lightgbm, etc.).
12. And determining related indexes for measuring the advantages and the disadvantages of the algorithms, ranking the algorithms based on the indexes, and screening the algorithm with the top five ranked algorithms for online testing.
13. And observing the on-line algorithm effect for 2 weeks, wherein the observation index is determined as the on-line integral order rate, and the standard of selecting the model is determined according to the 2-week average order rate.
Drawings
FIG. 1: the supply and demand scheduling strategy of the invention is integrated with the application principle flow diagram.
FIG. 2: the supply and demand scheduling strategy of the invention is integrated with an application example order distribution schematic diagram.
Detailed Description
The following detailed description of the present invention will be made with reference to the accompanying drawings and examples, which are provided for illustration of the present invention and are not intended to limit the scope of the present invention.
As shown in fig. 1, a: determining the statistical aperture of the service characteristics, and collecting related data results; b: determining a characteristic selection scheme and screening characteristics with excellent effect; c: establishing a characteristic engineering flow, and converting data into data which can be understood and digested by an algorithm; d: comparing the algorithm effects of the off-line data; e: and (4) evaluating the on-line gray effect of the algorithm, and selecting an optimal expression algorithm to carry out formal on-line. The supply and demand scheduling policy fusion application is explained in detail based on an LR model.
Firstly, the steps of determining service characteristics and statistical caliber are introduced, all relevant characteristics which can influence the order taking rate are listed according to service experience and brainstorming, meanwhile, the statistical period and the statistical formula of all the characteristics are determined, the label source of all the characteristics is whether the orders in the historical orders are taken by relevant riders, all the characteristic data and the label data are calculated in hive, and 100000 pieces of data are randomly sampled to serve as input data.
And secondly, introducing a characteristic selection scheme and screening a standard of the characteristic with excellent effect. The current popular feature selection method comprises a Pearson coefficient, a Chi-square test and a decision tree algorithm, the use scenes, the implementation cost and the final effect of each method are different, then under the comprehensive consideration, the Pearson coefficient is finally determined to be used as a method for screening features, in order to eliminate multiple collinearity among the features, the features with high similarity are deleted, the Pearson coefficient threshold value with high feature similarity is judged to be 0.5, the features larger than 0.5 can be considered to be high in correlation, and the features with the highest correlation coefficient with a target label are selected to be used as training data features from a feature group with high correlation.
Thirdly, a processing flow of feature engineering is introduced. In thatThe most basic in the characteristic engineering is to process abnormal values and null values, the processing mode of the abnormal values is generally strong, data hitting abnormal standards are directly deleted, and methods for measuring whether the data are abnormal include a box diagram abnormal value method, a method for measuring the abnormal values of the data and a method for measuring the abnormal values of the data, and a method for measuring the abnormal values of the data,And (4) an abnormal value method. Bias to use for data fitting to normal distributionAnd (4) adopting an abnormal value method, otherwise adopting a box type map abnormal value method. For the null value processing mode, the influence degree of the abnormal value on the training result and the proportion of data missing need to be judged, and under the condition that the influence degree is high and the proportion of data missing is high, the characteristic needs to be deleted. If not, the data needs to be filled in by adopting a relevant method. The corresponding filling method comprises mode filling, mean filling, median filling, KNN cluster filling, interpolation, fixed value filling and context filling.
In order to ensure that the input data can be understood by the model and the accuracy and efficiency of model calculation are ensured, discrete data need to be processed, and reference methods are label coding, onehot coding and embedding. Meanwhile, continuous data needs to be subjected to barrel processing, and reference methods include equal-width barrel division, equal-frequency barrel division and woe encoding.
Because the LR model has higher sensitivity to different dimensions, normalization processing needs to be carried out on each feature data in order to eliminate the influence of different dimensions of features on the efficiency and the accuracy of the model, and normalization processing methods comprise normalization processing and min-max normalization processing.
The data were divided into a training set, a validation set and a test set in a 6: 2 ratio.
And inputting the training data into the model for training, wherein the evaluation indexes comprise accuracy, AUC, recall ratio and accuracy.
In conclusion, the supply and demand scheduling strategy fusion application based on the machine learning model aims to improve the matching degree between orders and riders, improve the self-order-dividing efficiency of the platform and guarantee the high-quality experience of users and the two sides of the riders.
Claims (6)
1. A supply and demand scheduling strategy fusion application based on a machine learning model is characterized by comprising the following steps:
a: determining the statistical aperture of the service characteristics, and collecting related data results;
b: determining a characteristic selection scheme and screening characteristics with excellent effect;
c: establishing a characteristic engineering flow, and converting data into data which can be understood and digested by an algorithm;
d: comparing the algorithm effects of the off-line data;
e: and (4) evaluating the on-line gray effect of the algorithm, and selecting an optimal expression algorithm to carry out formal on-line.
2. The machine learning model-based supply and demand scheduling policy fusion application of claim 1, wherein: the step A specifically comprises the following steps:
a1: dividing the characteristics into five major characteristics of users, transport capacity, orders, cities and weather according to the past business experience, continuously subdividing based on the properties and attributes of each characteristic, finally determining that the number of related characteristics exceeds 100, combing the existing characteristics by combining the business properties and determining the characteristic statistical period and the statistical standard.
A2: and preparing related data based on the characteristics determined in A1 and the corresponding statistical aperture, marking the sample data by combining the service data, wherein the marking data is based on whether the order is received by the transport capacity after the order is dispatched. And simultaneously, randomly sampling sample data and screening 10000 pieces of data.
3. The machine learning model-based supply and demand scheduling policy fusion application of claim 1, wherein: the step B specifically comprises the following steps: in order to avoid dimension disasters, the computational complexity of machine learning needs to be reduced on the premise of ensuring the training result, and the feature screening is particularly important. During feature screening, a plurality of methods (such as Pearson coefficients, Chi-Square tests, decision tree algorithms and the like) can be used for evaluating the usability of the features, so that the features with high correlation with a target result are screened, and redundant features with high similarity are eliminated.
4. The machine learning model-based supply and demand scheduling policy fusion application of claim 1, wherein: the step C specifically comprises the following steps:
c1: in order to reduce the influence of data loss on model accuracy, methods such as mode filling, mean filling, median filling, KNN clustering filling, fixed value filling, context filling and direct elimination need to be adopted to fill the missing value. And determining a corresponding filling method according to the service scene and the algorithm requirement.
C2: method for using experience, abnormal value of box chart andand (5) removing abnormal values by a principle method.
C3: discrete data processing is carried out by using label coding, one hot coding and embedding methods for reference.
C4: and (5) processing continuous data in barrels.
C5: data normalization/normalization processing.
C6: the sample is randomly divided into three levels of a training set, a verification set and a test set according to the ratio of 6: 2.
5. The machine learning model-based supply and demand scheduling policy fusion application of claim 1, wherein: the step D specifically comprises the following steps:
d1: determining a list of selected algorithms, which mainly comprises (Ridge regression, Lasso, LR, FM, svm, Bayesian classifier, Adaboost, lightgbm, etc.)
D2: and determining related indexes for measuring the advantages and the disadvantages of the algorithms, ranking the algorithms based on the indexes, and screening the algorithm with the top five ranked algorithms for online testing.
6. The machine learning model-based supply and demand scheduling policy fusion application of claim 1, wherein: the step E specifically comprises the following steps: the observation period of the online algorithm is determined as 2 weeks, the observation index is determined as the overall on-line order receiving rate, and the standard of selecting the model is determined according to the average on-line order receiving rate of 2 weeks.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111266699.2A CN114048978A (en) | 2021-10-27 | 2021-10-27 | Supply and demand scheduling strategy fusion application based on machine learning model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111266699.2A CN114048978A (en) | 2021-10-27 | 2021-10-27 | Supply and demand scheduling strategy fusion application based on machine learning model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114048978A true CN114048978A (en) | 2022-02-15 |
Family
ID=80206324
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111266699.2A Pending CN114048978A (en) | 2021-10-27 | 2021-10-27 | Supply and demand scheduling strategy fusion application based on machine learning model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114048978A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116663858A (en) * | 2023-07-25 | 2023-08-29 | 武汉新威奇科技有限公司 | Screw press resource scheduling method and system based on demand matching |
-
2021
- 2021-10-27 CN CN202111266699.2A patent/CN114048978A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116663858A (en) * | 2023-07-25 | 2023-08-29 | 武汉新威奇科技有限公司 | Screw press resource scheduling method and system based on demand matching |
CN116663858B (en) * | 2023-07-25 | 2023-10-24 | 武汉新威奇科技有限公司 | Screw press resource scheduling method and system based on demand matching |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107844915B (en) | Automatic scheduling method of call center based on traffic prediction | |
CN111582559B (en) | Arrival time estimation method and device | |
CN108038578B (en) | Public bicycle static scheduling method based on demand prediction and central radiation network | |
CN110245377B (en) | Travel scheme recommendation method and recommendation system | |
CN112241494B (en) | Key information pushing method and device based on user behavior data | |
CN111428137B (en) | Recommendation method and recommendation device for electric vehicle charging facilities | |
CN112258251A (en) | Grey correlation-based integrated learning prediction method and system for electric vehicle battery replacement demand | |
CN116824861B (en) | Method and system for scheduling sharing bicycle based on multidimensional data of urban brain platform | |
WO2022073444A1 (en) | Systems and methods for dispatching shared rides through ride-hailing platform | |
CN111899061A (en) | Order recommendation method, device, equipment and storage medium | |
CN114048978A (en) | Supply and demand scheduling strategy fusion application based on machine learning model | |
CN116523177A (en) | Vehicle energy consumption prediction method and device integrating mechanism and deep learning model | |
CN117610734A (en) | Deep learning-based user behavior prediction method, system and electronic equipment | |
CN113379318A (en) | Method and device for evaluating operation service quality of public transport system and computer equipment | |
CN117196630A (en) | Transaction risk prediction method, device, terminal equipment and storage medium | |
CN111091460A (en) | Data processing method and device | |
CN110796301A (en) | Passenger flow prediction method and device based on IC card data | |
CN113792945B (en) | Dispatching method, device, equipment and readable storage medium of commercial vehicle | |
CN115049107A (en) | Electric quantity prediction system based on thing allies oneself with collection edge calculation | |
CN114896482A (en) | Model training and energy supplementing intention recognition method, device, equipment and medium | |
CN115600710A (en) | Method for predicting regional demand of manned vehicle, electronic device and storage medium | |
CN113191569A (en) | Enterprise management method and system based on big data | |
CN111121806A (en) | Travel mode planning method and device, computer equipment and storage medium | |
CN114518763B (en) | Route planning method for cooperative bus | |
CN113298448B (en) | Lease index analysis method and system based on Internet and cloud platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |