CN109902859B

CN109902859B - Queuing peak period estimation method based on big data and machine learning algorithm

Info

Publication number: CN109902859B
Application number: CN201910076184.2A
Authority: CN
Inventors: 张乐情; 徐博识; 郑国春; 谢新法
Original assignee: Delicious Nowait Shanghai Information Technology Co ltd
Current assignee: Meizhiwei Shanghai Information Technology Co ltd
Priority date: 2019-01-26
Filing date: 2019-01-26
Publication date: 2023-03-24
Anticipated expiration: 2039-01-26
Also published as: CN109902859A

Abstract

A queuing peak period estimation method based on big data and a machine learning algorithm relates to the technical field of queuing peak period estimation. The queuing peak period estimation method based on big data and a machine learning algorithm specifically comprises the following steps: effectively cleaning the data based on the actual dining sequence between adjacent numbers; model training is carried out on the relation between the waiting time and the number taking time, the number taking date and the like by combining a machine learning algorithm, and the specific relation between the waiting time and the parameters is obtained; aiming at the predicted specific number taking time period and date parameter of the C-end user, the corresponding queuing waiting time is predicted; and scanning time points in the normal business hours all day to obtain a relation curve between each number taking time in the normal business hours in one day and the estimated queuing waiting time. After the technical scheme is adopted, the invention has the beneficial effects that: the accuracy of the estimated time is improved, the time waste of the user is reduced, and the dining efficiency of the whole catering industry is improved.

Description

Queuing peak period estimation method based on big data and machine learning algorithm

Technical Field

The invention relates to the technical field of queuing peak period estimation, in particular to a queuing peak period estimation method based on big data and a machine learning algorithm.

Background

Queuing for dining becomes the normality of all large commercial circle fire explosion dining rooms, in order to standardize queuing order, orderly dining is generally carried out according to the number taking sequence, and meanwhile, in order to bring better queuing experience to C-end users, the number of tables needing waiting in front is generally marked on number taking lists. With the popularization of smart phones and the development of mobile internet, a new mode of dining under an online number-taking line becomes mainstream step by step, a user can take numbers online without arriving at a restaurant site, and the travel of the user is reasonably arranged according to the number of current waiting tables. Based on CCP118040081: the patent technology of the method and the device for pre-estimating the meal waiting time can also calculate the required waiting time, and further improve the queuing experience of the C-end user. However, these applications are based on the premise that the number ticket is already fetched, and when the number ticket cannot be fetched at present, the waiting time with reference significance cannot be provided for the C-side client. For example, today, it is not known how long to queue for getting a meal at 11 pm in the tomorrow, nor how long to get a meal just before having a meal at 12 pm in the tomorrow (although some tables are only reserved for meals, the table is mainly used for high-end services such as large tables and boxes, and the meal peak period of the small and medium tables does not support the reservation for meals). In recent years, with the rapid development of big data and machine learning algorithms, more and more fields are revolutionized. At present, in the field of dining queuing, due to the wide application of mobile interconnection, a large amount of waiting time data are accumulated on line, so that the estimation of the queuing time in the future period becomes feasible. In order to meet better experience requirements of a C-end client on how long a customer can expect to wait for taking a number at a certain future time and how long a customer needs to take a number in advance for eating at a certain future time on the premise of not changing an existing restaurant management mode, the scheme of the invention provides a queuing peak period estimation method based on big data and a machine learning algorithm. However, in the prior art, the whole day is simply and roughly divided in a fixed time length, and simple averaging is performed in the same time period, which results in that: 1. information of fluctuation in the same period is lost; 2. it is difficult to accurately locate the turning point in the relation curve between the waiting time and the number taking time; 3. two adjacent time periods are mutually independent, and certain correlation between the adjacent time periods is lost; 4. the features of different months and seasons, etc. are not effectively utilized. Finally, the estimated waiting time has larger error and lower stability.

Disclosure of Invention

The invention aims to provide a queuing peak period estimation method based on big data and a machine learning algorithm aiming at the defects and shortcomings of the prior art, so that the accuracy of estimated time is improved, a user can arrange a journey more reasonably, the time waste of the user is reduced, and the dining efficiency of the whole catering industry is improved.

In order to achieve the purpose, the invention adopts the following technical scheme: the queuing peak period estimation method based on big data and a machine learning algorithm specifically comprises the following steps: 1. effectively cleaning the data based on the actual dining sequence between adjacent numbers; 2. based on the washed historical data, combining with a machine learning algorithm, carrying out model training on the relations between the waiting time and the number taking time, the number taking date (the day of the week, the months, whether the holidays are saved or not) and the like to obtain the specific relations between the waiting time and the parameters, and storing the specific relations as a pre-estimation model; 3. based on a pre-estimation model, aiming at the predicted specific number taking time period and date parameter of the C-end user, pre-estimating the corresponding queuing waiting time; 4. scanning time points in the whole day normal business hours to obtain a relation curve between each number taking time in the normal business hours and the estimated queuing waiting time in one day; 5. and predicting how long the waiting time is predicted to be reached at a certain future time and how long the meal needs to be reached in advance at a certain future time based on the curve obtained in the step four.

The effective cleaning of the data in the first step specifically comprises the following methods: 1. performing pre-estimation model training for T1 based on the waiting time instead of the original record (T1 + T2) of the waiting time; 2. if the on-line historical data records do not mark the time corresponding to the T1, but only record the total time of T1+ T2, the type of historical data should be discarded, and the type of historical data does not participate in the estimation model training.

The estimation model in the third step specifically includes the following principles: 1. determining characteristic variables and target variables for training the model; 2. fitting the relationship between (X1, X2, X3, X4) and Y based on a Random Forest (Random Forest) regression model; 3. and testing the performance of the pre-estimated model based on the test set data.

The feature variables in the first principle are four time features of user number taking: point of day, day of week, months, and whether holidays are saved.

The fitting specifically comprises the following processes: 1. randomly extracting certain training data from the training data set based on the putting back rule, and randomly extracting a feature combination (such as (X1, X2, X4)) from the feature set (X1, X2, X3, X4) to form a sub-training set together; 2. establishing a Decision Tree (Decision Tree) regression model Ti based on the extracted sub-training set of (a); 3. repeating the process of (a) and (b) N times to generate N decision tree regression models T1: N; 4. and inputting all the characteristics (X1, X2, X3 and X4) of the test data into T1: N in the data to be predicted, averaging the outputs of the N decision tree regression models, and outputting the average value as a prediction result.

After the technical scheme is adopted, the invention has the beneficial effects that: the accuracy of the estimated time is improved, so that the user can arrange the self travel more reasonably, the time waste of the user is reduced, and the dining efficiency of the whole catering industry is improved.

Detailed Description

The technical scheme adopted by the specific implementation mode is as follows: the queuing peak period estimation method based on big data and a machine learning algorithm specifically comprises the following steps: 1. effectively cleaning the data based on the actual dining sequence between adjacent numbers (refer to the filed patent CCP118040081, a method and a device for estimating the dining waiting time); 2. based on the washed historical data, combining with a machine learning algorithm, carrying out model training on the relations between the waiting time and the number taking time, the number taking date (the day of the week, the months, whether the holidays are saved or not) and the like to obtain the specific relations between the waiting time and the parameters, and storing the specific relations as a pre-estimation model; 3. based on a pre-estimation model, aiming at the predicted specific number taking time period and date parameter of the C-end user, pre-estimating the corresponding queuing waiting time; 4. scanning time points in the whole-day normal business period to obtain a relation curve between each number taking time in the normal business period and the estimated queuing waiting time in one day; 5. and predicting how long the waiting time is predicted to be reached at a certain future time and how long the meal needs to be reached in advance at a certain future time based on the curve obtained in the step four.

The effective cleaning of the data in the first step specifically comprises the following steps: 1. performing pre-estimation model training for T1 based on the waiting time instead of the original record (T1 + T2) of the waiting time; 2. if the on-line historical data record does not mark the time corresponding to the T1, but only records the total time of the T1+ T2, the type of historical data should be discarded and does not participate in the estimation model training. When the user U is not in the restaurant, the user U has the condition of having a number before the number, for example, the user U takes the number An, and after waiting for the time T1, the restaurant calls the number An, but the user U is not in the restaurant at that time, and the number-passing condition occurs. User U then does not cancel the meal, but does have a meal after time T2 has elapsed. Therefore, on the online data record, the actual waiting time of the user U is T1+ T2, but the time T2 is partly due to the user U himself, not to the objective time required for queuing itself, and the time required for time queuing is T1. Therefore, the online recording waiting time T1+ T2 has great interference on the training of the estimation model and needs to be cleaned.

The estimation model in the third step specifically comprises the following principles: 1. determining characteristic variables and target variables for training the model; the above training is based on a table type of a specific restaurant, for example: a small table in a west bei zhongshan park shop. If all lines (small, medium, large, etc.) at a plurality of restaurants are pre-estimated, each table type at each restaurant needs to be trained separately. In the characteristics, the week and the months adopt a one-hot code (one-hot) coding mode. 2. Fitting the relationship between (X1, X2, X3, X4) and Y based on a Random Forest (Random Forest) regression model; 3. and testing the performance of the pre-estimated model based on the test set data. Based on the test set data { Y: the characteristic variables (X1, X2, X3, X4) in (X1, X2, X3, X4) } are calculated according to the random forest regression model T1: N trained by 2) to obtain the corresponding queuing time estimated value YHAT, and according to the error between YHAT and Y: sum (Y-YHAT) 2 is used as a performance index of the estimated model H, the smaller the error is, the better the performance of the model H is, and the error between YHAT and Y is minimized by continuously adjusting and optimizing the model parameter N, the maximum decision tree depth, the minimum extracted feature quantity and the like.

Random forests build multiple decision trees and merge them together by bagging techniques to obtain more accurate and stable predictions. One big advantage of random forests is that it can be used for both classification and regression problems, which constitute exactly what most current machine learning systems need to face.

One-Hot coding, or One-Hot coding, also known as One-bit-efficient coding, uses an N-bit state register to encode N states, each state having its own independent register bit and only One of which is active at any time. For example, six states are encoded: the natural sequence code is 000,001,010,011,100,101; the one-hot encoding is then: 000001,000010,000100,001000,010000,100000.

The decision tree is a decision analysis method which is used for solving the probability that the expected value of the net present value is greater than or equal to zero by forming the decision tree on the basis of the known occurrence probability of various conditions, evaluating the risk of the project and judging the feasibility of the project, and is a graphical method for intuitively applying probability analysis. This decision branch is called a decision tree because it is drawn to resemble a branch of a tree. In machine learning, a decision tree is a predictive model that represents a mapping between object attributes and object values.

After the technical scheme is adopted, the invention has the beneficial effects that: the accuracy of the estimated time is improved, so that the user can arrange the travel of the user more reasonably, the time waste of the user is reduced, and the dining efficiency of the whole catering industry is improved.

The above description is only for the purpose of illustrating the technical solutions of the present invention and not for the purpose of limiting the same, and other modifications or equivalent substitutions made by those skilled in the art to the technical solutions of the present invention should be covered within the scope of the claims of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A queuing peak period estimation method based on big data and a machine learning algorithm is characterized by comprising the following steps: the method specifically comprises the following steps: 1. effectively cleaning the data based on the actual dining sequence between adjacent numbers; 2. based on the washed historical data, combining with a machine learning algorithm, carrying out model training on the relation between the waiting time and the number taking date to obtain the specific relation between the waiting time and the number taking date, and storing the specific relation as a pre-estimated model; 3. based on a pre-estimation model, aiming at the predicted specific number taking time period and date parameter of the C-end user, pre-estimating the corresponding queuing waiting time; 4. scanning time points in the whole day normal business period to obtain a relation curve between each number taking time in the normal business period and the estimated queuing waiting time in one day; 5. and predicting how long the waiting time is predicted to be reached at a certain future time and how long the meal needs to be reached in advance at a certain future time based on the curve obtained in the step four.

2. The big data and machine learning algorithm based queuing peak period estimation method according to claim 1, characterized in that: the effective cleaning of the data in the first step specifically comprises the following steps: 1. performing pre-estimation model training for T1 based on the waiting time instead of the original record T1+ T2 of the waiting time; 2. if the on-line historical data record does not mark the time corresponding to the T1, but only records the total time of the T1+ T2, the historical data of the type is discarded and does not participate in the estimation model training.

3. The big data and machine learning algorithm-based queuing peak hour estimation method according to claim 1, wherein: the estimation model in the third step specifically comprises the following principles: 1. determining characteristic variables and target variables for training the model; 2. fitting the relation between the feature set and Y based on a Random Forest (Random Forest) regression model; 3. and testing the performance of the pre-estimated model based on the test set data, wherein Y is the queuing time in the test set data.

4. The big data and machine learning algorithm based queuing peak period estimation method according to claim 3, characterized in that: the feature variables in the first principle are four time features of user number taking: the point of the day, day of the week, months and whether holidays are reserved.

5. The big data and machine learning algorithm-based queuing peak hour estimation method according to claim 3, wherein: the fitting specifically comprises the following processes: (a) Randomly extracting certain training data from the training data set based on the putting-back rule, and randomly extracting a feature combination from the feature set to form a sub-training set together; (b) Establishing a Decision Tree (Decision Tree) regression model Ti based on the extracted sub-training set in the step (a); 3. repeating the process of (a) and (b) N times to generate N decision tree regression models T1: N; 4. and inputting all the characteristics of the test data into T1: N according to the data to be predicted, averaging the output of the N decision tree regression models, and outputting the average value as a prediction result.