CN109325808A - Demand for commodity prediction based on Spark big data platform divides storehouse planing method with logistics - Google Patents

Demand for commodity prediction based on Spark big data platform divides storehouse planing method with logistics Download PDF

Info

Publication number
CN109325808A
CN109325808A CN201811133491.1A CN201811133491A CN109325808A CN 109325808 A CN109325808 A CN 109325808A CN 201811133491 A CN201811133491 A CN 201811133491A CN 109325808 A CN109325808 A CN 109325808A
Authority
CN
China
Prior art keywords
commodity
feature
window
data
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201811133491.1A
Other languages
Chinese (zh)
Inventor
舒海东
胡峰
雷大江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shu Haidong
Original Assignee
Chongqing Zhiwanjia Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Zhiwanjia Technology Co Ltd filed Critical Chongqing Zhiwanjia Technology Co Ltd
Priority to CN201811133491.1A priority Critical patent/CN109325808A/en
Publication of CN109325808A publication Critical patent/CN109325808A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06313Resource planning in a project environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Finance (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Game Theory and Decision Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Educational Administration (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention provides the demand for commodity predictions based on Spark big data platform to divide storehouse planing method with logistics, include the following steps: Q1, data prediction, Q2, feature construction, Q3, characteristic value selection, Q4, model selection, Q5, model prediction result is merged with regular prediction result, fusion coefficients are 0.75model+0.25rule, smart home demand for commodity is predicted the present invention is based on Spark big data platform and divides storehouse planing method, it can effectively help smart home businessman that operation cost is greatly reduced, reduction is received timeliness, promote the experience of user, it is more in line with the practical commercial scene of data volume rapid growth.

Description

Demand for commodity prediction based on Spark big data platform divides storehouse planing method with logistics
Technical field
The present invention relates to big data analysis applied technical fields, are related in e-commerce, produce more particularly, to smart home In product e-commerce, it is used based on Spark big data platform with the planning of logistics point storehouse to meet the prediction of electric business demand for commodity Demand for commodity prediction divides storehouse planing method with logistics.
Background technique
With the fast development of science and technology, internet has brought various convenient services, and e-commerce is more next More complicated, smart home is the embodiment of the instrumentation under the influence of internet, and smart home will be in family by technology of Internet of things Various equipment (such as audio & video equipment, lighting system, curtain control, airconditioning control, security system, Digital Theater Systems, audio-visual clothes Business device, shadow cabinet system, network home appliance etc.) it connects together, home wiring control, Lighting control, remote control using telephone, indoor and outdoor are provided The multiple functions such as remote control, burglar alarm, environmental monitoring, HVAC control, infrared forwarding and programmable Timer control and means.With Common household is compared, and smart home not only has traditional inhabitation function, has both building, network communication, information household appliances, equipment certainly Dynamicization provides comprehensive information exchange function, and even various energy expenditures save fund.
For smart home E-commerce market, time limit and two key factors that price is that user considers, one As in the case of, the promotion in time limit and the reduction of price are constantly present rigid outer limit, and therefore, it is necessary to seek in deeper time Solution is sought, a point storehouse stock service refers to by smart home businessman according to sales forecast, is got ready the goods in advance to warehouse, realizes nearest It is dispensed in delivery, area, smart home businessman also can easily possess the object of a line electric business of cost 10,000,000,000 easily without self-built warehouse Fluid system realizes very fast be sent to.With adding fuel to the flames for various red-letter days and electric business platform, the various forms online shopping such as quick-fried money, special selling Advertising campaign will become normality.If smart home businessman is delivered with traditional single storehouse mode, it is difficult to avoid transprovincially that outbox amount is big, object It flows at high cost, the problems such as delivering and sending time long, customer complaint with charge free, therefore the product that standardization level is higher, inventory's depth is deeper Board should consider that a point storehouse is got ready the goods in advance.
Promote whenever big, consumer is most concerned with when express delivery is sent to.Most effective method be by big data and Algorithm allows cargo to be placed directly on the warehouse nearest from consumer;Businessman can be helped substantially to drop with the supply chain that big data drives Low running cost promotes the experience of user, plays an important role to the improved efficiency of entire smart home electric business industry.High quality Smart home demand for commodity prediction be supply chain management basis and core function.Realizing the demand for commodity prediction of high quality is It is more stepped towards intelligentized supply platform chain direction further.Therefore how to realize more accurate requirement forecasting, make cargo direct Be put into the warehouse nearest from consumer, at the same again can greatly optimizing management cost, be it is essential that being to be badly in need of solving The problem of.
Spark is that the distributive parallel computation framework for providing support DAG figure rapidly and efficiently memory-based is divided Cloth big data handles frame.The data source or file system storing data supported using Hadoop, including HDFS, HBase, Hive, Cassandra etc..Both an individual server can be deployed in or be deployed in as Mesos or YARN On distributed resource management frame.And Scala is provided, the API of tri- kinds of programming languages of Java and Python.It utilizes The API that Spark is provided, developer can create the application based on Spark with the api interface of standard.
RDD (elasticity distribution formula data set) is a kind of abstract data type, is the form of expression of the data in Spark, It is module and class most crucial in Spark, and design essence place.You, which can regard it as one, the big of fault tolerant mechanism Set, Spark provide Persist mechanism and are cached in memory, facilitate interative computation and are used for multiple times.RDD is subregion Record, same district can be distributed in different physical machines, preferably support parallel computation.RDD is it there are one characteristic Be it is elastic, during job run, when the memory of machine overflows, RDD can be interacted with hard disc data, although Efficiency can be reduced, but can guarantee the normal operation of operation.Two kinds of operations: conversion and movement can be carried out on RDD.
Conversion: existing RDD is converted by a new RDD by a series of function operation, i.e. return value is still RDD, and RDD can be converted constantly.Since RDD is distributed storage, so entire conversion process is also to carry out parallel 's.Common conversion higher-order function such as map, flatMap, reduceByKey etc..
Movement: return value is not a RDD.It can be the ordinary set or a value of a Scala, or It is sky, finally or returns to Driver program, or RDD is written in file system.Such as reduce, saveAsTextFile With the functions such as collect.
For Spark Application when encountering action operation, SparkContext can generate Job, and by each Job points are different stage, and (each Job can be split many group Task, and every group task is referred to as Stage, can also claim TaskSet.Task is the working cell being sent on some Executor).Each Spark Application obtains exclusive Executor, the corresponding JVM process of Executor, which is responsible for running Task.From difference The Task of Application is operated in different JVM processes.The process is resident always during Application, and with Multithreading runs Tasks.Each node can play one or more Executor;Each Executor by several Core, Memery composition, each Core of each Executor can only once execute a Task;The result that each Task is executed is exactly Generate a Partiton of target RDD, each Executor core of the concurrency that Task is performed=Executor number * Number.
Summary of the invention
Aiming at the problem that above-mentioned background technique is illustrated, it is an object of the present invention to provide the quotient based on Spark big data platform Product requirement forecasting and logistics divide storehouse planing method, solve existing smart home demand for commodity prediction and information in the planning of logistics point storehouse Low efficiency, problem at high cost can effectively help smart home businessman that operation cost is greatly reduced, and reduce timeliness of receiving, mention Rise the experience of user.
In order to achieve the above object, the invention provides the following technical scheme:
Demand for commodity prediction based on Spark big data platform divides storehouse planing method with logistics, it is characterised in that: including such as Lower step:
Q1, data prediction obtain associated data files, including commodity granularity correlated characteristic dependent merchandise from database User behavior characteristics, commodity and the region Fen Cang granularity correlated characteristic, the region Fen Cang benefit few to mend the relevant informations such as more costs right Afterwards, by database because recent restocking or undercarriage cause the commodity of no sales volume record to carry out filling out 0 processing, to guarantee that data connect Continuous property;
That is: creation is by SparkContext object, then with its textFile (URL) function creation distributed data collection RDD, wherein data include smart home commodity granularity correlated characteristic, including ID, classification, brand, date, price, correlation in RDD Commodity user behavior characteristics include browsing number plus shopping cart number, buy number, flow, commodity and the region Fen Cang granularity phase Closing information, the distributed data collection for creating completion such as the cost of feature, the benefit in the region Fen Cang less, more than benefit can be grasped parallel Make;Secondly, calling mapPartitions operator general<feature 1, feature 2 ..., feature m>form sample will be because recent using 0 Restocking or undercarriage lead to the relevant field polishing of the commodity of no sales volume record, to guarantee data continuity;It calls ZipWithIndex operator, does a label to each sample, and the RDD of creation is converted to < label, commodity ID, warehouse Code, feature 1, feature 2 ..., feature m > form finally call Filter operator according to the commodity transaction date by entire data set It is divided into test set TestRDD and training set TrainRDD, and calls Persist operator will be in obtained TrainRDD persistence In depositing;
Q2, feature construction call mapPartitions operator to TrainRDD using slip window sampling construction feature, Corresponding statistical function is write, the relevant information structure of sample on each Partition (window) in different time period is counted Build corresponding feature, select after each specific time point N days as a window, each commodity of the window, warehouse inventory Total sales volume slides M window as label characteristic value, was used as a window to N days before specific time point, carries out feature structure It builds, count N days various classification characteristic values before the window and value sum and average value avg, friendship of the statistics commodity at nearest N days Easily several characteristic values, including maximum value, minimum value, standard deviation, count its classification id in the characteristic value of nearest N days numbers of deals, Including maximum value, minimum value, standard deviation and rank value, accounting, latter N days total sales volumes slide M window, warp as label It crosses the transformation of a series of data and the TrainRDD of creation is converted to < label, commodity ID, warehouse code, feature 1, feature 2 ..., Feature m, feature m+1 ..., feature n, label > form;
Q3, characteristic value selection select the feature of ranking topk using xgboost, calculate similarity, remove redundancy feature, The associated eigenvalue for selecting construction feature in Q2, then goes to learn, most obtains the importance ranking of feature with xgboost model, It chooses topk important feature and calculates similarity, weed out those unessential features;
Q4, model selection, the multiple regression models of training, first to TrainRDD using LR, SVR, RF, GBRT, Algorithm and the multiple regression models of third party's distributed learning algorithm training in XGBOOSTSpark Mllib machine learning library, will Each model prediction call by result union operator is had the prediction result of model to be defined as model_RDD more;Secondly, calling GroupBy operator is polymerize according to commodity ID.Map operator is finally called, if the benefit of some (commodity ID, warehouse code) Few cost, which is greater than, mends more costs, then it is intended that prediction is more, therefore take maximum value in single model prediction result multiplied by 1.1, on the contrary take the minimum value in single model prediction result multiplied by 0.9, < commodity will be obtained by the transformation of a series of data ID, warehouse code divide storehouse regional aim inventory > form model learning result model in future time section;
Q5, model prediction result are merged with regular prediction result, fusion coefficients 0.75model+0.25rule, Wherein rule rule learning is defined as: N days sales volumes are denoted as day1, day2 ... dayN respectively before prediction window, to each quotient Product mend more costs if mending few cost and being greater than, are predicted as N*max (day1, day2 ... dayN), otherwise are predicted as N* min TestRDD is finally converted to < commodity ID by (day1, day2 ... dayN), warehouse code, the region Fen Cang in future time section Base stock >, it is defined as rule_RDD, will finally obtain each commodity certain following period base stock in the region Fen Cang.
Smart home demand for commodity is predicted the present invention is based on Spark big data platform and divides storehouse planing method, Neng Gouyou Effect helps smart home businessman that operation cost is greatly reduced, and reduces timeliness of receiving, promotes the experience of user, be more in line with data volume The practical commercial scene of rapid growth.
Detailed description of the invention
Flow diagram Fig. 1 of the invention.
Feature Engineering stage RDD variation diagram Fig. 2 of the invention.
Specific embodiment
Below in conjunction with drawings and examples of the invention, technical solution of the present invention is clearly and completely described, Obviously, described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Based in the present invention Embodiment, every other embodiment obtained by those of ordinary skill in the art without making creative efforts, all Belong to the scope of protection of the invention.
According to Fig. 1 and 2, the present invention saves as embodiment with the e-commerce product and logistic warehouse of smart home product, Illustrate demand for commodity prediction and logistics point storehouse planing method based on Spark big data platform, includes the following steps:
Q1, data preprocessing phase:
Related data information is obtained from related smart home product database, and multilist data are carried out after integrating conveniently It is continuous to use, then carry out pretreatment operation, by data record because recent restocking or undercarriage lead to the quotient of no sales volume record Product carry out filling out 0 processing, to guarantee data continuity;Secondly it chooses whether that operation is normalized as needed.Finally, according to wanting Predicted time length divides data, will integrally be divided into training set, verifying collection and test set.
Creation is by SparkContext object, then with its textFile (URL) function creation distributed data collection RDD, wherein data include smart home commodity granularity correlated characteristic, including ID, classification, brand, date, price, correlation in RDD Commodity user behavior characteristics include browsing number plus shopping cart number, buy number, flow, commodity and the region Fen Cang granularity phase Closing information, the distributed data collection for creating completion such as the cost of feature, the benefit in the region Fen Cang less, more than benefit can be grasped parallel Make;Secondly, calling mapPartitions operator general<feature 1, feature 2 ..., feature m>form sample will be because recent using 0 Restocking or undercarriage lead to the relevant field polishing of the commodity of no sales volume record, to guarantee data continuity;It calls ZipWithIndex operator, does a label to each sample, and the RDD of creation is converted to < label, commodity ID, warehouse Code, feature 1, feature 2 ..., feature m > form finally call Filter operator according to the commodity transaction date by entire data set It is divided into test set TestRDD and training set TrainRDD, and calls Persist operator will be in obtained TrainRDD persistence In depositing.
Wherein data structure is as shown in the following table 1, table 2:
1 commodity granularity correlated characteristic of table
The cost of the benefit in 2 region commodity Fen Cang of table less, more than benefit
Field Type Meaning Example
item_id bigint Commodity ID 333442
store_code String Warehouse CODE 1
money_a String Commodity benefit mends more cost less 10.44
money_b String Commodity benefit mends more cost less 20.88
Q2, feature construction:
Using slip window sampling construction feature, a window is used as within N days after choosing each specific time point, the window is each A commodity, the total sales volume of warehouse inventory slide M window as characteristic value label, were used as one to N days before specific time point A window carries out feature construction:
/ sum and avg of N days various classification features are counted 1/2/3/5/7/9/ before the window ..., before counting the window N days various classification characteristic values and value sum and average value avg, statistics commodity nearest N days numbers of deals characteristic value, wrap Maximum value, minimum value, standard deviation are included, counts its classification id in the characteristic value of nearest N days numbers of deals, including maximum value, minimum Value, standard deviation and rank value, accounting and meet multinomial cross feature value.
It selects to be trained in 13 Time of Day section in December 10 to 2015 years July in 2015, slides 11 windows, length of window Progress feature extraction in 14 days two weeks is selected ,/14 days various classification characteristic values that feature includes 1/2/3/5/7/9/ before the window ... And value sum and average value avg, count commodity in the characteristic value of nearest 14 days numbers of deals, including maximum value, minimum value, mark It is quasi- poor, its classification id is counted in the characteristic value of nearest 14 days numbers of deals, including maximum value, minimum value, standard deviation and ranking Value, accounting and meets multinomial cross feature value.Each length of window last day counts the total part of sale in 14 days backward Number summation is used as characteristic value label.It is shown in Table 3 sliding window explanations
3 sliding window date of table explanation
Q3, characteristic value selection:
It is converted based on Spark into type data, using the feature of xgb selection ranking topk, calculates similarity, removal redundancy is special Sign.Concrete operations are as follows: call the distributed version of xgboost to n feature calculation importance of input TrainRDD.Then SortBy algorithm and Filter operator is called to choose topk importance characteristic, TrainRDD is converted to < label, commodity at this time ID, warehouse code, feature x1, feature x2 ..., feature xk, label > form.Finally mapPartitions operator is called to calculate Pearson correlation coefficient between feature rejects redundancy feature according to the similarity size between feature.And to calling Persist operator It will be in obtained TrainRDD persistence memory.Such as: 400 correlated characteristics are constructed, then xgboost model are selected to go Study, then can export the important coefficient of each feature, we select tok40 here, that is, importance has been selected to come preceding 40 Feature;However there may be redundancies for this 40 features.Therefore by calculating the similarity between feature, common similarity calculation side Method includes Pearson correlation coefficients, cosine similarity etc..Such as feature 1 is up to 10 similarity of feature in this 40 features 0.999, then removal feature 1 or feature 10 may be selected, only retain one of feature.Specifically remove that also with consider its with The relationship of other features.
Q4, model selection:
TrainRDD is successively called in the Spark Mllib machine learning such as LR, SVR, RF, GBRT, XGBOOST library Algorithm and the multiple regression models of third party's distributed learning algorithm training, and by each model prediction call by result union operator There is the prediction result of model to be defined as model_RDD more.Then GroupBy operator is called to be polymerize according to commodity ID. Finally call Map operator, if the benefit of some (commodity ID, warehouse code) lack cost and is greater than the more costs of benefit, then it is intended that It predicts more, therefore takes maximum value in single model prediction result multiplied by 1.1, otherwise take in single model prediction result most Small value is multiplied by 0.9.< commodity ID will be obtained by converting by a series of data, warehouse code, the region Fen Cang in future time section Base stock > form model learning result.
Each model predication value exemplary graph of table 4
Commodity Position in storehouse LR SVR RF GBRT XGBOOST
b 0002 30 45 54 100 10
c 0003 40 60 70 20 10
If it is 10 yuan that smart home commodity b, which mends more costs, mending few cost is 100 yuan, then the smart home commodity are 0002 Prediction result value in warehouse is 100*1.1=110;
If it is 80 yuan that smart home commodity c, which mends more costs, mending few cost is 40 yuan, then the smart home commodity are in 0003 storehouse Prediction result value in library is 10*0.9=9;
Q5, model prediction result are merged with regular prediction result:
N days sales volumes are denoted as day1, day2 ... dayN respectively before prediction window, to each commodity, if it is big to mend few cost In mending more costs, then it is predicted as N*max (day1, day2 ... dayN), otherwise is predicted as N*min (day1, day2 ... dayN);
Such as: the sales volume of prediction window the last fortnight is denoted as sale1, sale2 respectively, to each (commodity, position in storehouse), if mended Few cost, which is greater than, mends more costs, then is predicted as 2* max (sale1, sale2), otherwise is predicted as 2*min (sale1, sale2).
Fusion forecasting is as a result, model prediction result is merged with regular prediction result, fusion coefficients 0.75model +0.25rule;It is merged between model as shown in Figure 2, Model Fusion result is M1, is then merged with rule, fusion knot Fruit is M2, and using model M 2, the historical data as shown in table 1, table 2 can according to the historical data of different smart home products In Spark big data platform, their following quantities in stock as shown in table 4 needed for the warehouse of prediction, compare table 4 and Speech, table 4 are the output of single model as a result, M2 is the fusion results of multi-model and rule, and effect can be more preferably.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain Lid is within protection scope of the present invention.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (1)

1. the demand for commodity prediction based on Spark big data platform divides storehouse planing method with logistics, it is characterised in that: including as follows Step:
Q1, data prediction obtain associated data files, including commodity granularity correlated characteristic dependent merchandise user from database Then behavioural characteristic, commodity and the region Fen Cang granularity correlated characteristic, mending for the region Fen Cang mend the relevant informations such as more costs less, will Because recent restocking or undercarriage cause the commodity of no sales volume record to carry out filling out 0 processing in database, to guarantee data continuity;
That is: creation is by SparkContext object, then with its textFile (URL) function creation distributed data collection RDD, Wherein data include smart home commodity granularity correlated characteristic, including ID, classification, brand, date, price, dependent merchandise in RDD User behavior characteristics include browsing number plus shopping cart number, buy number, and flow, commodity are related to the region Fen Cang granularity special Information, the distributed data collection for creating completion such as the cost of sign, the benefit in the region Fen Cang less, more than benefit can be operated in parallel;Its It is secondary, call mapPartitions operator general<feature 1, feature 2 ..., feature m>form sample use 0 will because of recent restocking or Person's undercarriage leads to the relevant field polishing of the commodity of no sales volume record, to guarantee data continuity;It calls ZipWithIndex operator, does a label to each sample, and the RDD of creation is converted to < label, commodity ID, warehouse Code, feature 1, feature 2 ..., feature m > form finally call Filter operator according to the commodity transaction date by entire data set It is divided into test set TestRDD and training set TrainRDD, and calls Persist operator will be in obtained TrainRDD persistence In depositing;
Q2, feature construction are called mapPartitions operator to TrainRDD, are write phase using slip window sampling construction feature The statistical function answered, the relevant information building for counting sample on each Partition (window) in different time period are corresponding Feature, select after each specific time point N days as a window, each commodity of the window, the total sales volume of warehouse inventory As label characteristic value, M window is slided, was used as a window to N days before specific time point, carries out feature construction, statistics Before the window N days various classification characteristic values and value sum and average value avg, count commodity nearest N days numbers of deals spy Value indicative, including maximum value, minimum value, standard deviation count its classification id in the characteristic value of nearest N days numbers of deals, including maximum Value, minimum value, standard deviation and rank value, accounting, latter N days total sales volumes slide M window, process is a series of as label Data transformation the TrainRDD of creation is converted to < label, commodity ID, warehouse code, feature 1, feature 2 ..., feature m is special Levy m+1 ..., feature n, label > form;
Q3, characteristic value selection select the feature of ranking topk using xgboost, calculate similarity, remove redundancy feature, selection The associated eigenvalue of construction feature in Q2, then goes to learn with xgboost model, most obtains the importance ranking of feature, chooses Topk important feature calculates similarity, weeds out those unessential features;
Q4, model selection, the multiple regression models of training, first to TrainRDD using LR, SVR, RF, GBRT, Algorithm and the multiple regression models of third party's distributed learning algorithm training in XGBOOSTSpark Mllib machine learning library, will Each model prediction call by result union operator is had the prediction result of model to be defined as model_RDD more;Secondly, calling GroupBy operator is polymerize according to commodity ID.Map operator is finally called, if the benefit of some (commodity ID, warehouse code) is few Cost, which is greater than, mends more costs, then it is intended that prediction is more, therefore take maximum value in single model prediction result multiplied by 1.1, on the contrary take the minimum value in single model prediction result multiplied by 0.9, < commodity will be obtained by the transformation of a series of data ID, warehouse code divide storehouse regional aim inventory > form model learning result model in future time section;
Q5, model prediction result are merged with regular prediction result, fusion coefficients 0.75model+0.25rule, wherein Rule rule learning is defined as: N days sales volumes are denoted as day1, day2 ... dayN respectively before prediction window, to each commodity, such as Fruit, which mends few cost and is greater than, mends more costs, then is predicted as N*max (day1, day2 ... dayN), on the contrary be predicted as N*min (day1, Day2 ... dayN), TestRDD is finally converted to < commodity ID, warehouse code divides storehouse regional aim inventory in future time section >, it is defined as rule_RDD, will finally obtain each commodity certain following period base stock in the region Fen Cang.
CN201811133491.1A 2018-09-27 2018-09-27 Demand for commodity prediction based on Spark big data platform divides storehouse planing method with logistics Withdrawn CN109325808A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811133491.1A CN109325808A (en) 2018-09-27 2018-09-27 Demand for commodity prediction based on Spark big data platform divides storehouse planing method with logistics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811133491.1A CN109325808A (en) 2018-09-27 2018-09-27 Demand for commodity prediction based on Spark big data platform divides storehouse planing method with logistics

Publications (1)

Publication Number Publication Date
CN109325808A true CN109325808A (en) 2019-02-12

Family

ID=65266412

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811133491.1A Withdrawn CN109325808A (en) 2018-09-27 2018-09-27 Demand for commodity prediction based on Spark big data platform divides storehouse planing method with logistics

Country Status (1)

Country Link
CN (1) CN109325808A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110688623A (en) * 2019-09-29 2020-01-14 深圳乐信软件技术有限公司 Training optimization method, device, equipment and storage medium of high-order LR model
CN110956272A (en) * 2019-11-01 2020-04-03 第四范式(北京)技术有限公司 Method and system for realizing data processing
CN111190110A (en) * 2020-01-13 2020-05-22 南京邮电大学 Lithium ion battery SOC online estimation method comprehensively considering internal and external influence factors
CN111768139A (en) * 2019-06-27 2020-10-13 北京沃东天骏信息技术有限公司 Stock processing method, apparatus, device and storage medium
CN112100182A (en) * 2020-09-27 2020-12-18 中国建设银行股份有限公司 Data warehousing processing method and device and server
CN112308665A (en) * 2020-10-26 2021-02-02 福建菩泰网络科技有限公司 Goods distribution method and system for online shopping mall
CN112597213A (en) * 2020-12-24 2021-04-02 第四范式(北京)技术有限公司 Batch request processing method and device for feature calculation, electronic equipment and storage medium
CN113642958A (en) * 2021-08-05 2021-11-12 大唐互联科技(武汉)有限公司 Warehouse replenishment method, device, equipment and storage medium based on big data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599935A (en) * 2016-12-29 2017-04-26 重庆邮电大学 Three-decision unbalanced data oversampling method based on Spark big data platform
CN107122928A (en) * 2016-02-24 2017-09-01 阿里巴巴集团控股有限公司 A kind of supply chain Resource Requirement Planning collocation method and device
CN108399457A (en) * 2018-02-02 2018-08-14 西安电子科技大学 There are the Boosting improved methods converted based on multistep label under inclined data in integrated study
CN109582706A (en) * 2018-11-14 2019-04-05 重庆邮电大学 The neighborhood density imbalance data mixing method of sampling based on Spark big data platform

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122928A (en) * 2016-02-24 2017-09-01 阿里巴巴集团控股有限公司 A kind of supply chain Resource Requirement Planning collocation method and device
CN106599935A (en) * 2016-12-29 2017-04-26 重庆邮电大学 Three-decision unbalanced data oversampling method based on Spark big data platform
CN108399457A (en) * 2018-02-02 2018-08-14 西安电子科技大学 There are the Boosting improved methods converted based on multistep label under inclined data in integrated study
CN109582706A (en) * 2018-11-14 2019-04-05 重庆邮电大学 The neighborhood density imbalance data mixing method of sampling based on Spark big data platform

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LIMEIYANG: "菜鸟-需求预测与分仓规划解决", 《HTTPS://GITHUB.COM/LIMEIYANG/CAINIAO》 *
菜鸟网络: "菜鸟-需求预测与分仓规划:赛题与数据", 《HTTPS://TIANCHI.ALIYUN.COM/COMPETITION/ENTRANCE/231530/INFORMATION?FROM=OLDURL》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111768139A (en) * 2019-06-27 2020-10-13 北京沃东天骏信息技术有限公司 Stock processing method, apparatus, device and storage medium
CN110688623A (en) * 2019-09-29 2020-01-14 深圳乐信软件技术有限公司 Training optimization method, device, equipment and storage medium of high-order LR model
CN110688623B (en) * 2019-09-29 2023-12-26 深圳乐信软件技术有限公司 Training optimization method, device, equipment and storage medium for high-order LR model
CN110956272A (en) * 2019-11-01 2020-04-03 第四范式(北京)技术有限公司 Method and system for realizing data processing
CN110956272B (en) * 2019-11-01 2023-08-08 第四范式(北京)技术有限公司 Method and system for realizing data processing
CN111190110A (en) * 2020-01-13 2020-05-22 南京邮电大学 Lithium ion battery SOC online estimation method comprehensively considering internal and external influence factors
CN112100182A (en) * 2020-09-27 2020-12-18 中国建设银行股份有限公司 Data warehousing processing method and device and server
CN112308665A (en) * 2020-10-26 2021-02-02 福建菩泰网络科技有限公司 Goods distribution method and system for online shopping mall
CN112597213A (en) * 2020-12-24 2021-04-02 第四范式(北京)技术有限公司 Batch request processing method and device for feature calculation, electronic equipment and storage medium
CN112597213B (en) * 2020-12-24 2023-11-10 第四范式(北京)技术有限公司 Batch request processing method and device for feature calculation, electronic equipment and storage medium
CN113642958A (en) * 2021-08-05 2021-11-12 大唐互联科技(武汉)有限公司 Warehouse replenishment method, device, equipment and storage medium based on big data
CN113642958B (en) * 2021-08-05 2024-06-04 大唐互联科技(武汉)有限公司 Warehouse replenishment method, device, equipment and storage medium based on big data

Similar Documents

Publication Publication Date Title
CN109325808A (en) Demand for commodity prediction based on Spark big data platform divides storehouse planing method with logistics
Hofmann et al. Big data analytics and demand forecasting in supply chains: a conceptual analysis
Kim et al. Optimal inventory control in a multi-period newsvendor problem with non-stationary demand
Vahdani et al. A hybrid multi-stage predictive model for supply chain network collapse recovery analysis: a practical framework for effective supply chain network continuity management
CA3235875A1 (en) Method and system for generation of at least one output analytic for a promotion
CN101783004A (en) Fast intelligent commodity recommendation system
JP2018533807A (en) System and method for providing a multi-channel inventory allocation approach to retailers
US10528903B2 (en) Computerized promotion and markdown price scheduling
CN109961198B (en) Associated information generation method and device
CN110109901B (en) Method and device for screening target object
US20160034952A1 (en) Control apparatus and accelerating method
CN109214587A (en) A kind of demand for commodity prediction based on three decisions divides storehouse planing method with logistics
Harsoor et al. Forecast of sales of Walmart store using big data applications
CN108777701A (en) A kind of method and device of determining receiver
CN109558992A (en) Based on sale peak value prediction technique, device, equipment and the storage medium from the machine of dealer
CN112036631B (en) Purchasing quantity determining method, purchasing quantity determining device, purchasing quantity determining equipment and storage medium
Behera et al. Grid search optimization (GSO) based future sales prediction for big mart
CN112365283A (en) Coupon issuing method, device, terminal equipment and storage medium
CA3131040A1 (en) Method and system for optimizing an objective having discrete constraints
CN113763035A (en) Advertisement delivery effect prediction method and device, computer equipment and storage medium
CN109190027A (en) Multi-source recommended method, terminal, server, computer equipment, readable medium
CN108629467B (en) Sample information processing method and system
US20210312259A1 (en) Systems and methods for automatic product usage model training and prediction
CN111353794A (en) Data processing method, supply chain scheduling method and device
Polder et al. Complementarities between Information Technologies and Innovation Modes in the Adoption and Outcome Stage: A MicroEconometric Analysis for the Netherlands. CAED conference

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20190415

Address after: 400000 Chongqing Nanan District Photoelectric Road 26 10 7-1

Applicant after: Shu Haidong

Address before: 400010 9 buildings 13-8, 168 Caiyuan Road, Yuzhong District, Chongqing

Applicant before: Chongqing Zhiwanjia Technology Co., Ltd.

TA01 Transfer of patent application right
WW01 Invention patent application withdrawn after publication

Application publication date: 20190212

WW01 Invention patent application withdrawn after publication