CN109325808A - Demand for commodity prediction based on Spark big data platform divides storehouse planing method with logistics - Google Patents
Demand for commodity prediction based on Spark big data platform divides storehouse planing method with logistics Download PDFInfo
- Publication number
- CN109325808A CN109325808A CN201811133491.1A CN201811133491A CN109325808A CN 109325808 A CN109325808 A CN 109325808A CN 201811133491 A CN201811133491 A CN 201811133491A CN 109325808 A CN109325808 A CN 109325808A
- Authority
- CN
- China
- Prior art keywords
- commodity
- feature
- window
- data
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 19
- 238000010276 construction Methods 0.000 claims abstract description 11
- 230000004927 fusion Effects 0.000 claims abstract description 8
- 230000008901 benefit Effects 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 12
- 230000002596 correlated effect Effects 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 9
- 101100481876 Danio rerio pbk gene Proteins 0.000 claims description 6
- 101100481878 Mus musculus Pbk gene Proteins 0.000 claims description 6
- 238000013480 data collection Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 5
- 230000006399 behavior Effects 0.000 claims description 4
- 230000000875 corresponding effect Effects 0.000 claims description 4
- 230000002688 persistence Effects 0.000 claims description 4
- 238000012360 testing method Methods 0.000 claims description 4
- 230000001419 dependent effect Effects 0.000 claims description 3
- 238000000151 deposition Methods 0.000 claims description 3
- 238000010801 machine learning Methods 0.000 claims description 3
- 238000005498 polishing Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 235000013399 edible fruits Nutrition 0.000 claims description 2
- 238000005192 partition Methods 0.000 claims description 2
- 241000196324 Embryophyta Species 0.000 claims 1
- 230000003542 behavioural effect Effects 0.000 claims 1
- 238000013501 data transformation Methods 0.000 claims 1
- 230000009467 reduction Effects 0.000 abstract description 2
- 238000006243 chemical reaction Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000013439 planning Methods 0.000 description 2
- ZLIBICFPKPWGIZ-UHFFFAOYSA-N pyrimethanil Chemical compound CC1=CC(C)=NC(NC=2C=CC=CC=2)=N1 ZLIBICFPKPWGIZ-UHFFFAOYSA-N 0.000 description 2
- 238000012384 transportation and delivery Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000004378 air conditioning Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 239000000446 fuel Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000013068 supply chain management Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0202—Market predictions or forecasting for commercial activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06313—Resource planning in a project environment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/08—Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Marketing (AREA)
- Finance (AREA)
- Data Mining & Analysis (AREA)
- General Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Game Theory and Decision Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biodiversity & Conservation Biology (AREA)
- Educational Administration (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention provides the demand for commodity predictions based on Spark big data platform to divide storehouse planing method with logistics, include the following steps: Q1, data prediction, Q2, feature construction, Q3, characteristic value selection, Q4, model selection, Q5, model prediction result is merged with regular prediction result, fusion coefficients are 0.75model+0.25rule, smart home demand for commodity is predicted the present invention is based on Spark big data platform and divides storehouse planing method, it can effectively help smart home businessman that operation cost is greatly reduced, reduction is received timeliness, promote the experience of user, it is more in line with the practical commercial scene of data volume rapid growth.
Description
Technical field
The present invention relates to big data analysis applied technical fields, are related in e-commerce, produce more particularly, to smart home
In product e-commerce, it is used based on Spark big data platform with the planning of logistics point storehouse to meet the prediction of electric business demand for commodity
Demand for commodity prediction divides storehouse planing method with logistics.
Background technique
With the fast development of science and technology, internet has brought various convenient services, and e-commerce is more next
More complicated, smart home is the embodiment of the instrumentation under the influence of internet, and smart home will be in family by technology of Internet of things
Various equipment (such as audio & video equipment, lighting system, curtain control, airconditioning control, security system, Digital Theater Systems, audio-visual clothes
Business device, shadow cabinet system, network home appliance etc.) it connects together, home wiring control, Lighting control, remote control using telephone, indoor and outdoor are provided
The multiple functions such as remote control, burglar alarm, environmental monitoring, HVAC control, infrared forwarding and programmable Timer control and means.With
Common household is compared, and smart home not only has traditional inhabitation function, has both building, network communication, information household appliances, equipment certainly
Dynamicization provides comprehensive information exchange function, and even various energy expenditures save fund.
For smart home E-commerce market, time limit and two key factors that price is that user considers, one
As in the case of, the promotion in time limit and the reduction of price are constantly present rigid outer limit, and therefore, it is necessary to seek in deeper time
Solution is sought, a point storehouse stock service refers to by smart home businessman according to sales forecast, is got ready the goods in advance to warehouse, realizes nearest
It is dispensed in delivery, area, smart home businessman also can easily possess the object of a line electric business of cost 10,000,000,000 easily without self-built warehouse
Fluid system realizes very fast be sent to.With adding fuel to the flames for various red-letter days and electric business platform, the various forms online shopping such as quick-fried money, special selling
Advertising campaign will become normality.If smart home businessman is delivered with traditional single storehouse mode, it is difficult to avoid transprovincially that outbox amount is big, object
It flows at high cost, the problems such as delivering and sending time long, customer complaint with charge free, therefore the product that standardization level is higher, inventory's depth is deeper
Board should consider that a point storehouse is got ready the goods in advance.
Promote whenever big, consumer is most concerned with when express delivery is sent to.Most effective method be by big data and
Algorithm allows cargo to be placed directly on the warehouse nearest from consumer;Businessman can be helped substantially to drop with the supply chain that big data drives
Low running cost promotes the experience of user, plays an important role to the improved efficiency of entire smart home electric business industry.High quality
Smart home demand for commodity prediction be supply chain management basis and core function.Realizing the demand for commodity prediction of high quality is
It is more stepped towards intelligentized supply platform chain direction further.Therefore how to realize more accurate requirement forecasting, make cargo direct
Be put into the warehouse nearest from consumer, at the same again can greatly optimizing management cost, be it is essential that being to be badly in need of solving
The problem of.
Spark is that the distributive parallel computation framework for providing support DAG figure rapidly and efficiently memory-based is divided
Cloth big data handles frame.The data source or file system storing data supported using Hadoop, including HDFS, HBase,
Hive, Cassandra etc..Both an individual server can be deployed in or be deployed in as Mesos or YARN
On distributed resource management frame.And Scala is provided, the API of tri- kinds of programming languages of Java and Python.It utilizes
The API that Spark is provided, developer can create the application based on Spark with the api interface of standard.
RDD (elasticity distribution formula data set) is a kind of abstract data type, is the form of expression of the data in Spark,
It is module and class most crucial in Spark, and design essence place.You, which can regard it as one, the big of fault tolerant mechanism
Set, Spark provide Persist mechanism and are cached in memory, facilitate interative computation and are used for multiple times.RDD is subregion
Record, same district can be distributed in different physical machines, preferably support parallel computation.RDD is it there are one characteristic
Be it is elastic, during job run, when the memory of machine overflows, RDD can be interacted with hard disc data, although
Efficiency can be reduced, but can guarantee the normal operation of operation.Two kinds of operations: conversion and movement can be carried out on RDD.
Conversion: existing RDD is converted by a new RDD by a series of function operation, i.e. return value is still
RDD, and RDD can be converted constantly.Since RDD is distributed storage, so entire conversion process is also to carry out parallel
's.Common conversion higher-order function such as map, flatMap, reduceByKey etc..
Movement: return value is not a RDD.It can be the ordinary set or a value of a Scala, or
It is sky, finally or returns to Driver program, or RDD is written in file system.Such as reduce, saveAsTextFile
With the functions such as collect.
For Spark Application when encountering action operation, SparkContext can generate Job, and by each
Job points are different stage, and (each Job can be split many group Task, and every group task is referred to as Stage, can also claim
TaskSet.Task is the working cell being sent on some Executor).Each Spark Application obtains exclusive
Executor, the corresponding JVM process of Executor, which is responsible for running Task.From difference
The Task of Application is operated in different JVM processes.The process is resident always during Application, and with
Multithreading runs Tasks.Each node can play one or more Executor;Each Executor by several Core,
Memery composition, each Core of each Executor can only once execute a Task;The result that each Task is executed is exactly
Generate a Partiton of target RDD, each Executor core of the concurrency that Task is performed=Executor number *
Number.
Summary of the invention
Aiming at the problem that above-mentioned background technique is illustrated, it is an object of the present invention to provide the quotient based on Spark big data platform
Product requirement forecasting and logistics divide storehouse planing method, solve existing smart home demand for commodity prediction and information in the planning of logistics point storehouse
Low efficiency, problem at high cost can effectively help smart home businessman that operation cost is greatly reduced, and reduce timeliness of receiving, mention
Rise the experience of user.
In order to achieve the above object, the invention provides the following technical scheme:
Demand for commodity prediction based on Spark big data platform divides storehouse planing method with logistics, it is characterised in that: including such as
Lower step:
Q1, data prediction obtain associated data files, including commodity granularity correlated characteristic dependent merchandise from database
User behavior characteristics, commodity and the region Fen Cang granularity correlated characteristic, the region Fen Cang benefit few to mend the relevant informations such as more costs right
Afterwards, by database because recent restocking or undercarriage cause the commodity of no sales volume record to carry out filling out 0 processing, to guarantee that data connect
Continuous property;
That is: creation is by SparkContext object, then with its textFile (URL) function creation distributed data collection
RDD, wherein data include smart home commodity granularity correlated characteristic, including ID, classification, brand, date, price, correlation in RDD
Commodity user behavior characteristics include browsing number plus shopping cart number, buy number, flow, commodity and the region Fen Cang granularity phase
Closing information, the distributed data collection for creating completion such as the cost of feature, the benefit in the region Fen Cang less, more than benefit can be grasped parallel
Make;Secondly, calling mapPartitions operator general<feature 1, feature 2 ..., feature m>form sample will be because recent using 0
Restocking or undercarriage lead to the relevant field polishing of the commodity of no sales volume record, to guarantee data continuity;It calls
ZipWithIndex operator, does a label to each sample, and the RDD of creation is converted to < label, commodity ID, warehouse
Code, feature 1, feature 2 ..., feature m > form finally call Filter operator according to the commodity transaction date by entire data set
It is divided into test set TestRDD and training set TrainRDD, and calls Persist operator will be in obtained TrainRDD persistence
In depositing;
Q2, feature construction call mapPartitions operator to TrainRDD using slip window sampling construction feature,
Corresponding statistical function is write, the relevant information structure of sample on each Partition (window) in different time period is counted
Build corresponding feature, select after each specific time point N days as a window, each commodity of the window, warehouse inventory
Total sales volume slides M window as label characteristic value, was used as a window to N days before specific time point, carries out feature structure
It builds, count N days various classification characteristic values before the window and value sum and average value avg, friendship of the statistics commodity at nearest N days
Easily several characteristic values, including maximum value, minimum value, standard deviation, count its classification id in the characteristic value of nearest N days numbers of deals,
Including maximum value, minimum value, standard deviation and rank value, accounting, latter N days total sales volumes slide M window, warp as label
It crosses the transformation of a series of data and the TrainRDD of creation is converted to < label, commodity ID, warehouse code, feature 1, feature 2 ...,
Feature m, feature m+1 ..., feature n, label > form;
Q3, characteristic value selection select the feature of ranking topk using xgboost, calculate similarity, remove redundancy feature,
The associated eigenvalue for selecting construction feature in Q2, then goes to learn, most obtains the importance ranking of feature with xgboost model,
It chooses topk important feature and calculates similarity, weed out those unessential features;
Q4, model selection, the multiple regression models of training, first to TrainRDD using LR, SVR, RF, GBRT,
Algorithm and the multiple regression models of third party's distributed learning algorithm training in XGBOOSTSpark Mllib machine learning library, will
Each model prediction call by result union operator is had the prediction result of model to be defined as model_RDD more;Secondly, calling
GroupBy operator is polymerize according to commodity ID.Map operator is finally called, if the benefit of some (commodity ID, warehouse code)
Few cost, which is greater than, mends more costs, then it is intended that prediction is more, therefore take maximum value in single model prediction result multiplied by
1.1, on the contrary take the minimum value in single model prediction result multiplied by 0.9, < commodity will be obtained by the transformation of a series of data
ID, warehouse code divide storehouse regional aim inventory > form model learning result model in future time section;
Q5, model prediction result are merged with regular prediction result, fusion coefficients 0.75model+0.25rule,
Wherein rule rule learning is defined as: N days sales volumes are denoted as day1, day2 ... dayN respectively before prediction window, to each quotient
Product mend more costs if mending few cost and being greater than, are predicted as N*max (day1, day2 ... dayN), otherwise are predicted as N* min
TestRDD is finally converted to < commodity ID by (day1, day2 ... dayN), warehouse code, the region Fen Cang in future time section
Base stock >, it is defined as rule_RDD, will finally obtain each commodity certain following period base stock in the region Fen Cang.
Smart home demand for commodity is predicted the present invention is based on Spark big data platform and divides storehouse planing method, Neng Gouyou
Effect helps smart home businessman that operation cost is greatly reduced, and reduces timeliness of receiving, promotes the experience of user, be more in line with data volume
The practical commercial scene of rapid growth.
Detailed description of the invention
Flow diagram Fig. 1 of the invention.
Feature Engineering stage RDD variation diagram Fig. 2 of the invention.
Specific embodiment
Below in conjunction with drawings and examples of the invention, technical solution of the present invention is clearly and completely described,
Obviously, described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Based in the present invention
Embodiment, every other embodiment obtained by those of ordinary skill in the art without making creative efforts, all
Belong to the scope of protection of the invention.
According to Fig. 1 and 2, the present invention saves as embodiment with the e-commerce product and logistic warehouse of smart home product,
Illustrate demand for commodity prediction and logistics point storehouse planing method based on Spark big data platform, includes the following steps:
Q1, data preprocessing phase:
Related data information is obtained from related smart home product database, and multilist data are carried out after integrating conveniently
It is continuous to use, then carry out pretreatment operation, by data record because recent restocking or undercarriage lead to the quotient of no sales volume record
Product carry out filling out 0 processing, to guarantee data continuity;Secondly it chooses whether that operation is normalized as needed.Finally, according to wanting
Predicted time length divides data, will integrally be divided into training set, verifying collection and test set.
Creation is by SparkContext object, then with its textFile (URL) function creation distributed data collection
RDD, wherein data include smart home commodity granularity correlated characteristic, including ID, classification, brand, date, price, correlation in RDD
Commodity user behavior characteristics include browsing number plus shopping cart number, buy number, flow, commodity and the region Fen Cang granularity phase
Closing information, the distributed data collection for creating completion such as the cost of feature, the benefit in the region Fen Cang less, more than benefit can be grasped parallel
Make;Secondly, calling mapPartitions operator general<feature 1, feature 2 ..., feature m>form sample will be because recent using 0
Restocking or undercarriage lead to the relevant field polishing of the commodity of no sales volume record, to guarantee data continuity;It calls
ZipWithIndex operator, does a label to each sample, and the RDD of creation is converted to < label, commodity ID, warehouse
Code, feature 1, feature 2 ..., feature m > form finally call Filter operator according to the commodity transaction date by entire data set
It is divided into test set TestRDD and training set TrainRDD, and calls Persist operator will be in obtained TrainRDD persistence
In depositing.
Wherein data structure is as shown in the following table 1, table 2:
1 commodity granularity correlated characteristic of table
The cost of the benefit in 2 region commodity Fen Cang of table less, more than benefit
Field | Type | Meaning | Example |
item_id | bigint | Commodity ID | 333442 |
store_code | String | Warehouse CODE | 1 |
money_a | String | Commodity benefit mends more cost less | 10.44 |
money_b | String | Commodity benefit mends more cost less | 20.88 |
Q2, feature construction:
Using slip window sampling construction feature, a window is used as within N days after choosing each specific time point, the window is each
A commodity, the total sales volume of warehouse inventory slide M window as characteristic value label, were used as one to N days before specific time point
A window carries out feature construction:
/ sum and avg of N days various classification features are counted 1/2/3/5/7/9/ before the window ..., before counting the window
N days various classification characteristic values and value sum and average value avg, statistics commodity nearest N days numbers of deals characteristic value, wrap
Maximum value, minimum value, standard deviation are included, counts its classification id in the characteristic value of nearest N days numbers of deals, including maximum value, minimum
Value, standard deviation and rank value, accounting and meet multinomial cross feature value.
It selects to be trained in 13 Time of Day section in December 10 to 2015 years July in 2015, slides 11 windows, length of window
Progress feature extraction in 14 days two weeks is selected ,/14 days various classification characteristic values that feature includes 1/2/3/5/7/9/ before the window ...
And value sum and average value avg, count commodity in the characteristic value of nearest 14 days numbers of deals, including maximum value, minimum value, mark
It is quasi- poor, its classification id is counted in the characteristic value of nearest 14 days numbers of deals, including maximum value, minimum value, standard deviation and ranking
Value, accounting and meets multinomial cross feature value.Each length of window last day counts the total part of sale in 14 days backward
Number summation is used as characteristic value label.It is shown in Table 3 sliding window explanations
3 sliding window date of table explanation
Q3, characteristic value selection:
It is converted based on Spark into type data, using the feature of xgb selection ranking topk, calculates similarity, removal redundancy is special
Sign.Concrete operations are as follows: call the distributed version of xgboost to n feature calculation importance of input TrainRDD.Then
SortBy algorithm and Filter operator is called to choose topk importance characteristic, TrainRDD is converted to < label, commodity at this time
ID, warehouse code, feature x1, feature x2 ..., feature xk, label > form.Finally mapPartitions operator is called to calculate
Pearson correlation coefficient between feature rejects redundancy feature according to the similarity size between feature.And to calling Persist operator
It will be in obtained TrainRDD persistence memory.Such as: 400 correlated characteristics are constructed, then xgboost model are selected to go
Study, then can export the important coefficient of each feature, we select tok40 here, that is, importance has been selected to come preceding 40
Feature;However there may be redundancies for this 40 features.Therefore by calculating the similarity between feature, common similarity calculation side
Method includes Pearson correlation coefficients, cosine similarity etc..Such as feature 1 is up to 10 similarity of feature in this 40 features
0.999, then removal feature 1 or feature 10 may be selected, only retain one of feature.Specifically remove that also with consider its with
The relationship of other features.
Q4, model selection:
TrainRDD is successively called in the Spark Mllib machine learning such as LR, SVR, RF, GBRT, XGBOOST library
Algorithm and the multiple regression models of third party's distributed learning algorithm training, and by each model prediction call by result union operator
There is the prediction result of model to be defined as model_RDD more.Then GroupBy operator is called to be polymerize according to commodity ID.
Finally call Map operator, if the benefit of some (commodity ID, warehouse code) lack cost and is greater than the more costs of benefit, then it is intended that
It predicts more, therefore takes maximum value in single model prediction result multiplied by 1.1, otherwise take in single model prediction result most
Small value is multiplied by 0.9.< commodity ID will be obtained by converting by a series of data, warehouse code, the region Fen Cang in future time section
Base stock > form model learning result.
Each model predication value exemplary graph of table 4
Commodity | Position in storehouse | LR | SVR | RF | GBRT | XGBOOST |
b | 0002 | 30 | 45 | 54 | 100 | 10 |
c | 0003 | 40 | 60 | 70 | 20 | 10 |
If it is 10 yuan that smart home commodity b, which mends more costs, mending few cost is 100 yuan, then the smart home commodity are 0002
Prediction result value in warehouse is 100*1.1=110;
If it is 80 yuan that smart home commodity c, which mends more costs, mending few cost is 40 yuan, then the smart home commodity are in 0003 storehouse
Prediction result value in library is 10*0.9=9;
Q5, model prediction result are merged with regular prediction result:
N days sales volumes are denoted as day1, day2 ... dayN respectively before prediction window, to each commodity, if it is big to mend few cost
In mending more costs, then it is predicted as N*max (day1, day2 ... dayN), otherwise is predicted as N*min (day1, day2 ... dayN);
Such as: the sales volume of prediction window the last fortnight is denoted as sale1, sale2 respectively, to each (commodity, position in storehouse), if mended
Few cost, which is greater than, mends more costs, then is predicted as 2* max (sale1, sale2), otherwise is predicted as 2*min (sale1, sale2).
Fusion forecasting is as a result, model prediction result is merged with regular prediction result, fusion coefficients 0.75model
+0.25rule;It is merged between model as shown in Figure 2, Model Fusion result is M1, is then merged with rule, fusion knot
Fruit is M2, and using model M 2, the historical data as shown in table 1, table 2 can according to the historical data of different smart home products
In Spark big data platform, their following quantities in stock as shown in table 4 needed for the warehouse of prediction, compare table 4 and
Speech, table 4 are the output of single model as a result, M2 is the fusion results of multi-model and rule, and effect can be more preferably.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain
Lid is within protection scope of the present invention.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (1)
1. the demand for commodity prediction based on Spark big data platform divides storehouse planing method with logistics, it is characterised in that: including as follows
Step:
Q1, data prediction obtain associated data files, including commodity granularity correlated characteristic dependent merchandise user from database
Then behavioural characteristic, commodity and the region Fen Cang granularity correlated characteristic, mending for the region Fen Cang mend the relevant informations such as more costs less, will
Because recent restocking or undercarriage cause the commodity of no sales volume record to carry out filling out 0 processing in database, to guarantee data continuity;
That is: creation is by SparkContext object, then with its textFile (URL) function creation distributed data collection RDD,
Wherein data include smart home commodity granularity correlated characteristic, including ID, classification, brand, date, price, dependent merchandise in RDD
User behavior characteristics include browsing number plus shopping cart number, buy number, and flow, commodity are related to the region Fen Cang granularity special
Information, the distributed data collection for creating completion such as the cost of sign, the benefit in the region Fen Cang less, more than benefit can be operated in parallel;Its
It is secondary, call mapPartitions operator general<feature 1, feature 2 ..., feature m>form sample use 0 will because of recent restocking or
Person's undercarriage leads to the relevant field polishing of the commodity of no sales volume record, to guarantee data continuity;It calls
ZipWithIndex operator, does a label to each sample, and the RDD of creation is converted to < label, commodity ID, warehouse
Code, feature 1, feature 2 ..., feature m > form finally call Filter operator according to the commodity transaction date by entire data set
It is divided into test set TestRDD and training set TrainRDD, and calls Persist operator will be in obtained TrainRDD persistence
In depositing;
Q2, feature construction are called mapPartitions operator to TrainRDD, are write phase using slip window sampling construction feature
The statistical function answered, the relevant information building for counting sample on each Partition (window) in different time period are corresponding
Feature, select after each specific time point N days as a window, each commodity of the window, the total sales volume of warehouse inventory
As label characteristic value, M window is slided, was used as a window to N days before specific time point, carries out feature construction, statistics
Before the window N days various classification characteristic values and value sum and average value avg, count commodity nearest N days numbers of deals spy
Value indicative, including maximum value, minimum value, standard deviation count its classification id in the characteristic value of nearest N days numbers of deals, including maximum
Value, minimum value, standard deviation and rank value, accounting, latter N days total sales volumes slide M window, process is a series of as label
Data transformation the TrainRDD of creation is converted to < label, commodity ID, warehouse code, feature 1, feature 2 ..., feature m is special
Levy m+1 ..., feature n, label > form;
Q3, characteristic value selection select the feature of ranking topk using xgboost, calculate similarity, remove redundancy feature, selection
The associated eigenvalue of construction feature in Q2, then goes to learn with xgboost model, most obtains the importance ranking of feature, chooses
Topk important feature calculates similarity, weeds out those unessential features;
Q4, model selection, the multiple regression models of training, first to TrainRDD using LR, SVR, RF, GBRT,
Algorithm and the multiple regression models of third party's distributed learning algorithm training in XGBOOSTSpark Mllib machine learning library, will
Each model prediction call by result union operator is had the prediction result of model to be defined as model_RDD more;Secondly, calling
GroupBy operator is polymerize according to commodity ID.Map operator is finally called, if the benefit of some (commodity ID, warehouse code) is few
Cost, which is greater than, mends more costs, then it is intended that prediction is more, therefore take maximum value in single model prediction result multiplied by
1.1, on the contrary take the minimum value in single model prediction result multiplied by 0.9, < commodity will be obtained by the transformation of a series of data
ID, warehouse code divide storehouse regional aim inventory > form model learning result model in future time section;
Q5, model prediction result are merged with regular prediction result, fusion coefficients 0.75model+0.25rule, wherein
Rule rule learning is defined as: N days sales volumes are denoted as day1, day2 ... dayN respectively before prediction window, to each commodity, such as
Fruit, which mends few cost and is greater than, mends more costs, then is predicted as N*max (day1, day2 ... dayN), on the contrary be predicted as N*min (day1,
Day2 ... dayN), TestRDD is finally converted to < commodity ID, warehouse code divides storehouse regional aim inventory in future time section
>, it is defined as rule_RDD, will finally obtain each commodity certain following period base stock in the region Fen Cang.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811133491.1A CN109325808A (en) | 2018-09-27 | 2018-09-27 | Demand for commodity prediction based on Spark big data platform divides storehouse planing method with logistics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811133491.1A CN109325808A (en) | 2018-09-27 | 2018-09-27 | Demand for commodity prediction based on Spark big data platform divides storehouse planing method with logistics |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109325808A true CN109325808A (en) | 2019-02-12 |
Family
ID=65266412
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811133491.1A Withdrawn CN109325808A (en) | 2018-09-27 | 2018-09-27 | Demand for commodity prediction based on Spark big data platform divides storehouse planing method with logistics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109325808A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110688623A (en) * | 2019-09-29 | 2020-01-14 | 深圳乐信软件技术有限公司 | Training optimization method, device, equipment and storage medium of high-order LR model |
CN110956272A (en) * | 2019-11-01 | 2020-04-03 | 第四范式(北京)技术有限公司 | Method and system for realizing data processing |
CN111190110A (en) * | 2020-01-13 | 2020-05-22 | 南京邮电大学 | Lithium ion battery SOC online estimation method comprehensively considering internal and external influence factors |
CN111768139A (en) * | 2019-06-27 | 2020-10-13 | 北京沃东天骏信息技术有限公司 | Stock processing method, apparatus, device and storage medium |
CN112100182A (en) * | 2020-09-27 | 2020-12-18 | 中国建设银行股份有限公司 | Data warehousing processing method and device and server |
CN112308665A (en) * | 2020-10-26 | 2021-02-02 | 福建菩泰网络科技有限公司 | Goods distribution method and system for online shopping mall |
CN112597213A (en) * | 2020-12-24 | 2021-04-02 | 第四范式(北京)技术有限公司 | Batch request processing method and device for feature calculation, electronic equipment and storage medium |
CN113642958A (en) * | 2021-08-05 | 2021-11-12 | 大唐互联科技(武汉)有限公司 | Warehouse replenishment method, device, equipment and storage medium based on big data |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106599935A (en) * | 2016-12-29 | 2017-04-26 | 重庆邮电大学 | Three-decision unbalanced data oversampling method based on Spark big data platform |
CN107122928A (en) * | 2016-02-24 | 2017-09-01 | 阿里巴巴集团控股有限公司 | A kind of supply chain Resource Requirement Planning collocation method and device |
CN108399457A (en) * | 2018-02-02 | 2018-08-14 | 西安电子科技大学 | There are the Boosting improved methods converted based on multistep label under inclined data in integrated study |
CN109582706A (en) * | 2018-11-14 | 2019-04-05 | 重庆邮电大学 | The neighborhood density imbalance data mixing method of sampling based on Spark big data platform |
-
2018
- 2018-09-27 CN CN201811133491.1A patent/CN109325808A/en not_active Withdrawn
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107122928A (en) * | 2016-02-24 | 2017-09-01 | 阿里巴巴集团控股有限公司 | A kind of supply chain Resource Requirement Planning collocation method and device |
CN106599935A (en) * | 2016-12-29 | 2017-04-26 | 重庆邮电大学 | Three-decision unbalanced data oversampling method based on Spark big data platform |
CN108399457A (en) * | 2018-02-02 | 2018-08-14 | 西安电子科技大学 | There are the Boosting improved methods converted based on multistep label under inclined data in integrated study |
CN109582706A (en) * | 2018-11-14 | 2019-04-05 | 重庆邮电大学 | The neighborhood density imbalance data mixing method of sampling based on Spark big data platform |
Non-Patent Citations (2)
Title |
---|
LIMEIYANG: "菜鸟-需求预测与分仓规划解决", 《HTTPS://GITHUB.COM/LIMEIYANG/CAINIAO》 * |
菜鸟网络: "菜鸟-需求预测与分仓规划:赛题与数据", 《HTTPS://TIANCHI.ALIYUN.COM/COMPETITION/ENTRANCE/231530/INFORMATION?FROM=OLDURL》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111768139A (en) * | 2019-06-27 | 2020-10-13 | 北京沃东天骏信息技术有限公司 | Stock processing method, apparatus, device and storage medium |
CN110688623A (en) * | 2019-09-29 | 2020-01-14 | 深圳乐信软件技术有限公司 | Training optimization method, device, equipment and storage medium of high-order LR model |
CN110688623B (en) * | 2019-09-29 | 2023-12-26 | 深圳乐信软件技术有限公司 | Training optimization method, device, equipment and storage medium for high-order LR model |
CN110956272A (en) * | 2019-11-01 | 2020-04-03 | 第四范式(北京)技术有限公司 | Method and system for realizing data processing |
CN110956272B (en) * | 2019-11-01 | 2023-08-08 | 第四范式(北京)技术有限公司 | Method and system for realizing data processing |
CN111190110A (en) * | 2020-01-13 | 2020-05-22 | 南京邮电大学 | Lithium ion battery SOC online estimation method comprehensively considering internal and external influence factors |
CN112100182A (en) * | 2020-09-27 | 2020-12-18 | 中国建设银行股份有限公司 | Data warehousing processing method and device and server |
CN112308665A (en) * | 2020-10-26 | 2021-02-02 | 福建菩泰网络科技有限公司 | Goods distribution method and system for online shopping mall |
CN112597213A (en) * | 2020-12-24 | 2021-04-02 | 第四范式(北京)技术有限公司 | Batch request processing method and device for feature calculation, electronic equipment and storage medium |
CN112597213B (en) * | 2020-12-24 | 2023-11-10 | 第四范式(北京)技术有限公司 | Batch request processing method and device for feature calculation, electronic equipment and storage medium |
CN113642958A (en) * | 2021-08-05 | 2021-11-12 | 大唐互联科技(武汉)有限公司 | Warehouse replenishment method, device, equipment and storage medium based on big data |
CN113642958B (en) * | 2021-08-05 | 2024-06-04 | 大唐互联科技(武汉)有限公司 | Warehouse replenishment method, device, equipment and storage medium based on big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109325808A (en) | Demand for commodity prediction based on Spark big data platform divides storehouse planing method with logistics | |
Hofmann et al. | Big data analytics and demand forecasting in supply chains: a conceptual analysis | |
Kim et al. | Optimal inventory control in a multi-period newsvendor problem with non-stationary demand | |
Vahdani et al. | A hybrid multi-stage predictive model for supply chain network collapse recovery analysis: a practical framework for effective supply chain network continuity management | |
CA3235875A1 (en) | Method and system for generation of at least one output analytic for a promotion | |
CN101783004A (en) | Fast intelligent commodity recommendation system | |
JP2018533807A (en) | System and method for providing a multi-channel inventory allocation approach to retailers | |
US10528903B2 (en) | Computerized promotion and markdown price scheduling | |
CN109961198B (en) | Associated information generation method and device | |
CN110109901B (en) | Method and device for screening target object | |
US20160034952A1 (en) | Control apparatus and accelerating method | |
CN109214587A (en) | A kind of demand for commodity prediction based on three decisions divides storehouse planing method with logistics | |
Harsoor et al. | Forecast of sales of Walmart store using big data applications | |
CN108777701A (en) | A kind of method and device of determining receiver | |
CN109558992A (en) | Based on sale peak value prediction technique, device, equipment and the storage medium from the machine of dealer | |
CN112036631B (en) | Purchasing quantity determining method, purchasing quantity determining device, purchasing quantity determining equipment and storage medium | |
Behera et al. | Grid search optimization (GSO) based future sales prediction for big mart | |
CN112365283A (en) | Coupon issuing method, device, terminal equipment and storage medium | |
CA3131040A1 (en) | Method and system for optimizing an objective having discrete constraints | |
CN113763035A (en) | Advertisement delivery effect prediction method and device, computer equipment and storage medium | |
CN109190027A (en) | Multi-source recommended method, terminal, server, computer equipment, readable medium | |
CN108629467B (en) | Sample information processing method and system | |
US20210312259A1 (en) | Systems and methods for automatic product usage model training and prediction | |
CN111353794A (en) | Data processing method, supply chain scheduling method and device | |
Polder et al. | Complementarities between Information Technologies and Innovation Modes in the Adoption and Outcome Stage: A MicroEconometric Analysis for the Netherlands. CAED conference |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20190415 Address after: 400000 Chongqing Nanan District Photoelectric Road 26 10 7-1 Applicant after: Shu Haidong Address before: 400010 9 buildings 13-8, 168 Caiyuan Road, Yuzhong District, Chongqing Applicant before: Chongqing Zhiwanjia Technology Co., Ltd. |
|
TA01 | Transfer of patent application right | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20190212 |
|
WW01 | Invention patent application withdrawn after publication |