CN107688580A - The method, apparatus and system of commodity classification based on Distributed Data Warehouse - Google Patents
The method, apparatus and system of commodity classification based on Distributed Data Warehouse Download PDFInfo
- Publication number
- CN107688580A CN107688580A CN201610637689.8A CN201610637689A CN107688580A CN 107688580 A CN107688580 A CN 107688580A CN 201610637689 A CN201610637689 A CN 201610637689A CN 107688580 A CN107688580 A CN 107688580A
- Authority
- CN
- China
- Prior art keywords
- data
- commodity
- layer
- classification
- graded index
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
- G06F16/24554—Unary operations; Data partitioning operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
- G06F16/24554—Unary operations; Data partitioning operations
- G06F16/24556—Aggregation; Duplicate elimination
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- Development Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Computational Linguistics (AREA)
- Game Theory and Decision Science (AREA)
- Computing Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention provides a kind of method, apparatus and system of the commodity classification based on Distributed Data Warehouse, distributed computing technology can be utilized, possibility is provided for large-scale data processing, and process performance can be also substantially improved, and preferably realize linear expansion.This method includes:Basic data is drawn into data warehouse, basic data data related to commodity graded index in essential information and each operation system including commodity;The basic data is associated by business major key, and the graded index data of commodity under specific dimension are obtained according to classification demand addition dimensional information, integration;The commodity under specific dimension are classified according to the graded index data.
Description
Technical field
The present invention relates to computer technology and software field, more particularly to a kind of commodity based on Distributed Data Warehouse point
The method, apparatus and system of level.
Background technology
Italian economist's Pareto is found when studying the wealth and revenue source of 19 century Britain people:In society
20% people occupies 80% wealth of society, i.e.,:Distribution of the wealth in population is unbalanced.Hereafter, it is it has also been found that raw
Many similar unbalanced phenomenons in work be present.Therefore, Pareto Law is regardless of result into the abbreviation of this unequal relation
It is not exactly 80% and 20%.Traditionally, Pareto Law discussion is the 20% of top, rather than the 80% of bottom.The electricity put
Sub- commercial field, operator are usually required according to classification rule, by commodity according to time (such as 7 days, 15 days, 30 days) and specific
(such as commodity first-level class, secondary classification, three-level classification, secondary classification+brand, three-level classification+brand, Cai Xiao departments, salesman
Deng) dimension is classified to the click volume of commodity, the amount of placing an order, outbound amount and sales volume etc..
The mode of existing commodity classification, it is (following to use in the accumulative click volume of 7 days with the commodity under first-level class dimension
Sumclick is represented) exemplified by:It is that the click volume of each commodity sorts from high to low first, then add up click volume successively, tires out
Count and reach 20%*sumclick*N (N=1,2,3,4,5) and be used as threshold value, another kind of, while rank is then classified as more than threshold value
It is sequentially allocated from A toward E to each SKU.
Existing commodity stage division is during technology is realized, or using multithreading, or uses data
Storehouse calculates, such as calls Stored Procedure Technology realization of database etc..Multithreading or the technology similar to storing process, are present
The defects of shared:Autgmentability is poor;Performance depends on single hardware performance, and performance difficulty is substantially improved;Distributed meter can not be realized
Calculate.As type of merchandize is more and more, it is increasing to calculate pressure, it usually needs spend time of several days just to calculate newest
Commodity band is classified.
The content of the invention
In view of this, the present invention provides a kind of method, apparatus and system of the commodity classification based on Distributed Data Warehouse,
Distributed computing technology can be utilized, possibility is provided for large-scale data processing, and process performance can be also substantially improved, and
And preferably realize linear expansion.
To achieve the above object, according to an aspect of the invention, there is provided a kind of business based on Distributed Data Warehouse
The method of product classification.
A kind of method of commodity classification based on Distributed Data Warehouse of the present invention includes:Basic data is drawn into number
According in warehouse, the basic data include commodity essential information and each operation system in the number related to commodity graded index
According to;The basic data is associated by business major key, and specific dimension is obtained according to classification demand addition dimensional information, integration
The graded index data of the lower commodity of degree;The commodity under specific dimension are classified according to the graded index data.
Alternatively, in methods described:The data warehouse includes buffered data layer, basic data layer, conventional data layer, dimension
Degrees of data layer, aggregated data layer and interim storage layer, wherein, basic data is drawn into the buffered data layer of data warehouse,
After data cleansing, preserve to basic data layer;The basic data is associated by business major key, preserved to general number
According to layer;Dimensional information data are extracted to dimension data layer, and according to classification demand by conventional data layer or the number of basic data layer
Combined according to the dimensional information data with dimension data layer, obtain the graded index data of commodity under specific dimension;And it is polymerizeing
Data Layer carries out the storage of classification results, and caused ephemeral data is deposited to interim storage layer wherein in calculating process.
Alternatively, methods described also includes:The commodity under specific dimension are classified according to the graded index data
Before, using time triggered mechanism, commodity classification is judged whether to.
Alternatively, carrying out classification to the commodity under specific dimension according to the graded index data includes:Step 1:According to
The index total amount of commodity determines a reference value, and some critical values are determined according to a reference value, and by commodity according to desired value from big to small
Sequence;Step 2:If the desired value that commodity be present is more than or equal to a reference value, the commodity of a reference value are more than or equal to desired value,
Level is down assigned successively from the superlative degree, if being not present, directly carries out step 3;Step 3:The desired value of surplus commodities is carried out
It is cumulative, and accumulation result and some critical values are contrasted, the commodity being present between two critical values are included into appropriate level, described
Appropriate level is that the remaining rank after level is assigned by step 2.
Alternatively, a reference value is determined according to the index total amount of commodity, determines that some critical values include according to a reference value:It is if total
Number N is classified, lowermost level is the commodity that desired value is zero, then is worth on the basis of N-1/1 of index total amount, a reference value is multiplied by 1 to N-
1 obtains N-1 critical value.
Alternatively, methods described also includes:Before step 2, if the commodity that desired value is zero be present, directly by it
It is included into lowermost level.
Alternatively, methods described also includes:In the step 3, if the accumulation result of the desired value of surplus commodities is less than base
Quasi- value, then by surplus commodities according to order from big to small, level is down assigned successively from the superlative degree, wherein, if the number of surplus commodities
Amount is more than total score series, then point to it is secondary it is rudimentary after, surplus commodities are all included into lowermost level.
To achieve the above object, according to another aspect of the invention, there is provided a kind of business based on Distributed Data Warehouse
The device of product classification.
A kind of device of commodity classification based on Distributed Data Warehouse of the present invention includes:Abstraction module, for by base
Plinth data pick-up into data warehouse, in essential information and each operation system including commodity with commodity divide by the basic data
The related data of level index;Module is integrated, for the basic data to be associated by business major key, and according to classification demand
Dimensional information is added, integration obtains the graded index data of commodity under specific dimension;Diversity module, for being referred to according to the classification
Mark data are classified to the commodity under specific dimension.
Alternatively, the data warehouse includes buffered data layer, basic data layer, conventional data layer, dimension data layer, poly-
Data Layer and interim storage layer are closed, wherein, abstraction module is additionally operable to basic data being drawn into the buffered data of data warehouse
Layer, after data cleansing, is preserved to basic data layer;Integrate module to be additionally operable to, the basic data is entered by business major key
Row association, is preserved to conventional data layer;And dimensional information data are extracted to dimension data layer, and will be general according to classification demand
The data of data Layer or basic data layer are combined with the dimensional information data of dimension data layer, and integration obtains commodity under specific dimension
Graded index data;And diversity module is additionally operable to carry out the storage of classification results, wherein calculating process in aggregated data layer
In caused ephemeral data deposit to interim storage layer.
Alternatively, the diversity module is additionally operable to:The commodity under specific dimension are carried out according to the graded index data
Before classification, using time triggered mechanism, commodity classification is judged whether to.
Alternatively, the diversity module is additionally operable to:Step 1:A reference value is determined according to the index total amount of commodity, according to base
Quasi- value determines some critical values, and commodity are sorted from big to small according to desired value;Step 2:If the desired value that commodity be present is big
In equal to a reference value, then it is more than or equal to the commodity of a reference value to desired value, level is down assigned successively from the superlative degree, if being not present,
Directly carry out step 3;Step 3:The desired value of surplus commodities is added up, and by accumulation result and some critical values pair
Than the commodity being present between two critical values are included into appropriate level, and the appropriate level is that the residue after level is assigned by step 2
Rank.
Alternatively, the diversity module is additionally operable to:If total score series N, lowermost level is the commodity that desired value is zero, then index
It is worth on the basis of N-1/1 of total amount, a reference value is multiplied by 1 to N-1 and obtains N-1 critical value.
Alternatively, the diversity module is additionally operable to:Before step 2, if the commodity that desired value is zero be present, directly
It is classified to lowermost level.
Alternatively, the diversity module is additionally operable to:In the step 3, if the accumulation result of the desired value of surplus commodities is small
In a reference value, then surplus commodities are down assigned into level according to order from big to small successively from the superlative degree, wherein, if surplus commodities
Quantity be more than total score series, then point to after time rudimentary, surplus commodities are all included into lowermost level.
To achieve the above object, in accordance with a further aspect of the present invention, there is provided a kind of business based on Distributed Data Warehouse
The system of product classification.
A kind of system of commodity classification based on Distributed Data Warehouse of the present invention includes:Memory and processor, its
In:The memory store instruction;The processor is configured as performing following steps according to the instruction:Basic data is taken out
Get in data warehouse, in the essential information and each operation system of the basic data including commodity with commodity graded index phase
The data of pass;The basic data is associated by business major key, and obtained according to classification demand addition dimensional information, integration
The graded index data of commodity under specific dimension;The commodity under specific dimension are classified according to the graded index data.
Technique according to the invention scheme, by standardizing the thought of modeling according to data warehouse, utilize data hierarchy mould
Type and Tool for Data Warehouse carry out data acquisition and integration, so as to lift the stability of data processing and scalability;It is logical
Cross and utilize Distributed Data Warehouse, so as to realize that the commodity under different condition are classified, multi-angle examines the warp of emphasis commodity
Overview is sought, the performance of lifting commodity classification, satisfaction adopts the routine work demand such as pin, storage, finance, and can unify enterprise not
Bore is classified with the commodity in form demand, reduces the waste of enterprises development resources, is advantageous to the maintenance of commodity hierarchy system;
In addition, by proposing that rationally comprehensive commodity are classified computation rule, so as to lift the accuracy of the result of commodity classification.
Brief description of the drawings
Accompanying drawing is used to more fully understand the present invention, does not form inappropriate limitation of the present invention.Wherein:
Fig. 1 is the key step of the method for the commodity classification according to embodiments of the present invention based on Distributed Data Warehouse
Schematic diagram;
Fig. 2 is the distributed data of the method for the commodity classification according to embodiments of the present invention based on Distributed Data Warehouse
The general frame schematic diagram in warehouse;
Fig. 3 is the data model of the method for the commodity classification according to embodiments of the present invention based on Distributed Data Warehouse
Schematic diagram;
Fig. 4 is by FDM layer tables in the method that the commodity according to embodiments of the present invention based on Distributed Data Warehouse are classified
The schematic diagram being associated by business major key;
Fig. 5 is the main flow of the method for the commodity classification according to embodiments of the present invention based on Distributed Data Warehouse
Schematic diagram;
Fig. 6 is the main modular of the device of the commodity classification according to embodiments of the present invention based on Distributed Data Warehouse
Schematic diagram;
Fig. 7 is the major part of the system of the commodity classification according to embodiments of the present invention based on Distributed Data Warehouse
Schematic diagram.
Embodiment
The one exemplary embodiment of the present invention is explained below in conjunction with accompanying drawing, including the various of the embodiment of the present invention
Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize
Arrive, various changes and modifications can be made to the embodiments described herein, without departing from scope and spirit of the present invention.Together
Sample, for clarity and conciseness, the description to known function and structure is eliminated in following description.
The embodiment of the present invention is that the structure of the Distributed Data Warehouse model standardization of the electronic commerce affair based on B2C enters
The method of product of doing business classification, is different from the structure implementation method of traditional program formula, can be with distributed implementation according to different condition
Commodity are classified, and multi-angle examines the summary of operations of emphasis commodity;Furthermore it is possible to lift commodity classification accuracy and performance, protect
Card satisfaction adopts the routine work demand such as pin, storage, finance;Further, it is also possible to the business proposed in unified enterprise difference form demand
Product are classified bore, reduce enterprises development resources and waste, and are advantageous to the maintenance of commodity hierarchy system.
For example, according to principle of grading, by commodity according to time (such as 7 days, 15 days, 30 days) and specific dimension (such as commodity one
Level classification, secondary classification, three-level classification, secondary classification+brand, three-level classification+brand, Cai Xiao departments, salesman etc.) to commodity
Click volume, the amount of placing an order, outbound amount and sales volume be respectively classified into some grades (such as A, B, C, D, E, F).
Fig. 1 is the key step of the method for the commodity classification according to embodiments of the present invention based on Distributed Data Warehouse
Schematic diagram.Fig. 2 is the distributed data of the method for the commodity classification according to embodiments of the present invention based on Distributed Data Warehouse
The general frame schematic diagram in warehouse.
As shown in Figure 1 and Figure 2, the method for the classification of the commodity based on Distributed Data Warehouse of the embodiment of the present invention is mainly wrapped
Include following steps:
Step S11:Basic data is drawn into data warehouse, the essential information of the basic data including commodity and
The data related to commodity graded index in each operation system.
Step S12:The basic data is associated by business major key, and dimensional information is added according to classification demand,
Integration obtains the graded index data of commodity under specific dimension.
Step S13:The commodity under specific dimension are classified according to the graded index data.
In the embodiment of the present invention, the commodity under specific dimension are carried out according to graded index data classification can include it is as follows
Step:Step 1:A reference value is determined according to the index total amount of commodity, some critical values are determined according to a reference value, and commodity are pressed
Sorted from big to small according to desired value;Step 2:If the desired value that commodity be present is more than or equal to a reference value, desired value is more than etc.
In the commodity of a reference value, level is down assigned successively from the superlative degree, if being not present, directly carries out step 3;Step 3:By remaining business
The desired value of product is added up, and accumulation result and some critical values are contrasted, and the commodity being present between two critical values are returned
Enter appropriate level, the appropriate level is that the remaining rank after level is assigned by step 2.
As shown in figure 3, in the embodiment of the present invention, aforementioned data warehouse includes buffered data layer, basic data layer, general number
According to layer, dimension data layer, aggregated data layer and interim storage layer, wherein, basic data is drawn into the buffering of data warehouse
Data Layer, after data cleansing, preserve to basic data layer;The basic data is associated by business major key, preserved
To conventional data layer;Extract dimensional information data to dimension data layer, and according to classification demand by the data of conventional data layer with
The dimensional information data of dimension data layer combine, and obtain the graded index data of commodity under specific dimension;And in aggregated data
Layer carries out classification calculating and the storage of classification results, and caused ephemeral data is deposited to interim storage layer wherein in calculating process.
Wherein, a reference value is determined according to the index total amount of commodity, determines that some critical values include according to a reference value:If total score
Series N, lowermost level are the commodity that desired value is zero, then are worth on the basis of N-1/1 of index total amount, a reference value is multiplied by 1 to N-1
Obtain N-1 critical value.In the embodiment of the present invention, N can be, but not limited to as 6.
In addition, in step three in methods described, will if the accumulation result of the desired value of surplus commodities is less than a reference value
Surplus commodities down assign level successively according to order from big to small, from the superlative degree, wherein, if the quantity of surplus commodities is more than total score
Series, then point to after time rudimentary, surplus commodities are all included into lowermost level.
The method being classified below in conjunction with accompanying drawing to the commodity based on Distributed Data Warehouse of the embodiment of the present invention carries out detailed
Describe in detail bright.
Distributed Data Warehouse of the embodiment of the present invention based on ecommerce is realized, by Hadoop platform, utilizes data
Warehouse instrument such as hive, it is (following that data model can be divided into buffered data layer (hereinafter referred to as " BDM layers "), basic data layer
Referred to as " FDM layers "), conventional data layer (hereinafter referred to as " GDM layers "), dimension data layer (hereinafter referred to as " DIM layers "), aggregated data
The layer level such as (hereinafter referred to as " ADM layers ") and interim storage layer (hereinafter referred to as " TMP layers ").
Specifically:
1st, BDM layers
Periodically extracted from each operation system and increase newly, update the data, such as can need to be set as according to business solid every night
The data for extracting the same day are carved in timing, are removed forbidden character, conversion date form, are then loaded directly into BDM layers, BDM layer table knots
Structure is consistent with operation system table structure, and during commodity are classified, the table being related to mainly has order table, order detailed
Table, flow table, outlet pipe table are clicked on, adopt pin system, flow system and financial system etc. in the overall framework of corresponding diagram 2 respectively.
2nd, FDM layers
By BDM data, it is worked into by way of slide fastener in this layer.FDM layer datas are increased with BDM layer historical datas
Amount is compared, and generation renewal, which is then updated corresponding to FDM layers, to be recorded, if increased newly, is sequentially inserted into FDM layers.
3rd, GDM layers
FDM layers table is associated by business major key, multiple FDM layer table packs are obtained into GDM layers, count on most particulate
Spend SKU.In the embodiment of the present invention, illustrated by taking the sale detail GDM tables of order as an example, specific association process refers to Fig. 4.
In order account for sales, business major key can be presented as O/No.+goods number, in merchandise news table, industry
Business major key can be presented as goods number, by matching business major key, can obtain summary sheet, that is, sell detailed GDM tables.
4th, ADM layers design
By dimension, by basal layer (FDM) or general layer (GDM) data with certain logic (such as order pin to upper figure
Detail list is sold, the sales volume of commodity is summed according to the specific dimension of sales date and first-level class) it is processed into polymerization
Layer (ADM) data.If corresponding data only exist FDM layers and without corresponding GDM layers, now select directly to add from FDM layers
Work;Otherwise to select to process from GDM layers.
ADM layers mainly deposit the classification results calculated, with 7 days for the cycle, count corresponding according to commodity first-level class
Band is example to illustrate:
Count the date | Measurement period generation Code | Commodity are compiled Number | The fraction of commodity one Class | The fraction of commodity two Class | The fraction of commodity three Class | Place an order Amount | The amount of placing an order band | Sale gold Volume | Sales volume band | Outbound Amount | Outbound amount band | Click on Amount | Click on Amount band |
2016-01-01 | 7 | 10001 | 1 | 11 | 110 | 5000 | A | 10000 | B | 3000 | C | 20000 | C |
…… | …… | …… | …… | …… | …… | …… | …… | …… | …… | …… | …… | …… | …… |
5th, DIM layers design
Dimension (such as time, category) related data is extracted, is entered into this layer, when being classified calculating so as to commodity, from DIM layers
Obtain dimensional information.
6th, TMP layers design
Ephemeral data processing, the layer of storage.Caused interim findings temp in commodity classification process can be cached.
As it was previously stated, in the embodiment of the present invention, commodity band classifications can be divided under specific dimension again as click volume band,
The amount of placing an order band, outbound amount band, sales volume band, measurement period are respectively 7,15,30 days, by first-level class, secondary classification,
Three-level classification, Cai Xiao departments, salesman, secondary classification+brand, three-level classification+brand etc., commodity band classifications are calculated respectively
Value.
Below with 7 days for the cycle, the present invention is implemented according to exemplified by the click volume band of commodity and statistic of classification commodity
The main flow of the commodity hierarchical algorithmses of example illustrates.Specific algorithm is as follows:
1:Judge whether that triggering commodity classification calculates, and can be, but not limited in the embodiment of the present invention using the time as triggering bar
Part, such as on Sunday weekly or monthly No. 1, triggering carries out commodity classification.Accordingly, can periodically judge to count the date
Whether it is Sunday or monthly No. 1, if it is, being transferred to 2, otherwise terminates to calculate;
2:Using aforementioned data model, such as the table that GDM layers obtain, the basis of each commodity in the cycle (7 days) is counted
Index such as click volume deposit temp1 in, if GDM layers not if from FDM layers table obtain.In temp1 tables should include commodity ID,
Cycle indicator (7 days), click volume information;
3:Temp1 tables are associated with commodity dimension table, and according to statistical demand, the dimension of the commodity of this statistics is obtained from DIM layers
Information (i.e. first-level class code) is spent, is stored in temp2 tables, what temp2 was preserved is the graded index number of temp1+ dimensional informations
According to;
4:Different dimensions a reference value as corresponding to first-level class code is calculated using temp2, is stored in temp3 tables, system
Several critical values under each first-level class code are counted out, are as a result stored in temp6;Such as:Commodity be classified as A, B, C, D,
E, six grades of F, critical value are 5, then calculate accumulative click value of the commodity at 7 days that first-level class is 1 with temp2 meters
(being represented below with sumclick), then its a reference value is sumclick*20%;First-level class is 15 critical value values, respectively
For:Sumclick*20%*1, sumclick*20%*2, sumclick*20%*3, sumclick*20%*4, sumclick*
20%*5;
5:By the desired value of each commodity such as click volume according to sorting from high to low.Using temp2, click volume is equal to 0 business
It is F that product information, which directly assigns click volume band values, is as a result inserted in temp_band;
6:In addition, because that may there is particular commodity (hereinafter referred to as " single product ") click volume to add up other commodity click volume just
Click volume a reference value can be reached, so, it is first determined whether in the presence of single product click volume and more than or equal to click volume a reference value, if
In the presence of, then according to the size of the click volume for the single product for being more than click volume a reference value, from it is highest down, assign respectively A, B, C, D,
E grades, as a result it is stored in temp5, via temp5, result is stored in temp_band;If being not present, step 7 is directly performed;
7:The commodity click volume band in addition to single product click volume is more than click volume a reference value is calculated, the commodity that add up successively are clicked on
Amount, judges whether accumulation result is more than or equal to click volume a reference value, if it is, result set is stored in temp7, count more than etc.
In the accumulative and result set deposit temp8 that 5 critical values are minimum;Critical value can be read from temp6;
Compared successively with the aggregate-value in temp7 with critical value, if in the range of two adjacent critical value compositions,
Just it is grouped into corresponding rank.Result set inserts temp_band;
8:If single product click volume is all less than a reference value, and when accumulation result does not reach a reference value yet, then according to commodity
The size of click volume, according to sorting from high to low, A, B, C, D, E grade are then assigned respectively;Beyond sequence 5, F etc. is classified as
Level.Specific algorithm flow is shown in Fig. 5.
Have been completed that the band that commodity are classified under specific dimension is calculated (i.e. with 7 days for the cycle, according to one-level by the end of 8
The lower commodity click volume of classification carries out commodity classification), the flow that commodity classification calculates under remaining dimension, the step class with foregoing 1-8
Seemingly, base values, statistical dimension need to only be replaced.
In the step of foregoing 1-8,6,8 be the commodity classification numerical procedure under special case.Lower example can be specifically combined to carry out
Understand.Assuming that there is the commodity that first-level class is 1,2,3 respectively, in the cycle of 7 days, the goods number of each commodity is with clicking on figureofmerit
Value is, it is known that the click volume accumulation result of the lower commodity of each classification is 10000, then according to aforementioned basic value calculating method, this three
A reference value is 10000*20%=2000, and critical value is respectively:10000*20%*1=2000,10000*20%*2=
4000,10000*20%*3=6000,10000*20%*4=8000,10000*20%*5=10000.Wherein, 7 algorithm
Main thought is:
The commodity that then first-level class is 1 are according to the classification results of click volume:
The commodity that first-level class is 2 are according to the classification results of click volume:
The commodity that first-level class is 3 are according to the classification results of click volume:
The method of the classification of the commodity based on Distributed Data Warehouse according to embodiments of the present invention can be seen that by according to
The thought of data warehouse standardization modeling, data acquisition and integration are carried out using data hierarchy model and Tool for Data Warehouse, from
And the stability and scalability of data processing can be lifted;By using Distributed Data Warehouse, so as to realize difference
Under the conditions of commodity classification, multi-angle examine emphasis commodity summary of operations, lifting commodity classification performance, satisfaction adopt pin, storehouse
The routine work demand such as storage, finance, and the classification bore of the commodity in enterprise's difference form demand can be unified, reduce in enterprise
The waste of portion's development resources, be advantageous to the maintenance of commodity hierarchy system;In addition, by proposing that rationally comprehensive commodity classification calculates rule
Then, so as to lifted commodity classification result accuracy.
Fig. 6 is the main modular of the device of the commodity classification according to embodiments of the present invention based on Distributed Data Warehouse
Schematic diagram.
As shown in fig. 6, a kind of device 60 of commodity classification based on Distributed Data Warehouse of the embodiment of the present invention includes:
Abstraction module 601, module 602 and diversity module 603 are integrated, wherein, abstraction module 601, for basic data to be drawn into
In data warehouse, the basic data is related to commodity graded index in essential information and each operation system including commodity
Data;Module 602 is integrated, for the basic data to be associated by business major key, and according to classification demand addition dimension
Information, integration obtain the graded index data of commodity under specific dimension;Diversity module 603, for according to the graded index number
It is classified according to the commodity under specific dimension.
In the present apparatus 60, the data warehouse may also include buffered data layer, basic data layer, conventional data layer, dimension
Data Layer, aggregated data layer and interim storage layer, wherein, abstraction module 601 can also be used to basic data being drawn into data
The buffered data layer in warehouse, after data cleansing, preserve to basic data layer;Integrate module 602 can be additionally used in, by the base
Plinth data are associated by business major key, are preserved to conventional data layer;And dimensional information data are extracted to dimension data layer, and
The data of conventional data layer or basic data layer are combined with the dimensional information data of dimension data layer according to classification demand, integrated
Obtain the graded index data of commodity under specific dimension;And diversity module 603 can also be used to be classified in aggregated data layer
As a result caused ephemeral data is deposited to interim storage layer in storage, wherein calculating process.
In addition, diversity module 603 can be additionally used in:The commodity under specific dimension are divided according to the graded index data
Before level, using time triggered mechanism, commodity classification is judged whether to.
Wherein, diversity module 603 carries out classification according to the graded index data to the commodity under specific dimension to wrap
Include following steps:Step 1:A reference value is determined according to the index total amount of commodity, some critical values are determined according to a reference value, and will
Commodity sort from big to small according to desired value;Step 2:If the desired value that commodity be present is more than or equal to a reference value, to desired value
More than or equal to the commodity of a reference value, level is down assigned successively from the superlative degree, if being not present, directly carries out step 3;Step 3:Will
The desired value of surplus commodities is added up, and accumulation result and some critical values are contrasted, and is present between two critical values
Commodity are included into appropriate level, and the appropriate level is that the remaining rank after level is assigned by step 2
Diversity module 603 can be additionally used in:If total score series N, lowermost level is the commodity that desired value is zero, then index total amount
It is worth on the basis of N-1/1, a reference value is multiplied by 1 to N-1 and obtains N-1 critical value.In addition, before step 2, if index be present
The commodity that value is zero, then be directly classified to lowermost level.
Diversity module 603 can be additionally used in:In the step 3, if the accumulation result of the desired value of surplus commodities is less than benchmark
Value, then by surplus commodities according to order from big to small, level is down assigned successively from the superlative degree, wherein, if the quantity of surplus commodities
More than total score series, then point to after time rudimentary, surplus commodities are all included into lowermost level.
Fig. 7 is the major part of the system of the commodity classification according to embodiments of the present invention based on Distributed Data Warehouse
Schematic diagram.
As shown in fig. 7, a kind of system 70 of commodity classification based on Distributed Data Warehouse of the embodiment of the present invention includes:
Memory 701 and processor 702, wherein:The store instruction of memory 701;Processor 702 is configured as being performed according to the instruction
Following steps:Basic data is drawn into data warehouse, the basic data includes the essential information of commodity and each business
The data related to commodity graded index in system;The basic data is associated by business major key, and according to classification need
Addition dimensional information is sought, integration obtains the graded index data of commodity under specific dimension;According to the graded index data to spy
The commodity determined under dimension are classified.
In the embodiment of the present invention, processor 702 divides the commodity under specific dimension according to the graded index data
When level, it can also be configured to perform the following steps:Step 1:A reference value is determined according to the index total amount of commodity,
Some critical values are determined according to a reference value, and commodity are sorted from big to small according to desired value;Step 2:If the finger of commodity be present
Scale value is more than or equal to a reference value, then is more than or equal to the commodity of a reference value to desired value, level is down assigned successively from the superlative degree, if not depositing
Then directly carrying out step 3;Step 3:The desired value of surplus commodities is added up, and if by accumulation result and dry criticality
Value contrast, the commodity being present between two critical values are included into appropriate level, and appropriate level is after assigning level by step 2 herein
Remaining rank.
From the above, it can be seen that the thought by standardizing modeling according to data warehouse, utilizes data hierarchy model
Data acquisition and integration are carried out with Tool for Data Warehouse, so as to lift the stability of data processing and scalability;Pass through
Using Distributed Data Warehouse, so as to realize that the commodity under different condition are classified, multi-angle examines the operation of emphasis commodity
Overview, the performance of lifting commodity classification, satisfaction adopts the routine work demand such as pin, storage, finance, and can unify enterprise's difference
Commodity classification bore in form demand, reduces the waste of enterprises development resources, is advantageous to the maintenance of commodity hierarchy system;This
Outside, by proposing that rationally comprehensive commodity are classified computation rule, so as to lift the accuracy of the result of commodity classification.
Above-mentioned embodiment, does not form limiting the scope of the invention.Those skilled in the art should be bright
It is white, depending on design requirement and other factors, various modifications, combination, sub-portfolio and replacement can occur.It is any
Modifications, equivalent substitutions and improvements made within the spirit and principles in the present invention etc., should be included in the scope of the present invention
Within.
Claims (15)
- A kind of 1. method of the commodity classification based on Distributed Data Warehouse, it is characterised in that including:Basic data is drawn into data warehouse, the basic data is included in essential information and each operation system of commodity The data related to commodity graded index;The basic data is associated by business major key, and according to classification demand addition dimensional information, integration obtains specific The graded index data of commodity under dimension;The commodity under specific dimension are classified according to the graded index data.
- 2. according to the method for claim 1, it is characterised in that in methods described:The data warehouse includes buffered data Layer, basic data layer, conventional data layer, dimension data layer, aggregated data layer and interim storage layer, wherein,Basic data is drawn into the buffered data layer of data warehouse, after data cleansing, preserved to basic data layer;The basic data is associated by business major key, preserved to conventional data layer;Dimensional information data are extracted to dimension data layer, and according to classification demand by the data of conventional data layer or basic data layer Combined with the dimensional information data of dimension data layer, obtain the graded index data of commodity under specific dimension;AndThe storage of classification results is carried out in aggregated data layer, wherein caused ephemeral data is deposited to interim storage in calculating process Layer.
- 3. according to the method for claim 1, it is characterised in that methods described also includes:According to the graded index data Before being classified to the commodity under specific dimension, using time triggered mechanism, commodity classification is judged whether to.
- 4. according to the method for claim 1, it is characterised in that according to the graded index data to the business under specific dimension Product, which carry out classification, to be included:Step 1:A reference value is determined according to the index total amount of commodity, some critical values are determined according to a reference value, and by commodity according to Desired value sorts from big to small;Step 2:If the desired value that commodity be present is more than or equal to a reference value, the commodity of a reference value are more than or equal to desired value, from The superlative degree down assigns level successively, if being not present, directly carries out step 3;Step 3:The desired value of surplus commodities is added up, and accumulation result and some critical values are contrasted, is present in two Commodity between critical value are included into appropriate level, and the appropriate level is that the remaining rank after level is assigned by step 2.
- 5. according to the method for claim 4, it is characterised in that a reference value is determined according to the index total amount of commodity, according to base Quasi- value determines that some critical values include:If total score series N, lowermost level is the commodity that desired value is zero, then the N-1 of index total amount points 1 on the basis of be worth, a reference value is multiplied by 1 to N-1 and obtains N-1 critical value.
- 6. according to the method for claim 4, it is characterised in that methods described also includes:Before step 2, if in the presence of referring to The commodity that scale value is zero, then be directly classified to lowermost level.
- 7. according to the method for claim 4, it is characterised in that methods described also includes:In the step 3, if remaining business The accumulation result of the desired value of product is less than a reference value, then by surplus commodities according to order from big to small, from it is highest down according to Secondary tax level, wherein, if the quantity of surplus commodities is more than total score series, point to after time rudimentary, surplus commodities are all included into Lowermost level.
- A kind of 8. device of the commodity classification based on Distributed Data Warehouse, it is characterised in that including:Abstraction module, for basic data to be drawn into data warehouse, the essential information of the basic data including commodity with And data related to commodity graded index in each operation system;Module is integrated, dimensional information is added for the basic data to be associated by business major key, and according to classification demand, Integration obtains the graded index data of commodity under specific dimension;Diversity module, for being classified according to the graded index data to the commodity under specific dimension.
- 9. device according to claim 8, it is characterised in that the data warehouse includes buffered data layer, basic data Layer, conventional data layer, dimension data layer, aggregated data layer and interim storage layer, wherein,Abstraction module is additionally operable to basic data being drawn into the buffered data layer of data warehouse, after data cleansing, preserves extremely Basic data layer;Integrate module to be additionally operable to, the basic data is associated by business major key, preserved to conventional data layer;And extract Dimensional information data are to dimension data layer, and according to classification demand by the data and number of dimensions of conventional data layer or basic data layer Combined according to the dimensional information data of layer, integration obtains the graded index data of commodity under specific dimension;AndDiversity module is additionally operable to carry out the storage of classification results, caused ephemeral data wherein in calculating process in aggregated data layer Deposit to interim storage layer.
- 10. device according to claim 8, it is characterised in that the diversity module is additionally operable to:According to the graded index Before data are classified to the commodity under specific dimension, using time triggered mechanism, commodity classification is judged whether to.
- 11. device according to claim 8, it is characterised in that the diversity module is additionally operable to:Step 1:A reference value is determined according to the index total amount of commodity, some critical values are determined according to a reference value, and by commodity according to Desired value sorts from big to small;Step 2:If the desired value that commodity be present is more than or equal to a reference value, the commodity of a reference value are more than or equal to desired value, from The superlative degree down assigns level successively, if being not present, directly carries out step 3;Step 3:The desired value of surplus commodities is added up, and accumulation result and some critical values are contrasted, is present in two Commodity between critical value are included into appropriate level, and the appropriate level is that the remaining rank after level is assigned by step 2.
- 12. device according to claim 11, it is characterised in that the diversity module is additionally operable to:If total score series N, most Rudimentary is the commodity that desired value is zero, then is worth on the basis of N-1/1 of index total amount, and a reference value is multiplied by 1 to N-1 and obtains N-1 Critical value.
- 13. device according to claim 11, it is characterised in that the diversity module is additionally operable to:Before step 2, if The commodity that desired value is zero be present, be then directly classified to lowermost level.
- 14. device according to claim 11, it is characterised in that the diversity module is additionally operable to:In the step 3, if The accumulation result of the desired value of surplus commodities is less than a reference value, then by surplus commodities according to order from big to small, from the superlative degree Level is down assigned successively, wherein, it is point to after time rudimentary, surplus commodities are complete if the quantity of surplus commodities is more than total score series It is included into lowermost level in portion.
- A kind of 15. system of the commodity classification based on Distributed Data Warehouse, it is characterised in that including:Memory and processor, wherein:The memory store instruction;The processor is configured as performing following steps according to the instruction:Basic data is drawn into data warehouse, the basic data is included in essential information and each operation system of commodity The data related to commodity graded index;The basic data is associated by business major key, and according to classification demand addition dimensional information, integration obtains specific The graded index data of commodity under dimension;The commodity under specific dimension are classified according to the graded index data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610637689.8A CN107688580B (en) | 2016-08-05 | 2016-08-05 | The method, apparatus and system of commodity classification based on Distributed Data Warehouse |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610637689.8A CN107688580B (en) | 2016-08-05 | 2016-08-05 | The method, apparatus and system of commodity classification based on Distributed Data Warehouse |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107688580A true CN107688580A (en) | 2018-02-13 |
CN107688580B CN107688580B (en) | 2019-03-01 |
Family
ID=61151195
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610637689.8A Active CN107688580B (en) | 2016-08-05 | 2016-08-05 | The method, apparatus and system of commodity classification based on Distributed Data Warehouse |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107688580B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108564443A (en) * | 2018-04-13 | 2018-09-21 | 广东星外星文化传播有限公司 | Commodity ranking method and device |
CN109360026A (en) * | 2018-10-26 | 2019-02-19 | 上海新增鼎数据科技有限公司 | A kind of co-ordination of supply and marketing determination method, device, system, computer equipment and storage medium |
CN110209668A (en) * | 2019-04-29 | 2019-09-06 | 苏宁云计算有限公司 | Dimension table correlating method, device, equipment and readable storage medium storing program for executing based on stream calculation |
CN110941601A (en) * | 2019-11-12 | 2020-03-31 | 北京三快在线科技有限公司 | Method and device for determining standard caliber of index, electronic equipment and storage medium |
CN111125088A (en) * | 2018-10-31 | 2020-05-08 | 北京国双科技有限公司 | Multi-level data processing method and device |
CN111897963A (en) * | 2020-08-06 | 2020-11-06 | 沈鑫 | Commodity classification method based on text information and machine learning |
CN112015737A (en) * | 2020-08-24 | 2020-12-01 | 华智众创(北京)投资管理有限责任公司 | Patent data processing method and device, computing equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101751614A (en) * | 2008-11-27 | 2010-06-23 | 财团法人工业技术研究院 | Method for forecasting customer flow grade and air conditioner temperature control method applying same |
WO2010101540A1 (en) * | 2009-03-02 | 2010-09-10 | Panchenko Borys Evgenijovich | Method for the fully modifiable framework distribution of data in a data warehouse taking account of the preliminary etymological separation of said data |
CN104866559A (en) * | 2015-05-18 | 2015-08-26 | 北京京东尚科信息技术有限公司 | Method, apparatus and system for collecting data from data warehouse |
CN105718565A (en) * | 2016-01-20 | 2016-06-29 | 北京京东尚科信息技术有限公司 | Data warehouse model construction method and construction apparatus |
-
2016
- 2016-08-05 CN CN201610637689.8A patent/CN107688580B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101751614A (en) * | 2008-11-27 | 2010-06-23 | 财团法人工业技术研究院 | Method for forecasting customer flow grade and air conditioner temperature control method applying same |
WO2010101540A1 (en) * | 2009-03-02 | 2010-09-10 | Panchenko Borys Evgenijovich | Method for the fully modifiable framework distribution of data in a data warehouse taking account of the preliminary etymological separation of said data |
CN104866559A (en) * | 2015-05-18 | 2015-08-26 | 北京京东尚科信息技术有限公司 | Method, apparatus and system for collecting data from data warehouse |
CN105718565A (en) * | 2016-01-20 | 2016-06-29 | 北京京东尚科信息技术有限公司 | Data warehouse model construction method and construction apparatus |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108564443A (en) * | 2018-04-13 | 2018-09-21 | 广东星外星文化传播有限公司 | Commodity ranking method and device |
CN109360026A (en) * | 2018-10-26 | 2019-02-19 | 上海新增鼎数据科技有限公司 | A kind of co-ordination of supply and marketing determination method, device, system, computer equipment and storage medium |
CN111125088A (en) * | 2018-10-31 | 2020-05-08 | 北京国双科技有限公司 | Multi-level data processing method and device |
CN111125088B (en) * | 2018-10-31 | 2023-08-25 | 北京国双科技有限公司 | Multi-level data processing method and device |
CN110209668A (en) * | 2019-04-29 | 2019-09-06 | 苏宁云计算有限公司 | Dimension table correlating method, device, equipment and readable storage medium storing program for executing based on stream calculation |
CN110941601A (en) * | 2019-11-12 | 2020-03-31 | 北京三快在线科技有限公司 | Method and device for determining standard caliber of index, electronic equipment and storage medium |
CN110941601B (en) * | 2019-11-12 | 2023-05-30 | 北京三快在线科技有限公司 | Method and device for determining standard caliber of index, electronic equipment and storage medium |
CN111897963A (en) * | 2020-08-06 | 2020-11-06 | 沈鑫 | Commodity classification method based on text information and machine learning |
CN112015737A (en) * | 2020-08-24 | 2020-12-01 | 华智众创(北京)投资管理有限责任公司 | Patent data processing method and device, computing equipment and storage medium |
CN112015737B (en) * | 2020-08-24 | 2021-03-30 | 华智众创(北京)投资管理有限责任公司 | Patent data processing method and device, computing equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN107688580B (en) | 2019-03-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107688580B (en) | The method, apparatus and system of commodity classification based on Distributed Data Warehouse | |
Arif et al. | Applications of goods mutation control form in accounting information system: A case study in sumber indah perkasa manufacturing, Indonesia | |
CN104778540B (en) | A kind of equipment for building materiaIs manufacturing BOM management method and management system | |
KR100712711B1 (en) | Sales Prediction Using Client Value Represented by Three Index Axes as Criterion | |
US7007020B1 (en) | Distributed OLAP-based association rule generation method and system | |
Berry et al. | Critical ratio scheduling: an experimental analysis | |
Kandeil et al. | A two-phase clustering analysis for B2B customer segmentation | |
CN108256898A (en) | A kind of product Method for Sales Forecast method, system and storage medium | |
US8583408B2 (en) | Standardized modeling suite | |
US8296182B2 (en) | Computer-implemented marketing optimization systems and methods | |
CN106529869A (en) | Material inventory item dynamic characteristic analysis platform and analysis method thereof | |
US20100138275A1 (en) | Automatic event shifting of demand patterns using multi-variable regression | |
Banaszewska et al. | A framework for measuring efficiency levels—The case of express depots | |
CN102609875A (en) | Method and system for concurrently and dynamically processing bank data | |
JP2016206878A (en) | Salesperson raising support system and salesperson raising support method | |
CN106570573A (en) | Parcel attribute information prediction method and device | |
CN110751441A (en) | Method and device for optimizing storage position in logistics storage system | |
CN111078766A (en) | Data warehouse model construction system and method based on multidimensional theory | |
JP2006235879A (en) | Sales planning support system, sales planning support method and sales planning support program | |
Matusiak et al. | Data-Driven warehouse optimization: deploying skills of order pickers | |
CN116308494A (en) | Supply chain demand prediction method | |
Hadj Salem et al. | Cutting and packing problems under uncertainty: literature review and classification framework | |
RU2480828C1 (en) | Method of predicting target value of events based on unlimited number of characteristics | |
CN107944795A (en) | The method and system of fresh sorting report thing generation order out of stock | |
US7885851B2 (en) | Retailer optimization using market segmentation top quintile process |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |