CN102385719A - Regression prediction method and device - Google Patents

Regression prediction method and device Download PDF

Info

Publication number
CN102385719A
CN102385719A CN2011103392241A CN201110339224A CN102385719A CN 102385719 A CN102385719 A CN 102385719A CN 2011103392241 A CN2011103392241 A CN 2011103392241A CN 201110339224 A CN201110339224 A CN 201110339224A CN 102385719 A CN102385719 A CN 102385719A
Authority
CN
China
Prior art keywords
data point
data
predicted
dimension
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011103392241A
Other languages
Chinese (zh)
Inventor
李锐
张帅
王斌
李鹏
张冠元
鲁凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN2011103392241A priority Critical patent/CN102385719A/en
Publication of CN102385719A publication Critical patent/CN102385719A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention provides a regression prediction method, wherein not only similarity between independent variables X is taken into consideration, but also similarity between dependent variables Y of raw data is taken into consideration, and the model of output value y development based on the historical angle of close neighbors. Compared with the conventional model without taking data development mode into consideration, only one preprocessing section is added to a data set, and the information of the data point can be diversified without the need of extra resource; and the information of the raw data point X is diversified, and finally the prediction effect is improved. Furthermore, the regression prediction method can be realized on a MapReduce frame, and the execution speed can be improved by utilizing the parallelism of the device.

Description

Regression forecasting method and device
Technical field
The invention belongs to statistical regression analysis and prediction, relate in particular to the regression forecasting method and the device that are used for statistical machine study.
Background technology
Regretional analysis (Regression Analysis) is the method for on a kind of statistics data being analyzed, and mainly is to hope to inquire between the data whether a kind of particular kind of relationship is arranged.Regretional analysis is the model of setting up dependent variable Y (response variables) or claiming Dependent variable, (dependent variables) and independent variable X (predictors) or claim to concern between only variable (independent variables).In statistical machine study, the regression forecasting method is mainly used in data is given a forecast and analyze.Wherein X generally is the data of multidimensional and Y generally is the numeric type data, is called multiple regression.Can be divided into linear regression, non-linear regression etc. again according to regression equation.The most basic linear regression formula: Y=β X+ β 0
Existing regression forecasting method exists following two problems: at first; Because the disappearance of data or do not do feature selecting, making that raw data points itself possibly not comprise enough information sometimes comes regression forecasting (this problem can abbreviate the characteristic disappearance as) is carried out in output; Secondly; Because the data on each dimension of data point X possibly not be numeric type; It possibly not satisfy the Changing Pattern and the variation range of numerical value, like the angle of preiodic type, and the sex of Boolean type etc.; The colors of enumeration type etc., this has influenced the effect and the accuracy of predicting (this problem can abbreviate the characteristic isomery as) that return to a certain extent.In order to overcome the above problems, existing method all is that the dependence experience comes characteristic is carried out simple format conversion, does not have standardization and extendibility.When data set takes place need change format conversion method when changing slightly.Therefore can not well solve the problem of characteristic disappearance and characteristic isomery.
In addition, along with the development of cloud computing technology, MapReduce, Hadoop etc. the platform of MPP data have appearred being used to carry out, for example.There is scholar's research on these platforms, to realize the regression forecasting method again, utilizes the concurrency of these cloud computing platforms to improve the performance of regression forecasting with expectation.For example; Local linear weighted regression LWLR (locally weighted linear regression) based on MapReduce; It is according to the data point to be predicted of new input; Dynamically in former data set, find some neighbours, do local linear regression with neighbour's data and draw anticipation function, that is to say and all need do neighbor searching and regression forecasting each data point to be predicted.At first, find the neighbour of data point to be predicted according to the similarity (also can be called distance) of independent variable; Carry out curve fitting according to the neighbour then, draw anticipation function; Through anticipation function the output valve of tested point is made prediction at last.
The benefit of LWLR is to be convenient to walk abreast, and is to give a forecast according to neighbour's data, has considered the relation between the independent variable, can improve the accuracy rate of prediction to a certain extent.But it is owing to skipped the stage to matrix inversion, therefore can't consider between the dependent variable Y of former data point X and former data point X and data point x to be predicted NewOutput y NewBetween relation.That is to say that the neighbour for data point to be predicted is not easy to look for accurately, and whether the neighbour's accurately there is decisive influence to the prediction result quality.In addition, this method does not solve the problem of characteristic disappearance and characteristic isomery yet.
Summary of the invention
Therefore, the objective of the invention is to overcome the defective of above-mentioned prior art, a kind of characteristic extending method of regression forecasting is provided, utilize the corresponding predicted value (y) of former data (X) to enrich the effect of the information of data point with the lifting regression forecasting.
The objective of the invention is to realize through following technical scheme:
On the one hand, the invention provides a kind of characteristic extending method YET (Y axis ExTension) that is used for regression forecasting, said method comprises:
In former data point, select the neighbours of data point to be predicted, said neighbours equate or similar a series of former data point with the value of data point to be predicted on certain dimension or certain several dimension;
Utilize these neighbours and corresponding dependent variable value thereof to come the dimension of former data point and data point to be predicted is expanded.
Another aspect provides a kind of characteristic extending method based on MapReduce, and said method comprises:
Step 1) is selected the neighbours of data point to be predicted in former data point, said neighbours equate or similar a series of former data point with the value of data point to be predicted on certain dimension or certain several dimension;
Step 2) each former data point is distributed into D 2-D 1+ 1 part, D wherein 2Be the dimension after the former data point expansion, D 1For former data point expands preceding dimension; Every piece of data be (key, value), wherein; Key is the sign of the data point that need to receive this piece of data, and value is included in the sequence number and the corresponding dependent variable value of former data point of sending this piece of data of the dimension that the data point that receives this piece of data will expand;
Each former data point of step 3) is extracted the sequence number and the dependent variable value of the dimension that comprises among the value and is come the dimension of self is expanded based on the data that received.
Another aspect provides a kind of regression forecasting method, and said method comprises:
Step a) utilizes above-mentioned characteristic extending method that the dimension of each former data point X is expanded, the data point after obtaining expanding;
Step b) is treated the predicted data point based on the data point after expanding and is carried out regression forecasting.
Another aspect provides a kind of regression forecasting method based on MapReduce, and this method comprises:
Step 41) utilize above-mentioned characteristic extending method that the dimension of each former data point X is expanded, the data point after obtaining expanding;
Step 42) based on the data point after expanding; Treat that the predicted data point carries out that similarity is calculated and distributing data to (key, value), wherein; Key is the sign of data point to be predicted, value for the sign of the data point after expanding and with the similarity of data point to be predicted;
Step 43) based on the similarity of being calculated, select the most close with data point to be predicted K the data point after expanding, utilize local linear weighted regression method to treat the predicted data point and carry out regression forecasting.
In the above-mentioned regression forecasting method, said step 42) adopt KL distance, cosine distance or Euclidean distance to calculate similarity for the dimension after the different expansions in.
Another aspect provides a kind of regression forecasting device based on MapReduce, and said device comprises:
Be used to utilize above-mentioned characteristic extending method the dimension of each former data point X to be expanded the device of the data point after obtaining expanding;
Be used for based on the data point after expanding; Treat that the predicted data point carries out that similarity is calculated and distributing data to (key, device value), wherein; Key is the sign of data point to be predicted, value for the sign of the data point after expanding and with the similarity of data point to be predicted;
Be used for based on the similarity of being calculated, select the most close with data point to be predicted K the data point after expanding, utilize local linear weighted regression method to treat the device that the predicted data point carries out regression forecasting.
Another aspect provides a kind of supervision machine learning method that has, and said method comprises:
1) feature extraction of training data and dimension yojan, (x1 is x2....) with the form of label y to form data point X;
2) utilize above-mentioned characteristic extending method that data point X is expanded;
3) select to predict the model formation of y, confirm model parameter type and number of parameters and on the basis of training set, train by the data point after expanding;
4) utilize model and the parameter that trains to be used in regression forecasting or the classification, finally obtain regression forecasting result or classification results.
In the above-mentioned machine learning method, step 3) predicts that by X the model formation of y is a regressive prediction model; Said step 4) utilizes above-mentioned regression forecasting method to predict, and is predicted the outcome.
Above-mentioned machine learning method can be used to carry out weather forecast, disease forecasting, user's buying behavior prediction, music recommend, network friend recommendation, and books are recommended, the decision of a game prediction, information retrieval, spam classification, news importance degree prediction etc.
Compared with prior art, the invention has the advantages that:
Not only consider similarity between the independent variable X, also considered the similarity between the dependent variable Y in the former data, considered output valve y pattern of development from neighbour and neighbour's viewpoint of history.
Compare the model of in the past not considering the data development model, the present invention has only increased a pretreated stage on data set, does not need extra resource just can enrich the information of data point; On execution speed, the time complexity that this pre-service increased is the required N/M of scan-data, and wherein N is the data point number, and M is the number of the Mapper of MapReduce.On treatment effect, enriched the information of former data point X, and finally improved prediction effect.
Description of drawings
Followingly the embodiment of the invention is described further with reference to accompanying drawing, wherein:
Fig. 1 is the schematic flow sheet according to the regression forecasting method of the embodiment of the invention;
Fig. 2 is the structural representation according to the regression forecasting device of the embodiment of the invention;
Fig. 3 returns and uses the effect contrast figure of the regression forecasting of the embodiment of the invention for conventional linear.
Embodiment
In order to make the object of the invention, technical scheme and advantage are clearer, pass through specific embodiment to further explain of the present invention below in conjunction with accompanying drawing.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.
In order to understand the present invention better, at first introduce some background technology knowledge.
MapReduce (Jeffrey Dean Sanjay Ghemawat.MapReduce:a flexible data processing tool [J] .Communications of the ACM; January 2010; V.53 n.1.) be the parallel framework (cloud computing framework) of a large-scale data of google proposition in recent years; Also be a kind of programming model and standard that large-scale data is handled that be used for, good bottom encapsulation is provided, conveniently write concurrent program.MapReduce has adopted the thought of dividing and rule; The processing stage that citation form having two of map (mapping) and reduce (yojan); The large-scale data Processing tasks is divided into a lot of subtasks, and several distributed machines are distributed in the subtask walks abreast and accomplish batch processing job.Wherein the map stage is to convert original input (generally be that key/value is right, i.e. key/value to) to intermediate result; The reduce stage then will before the intermediate result that produces merge ordering and output.Whole framework helps the user to accomplish a lot of thorny work, has solved some and has cut apart such as data, scheduling; The colocated of data and code, process synchronous communication, fault-tolerant and crash handling; Problems such as load balancing, and make that these functions are transparent to the developer.Therefore, the developer only need realize interfaces such as map and reduce, need not pay close attention to the problem of first floor system level, just can accomplish the exploitation of concurrent program on the distributed type assemblies easily.
Can realize the Return of Tradition Forecasting Methodology with MapReduce.Find the solution but in the Return of Tradition Forecasting Methodology, need matrix inversion or gradient to descend, if realize block parallel and calculate that every blocks of data will be finished all needs the information of the overall situation, gradient decline also is like this to the computing of matrix inversion.Yet the shortcoming of MapReduce framework itself is: global information be not easy to share and disk random access efficient low.Therefore, this traditional regression forecasting software can not utilize the concurrency of MapReduce framework to improve performance well.
Local linear weighted regression LWLR based on MapReduce gives a forecast according to neighbour's data, has skipped the stage to matrix inversion, therefore can utilize the concurrency of MapReduce.But as mentioned above, its exist neighbour for new data point be not easy to look for problem accurately, and do not solve the problem of characteristic disappearance or characteristic isomery.The basic step of LWLR is: regularization of data layout at first, confirmed that independent variable X (generally is a multidimensional, therefore with capitalization X; And each dimension of X also can be called attribute or row;) and dependent variable (generally be the predicted value of one dimension, therefore use small letter y) y, the form of every data generally is (x 1 (i), x 2 (i), x j (i)... x n (i), y (i)), wherein subscript j ∈ [1, n] represents every Column Properties, the numbering of the former data point of subscript i ∈ [1, m] representative, and former data just are expressed as the large matrix of a m* (n+1).Receiving new data X then New(x New, 1, x New, 2, x New, 3... X New, n) after, calculate X NewWith the Euclidean distance of each former data point X as similarity, then choose the most close K former data point, from Top K point, train regression model h (θ), the regression model h (θ) that last basis trains predicts dependent variable y.(C.Chu,S.Kim,Y.A.Lin,etc.Map-reduce?for?machine?learning?on?multicore[C]//NIPS?19,2007.)
According to one embodiment of present invention, a kind of characteristic extending method that is used for regression forecasting is provided, this characteristic extending method has not only been considered between the former data point (independent variable X) relation, but also has considered to concern between the dependent variable Y of former data point.Through each attribute of X being reconfigured and expand, enrich the characteristic of former data point and testing data point with the dependent variable value y of " neighbours " of former data point.
Below be convenient explanation, independent variable is designated as X (X 1, X 2, X 3...), expand the back independent variable and be designated as X +, the dependent variable that independent variable is corresponding is designated as Y (y 1, y 2, y 3...).Data point to be predicted is designated as X New, predicting the outcome is output as y New
More specifically, this method may further comprise the steps:
Step 1 is in former data point " neighbours " of selection data point to be predicted.
The dependent variable y that is somebody's turn to do " neighbours " correspondence is used for following step 2 and expands new feature.The definition of " neighbours " in the present embodiment: with data point X to be predicted NewValue on certain dimension or certain several dimension equates or similar a series of former data point.
To data point X to be predicted NewEach the dimension X i, utilize domain knowledge and experience, also can combine existing mining mode method such as Apriori, GSP, Prefixspan etc. find the former data (X of part that needs I1, X I2, X I3...) as " neighbours ", these neighbours can be used as the off-line knowledge of background.
Illustrate: utilize domain knowledge; For example according to each certain product price y of attribute X prediction of certain product, wherein a certain row of X " place of production " comprise each country name, but think on the experience that bigger zone is reasonable characteristic; Like European crudely-made articles; Meetings such as Asia crudely-made articles have bigger influence to the result, therefore can these row be all European former data and be regarded as " neighbours ", and the y that uses them is as expansion.
Again for example; Utilize mode excavation; Like a simple method,, analyze which characteristic useful (like bigger parameter characteristic of correspondence) according to the regression equation that training before obtains; Which characteristic should be useful but do not play corresponding effect (have effect intuitively like floor space x with respect to price y, but parameter is less).The utilization factor of this category feature is relatively more not enough, needs to expand.
Judge according to user preference X whether it likes certain commodity y for another example, statistical information is found " whether liking shopping online ", and whether " often sleeping evening " has very strong incidence relation.Then can find the identical or identical conduct " neighbours " of row of two row, the y that uses them also can remedy " neighbours " not enough shortcoming as expansion.As known off-line knowledge, then can do expansion with similar above information to new data set and other similar data sets.
Step 2 utilizes these neighbours and corresponding dependent variable y thereof to come the former data point (independent variable X) and the dimension of testing data point are expanded.Can expand one or more dimensions, the independent variable that obtains after X is expanded is designated as X +For the number of the dimension that expands can be according to the actual requirements, data set size and the algorithm complex that can bear confirm.For expanding which dimension, can be according to domain knowledge, experience is arranged, mode excavation, user preference, user's request or the like are confirmed.
Come two above-mentioned steps are explained in more detail below in conjunction with instantiation.
For example, the sales volume of some product of certain unit is given a forecast, more existing former data, concrete sample data as shown in table 1, former data are the data before in October, 2011, data to be predicted are 108002.Wherein, the dimension of independent variable X includes: the supply of starting material A, the supply of starting material B, month, input number, product type and product colour, totally six row (or six kinds attribute or dimension); Output valve Y: the sales volume of product.
Table 1
Figure BDA0000104341900000071
At first; Can be according to some domain knowledges or experience; Expand from the time angle, as having on the experience: one, the sales volume difference can be very not big in half a year for product, therefore adjacent month sales volume have correlativity; Two, this product has the busy season in dull season, so the sales volume in same January of different year also has certain correlativity.Concerning 106864, in " month " dimension, last month 106862, certain data of the month before last 106863 and " 2011.1 " all are its " neighbours ", and the corresponding y of neighbours is extended on the corresponding dimension.Therefore go up in " month " and expand: on each X, increase sales volume (X last month 7), the month before last sales volume (X 8), last year chain rate sales volume (X 9).This can be called " history of historical data " on time dimension, because treat predicted data point X New(108002), " neighbours " of itself comprise data 106864,106865 etc., also can its history 106863 corresponding historical sales volume y=334 of " neighbours " (historical data) 106864 be extended for 106864 the 7th dimension X 7That is to say, predict 108002, used the historical sales volume y (106863 sales volumes 334) of its historical data 106864.
From the product design angle, there is certain correlativity in the sales volume between the homologous series model, therefore can trigger from the angle of model, goes up in " model " and expands, and promptly on X, increases homologous series product sales volumes (10).
Also can be according to the certain methods such as the Apriori of mining mode, GSP, Prefixspan etc. find A supply X 1, B supply X 2Proportioning and the interval of sales volume y concern with certain, like X 1/ X 2=0.7, and the frequent appearance of y ∈ [298,335].Therefore, can use the close former data of AB ratio, all be about 0.5, so 106861 and 106862 " neighbours " each other like the A/B of data 106861 in the table and 106862 as " neighbours ".
Secondly, according to selected neighbours and corresponding y value thereof, come the dimension of each former data point X is expanded, independent variable that obtains after the expansion and corresponding dependent variable are designated as (X +, y).
In one embodiment, use MapReduce to realize the step that expands:
The step 21:Map stage: each former data point is distributed into D 2-D 1+ 1 part, D wherein 2Be the X after expanding +Dimension, D 1Dimension for X before expanding.Every piece of data be (key, value), wherein, key (key) is used for identifying the former data point that needs to receive this piece of data, can use the id as take over party's data point to be used as key; And value (value) for the subscript of the columns (or dimension) that will write (which dimension for example, as preceding text 7,8,9 etc.) and send the corresponding y value of former data point of this piece of data.
The step 22:Reduce stage:, put data output in order by columns (dimension) and y value with the information of collecting.
Be that example is explained above-mentioned expansion step still with the sample data shown in the table 1:
To each former data point x, the dimension that expands as required comes distributing data right, and for example, if expand k dimension, it is right then will to distribute k data for each former data point x.Handle earlier and month expand dimension,, generate sign key and its distribution (step 21) to each data point x.With data 106863 is example, divide the data layout send for (key, value), i.e. (106864, (7,334)), wherein, key be the key of corresponding next month, then be (7,334) integrate with y=334 on the 7th hurdle of data after the expansion in order to expression to value.Because this month 2010.12, is with respect to being its " history " next month 2011.1, " sales volume last month " of corresponding front, just the 7th hurdle.Then be " sales volume the month before last " for another example concerning 2011.2, just corresponding the 8th hurdle, key and the value (8,334) i.e. that therefore sends month after next be (106864, (8,334)).
Be example with 106862 again: according to the analysis of front, 106862 y should be extended for 106864 the 8th row X 8, can use the id of take over party's sample point.Columns (dimension) subscript and the corresponding y value of every part value for writing.So to 106862 this be output as key=106864, value=(8,325).Specifically distribute and collect details and then be responsible for processing by the framework of MapReduce, this method only need send out above-mentioned data and get final product.
Regenerate new expanding data point (step 22), receive, be integrated into new data,, will receive value and be (7,334) like data key=106864 with the data of key and according to row before, (8,325), (9,?) three records.Therefore the 7th of expanding data the row dimension 334, the eight classifies 325 etc. as.Wherein 2010.1 sales volume y represented in question mark; After accomplishing like this, expanding data will increase three-dimensional (X 7=334, X 8=325, X 9=?)
Expansion to " model " hurdle (extends to X 10), adopting similar method, Map and Reduce stage are carried out in the conduct that model is identical " neighbours " again, accomplish (X 10) expansion on hurdle.
After the completion, former 106864 in this example through the expanding data point after expansion be (33,38,2011.1,120, AF002, red indigo plant, 334,325,?, 371), extend to 10 dimensions from 6 original dimensions;
Above-mentioned extending method is used for the problem that regression forecasting can solve traditional regression forecasting characteristic disappearance and characteristic isomery.For example; Problem for the characteristic isomery; Classic method generally is all to convert nonumeric characteristic into numerical value, and as Wednesday and Friday being converted into 3 and 5, variation has taken place for data after the conversion and former data: 5 greater than 3; Therefore Friday but can not be simply greater than Wednesday, and Friday, the output (y) of product can not simply be greater than or less than the output (y) of Wednesday; Also have certain methods to transfer it to enumeration type and come comparison, Saturday, the similarity with Saturday was 1, and all be 0 with other similarities Saturday, and this has also lacked some information to a certain extent, as possibly also having some similarities Sunday Saturday; There is method can be experimental to give Saturday and similarity on Sunday again, but the not theory support of abundance.And the characteristic extending method of the embodiment of the invention, explicitly is the expansion of y as characteristic, for example, can be with the sales volume y of Friday as augmented features, the dimension X of so new expansion is that the data through y obtain, so all be numerical characteristics.Because there is not the problem of characteristic isomery in predicted value y, so the characteristic that expands can effectively new feature as regression forecasting.This will improve the quality of characteristic greatly, the final effect that improves prediction.Like the month item " 2010.11 " in table 1, " 2011.1 " are different data types, and being translated into numerical characteristics is individual very stubborn problem, adopt y of their " neighbours " correspondences then to solve this problem cleverly as characteristic.
Should point out that characteristic extending method mentioned above can also be applied on the method for a lot of supervised learnings.Because characteristic all plays important effect having in supervision and semi-supervised (former data have y) method of a lot of machine learning, like classification.It is generally acknowledged that different sorting algorithms is influential to the result, but parameter is optimum the time, this influence is not maximum.But the quality of characteristic is bigger to result's influence.Therefore, if in classification, also come feature-rich, also can reach the purpose that improves effect at last with the characteristic extending method in the embodiment of the invention.
According to still another embodiment of the invention, a kind of regression forecasting method based on above-mentioned characteristic extending method is provided also.This regression forecasting method is at first selected " neighbours " of data point to be predicted in former data point; According to selected neighbours and corresponding y value thereof, come the dimension of each former data point X is expanded then, obtain X +Then, based on X +Treat predicted data point X NewCarry out regression forecasting and obtain predicted value y New
Can know from preceding text, can realize with the MapReduce framework, therefore, in one embodiment, can use MapReduc to realize above-mentioned regression forecasting method, wherein based on X according to the characteristic extending method of the embodiment of the invention +Adopt LWLR to come to predicted data point X NewCarry out regression forecasting and obtain predicted value y New
But as indicated above, the LWLR method is existing problems in the neighbour calculates.Neighbour's calculating generally is to calculate through the Euclidean distance between the data point among the LWLR.Therefore, In yet another embodiment, after adopting above-mentioned characteristic extending method that former data characteristics is expanded, the available scheme of distance calculation is more flexibly tried to achieve the x of new data point NewNeighbour x +, train anticipation function with the neighbour at last and new data point is predicted drawn y.
Should point out that X and expansion can be the structures of serializing or tree even figure, but can only express with different dimensions in the data, structural information is record separately.Among the for example top embodiment, X 7, X 8, X 9, represented the historical development trend of month sales volume.Concerning 106864, in " month " dimension, last month 106863, the month before last 106862 and certain data of 2011.1 all are its " neighbours ", and their y correspondence is extended for X 7, X 8, X 9, be actually orderly moon sales volume y as expansion, belonging to has the serializing of sequencing data, when calculating similarity, can select to use KL distance (Kullback-Leibler Divergence is also referred to as relative entropy) as X 7, X 8, X 9Distance, see table 2 for details.
Illustrate: X 7, X 8, X 9Ordinal relation is arranged, if only this three-dimensional is calculated Euclidean distance, can have following problem: Euclidean distance is treated each dimension and is all made no exception, and the formula of usefulness is a residual sum of squares (RSS); Just for A (1,2,3) some B (2,3; 4) and C (2,1,2) be the same with its distance, but actual A of it seems and B all be promote gradually (if the moon sales volume; Then be to increase month by month), C then swings up and down, and Euclidean distance can't be calculated this difference.And what adopt the KL distance calculation is relative entropy, can draw the nearer correct conclusion of B and A.
Table 2
Figure BDA0000104341900000101
Should point out, in the process of calculating similarity, can use the different distances computing method for different dimensions.Like former dimension X 1To X 6Can directly use Euclidean distance D, expansion back dimension such as X 7, X 8, X 9Ordinal relation is arranged, use the KL distance B according to table 2 KLFinal similarity can adopt various integration methods: like both weighted sum (λ 1D+ λ 2D KL) or both get, and it is little: min (D, D KL), perhaps both get that it is big: max (D, D KL) etc.The scheme of choosing is looked concrete condition and is had nothing in common with each other, can be on data set, and relatively more good and bad with the result that various fusion methods obtain, concrete parameter also can be drawn by training.
Former data before data after the expansion are compared: at first intrinsic dimensionality has obtained expansion, and the characteristic of 6 dimensions has been extended to 10 dimensional features by us in the table; In addition, new characteristic is numeric type and structurized relation can be arranged, like X 7, X 8, X 9Between sequence relation is arranged, therefore can use sequence to predict sales volume y.At last, the not only close point on X that the neighbour calculates, and be point close on y; It is close to be not only monodrome, and is on the sequence and even close on the structure.Based on top 3 points owing to expand, enriched the information of former data point through characteristic, so the neighbour of testing data point can look for more accurately with enrich, thereby improve the quality and the accuracy for predicting of last regression function.
In one embodiment, also can realize the regression forecasting method, particularly, can be divided into the next stage based on MapReduce:
Step 31:Map (mapping) stage: calculate each X +With to be predicted some X NewSimilarity.The method of calculating similarity can be more flexibly, promptly can adopt distance calculating method provided above, also can use existing additive method as required.Output key is to be predicted some id, and value is X +The basis on increase the value of similarity.Notice that the every dimension data order that wherein expands generally is immutable, like X 7, X 8, X 9All representing different corresponding sales volumes (possibly increase month by month) of historical month, is sequential therefore, so do not consider that the range formula (like the cosine distance) of sequence is unaccommodated.
Step 32:Shuffle (breaing up) and Sort (ordering) stage: wherein the Shuffle stage is that the MapReduce framework automatically performs, and the Sort stage can be used prior art, according to similarity size ordering (like heapsort), finds each to be predicted some X NewThe most close K expanding data point X +And y.
Step 33:Reduce (yojan) stage: the information of coming before collecting (K data point the most close before also can filtering out once more), because data point at this moment is less, therefore can call various regression forecasting methods predicts Y NewAnd output.
Still the example of associative list 1 is discussed the step of testing data being carried out regression forecasting:
1) utilize the front to generate good data x +, treat predicted data point x New(108002) carrying out similarity calculates.Multiple computing method can be arranged here, specifically can li find from table 2, the y (from same angle, promptly an iteration in the pre-service has only expanded one dimension) like single expansion then uses simple Euclidean distance to get final product; And a plurality of y, month As mentioned above angle expansion, expanded 3 dimensions, X7, X8, X9 has sequence relation between them, should unifiedly calculate similarity, uses the KL distance to calculate so in table, can find.
Wherein this example 106864 in, characteristic (7,334), (8,325) (9,?) order be immutable, except traditional cosine distance and Euclidean distance calculate similarity, can also use other distances to calculate.As can calculate negative KL distance as similarity, find the data point of similar development trend; Then the similarity that calculates is added among the value.
2) merge this locality of MapReduce, according to the similarity S of preceding text calculating (108002) with each data points, in every blocks of data, finds the similar point of top K to output to next step.Output format is still thought routine key:108002, and value (106864, S);
3) the most close data point of collection front, as, the X that predict NewIn have 108002, data such as 106861,106862,106863,106864,106865 after the expansion of collecting before using are done common linear regression or non-linear regression etc., predict the product sales volume y of pending data point 108002 then NewBecause data X to be predicted NewAlso can have a lot, so this step is still executed in parallel in the MapReduce framework.
Fig. 1 has provided the schematic flow sheet of regression forecasting method according to an embodiment of the invention.This method mainly comprises following step, and wherein 3-5 is the processed offline stage, and 6-8 is the online treatment stage:
1) reads in raw data, read in data to be predicted (can be a plurality of);
2) if data processed offline mistake then skips to 6;
3) use domain knowledge and technology such as experience or mode excavation, needing to obtain the characteristic of expansion, write configuration file;
4) the Map stage, according to 3 distribute each former data information, key is reciever id, value is for expanding dimension and extended value;
5) the Reduce stage, collect all information with key, merge and preserve into new data point x +
6) the Map stage, on the data after the expansion, calculate data x to be predicted NewWith the former data x after the expansion +Similarity, use range formula more flexibly.
7) the Reduce stage, select the most close K data points X i +, i ∈ [1, K].
8) utilize K data points X i +, i ∈ [1, K], the y that regression forecasting goes out to need.
Fig. 2 has provided the schematic block diagram of regression forecasting device according to an embodiment of the invention.This device mainly comprises following module:
Data analysis module: former data are analyzed and mode excavation, needing to be obtained the X dimension of expansion;
Data preprocessing module: according to the output of data analysis module, the characteristic extending method that utilizes the preceding text introduction expands the characteristic of former data;
Regression forecasting module: utilize the data after expanding, adopt new distance calculating method mentioned above, obtain the more effective neighbour predicted value last with returning output
Fig. 3 has provided the effect contrast figure of employing with the regression forecasting of the characteristic extending method that does not adopt the present invention to propose.
What left figure showed is the characteristic extending method that does not use the present invention to propose; Do the design sketch of regression forecasting; Point among the figure is the linear prediction function for " neighbours " point (can think that in the embodiment of this paper horizontal ordinate is month, ordinate is a sales volume) dotted line of having chosen.
Right figure be after using the characteristic expansion that the present invention proposes, do regression forecasting design sketch (can think in the present embodiment that horizontal ordinate is the X7 after the expansion, X8, X9 and y represent last month, the The Month Before Last, last year on year-on-year basis and this month; Ordinate is the sales volume value), dotted line is to be predicted some X NewHistorical sales volume trend.The corresponding y value of the point in the upper right corner is a predicted value.
As can beappreciated from fig. 3, can grasp the development trend of dependent variable y on the whole, therefore obtain more obvious effects owing to adopted characteristic extending method provided by the invention.And this effect has been showed final predicted value y clearly NewWith the data point x after the expansion +Between relation; Can also provide proper explanations for prediction of the present invention: new data point is more close on attribute with former data point shown in the figure; And the dependent variable development trend also reaches unanimity, and the data after therefore using these to expand can better be treated future position and make prediction.
In sum, in the method for the embodiment of the invention, obtain useful pattern information and historical information, thereby former data internal schema is screened through relation between the dependent variable Y of mining data point and pattern.The pattern of being excavated mainly is conceived to nonumeric characteristic, in order to solve characteristic isomery problem.Generally speaking, time and the dimension (exist between the value of this dimension and comprise or relation of association, as the place of production etc.) of structure is arranged is the emphasis of mode excavation.Expand the number of dimension and generally judge, should under algorithm complex acceptable prerequisite, select based on the size of demand and data set.
And, to data point x to be predicted NewExcavate similar Y, come X expanded and obtain X +, Y; And realize with the MapReduce algorithm, reached under unified MapReduce framework fast parallel completion data mining and effects of pretreatment.Compare the model of in the past not considering the data development model, the present invention has only increased a pretreated stage on data set, does not need extra resource just can enrich the information of data point; On execution speed, the time complexity that this pre-service increased is the required N/M of scan-data, and wherein N is the data point number, and M is the number of the Mapper of MapReduce.On treatment effect, enriched the information of former data point X, and finally improved prediction effect.
In addition, use X +, Y and X NewPredict y New,, converted to and must predict the prediction of monodrome in the past the data point to be predicted that structural information (that is, the sequence in the foregoing description) is arranged, used the measuring similarity mode (like the KL distance of using among the above-mentioned embodiment) that is more suitable for.Therefore big lifting is arranged on the prediction effect of data, and can provide reason and the corresponding data that dopes this value.
According to one embodiment of present invention, a kind of supervision machine learning method that has also is provided:
1) feature extraction of training data and dimension yojan, (x1 is x2....) with the form of label Y to form X;
2) utilize the characteristic extending method of being introduced among the preceding text embodiment that X is expanded;
3) selection is confirmed model parameter type and number of parameters by the model formation that x predicts y.
4) on training set, can train the best parameter of effect with various learning methods.
5) model and the parameter that trains are used on the particular problem, finally reach the purpose of machine learning and prediction, for example provide the regression forecasting equation and predicted the outcome, provide sorter and obtain classification results.
In actual applications, what the feature of training data was not enough often, but very big to last result's influence.Use characteristic extending method of the present invention and can overcome this problem.
Again for example, method of the present invention also can be used to carry out weather forecast.
Weather forecast can use rule-based or based on the statistics machine learning method.Latter's key step is exemplified below: 1) collect weather data, and temperature, humidity, longitude, latitude, on the date, whether rain as Y as X season etc.2) selection sort model such as logistic return, and number of parameters is also decided thereupon, and number of parameters equals the X dimension.3) utilize the known weather data of every day in the past, train best parameter.4) with tomorrow, the day after tomorrow, data such as next week, model and parameter that input trains draw the result of whether raining.
In the practical application 1) characteristic often be nonumeric, like season, the date.Therefore adopt this type of problem of solution that characteristic extending method of the present invention can be satisfactory.Finally reach the better prediction effect.
Again for example, method of the present invention also can be used to carry out disease forecasting, 1) from existing case, obtain training data (data of disease forecasting), X: age, sex; Working environment, medical history, body weight, heartbeat, blood platelet; RBC number, leukocyte count, CT, symptom etc., Y: whether suffer from certain disease; 2) selection sort model and method.3) utilize the machine learning training parameter.4) input checking person information utilizes model and calculation of parameter to draw ill probability.
In the practical application, most data of case are not had a numerical characteristics, though the leucocyte red blood cell is a number, are not to be the bigger the better or more bad more greatly yet.Therefore predict the outcome as using method of the present invention, can reaching better.Thereby better reference is provided for the doctor makes a definite diagnosis.
Again for example, method of the present invention also can be used for user's buying behavior is predicted.
User's buying behavior prediction is extremely important to product distribution and advertisement putting, finds the user of strong desire to purchase can effectively reduce the input to distribution and advertisement.Implementation step as: 1) user data is collected, and the information slip of filling out when using the user to buy product is as training data, as comprises: the age, length of surfing the Net, sex, family income, whether the address bought the similar functions product, purchasing channel, job specification etc.; 2) preference pattern, computing method and the parameter that needs; 3) use machine learning method, learn out best parameter; 4) with each user that do not do shopping as the active user, in the model and parameter before the corresponding input of its information, each user is bought the possibility of product and makes a prediction.
In the practical application 1) characteristic mostly be isomery, nonnumeric.Therefore use method of the present invention can better find the user similar with the active user, thereby reach better prediction effect, the product advertising expense is finally saved in the advertisement of directed input.
Again for example, the present invention also can be used to carry out music recommend.Utilize the user to listen the song number of times and give a mark as Y, other various user characteristicses and melody characteristic are as X, and judges is liked the possibility of some new song.Thereby recommend the music prefer for the user.Also can use the abundant and characteristic that quantizes of the present invention.Similar have, the network friend recommendation, and books are recommended.
Certainly, also can use with as decision of a game prediction, information retrieval, spam is classified, other fields such as news importance degree prediction.
Though the present invention is described through preferred embodiment, yet the present invention is not limited to described embodiment here, also comprises various changes and the variation of being made without departing from the present invention.

Claims (9)

1. characteristic extending method that is used for regression forecasting, said method comprises:
In former data point, select the neighbours of data point to be predicted, said neighbours equate or similar a series of former data point with the value of data point to be predicted on certain dimension or certain several dimension;
Utilize these neighbours and corresponding dependent variable value thereof to come the dimension of former data point and data point to be predicted is expanded.
2. characteristic extending method based on MapReduce, said method comprises:
Step 1) is selected the neighbours of data point to be predicted in former data point, said neighbours equate or similar a series of former data point with the value of data point to be predicted on certain dimension or certain several dimension;
Step 2) each former data point is distributed into D 2-D 1+ 1 part, D wherein 2Be the dimension after the former data point expansion, D 1For former data point expands preceding dimension; Every piece of data be (key, value), wherein; Key is the sign of the data point that need to receive this piece of data, and value is included in the sequence number and the corresponding dependent variable value of former data point of sending this piece of data of the dimension that the data point that receives this piece of data will expand;
Each former data point of step 3) is extracted the sequence number and the dependent variable value of the dimension that comprises among the value and is come the dimension of self is expanded based on the data that received.
3. regression forecasting method, said method comprises:
Step a) utilization method according to claim 1 or claim 2 expands the dimension of each former data point X, the data point after obtaining expanding;
Step b) is treated the predicted data point based on the data point after expanding and is carried out regression forecasting.
4. regression forecasting method based on MapReduce, this method comprises:
Step 41) utilize method as claimed in claim 2 that the dimension of each former data point X is expanded, the data point after obtaining expanding;
Step 42) based on the data point after expanding; Treat that the predicted data point carries out that similarity is calculated and distributing data to (key, value), wherein; Key is the sign of data point to be predicted, value for the sign of the data point after expanding and with the similarity of data point to be predicted;
Step 43) based on the similarity of being calculated, select the most close with data point to be predicted K the data point after expanding, utilize local linear weighted regression method to treat the predicted data point and carry out regression forecasting.
5. adopt KL distance, cosine distance or Euclidean distance to calculate similarity for the dimension after the different expansions regression forecasting method according to claim 4, said step 42).
6. regression forecasting device based on MapReduce, said device comprises:
Be used to utilize method as claimed in claim 2 that the dimension of each former data point X is expanded, the device of the data point after obtaining expanding;
Be used for based on the data point after expanding; Treat that the predicted data point carries out that similarity is calculated and distributing data to (key, device value), wherein; Key is the sign of data point to be predicted, value for the sign of the data point after expanding and with the similarity of data point to be predicted;
Be used for based on the similarity of being calculated, select the most close with data point to be predicted K the data point after expanding, utilize local linear weighted regression method to treat the device that the predicted data point carries out regression forecasting.
7. one kind has the supervision machine learning method, and said method comprises:
1) feature extraction of training data and dimension yojan, (x1 is x2....) with the form of label y to form data point X;
2) utilize characteristic extending method according to claim 1 or claim 2 that data point X is expanded;
3) select to predict the model formation of y, confirm model parameter type and number of parameters and on the basis of training set, train by the data point after expanding;
4) utilize model and the parameter that trains to be used in regression forecasting or the classification, finally obtain regression forecasting result or classification results.
8. machine learning method according to claim 7, wherein model formation is a regressive prediction model in the step 3); Said step 4) utilization such as claim 3,4, one of 5 described regression forecasting methods are predicted, and are predicted the outcome.
9. according to claim 7 or 8 described machine learning methods; Said method is used to carry out weather forecast, disease forecasting, user's buying behavior prediction, music recommend, network friend recommendation, and books are recommended, the decision of a game prediction, information retrieval; The spam classification, news importance degree prediction etc.
CN2011103392241A 2011-11-01 2011-11-01 Regression prediction method and device Pending CN102385719A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011103392241A CN102385719A (en) 2011-11-01 2011-11-01 Regression prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011103392241A CN102385719A (en) 2011-11-01 2011-11-01 Regression prediction method and device

Publications (1)

Publication Number Publication Date
CN102385719A true CN102385719A (en) 2012-03-21

Family

ID=45825116

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011103392241A Pending CN102385719A (en) 2011-11-01 2011-11-01 Regression prediction method and device

Country Status (1)

Country Link
CN (1) CN102385719A (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014108768A1 (en) * 2013-01-11 2014-07-17 International Business Machines Corporation Computing regression models
CN104794537A (en) * 2015-04-17 2015-07-22 中国农业科学院柑桔研究所 Method for building prediction models for unaspis yanonensis kuwana emergence periods of mandarins
US9385934B2 (en) 2014-04-08 2016-07-05 International Business Machines Corporation Dynamic network monitoring
CN106294490A (en) * 2015-06-08 2017-01-04 富士通株式会社 The feature Enhancement Method of data sample and device and classifier training method and apparatus
JP2017102710A (en) * 2015-12-02 2017-06-08 日本電信電話株式会社 Data analysis device, data analysis method, and data analysis processing program
CN106940731A (en) * 2017-03-30 2017-07-11 福建师范大学 A kind of data based on non-temporal Attribute Association generation method true to nature
CN107998661A (en) * 2017-12-26 2018-05-08 苏州大学 A kind of aid decision-making method, device and the storage medium of online battle game
CN108052953A (en) * 2017-10-31 2018-05-18 华北电力大学(保定) The relevant sample extended method of feature based
CN108074628A (en) * 2016-11-15 2018-05-25 中国移动通信有限公司研究院 A kind of further consultation patient Forecasting Methodology and device
US10043194B2 (en) 2014-04-04 2018-08-07 International Business Machines Corporation Network demand forecasting
CN108932648A (en) * 2017-07-24 2018-12-04 上海宏原信息科技有限公司 A kind of method and apparatus for predicting its model of item property data and training
WO2019056502A1 (en) * 2017-09-25 2019-03-28 平安科技(深圳)有限公司 Variety game result prediction method and apparatus, and storage medium
CN109684302A (en) * 2018-12-04 2019-04-26 平安科技(深圳)有限公司 Data predication method, device, equipment and computer readable storage medium
US10361924B2 (en) 2014-04-04 2019-07-23 International Business Machines Corporation Forecasting computer resources demand
CN110110209A (en) * 2018-01-22 2019-08-09 青岛科技大学 A kind of intersection recommended method and system based on local weighted linear regression model (LRM)
US10439891B2 (en) 2014-04-08 2019-10-08 International Business Machines Corporation Hyperparameter and network topology selection in network demand forecasting
CN111382890A (en) * 2018-12-27 2020-07-07 珠海格力电器股份有限公司 Household appliance installation quantity prediction method, system and storage medium
US10713574B2 (en) 2014-04-10 2020-07-14 International Business Machines Corporation Cognitive distributed network
CN116777508A (en) * 2023-06-25 2023-09-19 急尼优医药科技(上海)有限公司 Medical supply analysis management system and method based on big data

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104937544A (en) * 2013-01-11 2015-09-23 国际商业机器公司 Computing regression models
US9152921B2 (en) 2013-01-11 2015-10-06 International Business Machines Corporation Computing regression models
US9159028B2 (en) 2013-01-11 2015-10-13 International Business Machines Corporation Computing regression models
WO2014108768A1 (en) * 2013-01-11 2014-07-17 International Business Machines Corporation Computing regression models
CN104937544B (en) * 2013-01-11 2017-06-13 国际商业机器公司 Method, computer-readable medium and computer system for calculating task result
US10043194B2 (en) 2014-04-04 2018-08-07 International Business Machines Corporation Network demand forecasting
US11082301B2 (en) 2014-04-04 2021-08-03 International Business Machines Corporation Forecasting computer resources demand
US10650396B2 (en) 2014-04-04 2020-05-12 International Business Machines Corporation Network demand forecasting
US10361924B2 (en) 2014-04-04 2019-07-23 International Business Machines Corporation Forecasting computer resources demand
US10250481B2 (en) 2014-04-08 2019-04-02 International Business Machines Corporation Dynamic network monitoring
US10257071B2 (en) 2014-04-08 2019-04-09 International Business Machines Corporation Dynamic network monitoring
US9722907B2 (en) 2014-04-08 2017-08-01 International Business Machines Corporation Dynamic network monitoring
US11848826B2 (en) 2014-04-08 2023-12-19 Kyndryl, Inc. Hyperparameter and network topology selection in network demand forecasting
US9385934B2 (en) 2014-04-08 2016-07-05 International Business Machines Corporation Dynamic network monitoring
US10439891B2 (en) 2014-04-08 2019-10-08 International Business Machines Corporation Hyperparameter and network topology selection in network demand forecasting
US9705779B2 (en) 2014-04-08 2017-07-11 International Business Machines Corporation Dynamic network monitoring
US10693759B2 (en) 2014-04-08 2020-06-23 International Business Machines Corporation Dynamic network monitoring
US10771371B2 (en) 2014-04-08 2020-09-08 International Business Machines Corporation Dynamic network monitoring
US10713574B2 (en) 2014-04-10 2020-07-14 International Business Machines Corporation Cognitive distributed network
CN104794537A (en) * 2015-04-17 2015-07-22 中国农业科学院柑桔研究所 Method for building prediction models for unaspis yanonensis kuwana emergence periods of mandarins
CN106294490A (en) * 2015-06-08 2017-01-04 富士通株式会社 The feature Enhancement Method of data sample and device and classifier training method and apparatus
CN106294490B (en) * 2015-06-08 2019-12-24 富士通株式会社 Feature enhancement method and device for data sample and classifier training method and device
JP2017102710A (en) * 2015-12-02 2017-06-08 日本電信電話株式会社 Data analysis device, data analysis method, and data analysis processing program
CN108074628A (en) * 2016-11-15 2018-05-25 中国移动通信有限公司研究院 A kind of further consultation patient Forecasting Methodology and device
CN106940731A (en) * 2017-03-30 2017-07-11 福建师范大学 A kind of data based on non-temporal Attribute Association generation method true to nature
CN108932648A (en) * 2017-07-24 2018-12-04 上海宏原信息科技有限公司 A kind of method and apparatus for predicting its model of item property data and training
WO2019056502A1 (en) * 2017-09-25 2019-03-28 平安科技(深圳)有限公司 Variety game result prediction method and apparatus, and storage medium
CN108052953A (en) * 2017-10-31 2018-05-18 华北电力大学(保定) The relevant sample extended method of feature based
CN107998661A (en) * 2017-12-26 2018-05-08 苏州大学 A kind of aid decision-making method, device and the storage medium of online battle game
CN110110209A (en) * 2018-01-22 2019-08-09 青岛科技大学 A kind of intersection recommended method and system based on local weighted linear regression model (LRM)
CN109684302A (en) * 2018-12-04 2019-04-26 平安科技(深圳)有限公司 Data predication method, device, equipment and computer readable storage medium
CN109684302B (en) * 2018-12-04 2023-08-15 平安科技(深圳)有限公司 Data prediction method, device, equipment and computer readable storage medium
CN111382890A (en) * 2018-12-27 2020-07-07 珠海格力电器股份有限公司 Household appliance installation quantity prediction method, system and storage medium
CN111382890B (en) * 2018-12-27 2022-04-12 珠海格力电器股份有限公司 Household appliance installation quantity prediction method, system and storage medium
CN116777508A (en) * 2023-06-25 2023-09-19 急尼优医药科技(上海)有限公司 Medical supply analysis management system and method based on big data
CN116777508B (en) * 2023-06-25 2024-03-12 急尼优医药科技(上海)有限公司 Medical supply analysis management system and method based on big data

Similar Documents

Publication Publication Date Title
CN102385719A (en) Regression prediction method and device
Velt et al. Entrepreneurial ecosystem research: Bibliometric mapping of the domain
Ismail et al. A hybrid model of self-organizing maps (SOM) and least square support vector machine (LSSVM) for time-series forecasting
Hong et al. A job recommender system based on user clustering.
Shilong Machine learning model for sales forecasting by using XGBoost
CN107862173A (en) A kind of lead compound virtual screening method and device
Shang et al. Moving from mass customization to social manufacturing: A footwear industry case study
CN105893609A (en) Mobile APP recommendation method based on weighted mixing
CN104750780B (en) A kind of Hadoop configuration parameter optimization methods based on statistical analysis
CN106407349A (en) Product recommendation method and device
Ramli et al. Real-time fuzzy regression analysis: A convex hull approach
Chen et al. Development and application of big data platform for garlic industry chain
CN105046323B (en) Regularization-based RBF network multi-label classification method
Broekel Measuring technological complexity-Current approaches and a new measure of structural complexity
Ozcan et al. Human resources mining for examination of R&D progress and requirements
Ming-Te et al. Using data mining technique to perform the performance assessment of lean service
CN114647465A (en) Single program splitting method and system for multi-channel attention-chart neural network clustering
Petrozziello et al. Distributed neural networks for missing big data imputation
Satinet et al. A supervised machine learning classification framework for clothing products’ sustainability
Zhang et al. Common community structure in time-varying networks
Xu et al. E-Commerce Online Shopping Platform Recommendation Model Based on Integrated Personalized Recommendation
Canetta* et al. Applying two-stage SOM-based clustering approaches to industrial data analysis
Jiang Prediction and management of regional economic scale based on machine learning model
US7272583B2 (en) Using supervised classifiers with unsupervised data
Shorfuzzaman Leveraging cloud based big data analytics in knowledge management for enhanced decision making in organizations

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20120321