CN106296305A - Electric business website real-time recommendation System and method under big data environment - Google Patents
Electric business website real-time recommendation System and method under big data environment Download PDFInfo
- Publication number
- CN106296305A CN106296305A CN201610710881.5A CN201610710881A CN106296305A CN 106296305 A CN106296305 A CN 106296305A CN 201610710881 A CN201610710881 A CN 201610710881A CN 106296305 A CN106296305 A CN 106296305A
- Authority
- CN
- China
- Prior art keywords
- user
- matrix
- real
- concealed
- formula
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Development Economics (AREA)
- Databases & Information Systems (AREA)
- Finance (AREA)
- Theoretical Computer Science (AREA)
- Accounting & Taxation (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Game Theory and Decision Science (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention discloses the electric business website real-time recommendation method under a kind of big data environment, and the method comprises: electricity consumption business's website user's implicit expression user behaviors log information training off-line recommended models;Online acquisition user concealed user behaviors log information, and with distributed storage technology and distributed stream treatment technology, the mass users implicit expression user behaviors log information gathered quickly is processed;Merge by the off-line recommended models of training and the up-to-date user concealed behavioural information processed through distributed stream, provide the user up-to-date commercial product recommending list.The present invention can analyze active user behavior under high amount of traffic environment and make real-time recommendation feedback, improves user and recommends satisfaction and electricity business website transaction conversion ratio.
Description
Technical field
The present invention relates to networking technology area, the electric business website real-time recommendation system being specifically related under a kind of big data environment
With method.
Background technology
The electricity Shang platform Taobao of largest domestic accesses user every day and reaches 60,000,000, and every day, online commodity number alreadyd more than
800,000,000.In the face of burgeoning data scale, user is faced with " information overload problem ", if drawn without the help of search
Hold up, the ancillary technique such as commending system or information classification, user finds oneself interested from the Internet resources of magnanimity
Information be an extremely difficult thing so that the effective rate of utilization of information reduces on the contrary.Search engine and personalization push away
The system of recommending is the two kinds of means solving " information overload " problem.Search engine, as Google, Baidu and Bing are defeated according to user
The keyword entered feeds back to the result of user's inquiry, owing to search engine returns search knot according to proprietary Behavior law
Really, it is impossible to provide personalized service according to each user so that the content that possible user is interested is tied by the search of magnanimity
Fruit is covered.Personalized recommendation compensate for the deficiency of search engine in this problem, i.e. replaces user to assess it and all has not seen
Product, and by analyzing the hobby of user and historical behavior, actively recommend to meet the project of user preferences.
The training scale of the commending system meet amount of bordering on the sea under big data age, the commending system under conventional individual environment
The demand that big data age is recommended can not be met.Calculate platform the most in a distributed manner and calculate the commending system of platform gradually as model
Secondary birth.Distributed commending system in early days builds based on Hadoop, and the training that can carry is larger, and cost is lower.
But due to Hadoop use MapReduce framework process intermediate object program time, need to be stored on hard disk intermediate object program in case under
Secondary calling, therefore when processing the recommendation task needing successive ignition, treatment effeciency is low.Along with new class MapReduce
The birth of Computational frame Spark, due to its calculation based on internal memory, it is needing the recommendation of successive ignition from processing speed
Task is substantially better than MapReduce.
After entering the Web2.0 epoch, the demand of real-time recommendation gets more and more, and the conventional recommendation system built based on Hadoop
System, is all periodically to be analyzed data, is then updated model, and then uses new model to carry out personalized recommendation,
Training effectiveness is low, simultaneously as do not have perfect mechanism to coordinate active user is made feedback, therefore there is recommendation satisfied
Degree and the transaction low problem of conversion ratio.Therefore build based on new distribution type stream parallel processing technique, it is possible to analyze in real time
User behavior and the system making real-time recommendation feedback are to have very much Research Significance.
Summary of the invention
The present invention provides the electric business website real-time recommendation System and method under a kind of big data environment, it is possible at data stream ring
Analyze active user behavior under border and make real-time recommendation feedback, improving user and recommend satisfaction and the transaction of electricity business website to convert
Rate.
For achieving the above object, the present invention provides the electric business website real-time recommendation method under a kind of big data environment, and it is special
Point is that the method comprises:
Electricity consumption business's website user's implicit expression user behaviors log information training off-line recommended models;
Online acquisition user concealed user behaviors log information, and with distributed storage technology and distributed stream treatment technology to adopting
The mass users implicit expression user behaviors log information of collection quickly processes;
Merge by training off-line recommended models and the up-to-date user concealed behavioural information processed through distributed stream, for
User provides up-to-date commercial product recommending list.
Above-mentioned electricity consumption business's website user's implicit expression user behaviors log information training off-line recommended models comprises:
S101, collection electricity business's website user's implicit expression user behaviors log information;
S102, set up the weight that every kind of commodity are marked by user, the user of structure-commodity scoring two-dimensional matrix, used
Family implicit expression behavior feedback matrix;
S103, to decompose the user of user concealed behavior feedback matrix-commodity scoring two-dimensional matrix be several submatrixs;
S104, optimum submatrix is become parallel processing by serial process, train off-line recommended models.
The user concealed behavioural information of above-mentioned online acquisition, and with distributed storage technology and distributed stream treatment technology to adopting
The magnanimity information of collection quickly processes and comprises:
S105, the user concealed user behaviors log information gathered in online real-time stream;
S106, filter and process online user's implicit expression user behaviors log information, obtain commodity ID and the ID of correspondence.
Above-mentioned will training off-line recommended models and through distributed stream process up-to-date user concealed behavioural information merge
Come, provide the user up-to-date commercial product recommending list and comprise:
S107, generate user's real-time commercial product recommending list online according to commodity ID and ID, and according to user concealed row
For the result of real-time recommendation is evaluated and tested;
S108, judge prediction accuracy whether more than or equal to threshold value Q, if then jumping to S109, if otherwise jumping to
S104;
S109, generation real-time recommendation list feed back to user.
In above-mentioned S103, the decomposition of user concealed behavior feedback matrix comprises:
If user-commodity scoring two-dimensional matrix is R, each element r in matrix RuiRepresent the user u scoring to article i,
Then from Singular Value Decomposition Using principle, matrix R can be to be decomposed into the form of several matrix multiples, such as formula (1):
R=UTM (1)
In formula (1), user's matrix UTRepresenting the enigmatic language justice property parameters of user, article matrix M represents the enigmatic language justice of article
Property parameters.
Above-mentioned user's matrix UTComprise with the method for solving of article matrix M:
Structure loss function f (U, M), such as formula (2);
In formula (2), I represents consumer articles scoring set, rijIt is element in matrix R,It it is required matrix UT
Element, represent user all items set, mjIt is the element of required matrix M, represents all owners collection of article
Close;
Alternating least-squares is used to minimize loss function f (U, M), through repeatedly iterating the user's matrix drawing optimum
U and optimum article matrix M.
Above-mentioned loss function f (U, M) uses alternating least-squares to carry out minimizing comprising:
A) M is definite value, calculates each feature u of each userkiAnd update, k represents any one feature of user.By formula
(2) to ukiSeek local derviation, make local derviation be equal to 0 and solve:
In formula (7)Wherein E is nf×nfUnit matrix,Represent
As row j ∈ IiThe submatrix of matrix M, R (i, I when being choseni) represent as row j ∈ IiThe row vector of i-th row of selected matrix R;
B) U is definite value, calculates each feature m of each articlekjAnd update, k represents any one feature of article, right
mkjSeek local derviation, make local derviation be equal to 0, use the method for formula (3) for mj, can obtain formula (8):
In formula (8) Represent as row i ∈ IjWhen being chosen
The submatrix of U, R (Ij, j) represent as row i ∈ IjThe row vector of the jth row of selected R;
C) obtain user's matrix U and article matrix M according to formula (7) and formula (8), calculate the root-mean-square error of matrix model
RMSE, obtains user's matrix U and optimum article matrix M, i.e. user's matrix U and the article matrix M of optimum by successive ignition
Optimal models.
Above-mentioned root-mean-square error defines such as formula (9):
In formula (9), u and i is the user during test data set is closed and article,It is that the prediction that commending system obtains is marked,
ruiIt it is the user u true scoring to article i.
In above-mentioned S108, it was predicted that accuracy uses root-mean-square error to judge, its definition is as shown in formula (10):
Wherein, k is proportionality coefficient.
Electric business website real-time recommendation system under a kind of big data environment, is characterized in, this system comprises:
Module is collected in electricity business's website user's implicit expression behavior, and it collects electricity business's website user's implicit expression user behaviors log information;
User concealed behavior feedback matrix sets up module, and it is according to user concealed behavior, sets up user and comments every kind of commodity
The weight divided, the user of structure-commodity scoring two-dimensional matrix, obtain user concealed behavior feedback matrix;
User concealed behavior feedback matrix decomposing module, its user-commodity decomposing user concealed behavior feedback matrix are commented
Dividing two-dimensional matrix is several submatrixs;
Parallelization module, matrix optimal models is become parallel processing, the training of off-line recommended models by serial process by it;
User concealed behavior online acquisition module, it gathers the user concealed user behaviors log information of online real-time stream;
Online user's implicit expression behavior processing module, its filtration and process online user's implicit expression user behaviors log information, it is right to obtain
The commodity ID answered and ID;
Real-time recommendation result test and appraisal module, it generates user's real-time commercial product recommending row online according to commodity ID and ID
Table, and according to user concealed behavior, the result of real-time recommendation is evaluated and tested;
Prediction accuracy judge module, it judges that whether prediction accuracy is more than or equal to threshold value Q;
Real-time recommendation List Generating Module, it generates real-time recommendation list according to prediction accuracy judged result and feeds back to use
Family.
Compared to the prior art, its advantage exists electric business website real-time recommendation System and method under the big data environment of the present invention
In, the present invention designs and builds distributed information log collection based on LogStash and Kafka and distributed information log transport module, profit
With being implanted in the software of application gateway, obtain access log produced by all external user calling system interfaces, and be delivered to
Kafka cluster is unified to be transmitted, and solves cross-system log collection and transmission problem, decrease system cluster huge time, gather and pass
The human cost of defeated miscellaneous service daily record;
The present invention utilizes Spark Streaming real-time streams treatment technology, and the user row of Kafka cluster transmission is uniformly processed
For daily record, carry out real time data filtration, all types of user behavior is classified, and write Hive, unified commending system data source,
The application scenarios of big data can be applied to very well.
The present invention provides a kind of real-time recommendation system, utilizes the unified reading User action log from Hive of Spark Sql,
And punish and normalized, use matrix decomposition model training data source based on Spark parallelization afterwards, by result
In write Redis caching system, optimize web site performance, use Spark Streaming real-time streams technology to real-time use simultaneously
Family access log does real-time recommendation and processes, and updates Redis, the real-time result with off-line is merged, and promotes electricity business website
The positive acting of user experience and the ability of increase electricity business website trading volume.
Accompanying drawing explanation
Fig. 1 is the method flow diagram of the electric business website real-time recommendation method under a kind of big data environment of the present invention;
Fig. 2 is recommended engine schematic diagram;
Fig. 3 is the flow chart that off-line recommended models based on Spark parallelization calculates;
Fig. 4 is the Organization Chart of real-time recommendation system based on Spark.
Detailed description of the invention
Below in conjunction with accompanying drawing, further illustrate the specific embodiment of the present invention.
The present invention provides the electric business website real-time recommendation method under a kind of big data environment, and the method includes the steps of:
Step 1, first electricity consumption business website user implicit expression user behaviors log information training off-line recommended models.
Off-line recommended models training package contains: constitute user by a large number of users implicit expression behavioural information collecting electricity business website hidden
Formula behavioural matrix, and by the decomposition-training optimal off-line recommended models of user concealed behavioural matrix
Step 2, online acquisition user concealed user behaviors log information, and process skill with distributed storage technology and distributed stream
The mass users implicit expression user behaviors log information gathered quickly is processed by art.
The collection of online user's implicit expression user behaviors log information and process, refer in electricity business's Website front-end gateway online acquisition
User concealed user behaviors log information also carries out filtration treatment, provides data source for online real-time recommendation.
Step 3, will training off-line recommended models and through distributed stream process up-to-date user concealed behavioural information merge
Get up, provide the user up-to-date commercial product recommending list, it is achieved real-time online is recommended.
As it is shown in figure 1, be the embodiment of the electric business website real-time recommendation method under a kind of big data environment, the method comprises
Following steps:
S101, collection electricity business's website user's implicit expression user behaviors log information.
There is the user of magnanimity electricity business website, including registration user and nonregistered user.Electric business website under big data environment:
(1) including the user that number is above in terms of 100,000, these users include registering user and nonregistered user;(2) more than ten thousand kinds are included
Commodity be available for user select browse or on-line purchase, these commodity can be original existing commodity, it is also possible to is newly added
Commodity.
The quantity of electricity business website user has substantial connection with the recommendation quality of electricity business's network recommendation system, directly affects electricity business
The economic and social benefit of website.
User concealed behavior comprises: user browses commodity, user adds commodity and deletes business to shopping cart, user from shopping cart
Product, user's pay invoice, user cancel an order and the behavior such as user's collecting commodities.By these user behaviors are carried out weight
Classification, can obtain user for the scoring of each commodity in electricity business website.
S102, set up the weight that every kind of commodity are marked by user, the user of structure-commodity scoring two-dimensional matrix, used
Family implicit expression behavior feedback matrix.
Owing to hobby and the purchasing power of user self have certain limitation, user can produce the commodity number of behavior
The percentage ratio accounting for whole commodity is the lowest so that the degree of rarefication of user-commodity scoring weight matrix is the lowest.Further, along with electricity business's net
The expansion stood, user concealed behavior feedback matrix scale is the most increasing.
S103, to decompose the user of user concealed behavior feedback matrix-commodity scoring two-dimensional matrix be several submatrixs, fall
The complexity that low calculating off-line is recommended.
In S103, the decomposition of user concealed behavior feedback matrix comprises:
If user-commodity scoring two-dimensional matrix is R, each element r in matrix RuiRepresent the user u scoring to article i,
Then from Singular Value Decomposition Using principle, matrix R can be to be decomposed into the form of several matrix multiples, such as formula (1):
R=UTM (1)
In formula (1), user's matrix UTRepresent the enigmatic language justice property parameters (hiding user personality parameter) of user, article square
Battle array M represents the enigmatic language justice property parameters (hiding item characteristics parameter) of article.
In formula (1), user's matrix UTComprise with the method for solving of article matrix M:
1) structure loss function f (U, M), such as formula (2);
In formula (2), I represents consumer articles scoring set, rijIt is element in matrix R,It it is required matrix UT
Element (OK), represent user all items set, mjIt is the element (arranging) of required matrix M, represents all of article
Owner gathers.Loss function is in order to Expired Drugs occurs in solution procedure on the right of plus sige.
2) use alternating least-squares (ALS, Alternating Least Square) minimize loss function f (U,
M)。
That is, input: user rating matrix R;Output: matrix U and M.Time initial, generate matrix U and the representative representing user
The matrix M of article, through repeatedly iterating the user's matrix U and optimum article matrix M drawing optimum.
Wherein, method of least square (ALS), also known as least square method, is a kind of mathematical optimization techniques.It is by minimizing by mistake
The quadratic sum of difference finds the optimal function coupling of data.Utilize method of least square can try to achieve the data of the unknown easily, and make
Obtaining the quadratic sum of error between these data tried to achieve and real data is minimum.
The concrete grammar that above-mentioned loss function f (U, M) uses alternating least-squares to carry out minimizing comprises:
A) M is definite value, calculates each feature u of each userkiAnd update, k represents any one feature of user.By formula
(2) to ukiSeek local derviation, make local derviation be equal to 0 and solve:
In formula (7)Wherein E is nf×nfUnit matrix,Represent
As row j ∈ IiThe submatrix of matrix M, R (i, I when being choseni) represent as row j ∈ IiThe row vector of i-th row of selected matrix R.
B) U is definite value, calculates each feature m of each articlekjAnd update, k represents any one feature of article, right
mkjSeek local derviation, make local derviation be equal to 0, use the method for formula (3) for mj, can obtain formula (8):
In formula (8) Represent as row i ∈ IjIt is chosen
Time U submatrix, R (Ij, j) represent as row i ∈ IjThe row vector of the jth row of selected R.
C) obtain user's matrix U and article matrix M according to formula (7) and formula (8), calculate the root-mean-square error of matrix model
RMSE, the optimum user's matrix U obtained by successive ignition and optimum article matrix M, i.e. user's matrix U and article matrix
The optimal models of M.Owing to matrix decomposition model based on implicit feedback greatly reduces the dimension of user-commodity rating matrix,
Computational efficiency is obtained promote.
Here, root-mean-square error defines such as formula (9):
In formula (9), u and i is the user during test data set is closed and article,It is that the prediction that commending system obtains is marked,
ruiIt it is the user u true scoring to article i.
S104, by serial process data being become parallel processing, off-line recommended models is trained.
Wherein parallelization off-line recommended models training package contains: by optimum user's matrix UTWith optimum article matrix M by
Serial process becomes parallel processing, improves optimum user's matrix UTWith the training speed of optimum article matrix M, improve big number
According to the data-handling efficiency under environment.
S105, gather the user concealed user behaviors log information of online real-time stream.
Wherein, online real-time stream, refer to that electricity business website arrives when normal work or leaves each of front-end server
Plant data stream.
User concealed user behaviors log, records user's various implicit expression behaviors relative to article in electricity business website.
S106, filter and process online user's implicit expression user behaviors log information, from the user concealed behavior of online real time collecting
Log information removes various interference information, from daily record, extracts commodity ID and the ID of correspondence, for real-time recommendation system
Use.
S107, the commodity ID obtained according to S106 and ID, online generation user's real-time commercial product recommending list, and according to
The result of real-time recommendation is evaluated and tested by user concealed behavior.
Here, real-time recommendation refers to the user concealed user behaviors log data according to electricity business website online real time collecting, and melts
Close off-line recommended models and generate user's Recommendations list.
S108, judge prediction accuracy whether more than or equal to threshold value Q, if then jumping to S109, if otherwise jumping to
S104.Predetermined threshold value Q refers to the electricity business website received minimum standards of prediction accuracy.
Wherein, it was predicted that accuracy uses root-mean-square error to judge, its definition is as shown in formula (10):
Wherein, k is proportionality coefficient.
S109, generation real-time recommendation list feed back to user.
The invention also discloses the electric business website real-time recommendation system under a kind of big data environment, this system comprises:
Module is collected in electricity business's website user's implicit expression behavior, and it collects electricity business's website user's implicit expression user behaviors log information;
User concealed behavior feedback matrix sets up module, and it is according to user concealed behavior, sets up user and comments every kind of commodity
The weight divided, the user of structure-commodity scoring two-dimensional matrix, obtain user concealed behavior feedback matrix;
User concealed behavior feedback matrix decomposing module, its user-commodity decomposing user concealed behavior feedback matrix are commented
Dividing two-dimensional matrix is several submatrixs;
Parallelization module, matrix optimal models is become parallel processing, the training of off-line recommended models by serial process by it;
User concealed behavior online acquisition module, it gathers the user concealed user behaviors log information of online real-time stream;
Online user's implicit expression behavior processing module, its filtration and process online user's implicit expression user behaviors log information, it is right to obtain
The commodity ID answered and ID;
Real-time recommendation result test and appraisal module, it generates user's real-time commercial product recommending row online according to commodity ID and ID
Table, and according to user concealed behavior, the result of real-time recommendation is evaluated and tested;
Prediction accuracy judge module, it judges that whether prediction accuracy is more than or equal to threshold value Q;
Real-time recommendation List Generating Module, it generates real-time recommendation list according to prediction accuracy judged result and feeds back to use
Family.
As in figure 2 it is shown, be recommended engine schematic diagram, it is recommended that engine is equivalent to a black box, upper strata is the data source gathered,
Lower floor gives the result of each user.The data source that commending system receives is of a great variety, as a example by electricity business website, can receive article base
This information, the information such as including the title of commodity, price, specification, the essential information of user can be received, such as the sex of user, the age,
Hobby, and the hobby of user is collected by some technological means, set up user's portrait.In user behavior preference, main
It is divided into two big classes, the explicit behavior of user and user concealed behavior.The explicit behavior of user is from straight for commodity of user
Take point and the comment of user, it is possible to show the Behavior preference of user more intuitively, but have extra running cost, gather
Data volume to compare implicit expression behavior smaller.User concealed behavior comes from user and expresses spontaneous during self Behavior preference carry out
User operation, as user accesses commodity details page, interpolation commodity to shopping cart, deletes commodity, pay invoice, pass from shopping cart
The behaviors such as note Brand all can be as user concealed behavior, by setting up the user concealed behavior of forward and negative sense, it is possible to very
Infer well the Behavior preference of user.
As it is shown on figure 3, be the method flow of off-line recommended models based on Spark parallelization calculating, its concrete steps bag
Contain:
S301, acquisition user's score data RDD (Resilient Distributed Datasets).RDD is a kind of point
The abstract conception of cloth internal memory, is fault-tolerant, a parallel data structure, can split large-scale data set to little
Data block, is stored on one or more node of cluster, and user can be allowed explicitly to control the position of data storage
And partitioning scenario.
S302, generation user's matrix U and the RDD of article matrix M, i.e. user's matrix U and the RDD of article matrix M.
S303, fixing U solve M, correspond to above-mentioned formula (8), are updated original M simultaneously, and by M at Spark
Broadcasting, because in parallel environment, the data of M are distributed on different nodes, need to allow other nodes know that M updates
?.
S304, fixing M solve U, correspond to above-mentioned formula (7), are updated original U simultaneously, and by U at Spark
In broadcast because in parallel environment, the data of U are distributed on different nodes, need to allow other nodes know that M updates
?.
S305, according to set iterations, observe RMSE change, select optimal training parameter.
The basis of off-line recommended models based on Spark is square based on ALS (Alternating Least Squares)
Battle array decomposition model, in the processing procedure of electricity business website, needs the sales tactics according to electricity business website to adjust commending system
Whole.
As shown in Figure 4, it is the framework of a kind of real-time recommendation system based on Spark, is divided into three levels: processed offline
Layer, service layer and process layer in real time.
In service layer, application gateway server is installed distributed information log Collection agent Agent, gathers and access each industry
The log information of business system.Owing to the daily record quantum of output of electricity business website is huge, need reliable messaging middleware as mould
Type training and data source gather between tie, system constructing message distribution middleware based on Kafka cluster, it is achieved daily record
The unification of data issues.Owing to daily record data comprising daily record and the daily record of user's click steam of each operation system,
Enter off-line or before the real-time recommendation stage, need to be through unified data cleansing.Use the Spark Streaming of Spark platform
Technology realizes the real-time process of daily record, and Spark Streaming technology can be received in Fixed Time Interval according to time slicing
To data carry out unifying batch processing, the effect processed in real time can be reached, and there is the highest handling capacity.
At processed offline layer, after the data source collection of real-time recommendation, the user behavior in data source is carried out
The classification of weight, obtains the user's basic scoring for certain commodity, and inputs recommended models training.Traditional scheme is to use
The off-line recommended models training of Hadoop platform, but there are three problems in Hadoop platform: and one is that abstraction hierarchy is low, needs to write
The very code of redundancy completes operation;Two is that Hadoop platform provides only Map and Reduce two operation, and ability to express is short of;Three
It is to process intermediate object program to be stored in HDFS file system so that slowly (intermediate data to pass through to calculate iterative task speed
Hard disk cache).To utilize RDD to carry out abstract for the Spark platform that the present invention uses, it is achieved mathematical logic compare Hadoop platform more
Briefly, provide multiple conversion and operation simultaneously, there is the strongest expressiveness.Meanwhile, relative to Hadoop platform, Spark platform
Results of intermediate calculations can be buffered in internal memory, for need a lot of iterative computation recommendation task, improve computational efficiency.
Processing layer in real time, be the user concealed behavior according to the same day or the same day each time period, the behavior extracting user is inclined
The real-time commercial product recommending list that good implicit expression behavioral data real-time update off-line recommended models provide the user, thus be reached for using
Family provides the purpose of real-time recommendation.
This method is illustrated below by an embodiment.
Certain electricity business website uses 3 Cloud Servers of electric business's real-time recommendation system building based on Spark platform, and trustship exists
On Ali's cloud, undertake and be up to 1600 general-purpose families access every day.Use in December, 2015 this electricity business in November, 2015
Hadoop off-line commending system, uses real-time recommendation system based on Spark platform from April, 2016 in January, 2016 to, and it is purchased
The conversion of thing car is as shown in table 1, and its order conversion ratio is as shown in table 2.As known from Table 1, the conversion of shopping cart improves about 3.5
Times, order conversion ratio improves about 2.5 times as can be seen from the table.
Table 1 shopping cart conversion ratio
Table 2 order conversion ratio
Although present disclosure has been made to be discussed in detail by above preferred embodiment, but it should be appreciated that above-mentioned
Description is not considered as limitation of the present invention.After those skilled in the art have read foregoing, for the present invention's
Multiple amendment and replacement all will be apparent from.Therefore, protection scope of the present invention should be limited to the appended claims.
Claims (10)
1. the electric business website real-time recommendation method under a big data environment, it is characterised in that the method comprises:
Electricity consumption business's website user's implicit expression user behaviors log information training off-line recommended models;
Online acquisition user concealed user behaviors log information, and with distributed storage technology and distributed stream treatment technology to gathering
Mass users implicit expression user behaviors log information quickly processes;
Merge to get up, for user by training off-line recommended models and the up-to-date user concealed behavioural information processed through distributed stream
Up-to-date commercial product recommending list is provided.
2. the electric business website real-time recommendation method under big data environment as claimed in claim 1, it is characterised in that described electricity consumption
Business's website user's implicit expression user behaviors log information training off-line recommended models comprises:
S101, collection electricity business's website user's implicit expression user behaviors log information;
S102, set up the weight that every kind of commodity are marked by user, the user of structure-commodity scoring two-dimensional matrix, obtain user hidden
Formula behavior feedback matrix;
S103, to decompose the user of user concealed behavior feedback matrix-commodity scoring two-dimensional matrix be several submatrixs;
S104, optimum submatrix is become parallel processing by serial process, train off-line recommended models.
3. the electric business website real-time recommendation method under data environment as claimed in claim 2 big, it is characterised in that described online
Gather user concealed user behaviors log information, and the mass users to gathering by distributed storage technology and distributed stream treatment technology
Implicit expression user behaviors log information quickly processes and comprises:
S105, the user concealed user behaviors log information gathered in online real-time stream;
S106, filter and process online user's implicit expression user behaviors log information, obtain commodity ID and the ID of correspondence.
4. the electric business website real-time recommendation method under the biggest data environment, it is characterised in that described
Merge by training off-line recommended models and the up-to-date user concealed behavioural information processed through distributed stream, provide the user
Up-to-date commercial product recommending list comprises:
S107, generate user's real-time commercial product recommending list online according to commodity ID and ID, and according to user concealed behavior pair
The result of real-time recommendation is evaluated and tested;
S108, judge prediction accuracy whether more than or equal to threshold value Q, if then jumping to S109, if otherwise jumping to S104;
S109, generation real-time recommendation list feed back to user.
5. the electric business website real-time recommendation method under big data environment as claimed in claim 2, it is characterised in that described S103
In, the decomposition of user concealed behavior feedback matrix comprises:
If user-commodity scoring two-dimensional matrix is R, each element r in matrix RuiRepresent the user u scoring to article i, then by
Singular Value Decomposition Using principle understands, and matrix R can be to be decomposed into the form of several matrix multiples, such as formula (1):
R=UTM (1)
In formula (1), user's matrix UTRepresenting the enigmatic language justice property parameters of user, article matrix M represents the hidden semantic attribute ginseng of article
Number.
6. the electric business website real-time recommendation method under big data environment as claimed in claim 5, it is characterised in that described user
Matrix UTComprise with the method for solving of article matrix M:
Structure loss function f (U, M), such as formula (2);
In formula (2), I represents consumer articles scoring set, rijIt is element in matrix R,It it is required matrix UTUnit
Element, represents all items set of user, mjIt is the element of required matrix M, represents all owners set of article;
Use alternating least-squares minimize loss function f (U, M), through repeatedly iterate draw optimum user's matrix U with
Optimum article matrix M.
7. the electric business website real-time recommendation method under big data environment as claimed in claim 6, it is characterised in that described loss
Function f (U, M) uses alternating least-squares to carry out minimizing comprising:
A) M is definite value, calculates each feature u of each userkiAnd update, k represents any one feature of user.By right for formula (2)
ukiSeek local derviation, make local derviation be equal to 0 and solve:
In formula (7)Wherein E is nf×nfUnit matrix,Represent when row
j∈IiThe submatrix of matrix M, R (i, I when being choseni) represent as row j ∈ IiThe row vector of i-th row of selected matrix R;
B) U is definite value, calculates each feature m of each articlekjAnd update, k represents any one feature of article, to mkjAsk inclined
Lead, make local derviation be equal to 0, use the method for formula (3) for mj, can obtain formula (8):
In formula (8) Represent as row i ∈ IjU when being chosen
Submatrix, R (Ij, j) represent as row i ∈ IjThe row vector of the jth row of selected R;
C) obtain user's matrix U and article matrix M according to formula (7) and formula (8), calculate root-mean-square error RMSE of matrix model,
User's matrix U of optimum and the optimum of optimum article matrix M, i.e. user's matrix U with article matrix M is obtained by successive ignition
Model.
8. the electric business website real-time recommendation method under big data environment as claimed in claim 7, it is characterised in that described mean square
Root error defines such as formula (9):
In formula (9), u and i is the user during test data set is closed and article,It is that the prediction that commending system obtains is marked, ruiIt is
The user u true scoring to article i.
9. the electric business website real-time recommendation method under the biggest data environment, it is characterised in that described
In S108, it was predicted that accuracy uses root-mean-square error to judge, its definition is as shown in formula (10):
Wherein, k is proportionality coefficient.
10. the electric business website real-time recommendation system under a big data environment, it is characterised in that this system comprises:
Module is collected in electricity business's website user's implicit expression behavior, and it collects electricity business's website user's implicit expression user behaviors log information;
User concealed behavior feedback matrix sets up module, and it sets up what every kind of commodity were marked by user according to user concealed behavior
Weight, the user of structure-commodity scoring two-dimensional matrix, obtain user concealed behavior feedback matrix;
User concealed behavior feedback matrix decomposing module, it decomposes the user-commodity scoring two of user concealed behavior feedback matrix
Dimension matrix is several submatrixs;
Parallelization module, matrix optimal models is become parallel processing, the training of off-line recommended models by serial process by it;
User concealed behavior online acquisition module, it gathers the user concealed user behaviors log information of online real-time stream;
Online user's implicit expression behavior processing module, it filters and processes online user's implicit expression user behaviors log information, obtaining correspondence
Commodity ID and ID;
Real-time recommendation result test and appraisal module, it generates user's real-time commercial product recommending list online according to commodity ID and ID, and
According to user concealed behavior, the result of real-time recommendation is evaluated and tested;
Prediction accuracy judge module, it judges that whether prediction accuracy is more than or equal to threshold value Q;
Real-time recommendation List Generating Module, it generates real-time recommendation list according to prediction accuracy judged result and feeds back to user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610710881.5A CN106296305A (en) | 2016-08-23 | 2016-08-23 | Electric business website real-time recommendation System and method under big data environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610710881.5A CN106296305A (en) | 2016-08-23 | 2016-08-23 | Electric business website real-time recommendation System and method under big data environment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106296305A true CN106296305A (en) | 2017-01-04 |
Family
ID=57615816
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610710881.5A Pending CN106296305A (en) | 2016-08-23 | 2016-08-23 | Electric business website real-time recommendation System and method under big data environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106296305A (en) |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107171843A (en) * | 2017-05-23 | 2017-09-15 | 上海海事大学 | A kind of system of selection of preferable cloud service provider and system |
CN107277159A (en) * | 2017-07-10 | 2017-10-20 | 东南大学 | A kind of super-intensive network small station caching method based on machine learning |
CN107341206A (en) * | 2017-06-23 | 2017-11-10 | 南京甄视智能科技有限公司 | Accurately user's portrait system and method is built based on multiple data sources |
CN107590213A (en) * | 2017-08-29 | 2018-01-16 | 重庆邮电大学 | Mixing commending system based on mobile phone big data |
CN107895026A (en) * | 2017-11-17 | 2018-04-10 | 联奕科技有限公司 | A kind of implementation method of campus user portrait |
CN108596720A (en) * | 2018-04-23 | 2018-09-28 | 广东奥园奥买家电子商务有限公司 | A method of commercial product recommending is carried out according to the behavioral data of user |
WO2018176937A1 (en) * | 2017-03-27 | 2018-10-04 | 华南理工大学 | Quantitative scoring method for implicit feedback of user |
WO2018205853A1 (en) * | 2017-05-10 | 2018-11-15 | 腾讯科技(深圳)有限公司 | Distributed computing system and method and storage medium |
CN108875776A (en) * | 2018-05-02 | 2018-11-23 | 北京三快在线科技有限公司 | Model training method and device, business recommended method and apparatus, electronic equipment |
CN108876508A (en) * | 2018-05-03 | 2018-11-23 | 上海海事大学 | A kind of electric business collaborative filtering recommending method |
CN109087162A (en) * | 2018-07-05 | 2018-12-25 | 杭州朗和科技有限公司 | Data processing method, system, medium and calculating equipment |
CN109102127A (en) * | 2018-08-31 | 2018-12-28 | 杭州贝购科技有限公司 | Method of Commodity Recommendation and device |
CN109146606A (en) * | 2018-07-09 | 2019-01-04 | 广州品唯软件有限公司 | A kind of brand recommended method, electronic equipment, storage medium and system |
CN109408711A (en) * | 2018-09-29 | 2019-03-01 | 北京三快在线科技有限公司 | Data filtering method, device, electronic equipment and storage medium |
CN109635186A (en) * | 2018-11-16 | 2019-04-16 | 华南理工大学 | A kind of real-time recommendation method based on Lambda framework |
CN109690571A (en) * | 2017-04-20 | 2019-04-26 | 北京嘀嘀无限科技发展有限公司 | Group echo system and method based on study |
CN109783465A (en) * | 2018-12-25 | 2019-05-21 | 同济大学 | Magnanimity threedimensional model integrated platform under a kind of cloud computing framework |
CN109816495A (en) * | 2019-02-13 | 2019-05-28 | 北京达佳互联信息技术有限公司 | Merchandise news method for pushing, system and server and storage medium |
CN109903103A (en) * | 2017-12-07 | 2019-06-18 | 华为技术有限公司 | A kind of method and apparatus for recommending article |
CN110175287A (en) * | 2019-05-22 | 2019-08-27 | 湖南大学 | A kind of matrix decomposition implicit feedback recommended method and system based on Flink |
CN110674964A (en) * | 2019-05-14 | 2020-01-10 | 南京邮电大学 | Search prediction system and method based on agricultural traceability information |
CN110674408A (en) * | 2019-09-30 | 2020-01-10 | 北京三快在线科技有限公司 | Service platform, and real-time generation method and device of training sample |
CN110754075A (en) * | 2017-10-13 | 2020-02-04 | 美的集团股份有限公司 | Method and system for providing personalized live information exchange |
CN111681085A (en) * | 2020-06-10 | 2020-09-18 | 创新奇智(成都)科技有限公司 | Commodity pushing method and device, server and readable storage medium |
CN113112333A (en) * | 2021-04-27 | 2021-07-13 | 湖南云畅网络科技有限公司 | Data stream processing method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103345698A (en) * | 2013-07-09 | 2013-10-09 | 焦点科技股份有限公司 | Personalized recommendation method based on cloud processing mode and applied in e-business environment |
CN104133837A (en) * | 2014-06-24 | 2014-11-05 | 上海交通大学 | Internet information putting channel optimizing system based on distributed computing |
CN105183841A (en) * | 2015-09-06 | 2015-12-23 | 南京游族信息技术有限公司 | Recommendation method in combination with frequent item set and deep learning under big data environment |
CN105488216A (en) * | 2015-12-17 | 2016-04-13 | 上海中彦信息科技有限公司 | Recommendation system and method based on implicit feedback collaborative filtering algorithm |
-
2016
- 2016-08-23 CN CN201610710881.5A patent/CN106296305A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103345698A (en) * | 2013-07-09 | 2013-10-09 | 焦点科技股份有限公司 | Personalized recommendation method based on cloud processing mode and applied in e-business environment |
CN104133837A (en) * | 2014-06-24 | 2014-11-05 | 上海交通大学 | Internet information putting channel optimizing system based on distributed computing |
CN105183841A (en) * | 2015-09-06 | 2015-12-23 | 南京游族信息技术有限公司 | Recommendation method in combination with frequent item set and deep learning under big data environment |
CN105488216A (en) * | 2015-12-17 | 2016-04-13 | 上海中彦信息科技有限公司 | Recommendation system and method based on implicit feedback collaborative filtering algorithm |
Cited By (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018176937A1 (en) * | 2017-03-27 | 2018-10-04 | 华南理工大学 | Quantitative scoring method for implicit feedback of user |
CN109690571B (en) * | 2017-04-20 | 2020-09-18 | 北京嘀嘀无限科技发展有限公司 | Learning-based group tagging system and method |
CN109690571A (en) * | 2017-04-20 | 2019-04-26 | 北京嘀嘀无限科技发展有限公司 | Group echo system and method based on study |
WO2018205853A1 (en) * | 2017-05-10 | 2018-11-15 | 腾讯科技(深圳)有限公司 | Distributed computing system and method and storage medium |
CN107171843A (en) * | 2017-05-23 | 2017-09-15 | 上海海事大学 | A kind of system of selection of preferable cloud service provider and system |
CN107171843B (en) * | 2017-05-23 | 2019-07-09 | 上海海事大学 | A kind of selection method and system of ideal cloud service provider |
CN107341206B (en) * | 2017-06-23 | 2019-11-29 | 南京甄视智能科技有限公司 | The method for constructing accurately user's portrait system based on multiple data sources |
CN107341206A (en) * | 2017-06-23 | 2017-11-10 | 南京甄视智能科技有限公司 | Accurately user's portrait system and method is built based on multiple data sources |
CN107277159B (en) * | 2017-07-10 | 2020-05-08 | 东南大学 | Ultra-dense network small station caching method based on machine learning |
CN107277159A (en) * | 2017-07-10 | 2017-10-20 | 东南大学 | A kind of super-intensive network small station caching method based on machine learning |
CN107590213A (en) * | 2017-08-29 | 2018-01-16 | 重庆邮电大学 | Mixing commending system based on mobile phone big data |
CN110754075A (en) * | 2017-10-13 | 2020-02-04 | 美的集团股份有限公司 | Method and system for providing personalized live information exchange |
US10789638B2 (en) | 2017-10-13 | 2020-09-29 | Midea Group Co., Ltd. | Method and system for providing personalized on-location information exchange |
CN107895026A (en) * | 2017-11-17 | 2018-04-10 | 联奕科技有限公司 | A kind of implementation method of campus user portrait |
CN109903103A (en) * | 2017-12-07 | 2019-06-18 | 华为技术有限公司 | A kind of method and apparatus for recommending article |
CN108596720A (en) * | 2018-04-23 | 2018-09-28 | 广东奥园奥买家电子商务有限公司 | A method of commercial product recommending is carried out according to the behavioral data of user |
CN108875776B (en) * | 2018-05-02 | 2021-08-20 | 北京三快在线科技有限公司 | Model training method and device, service recommendation method and device, and electronic device |
CN108875776A (en) * | 2018-05-02 | 2018-11-23 | 北京三快在线科技有限公司 | Model training method and device, business recommended method and apparatus, electronic equipment |
CN108876508A (en) * | 2018-05-03 | 2018-11-23 | 上海海事大学 | A kind of electric business collaborative filtering recommending method |
CN109087162A (en) * | 2018-07-05 | 2018-12-25 | 杭州朗和科技有限公司 | Data processing method, system, medium and calculating equipment |
CN109146606B (en) * | 2018-07-09 | 2022-02-22 | 广州品唯软件有限公司 | Brand recommendation method, electronic equipment, storage medium and system |
CN109146606A (en) * | 2018-07-09 | 2019-01-04 | 广州品唯软件有限公司 | A kind of brand recommended method, electronic equipment, storage medium and system |
CN109102127B (en) * | 2018-08-31 | 2021-10-26 | 杭州贝购科技有限公司 | Commodity recommendation method and device |
CN109102127A (en) * | 2018-08-31 | 2018-12-28 | 杭州贝购科技有限公司 | Method of Commodity Recommendation and device |
CN109408711A (en) * | 2018-09-29 | 2019-03-01 | 北京三快在线科技有限公司 | Data filtering method, device, electronic equipment and storage medium |
CN109635186A (en) * | 2018-11-16 | 2019-04-16 | 华南理工大学 | A kind of real-time recommendation method based on Lambda framework |
CN109783465B (en) * | 2018-12-25 | 2023-09-08 | 吉林动画学院 | Mass three-dimensional model integration system under cloud computing framework |
CN109783465A (en) * | 2018-12-25 | 2019-05-21 | 同济大学 | Magnanimity threedimensional model integrated platform under a kind of cloud computing framework |
CN109816495A (en) * | 2019-02-13 | 2019-05-28 | 北京达佳互联信息技术有限公司 | Merchandise news method for pushing, system and server and storage medium |
CN109816495B (en) * | 2019-02-13 | 2020-11-24 | 北京达佳互联信息技术有限公司 | Commodity information pushing method, system, server and storage medium |
CN110674964A (en) * | 2019-05-14 | 2020-01-10 | 南京邮电大学 | Search prediction system and method based on agricultural traceability information |
CN110175287A (en) * | 2019-05-22 | 2019-08-27 | 湖南大学 | A kind of matrix decomposition implicit feedback recommended method and system based on Flink |
CN110674408A (en) * | 2019-09-30 | 2020-01-10 | 北京三快在线科技有限公司 | Service platform, and real-time generation method and device of training sample |
CN110674408B (en) * | 2019-09-30 | 2021-06-04 | 北京三快在线科技有限公司 | Service platform, and real-time generation method and device of training sample |
CN111681085A (en) * | 2020-06-10 | 2020-09-18 | 创新奇智(成都)科技有限公司 | Commodity pushing method and device, server and readable storage medium |
CN113112333A (en) * | 2021-04-27 | 2021-07-13 | 湖南云畅网络科技有限公司 | Data stream processing method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106296305A (en) | Electric business website real-time recommendation System and method under big data environment | |
Ma et al. | Recommender systems with social regularization | |
Yang et al. | Friend or frenemy? Predicting signed ties in social networks | |
CN105488216A (en) | Recommendation system and method based on implicit feedback collaborative filtering algorithm | |
CN101226557B (en) | Method for processing efficient relating subject model data | |
CN102254028A (en) | Personalized commodity recommending method and system which integrate attributes and structural similarity | |
CN106600302A (en) | Hadoop-based commodity recommendation system | |
CN104268292A (en) | Label word library update method of portrait system | |
CN104090886A (en) | Method and device for constructing real-time portrayal of user | |
CN103514239A (en) | Recommendation method and system integrating user behaviors and object content | |
CN103345698A (en) | Personalized recommendation method based on cloud processing mode and applied in e-business environment | |
CN105868334A (en) | Personalized film recommendation method and system based on feature augmentation | |
Huang et al. | A modal interval based method for dynamic decision model considering uncertain quality of used products in remanufacturing | |
CN114647465B (en) | Single program splitting method and system for multi-channel attention map neural network clustering | |
CN104424247A (en) | Product information filtering recommendation method and device | |
KR20220063350A (en) | Method and apparatus for providing marketing information | |
CN112612942A (en) | Social big data-based fund recommendation system and method | |
Li | Accurate digital marketing communication based on intelligent data analysis | |
Grolman et al. | Utilizing transfer learning for in-domain collaborative filtering | |
Zhang | Optimization of the marketing management system based on cloud computing and big data | |
Giri et al. | Exploitation of social network data for forecasting garment sales | |
CN106021391B (en) | Product review information real-time collecting method based on Storm | |
Minjing et al. | Recognizing intentions of E-commerce consumers based on ant colony optimization simulation | |
Karimi-Majd et al. | Extracting new ideas from the behavior of social network users | |
Antulov-Fantulin et al. | Ecml-pkdd 2011 discovery challenge overview |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170104 |