CN106296305A - Electric business website real-time recommendation System and method under big data environment - Google Patents

Electric business website real-time recommendation System and method under big data environment Download PDF

Info

Publication number
CN106296305A
CN106296305A CN201610710881.5A CN201610710881A CN106296305A CN 106296305 A CN106296305 A CN 106296305A CN 201610710881 A CN201610710881 A CN 201610710881A CN 106296305 A CN106296305 A CN 106296305A
Authority
CN
China
Prior art keywords
user
matrix
real
concealed
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610710881.5A
Other languages
Chinese (zh)
Inventor
岑凯伦
韩志德
毕坤
王军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Maritime University
Original Assignee
Shanghai Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Maritime University filed Critical Shanghai Maritime University
Priority to CN201610710881.5A priority Critical patent/CN106296305A/en
Publication of CN106296305A publication Critical patent/CN106296305A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Databases & Information Systems (AREA)
  • Finance (AREA)
  • Theoretical Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses the electric business website real-time recommendation method under a kind of big data environment, and the method comprises: electricity consumption business's website user's implicit expression user behaviors log information training off-line recommended models;Online acquisition user concealed user behaviors log information, and with distributed storage technology and distributed stream treatment technology, the mass users implicit expression user behaviors log information gathered quickly is processed;Merge by the off-line recommended models of training and the up-to-date user concealed behavioural information processed through distributed stream, provide the user up-to-date commercial product recommending list.The present invention can analyze active user behavior under high amount of traffic environment and make real-time recommendation feedback, improves user and recommends satisfaction and electricity business website transaction conversion ratio.

Description

Electric business website real-time recommendation System and method under big data environment
Technical field
The present invention relates to networking technology area, the electric business website real-time recommendation system being specifically related under a kind of big data environment With method.
Background technology
The electricity Shang platform Taobao of largest domestic accesses user every day and reaches 60,000,000, and every day, online commodity number alreadyd more than 800,000,000.In the face of burgeoning data scale, user is faced with " information overload problem ", if drawn without the help of search Hold up, the ancillary technique such as commending system or information classification, user finds oneself interested from the Internet resources of magnanimity Information be an extremely difficult thing so that the effective rate of utilization of information reduces on the contrary.Search engine and personalization push away The system of recommending is the two kinds of means solving " information overload " problem.Search engine, as Google, Baidu and Bing are defeated according to user The keyword entered feeds back to the result of user's inquiry, owing to search engine returns search knot according to proprietary Behavior law Really, it is impossible to provide personalized service according to each user so that the content that possible user is interested is tied by the search of magnanimity Fruit is covered.Personalized recommendation compensate for the deficiency of search engine in this problem, i.e. replaces user to assess it and all has not seen Product, and by analyzing the hobby of user and historical behavior, actively recommend to meet the project of user preferences.
The training scale of the commending system meet amount of bordering on the sea under big data age, the commending system under conventional individual environment The demand that big data age is recommended can not be met.Calculate platform the most in a distributed manner and calculate the commending system of platform gradually as model Secondary birth.Distributed commending system in early days builds based on Hadoop, and the training that can carry is larger, and cost is lower. But due to Hadoop use MapReduce framework process intermediate object program time, need to be stored on hard disk intermediate object program in case under Secondary calling, therefore when processing the recommendation task needing successive ignition, treatment effeciency is low.Along with new class MapReduce The birth of Computational frame Spark, due to its calculation based on internal memory, it is needing the recommendation of successive ignition from processing speed Task is substantially better than MapReduce.
After entering the Web2.0 epoch, the demand of real-time recommendation gets more and more, and the conventional recommendation system built based on Hadoop System, is all periodically to be analyzed data, is then updated model, and then uses new model to carry out personalized recommendation, Training effectiveness is low, simultaneously as do not have perfect mechanism to coordinate active user is made feedback, therefore there is recommendation satisfied Degree and the transaction low problem of conversion ratio.Therefore build based on new distribution type stream parallel processing technique, it is possible to analyze in real time User behavior and the system making real-time recommendation feedback are to have very much Research Significance.
Summary of the invention
The present invention provides the electric business website real-time recommendation System and method under a kind of big data environment, it is possible at data stream ring Analyze active user behavior under border and make real-time recommendation feedback, improving user and recommend satisfaction and the transaction of electricity business website to convert Rate.
For achieving the above object, the present invention provides the electric business website real-time recommendation method under a kind of big data environment, and it is special Point is that the method comprises:
Electricity consumption business's website user's implicit expression user behaviors log information training off-line recommended models;
Online acquisition user concealed user behaviors log information, and with distributed storage technology and distributed stream treatment technology to adopting The mass users implicit expression user behaviors log information of collection quickly processes;
Merge by training off-line recommended models and the up-to-date user concealed behavioural information processed through distributed stream, for User provides up-to-date commercial product recommending list.
Above-mentioned electricity consumption business's website user's implicit expression user behaviors log information training off-line recommended models comprises:
S101, collection electricity business's website user's implicit expression user behaviors log information;
S102, set up the weight that every kind of commodity are marked by user, the user of structure-commodity scoring two-dimensional matrix, used Family implicit expression behavior feedback matrix;
S103, to decompose the user of user concealed behavior feedback matrix-commodity scoring two-dimensional matrix be several submatrixs;
S104, optimum submatrix is become parallel processing by serial process, train off-line recommended models.
The user concealed behavioural information of above-mentioned online acquisition, and with distributed storage technology and distributed stream treatment technology to adopting The magnanimity information of collection quickly processes and comprises:
S105, the user concealed user behaviors log information gathered in online real-time stream;
S106, filter and process online user's implicit expression user behaviors log information, obtain commodity ID and the ID of correspondence.
Above-mentioned will training off-line recommended models and through distributed stream process up-to-date user concealed behavioural information merge Come, provide the user up-to-date commercial product recommending list and comprise:
S107, generate user's real-time commercial product recommending list online according to commodity ID and ID, and according to user concealed row For the result of real-time recommendation is evaluated and tested;
S108, judge prediction accuracy whether more than or equal to threshold value Q, if then jumping to S109, if otherwise jumping to S104;
S109, generation real-time recommendation list feed back to user.
In above-mentioned S103, the decomposition of user concealed behavior feedback matrix comprises:
If user-commodity scoring two-dimensional matrix is R, each element r in matrix RuiRepresent the user u scoring to article i, Then from Singular Value Decomposition Using principle, matrix R can be to be decomposed into the form of several matrix multiples, such as formula (1):
R=UTM (1)
In formula (1), user's matrix UTRepresenting the enigmatic language justice property parameters of user, article matrix M represents the enigmatic language justice of article Property parameters.
Above-mentioned user's matrix UTComprise with the method for solving of article matrix M:
Structure loss function f (U, M), such as formula (2);
f ( U , M ) = Σ ( i , j ) ∈ I ( r i j - u i T m j ) 2 + λ ( Σ i n u i | | u i | | 2 + Σ j n m j | | m j | | 2 ) - - - ( 2 )
In formula (2), I represents consumer articles scoring set, rijIt is element in matrix R,It it is required matrix UT Element, represent user all items set, mjIt is the element of required matrix M, represents all owners collection of article Close;
Alternating least-squares is used to minimize loss function f (U, M), through repeatedly iterating the user's matrix drawing optimum U and optimum article matrix M.
Above-mentioned loss function f (U, M) uses alternating least-squares to carry out minimizing comprising:
A) M is definite value, calculates each feature u of each userkiAnd update, k represents any one feature of user.By formula (2) to ukiSeek local derviation, make local derviation be equal to 0 and solve:
1 2 ∂ f ∂ u k i = 0 , ∀ i , k - - - ( 3 )
⇒ Σ j ∈ I i ( u i T m j - r i j ) m k j + λn u i u k i = 0 , ∀ i , k - - - ( 4 )
⇒ Σ j ∈ I i m k j m j T u i + λn u i u k i = Σ j ∈ I i m k j r i j , ∀ i , k - - - ( 5 )
⇒ ( M I i M I i T + λn u i E ) u i = M I i R T ( i , I i ) , ∀ i - - - ( 6 )
⇒ u i = A i - 1 V i , ∀ i - - - ( 7 )
In formula (7)Wherein E is nf×nfUnit matrix,Represent As row j ∈ IiThe submatrix of matrix M, R (i, I when being choseni) represent as row j ∈ IiThe row vector of i-th row of selected matrix R;
B) U is definite value, calculates each feature m of each articlekjAnd update, k represents any one feature of article, right mkjSeek local derviation, make local derviation be equal to 0, use the method for formula (3) for mj, can obtain formula (8):
⇒ m j = A j - 1 V j , ∀ j - - - ( 8 )
In formula (8) Represent as row i ∈ IjWhen being chosen The submatrix of U, R (Ij, j) represent as row i ∈ IjThe row vector of the jth row of selected R;
C) obtain user's matrix U and article matrix M according to formula (7) and formula (8), calculate the root-mean-square error of matrix model RMSE, obtains user's matrix U and optimum article matrix M, i.e. user's matrix U and the article matrix M of optimum by successive ignition Optimal models.
Above-mentioned root-mean-square error defines such as formula (9):
R M S E = Σ u , i ∈ T ( r u i - r ^ u i ) 2 2 - - - ( 9 )
In formula (9), u and i is the user during test data set is closed and article,It is that the prediction that commending system obtains is marked, ruiIt it is the user u true scoring to article i.
In above-mentioned S108, it was predicted that accuracy uses root-mean-square error to judge, its definition is as shown in formula (10):
Wherein, k is proportionality coefficient.
Electric business website real-time recommendation system under a kind of big data environment, is characterized in, this system comprises:
Module is collected in electricity business's website user's implicit expression behavior, and it collects electricity business's website user's implicit expression user behaviors log information;
User concealed behavior feedback matrix sets up module, and it is according to user concealed behavior, sets up user and comments every kind of commodity The weight divided, the user of structure-commodity scoring two-dimensional matrix, obtain user concealed behavior feedback matrix;
User concealed behavior feedback matrix decomposing module, its user-commodity decomposing user concealed behavior feedback matrix are commented Dividing two-dimensional matrix is several submatrixs;
Parallelization module, matrix optimal models is become parallel processing, the training of off-line recommended models by serial process by it;
User concealed behavior online acquisition module, it gathers the user concealed user behaviors log information of online real-time stream;
Online user's implicit expression behavior processing module, its filtration and process online user's implicit expression user behaviors log information, it is right to obtain The commodity ID answered and ID;
Real-time recommendation result test and appraisal module, it generates user's real-time commercial product recommending row online according to commodity ID and ID Table, and according to user concealed behavior, the result of real-time recommendation is evaluated and tested;
Prediction accuracy judge module, it judges that whether prediction accuracy is more than or equal to threshold value Q;
Real-time recommendation List Generating Module, it generates real-time recommendation list according to prediction accuracy judged result and feeds back to use Family.
Compared to the prior art, its advantage exists electric business website real-time recommendation System and method under the big data environment of the present invention In, the present invention designs and builds distributed information log collection based on LogStash and Kafka and distributed information log transport module, profit With being implanted in the software of application gateway, obtain access log produced by all external user calling system interfaces, and be delivered to Kafka cluster is unified to be transmitted, and solves cross-system log collection and transmission problem, decrease system cluster huge time, gather and pass The human cost of defeated miscellaneous service daily record;
The present invention utilizes Spark Streaming real-time streams treatment technology, and the user row of Kafka cluster transmission is uniformly processed For daily record, carry out real time data filtration, all types of user behavior is classified, and write Hive, unified commending system data source, The application scenarios of big data can be applied to very well.
The present invention provides a kind of real-time recommendation system, utilizes the unified reading User action log from Hive of Spark Sql, And punish and normalized, use matrix decomposition model training data source based on Spark parallelization afterwards, by result In write Redis caching system, optimize web site performance, use Spark Streaming real-time streams technology to real-time use simultaneously Family access log does real-time recommendation and processes, and updates Redis, the real-time result with off-line is merged, and promotes electricity business website The positive acting of user experience and the ability of increase electricity business website trading volume.
Accompanying drawing explanation
Fig. 1 is the method flow diagram of the electric business website real-time recommendation method under a kind of big data environment of the present invention;
Fig. 2 is recommended engine schematic diagram;
Fig. 3 is the flow chart that off-line recommended models based on Spark parallelization calculates;
Fig. 4 is the Organization Chart of real-time recommendation system based on Spark.
Detailed description of the invention
Below in conjunction with accompanying drawing, further illustrate the specific embodiment of the present invention.
The present invention provides the electric business website real-time recommendation method under a kind of big data environment, and the method includes the steps of:
Step 1, first electricity consumption business website user implicit expression user behaviors log information training off-line recommended models.
Off-line recommended models training package contains: constitute user by a large number of users implicit expression behavioural information collecting electricity business website hidden Formula behavioural matrix, and by the decomposition-training optimal off-line recommended models of user concealed behavioural matrix
Step 2, online acquisition user concealed user behaviors log information, and process skill with distributed storage technology and distributed stream The mass users implicit expression user behaviors log information gathered quickly is processed by art.
The collection of online user's implicit expression user behaviors log information and process, refer in electricity business's Website front-end gateway online acquisition User concealed user behaviors log information also carries out filtration treatment, provides data source for online real-time recommendation.
Step 3, will training off-line recommended models and through distributed stream process up-to-date user concealed behavioural information merge Get up, provide the user up-to-date commercial product recommending list, it is achieved real-time online is recommended.
As it is shown in figure 1, be the embodiment of the electric business website real-time recommendation method under a kind of big data environment, the method comprises Following steps:
S101, collection electricity business's website user's implicit expression user behaviors log information.
There is the user of magnanimity electricity business website, including registration user and nonregistered user.Electric business website under big data environment: (1) including the user that number is above in terms of 100,000, these users include registering user and nonregistered user;(2) more than ten thousand kinds are included Commodity be available for user select browse or on-line purchase, these commodity can be original existing commodity, it is also possible to is newly added Commodity.
The quantity of electricity business website user has substantial connection with the recommendation quality of electricity business's network recommendation system, directly affects electricity business The economic and social benefit of website.
User concealed behavior comprises: user browses commodity, user adds commodity and deletes business to shopping cart, user from shopping cart Product, user's pay invoice, user cancel an order and the behavior such as user's collecting commodities.By these user behaviors are carried out weight Classification, can obtain user for the scoring of each commodity in electricity business website.
S102, set up the weight that every kind of commodity are marked by user, the user of structure-commodity scoring two-dimensional matrix, used Family implicit expression behavior feedback matrix.
Owing to hobby and the purchasing power of user self have certain limitation, user can produce the commodity number of behavior The percentage ratio accounting for whole commodity is the lowest so that the degree of rarefication of user-commodity scoring weight matrix is the lowest.Further, along with electricity business's net The expansion stood, user concealed behavior feedback matrix scale is the most increasing.
S103, to decompose the user of user concealed behavior feedback matrix-commodity scoring two-dimensional matrix be several submatrixs, fall The complexity that low calculating off-line is recommended.
In S103, the decomposition of user concealed behavior feedback matrix comprises:
If user-commodity scoring two-dimensional matrix is R, each element r in matrix RuiRepresent the user u scoring to article i, Then from Singular Value Decomposition Using principle, matrix R can be to be decomposed into the form of several matrix multiples, such as formula (1):
R=UTM (1)
In formula (1), user's matrix UTRepresent the enigmatic language justice property parameters (hiding user personality parameter) of user, article square Battle array M represents the enigmatic language justice property parameters (hiding item characteristics parameter) of article.
In formula (1), user's matrix UTComprise with the method for solving of article matrix M:
1) structure loss function f (U, M), such as formula (2);
f ( U , M ) = Σ ( i , j ) ∈ I ( r i j - u i T m j ) 2 + λ ( Σ i n u i | | u i | | 2 + Σ j n m j | | m j | | 2 ) - - - ( 2 )
In formula (2), I represents consumer articles scoring set, rijIt is element in matrix R,It it is required matrix UT Element (OK), represent user all items set, mjIt is the element (arranging) of required matrix M, represents all of article Owner gathers.Loss function is in order to Expired Drugs occurs in solution procedure on the right of plus sige.
2) use alternating least-squares (ALS, Alternating Least Square) minimize loss function f (U, M)。
That is, input: user rating matrix R;Output: matrix U and M.Time initial, generate matrix U and the representative representing user The matrix M of article, through repeatedly iterating the user's matrix U and optimum article matrix M drawing optimum.
Wherein, method of least square (ALS), also known as least square method, is a kind of mathematical optimization techniques.It is by minimizing by mistake The quadratic sum of difference finds the optimal function coupling of data.Utilize method of least square can try to achieve the data of the unknown easily, and make Obtaining the quadratic sum of error between these data tried to achieve and real data is minimum.
The concrete grammar that above-mentioned loss function f (U, M) uses alternating least-squares to carry out minimizing comprises:
A) M is definite value, calculates each feature u of each userkiAnd update, k represents any one feature of user.By formula (2) to ukiSeek local derviation, make local derviation be equal to 0 and solve:
1 2 ∂ f ∂ u k i = 0 , ∀ i , k - - - ( 3 )
⇒ Σ j ∈ I i ( u i T m j - r i j ) m k j + λn u i u k i = 0 , ∀ i , k - - - ( 4 )
⇒ Σ j ∈ I i m k j m j T u i + λn u i u k i = Σ j ∈ I i m k j r i j , ∀ i , k - - - ( 5 )
⇒ ( M I i M I i T + λn u i E ) u i = M I i R T ( i , I i ) , ∀ i - - - ( 6 )
⇒ u i = A i - 1 V i , ∀ i - - - ( 7 )
In formula (7)Wherein E is nf×nfUnit matrix,Represent As row j ∈ IiThe submatrix of matrix M, R (i, I when being choseni) represent as row j ∈ IiThe row vector of i-th row of selected matrix R.
B) U is definite value, calculates each feature m of each articlekjAnd update, k represents any one feature of article, right mkjSeek local derviation, make local derviation be equal to 0, use the method for formula (3) for mj, can obtain formula (8):
⇒ m j = A j - 1 V j , ∀ j - - - ( 8 )
In formula (8) Represent as row i ∈ IjIt is chosen Time U submatrix, R (Ij, j) represent as row i ∈ IjThe row vector of the jth row of selected R.
C) obtain user's matrix U and article matrix M according to formula (7) and formula (8), calculate the root-mean-square error of matrix model RMSE, the optimum user's matrix U obtained by successive ignition and optimum article matrix M, i.e. user's matrix U and article matrix The optimal models of M.Owing to matrix decomposition model based on implicit feedback greatly reduces the dimension of user-commodity rating matrix, Computational efficiency is obtained promote.
Here, root-mean-square error defines such as formula (9):
R M S E = Σ u , i ∈ T ( r u i - r ^ u i ) 2 2 - - - ( 9 )
In formula (9), u and i is the user during test data set is closed and article,It is that the prediction that commending system obtains is marked, ruiIt it is the user u true scoring to article i.
S104, by serial process data being become parallel processing, off-line recommended models is trained.
Wherein parallelization off-line recommended models training package contains: by optimum user's matrix UTWith optimum article matrix M by Serial process becomes parallel processing, improves optimum user's matrix UTWith the training speed of optimum article matrix M, improve big number According to the data-handling efficiency under environment.
S105, gather the user concealed user behaviors log information of online real-time stream.
Wherein, online real-time stream, refer to that electricity business website arrives when normal work or leaves each of front-end server Plant data stream.
User concealed user behaviors log, records user's various implicit expression behaviors relative to article in electricity business website.
S106, filter and process online user's implicit expression user behaviors log information, from the user concealed behavior of online real time collecting Log information removes various interference information, from daily record, extracts commodity ID and the ID of correspondence, for real-time recommendation system Use.
S107, the commodity ID obtained according to S106 and ID, online generation user's real-time commercial product recommending list, and according to The result of real-time recommendation is evaluated and tested by user concealed behavior.
Here, real-time recommendation refers to the user concealed user behaviors log data according to electricity business website online real time collecting, and melts Close off-line recommended models and generate user's Recommendations list.
S108, judge prediction accuracy whether more than or equal to threshold value Q, if then jumping to S109, if otherwise jumping to S104.Predetermined threshold value Q refers to the electricity business website received minimum standards of prediction accuracy.
Wherein, it was predicted that accuracy uses root-mean-square error to judge, its definition is as shown in formula (10):
Wherein, k is proportionality coefficient.
S109, generation real-time recommendation list feed back to user.
The invention also discloses the electric business website real-time recommendation system under a kind of big data environment, this system comprises:
Module is collected in electricity business's website user's implicit expression behavior, and it collects electricity business's website user's implicit expression user behaviors log information;
User concealed behavior feedback matrix sets up module, and it is according to user concealed behavior, sets up user and comments every kind of commodity The weight divided, the user of structure-commodity scoring two-dimensional matrix, obtain user concealed behavior feedback matrix;
User concealed behavior feedback matrix decomposing module, its user-commodity decomposing user concealed behavior feedback matrix are commented Dividing two-dimensional matrix is several submatrixs;
Parallelization module, matrix optimal models is become parallel processing, the training of off-line recommended models by serial process by it;
User concealed behavior online acquisition module, it gathers the user concealed user behaviors log information of online real-time stream;
Online user's implicit expression behavior processing module, its filtration and process online user's implicit expression user behaviors log information, it is right to obtain The commodity ID answered and ID;
Real-time recommendation result test and appraisal module, it generates user's real-time commercial product recommending row online according to commodity ID and ID Table, and according to user concealed behavior, the result of real-time recommendation is evaluated and tested;
Prediction accuracy judge module, it judges that whether prediction accuracy is more than or equal to threshold value Q;
Real-time recommendation List Generating Module, it generates real-time recommendation list according to prediction accuracy judged result and feeds back to use Family.
As in figure 2 it is shown, be recommended engine schematic diagram, it is recommended that engine is equivalent to a black box, upper strata is the data source gathered, Lower floor gives the result of each user.The data source that commending system receives is of a great variety, as a example by electricity business website, can receive article base This information, the information such as including the title of commodity, price, specification, the essential information of user can be received, such as the sex of user, the age, Hobby, and the hobby of user is collected by some technological means, set up user's portrait.In user behavior preference, main It is divided into two big classes, the explicit behavior of user and user concealed behavior.The explicit behavior of user is from straight for commodity of user Take point and the comment of user, it is possible to show the Behavior preference of user more intuitively, but have extra running cost, gather Data volume to compare implicit expression behavior smaller.User concealed behavior comes from user and expresses spontaneous during self Behavior preference carry out User operation, as user accesses commodity details page, interpolation commodity to shopping cart, deletes commodity, pay invoice, pass from shopping cart The behaviors such as note Brand all can be as user concealed behavior, by setting up the user concealed behavior of forward and negative sense, it is possible to very Infer well the Behavior preference of user.
As it is shown on figure 3, be the method flow of off-line recommended models based on Spark parallelization calculating, its concrete steps bag Contain:
S301, acquisition user's score data RDD (Resilient Distributed Datasets).RDD is a kind of point The abstract conception of cloth internal memory, is fault-tolerant, a parallel data structure, can split large-scale data set to little Data block, is stored on one or more node of cluster, and user can be allowed explicitly to control the position of data storage And partitioning scenario.
S302, generation user's matrix U and the RDD of article matrix M, i.e. user's matrix U and the RDD of article matrix M.
S303, fixing U solve M, correspond to above-mentioned formula (8), are updated original M simultaneously, and by M at Spark Broadcasting, because in parallel environment, the data of M are distributed on different nodes, need to allow other nodes know that M updates ?.
S304, fixing M solve U, correspond to above-mentioned formula (7), are updated original U simultaneously, and by U at Spark In broadcast because in parallel environment, the data of U are distributed on different nodes, need to allow other nodes know that M updates ?.
S305, according to set iterations, observe RMSE change, select optimal training parameter.
The basis of off-line recommended models based on Spark is square based on ALS (Alternating Least Squares) Battle array decomposition model, in the processing procedure of electricity business website, needs the sales tactics according to electricity business website to adjust commending system Whole.
As shown in Figure 4, it is the framework of a kind of real-time recommendation system based on Spark, is divided into three levels: processed offline Layer, service layer and process layer in real time.
In service layer, application gateway server is installed distributed information log Collection agent Agent, gathers and access each industry The log information of business system.Owing to the daily record quantum of output of electricity business website is huge, need reliable messaging middleware as mould Type training and data source gather between tie, system constructing message distribution middleware based on Kafka cluster, it is achieved daily record The unification of data issues.Owing to daily record data comprising daily record and the daily record of user's click steam of each operation system, Enter off-line or before the real-time recommendation stage, need to be through unified data cleansing.Use the Spark Streaming of Spark platform Technology realizes the real-time process of daily record, and Spark Streaming technology can be received in Fixed Time Interval according to time slicing To data carry out unifying batch processing, the effect processed in real time can be reached, and there is the highest handling capacity.
At processed offline layer, after the data source collection of real-time recommendation, the user behavior in data source is carried out The classification of weight, obtains the user's basic scoring for certain commodity, and inputs recommended models training.Traditional scheme is to use The off-line recommended models training of Hadoop platform, but there are three problems in Hadoop platform: and one is that abstraction hierarchy is low, needs to write The very code of redundancy completes operation;Two is that Hadoop platform provides only Map and Reduce two operation, and ability to express is short of;Three It is to process intermediate object program to be stored in HDFS file system so that slowly (intermediate data to pass through to calculate iterative task speed Hard disk cache).To utilize RDD to carry out abstract for the Spark platform that the present invention uses, it is achieved mathematical logic compare Hadoop platform more Briefly, provide multiple conversion and operation simultaneously, there is the strongest expressiveness.Meanwhile, relative to Hadoop platform, Spark platform Results of intermediate calculations can be buffered in internal memory, for need a lot of iterative computation recommendation task, improve computational efficiency.
Processing layer in real time, be the user concealed behavior according to the same day or the same day each time period, the behavior extracting user is inclined The real-time commercial product recommending list that good implicit expression behavioral data real-time update off-line recommended models provide the user, thus be reached for using Family provides the purpose of real-time recommendation.
This method is illustrated below by an embodiment.
Certain electricity business website uses 3 Cloud Servers of electric business's real-time recommendation system building based on Spark platform, and trustship exists On Ali's cloud, undertake and be up to 1600 general-purpose families access every day.Use in December, 2015 this electricity business in November, 2015 Hadoop off-line commending system, uses real-time recommendation system based on Spark platform from April, 2016 in January, 2016 to, and it is purchased The conversion of thing car is as shown in table 1, and its order conversion ratio is as shown in table 2.As known from Table 1, the conversion of shopping cart improves about 3.5 Times, order conversion ratio improves about 2.5 times as can be seen from the table.
Table 1 shopping cart conversion ratio
Table 2 order conversion ratio
Although present disclosure has been made to be discussed in detail by above preferred embodiment, but it should be appreciated that above-mentioned Description is not considered as limitation of the present invention.After those skilled in the art have read foregoing, for the present invention's Multiple amendment and replacement all will be apparent from.Therefore, protection scope of the present invention should be limited to the appended claims.

Claims (10)

1. the electric business website real-time recommendation method under a big data environment, it is characterised in that the method comprises:
Electricity consumption business's website user's implicit expression user behaviors log information training off-line recommended models;
Online acquisition user concealed user behaviors log information, and with distributed storage technology and distributed stream treatment technology to gathering Mass users implicit expression user behaviors log information quickly processes;
Merge to get up, for user by training off-line recommended models and the up-to-date user concealed behavioural information processed through distributed stream Up-to-date commercial product recommending list is provided.
2. the electric business website real-time recommendation method under big data environment as claimed in claim 1, it is characterised in that described electricity consumption Business's website user's implicit expression user behaviors log information training off-line recommended models comprises:
S101, collection electricity business's website user's implicit expression user behaviors log information;
S102, set up the weight that every kind of commodity are marked by user, the user of structure-commodity scoring two-dimensional matrix, obtain user hidden Formula behavior feedback matrix;
S103, to decompose the user of user concealed behavior feedback matrix-commodity scoring two-dimensional matrix be several submatrixs;
S104, optimum submatrix is become parallel processing by serial process, train off-line recommended models.
3. the electric business website real-time recommendation method under data environment as claimed in claim 2 big, it is characterised in that described online Gather user concealed user behaviors log information, and the mass users to gathering by distributed storage technology and distributed stream treatment technology Implicit expression user behaviors log information quickly processes and comprises:
S105, the user concealed user behaviors log information gathered in online real-time stream;
S106, filter and process online user's implicit expression user behaviors log information, obtain commodity ID and the ID of correspondence.
4. the electric business website real-time recommendation method under the biggest data environment, it is characterised in that described Merge by training off-line recommended models and the up-to-date user concealed behavioural information processed through distributed stream, provide the user Up-to-date commercial product recommending list comprises:
S107, generate user's real-time commercial product recommending list online according to commodity ID and ID, and according to user concealed behavior pair The result of real-time recommendation is evaluated and tested;
S108, judge prediction accuracy whether more than or equal to threshold value Q, if then jumping to S109, if otherwise jumping to S104;
S109, generation real-time recommendation list feed back to user.
5. the electric business website real-time recommendation method under big data environment as claimed in claim 2, it is characterised in that described S103 In, the decomposition of user concealed behavior feedback matrix comprises:
If user-commodity scoring two-dimensional matrix is R, each element r in matrix RuiRepresent the user u scoring to article i, then by Singular Value Decomposition Using principle understands, and matrix R can be to be decomposed into the form of several matrix multiples, such as formula (1):
R=UTM (1)
In formula (1), user's matrix UTRepresenting the enigmatic language justice property parameters of user, article matrix M represents the hidden semantic attribute ginseng of article Number.
6. the electric business website real-time recommendation method under big data environment as claimed in claim 5, it is characterised in that described user Matrix UTComprise with the method for solving of article matrix M:
Structure loss function f (U, M), such as formula (2);
f ( U , M ) = Σ ( i , j ) ∈ I ( r i j - u i T m j ) 2 + λ ( Σ i n u i | | u i | | 2 + Σ j n m j | | m j | | 2 ) - - - ( 2 )
In formula (2), I represents consumer articles scoring set, rijIt is element in matrix R,It it is required matrix UTUnit Element, represents all items set of user, mjIt is the element of required matrix M, represents all owners set of article;
Use alternating least-squares minimize loss function f (U, M), through repeatedly iterate draw optimum user's matrix U with Optimum article matrix M.
7. the electric business website real-time recommendation method under big data environment as claimed in claim 6, it is characterised in that described loss Function f (U, M) uses alternating least-squares to carry out minimizing comprising:
A) M is definite value, calculates each feature u of each userkiAnd update, k represents any one feature of user.By right for formula (2) ukiSeek local derviation, make local derviation be equal to 0 and solve:
1 2 ∂ f ∂ u k i = 0 , ∀ i , k - - - ( 3 )
⇒ Σ j ∈ I i ( u i T m j - r i j ) m k j + λn u i u k i = 0 , ∀ i , k - - - ( 4 )
⇒ Σ j ∈ I i m k j m j T u i + λn u i u k i = Σ j ∈ I i m k j r i j , ∀ i , k - - - ( 5 )
⇒ ( M I i M I i T + λn u i E ) u i = M I i R T ( i , I i ) , ∀ i - - - ( 6 )
⇒ u i = A i - 1 V i , ∀ i - - - ( 7 )
In formula (7)Wherein E is nf×nfUnit matrix,Represent when row j∈IiThe submatrix of matrix M, R (i, I when being choseni) represent as row j ∈ IiThe row vector of i-th row of selected matrix R;
B) U is definite value, calculates each feature m of each articlekjAnd update, k represents any one feature of article, to mkjAsk inclined Lead, make local derviation be equal to 0, use the method for formula (3) for mj, can obtain formula (8):
⇒ m j = A j - 1 V j , ∀ j - - - ( 8 )
In formula (8) Represent as row i ∈ IjU when being chosen Submatrix, R (Ij, j) represent as row i ∈ IjThe row vector of the jth row of selected R;
C) obtain user's matrix U and article matrix M according to formula (7) and formula (8), calculate root-mean-square error RMSE of matrix model, User's matrix U of optimum and the optimum of optimum article matrix M, i.e. user's matrix U with article matrix M is obtained by successive ignition Model.
8. the electric business website real-time recommendation method under big data environment as claimed in claim 7, it is characterised in that described mean square Root error defines such as formula (9):
R M S E = Σ u , i ∈ T ( r u i - r ^ u i ) 2 2 - - - ( 9 )
In formula (9), u and i is the user during test data set is closed and article,It is that the prediction that commending system obtains is marked, ruiIt is The user u true scoring to article i.
9. the electric business website real-time recommendation method under the biggest data environment, it is characterised in that described In S108, it was predicted that accuracy uses root-mean-square error to judge, its definition is as shown in formula (10):
Wherein, k is proportionality coefficient.
10. the electric business website real-time recommendation system under a big data environment, it is characterised in that this system comprises:
Module is collected in electricity business's website user's implicit expression behavior, and it collects electricity business's website user's implicit expression user behaviors log information;
User concealed behavior feedback matrix sets up module, and it sets up what every kind of commodity were marked by user according to user concealed behavior Weight, the user of structure-commodity scoring two-dimensional matrix, obtain user concealed behavior feedback matrix;
User concealed behavior feedback matrix decomposing module, it decomposes the user-commodity scoring two of user concealed behavior feedback matrix Dimension matrix is several submatrixs;
Parallelization module, matrix optimal models is become parallel processing, the training of off-line recommended models by serial process by it;
User concealed behavior online acquisition module, it gathers the user concealed user behaviors log information of online real-time stream;
Online user's implicit expression behavior processing module, it filters and processes online user's implicit expression user behaviors log information, obtaining correspondence Commodity ID and ID;
Real-time recommendation result test and appraisal module, it generates user's real-time commercial product recommending list online according to commodity ID and ID, and According to user concealed behavior, the result of real-time recommendation is evaluated and tested;
Prediction accuracy judge module, it judges that whether prediction accuracy is more than or equal to threshold value Q;
Real-time recommendation List Generating Module, it generates real-time recommendation list according to prediction accuracy judged result and feeds back to user.
CN201610710881.5A 2016-08-23 2016-08-23 Electric business website real-time recommendation System and method under big data environment Pending CN106296305A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610710881.5A CN106296305A (en) 2016-08-23 2016-08-23 Electric business website real-time recommendation System and method under big data environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610710881.5A CN106296305A (en) 2016-08-23 2016-08-23 Electric business website real-time recommendation System and method under big data environment

Publications (1)

Publication Number Publication Date
CN106296305A true CN106296305A (en) 2017-01-04

Family

ID=57615816

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610710881.5A Pending CN106296305A (en) 2016-08-23 2016-08-23 Electric business website real-time recommendation System and method under big data environment

Country Status (1)

Country Link
CN (1) CN106296305A (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107171843A (en) * 2017-05-23 2017-09-15 上海海事大学 A kind of system of selection of preferable cloud service provider and system
CN107277159A (en) * 2017-07-10 2017-10-20 东南大学 A kind of super-intensive network small station caching method based on machine learning
CN107341206A (en) * 2017-06-23 2017-11-10 南京甄视智能科技有限公司 Accurately user's portrait system and method is built based on multiple data sources
CN107590213A (en) * 2017-08-29 2018-01-16 重庆邮电大学 Mixing commending system based on mobile phone big data
CN107895026A (en) * 2017-11-17 2018-04-10 联奕科技有限公司 A kind of implementation method of campus user portrait
CN108596720A (en) * 2018-04-23 2018-09-28 广东奥园奥买家电子商务有限公司 A method of commercial product recommending is carried out according to the behavioral data of user
WO2018176937A1 (en) * 2017-03-27 2018-10-04 华南理工大学 Quantitative scoring method for implicit feedback of user
WO2018205853A1 (en) * 2017-05-10 2018-11-15 腾讯科技(深圳)有限公司 Distributed computing system and method and storage medium
CN108875776A (en) * 2018-05-02 2018-11-23 北京三快在线科技有限公司 Model training method and device, business recommended method and apparatus, electronic equipment
CN108876508A (en) * 2018-05-03 2018-11-23 上海海事大学 A kind of electric business collaborative filtering recommending method
CN109087162A (en) * 2018-07-05 2018-12-25 杭州朗和科技有限公司 Data processing method, system, medium and calculating equipment
CN109102127A (en) * 2018-08-31 2018-12-28 杭州贝购科技有限公司 Method of Commodity Recommendation and device
CN109146606A (en) * 2018-07-09 2019-01-04 广州品唯软件有限公司 A kind of brand recommended method, electronic equipment, storage medium and system
CN109408711A (en) * 2018-09-29 2019-03-01 北京三快在线科技有限公司 Data filtering method, device, electronic equipment and storage medium
CN109635186A (en) * 2018-11-16 2019-04-16 华南理工大学 A kind of real-time recommendation method based on Lambda framework
CN109690571A (en) * 2017-04-20 2019-04-26 北京嘀嘀无限科技发展有限公司 Group echo system and method based on study
CN109783465A (en) * 2018-12-25 2019-05-21 同济大学 Magnanimity threedimensional model integrated platform under a kind of cloud computing framework
CN109816495A (en) * 2019-02-13 2019-05-28 北京达佳互联信息技术有限公司 Merchandise news method for pushing, system and server and storage medium
CN109903103A (en) * 2017-12-07 2019-06-18 华为技术有限公司 A kind of method and apparatus for recommending article
CN110175287A (en) * 2019-05-22 2019-08-27 湖南大学 A kind of matrix decomposition implicit feedback recommended method and system based on Flink
CN110674964A (en) * 2019-05-14 2020-01-10 南京邮电大学 Search prediction system and method based on agricultural traceability information
CN110674408A (en) * 2019-09-30 2020-01-10 北京三快在线科技有限公司 Service platform, and real-time generation method and device of training sample
CN110754075A (en) * 2017-10-13 2020-02-04 美的集团股份有限公司 Method and system for providing personalized live information exchange
CN111681085A (en) * 2020-06-10 2020-09-18 创新奇智(成都)科技有限公司 Commodity pushing method and device, server and readable storage medium
CN113112333A (en) * 2021-04-27 2021-07-13 湖南云畅网络科技有限公司 Data stream processing method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345698A (en) * 2013-07-09 2013-10-09 焦点科技股份有限公司 Personalized recommendation method based on cloud processing mode and applied in e-business environment
CN104133837A (en) * 2014-06-24 2014-11-05 上海交通大学 Internet information putting channel optimizing system based on distributed computing
CN105183841A (en) * 2015-09-06 2015-12-23 南京游族信息技术有限公司 Recommendation method in combination with frequent item set and deep learning under big data environment
CN105488216A (en) * 2015-12-17 2016-04-13 上海中彦信息科技有限公司 Recommendation system and method based on implicit feedback collaborative filtering algorithm

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345698A (en) * 2013-07-09 2013-10-09 焦点科技股份有限公司 Personalized recommendation method based on cloud processing mode and applied in e-business environment
CN104133837A (en) * 2014-06-24 2014-11-05 上海交通大学 Internet information putting channel optimizing system based on distributed computing
CN105183841A (en) * 2015-09-06 2015-12-23 南京游族信息技术有限公司 Recommendation method in combination with frequent item set and deep learning under big data environment
CN105488216A (en) * 2015-12-17 2016-04-13 上海中彦信息科技有限公司 Recommendation system and method based on implicit feedback collaborative filtering algorithm

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018176937A1 (en) * 2017-03-27 2018-10-04 华南理工大学 Quantitative scoring method for implicit feedback of user
CN109690571B (en) * 2017-04-20 2020-09-18 北京嘀嘀无限科技发展有限公司 Learning-based group tagging system and method
CN109690571A (en) * 2017-04-20 2019-04-26 北京嘀嘀无限科技发展有限公司 Group echo system and method based on study
WO2018205853A1 (en) * 2017-05-10 2018-11-15 腾讯科技(深圳)有限公司 Distributed computing system and method and storage medium
CN107171843A (en) * 2017-05-23 2017-09-15 上海海事大学 A kind of system of selection of preferable cloud service provider and system
CN107171843B (en) * 2017-05-23 2019-07-09 上海海事大学 A kind of selection method and system of ideal cloud service provider
CN107341206B (en) * 2017-06-23 2019-11-29 南京甄视智能科技有限公司 The method for constructing accurately user's portrait system based on multiple data sources
CN107341206A (en) * 2017-06-23 2017-11-10 南京甄视智能科技有限公司 Accurately user's portrait system and method is built based on multiple data sources
CN107277159B (en) * 2017-07-10 2020-05-08 东南大学 Ultra-dense network small station caching method based on machine learning
CN107277159A (en) * 2017-07-10 2017-10-20 东南大学 A kind of super-intensive network small station caching method based on machine learning
CN107590213A (en) * 2017-08-29 2018-01-16 重庆邮电大学 Mixing commending system based on mobile phone big data
CN110754075A (en) * 2017-10-13 2020-02-04 美的集团股份有限公司 Method and system for providing personalized live information exchange
US10789638B2 (en) 2017-10-13 2020-09-29 Midea Group Co., Ltd. Method and system for providing personalized on-location information exchange
CN107895026A (en) * 2017-11-17 2018-04-10 联奕科技有限公司 A kind of implementation method of campus user portrait
CN109903103A (en) * 2017-12-07 2019-06-18 华为技术有限公司 A kind of method and apparatus for recommending article
CN108596720A (en) * 2018-04-23 2018-09-28 广东奥园奥买家电子商务有限公司 A method of commercial product recommending is carried out according to the behavioral data of user
CN108875776B (en) * 2018-05-02 2021-08-20 北京三快在线科技有限公司 Model training method and device, service recommendation method and device, and electronic device
CN108875776A (en) * 2018-05-02 2018-11-23 北京三快在线科技有限公司 Model training method and device, business recommended method and apparatus, electronic equipment
CN108876508A (en) * 2018-05-03 2018-11-23 上海海事大学 A kind of electric business collaborative filtering recommending method
CN109087162A (en) * 2018-07-05 2018-12-25 杭州朗和科技有限公司 Data processing method, system, medium and calculating equipment
CN109146606B (en) * 2018-07-09 2022-02-22 广州品唯软件有限公司 Brand recommendation method, electronic equipment, storage medium and system
CN109146606A (en) * 2018-07-09 2019-01-04 广州品唯软件有限公司 A kind of brand recommended method, electronic equipment, storage medium and system
CN109102127B (en) * 2018-08-31 2021-10-26 杭州贝购科技有限公司 Commodity recommendation method and device
CN109102127A (en) * 2018-08-31 2018-12-28 杭州贝购科技有限公司 Method of Commodity Recommendation and device
CN109408711A (en) * 2018-09-29 2019-03-01 北京三快在线科技有限公司 Data filtering method, device, electronic equipment and storage medium
CN109635186A (en) * 2018-11-16 2019-04-16 华南理工大学 A kind of real-time recommendation method based on Lambda framework
CN109783465B (en) * 2018-12-25 2023-09-08 吉林动画学院 Mass three-dimensional model integration system under cloud computing framework
CN109783465A (en) * 2018-12-25 2019-05-21 同济大学 Magnanimity threedimensional model integrated platform under a kind of cloud computing framework
CN109816495A (en) * 2019-02-13 2019-05-28 北京达佳互联信息技术有限公司 Merchandise news method for pushing, system and server and storage medium
CN109816495B (en) * 2019-02-13 2020-11-24 北京达佳互联信息技术有限公司 Commodity information pushing method, system, server and storage medium
CN110674964A (en) * 2019-05-14 2020-01-10 南京邮电大学 Search prediction system and method based on agricultural traceability information
CN110175287A (en) * 2019-05-22 2019-08-27 湖南大学 A kind of matrix decomposition implicit feedback recommended method and system based on Flink
CN110674408A (en) * 2019-09-30 2020-01-10 北京三快在线科技有限公司 Service platform, and real-time generation method and device of training sample
CN110674408B (en) * 2019-09-30 2021-06-04 北京三快在线科技有限公司 Service platform, and real-time generation method and device of training sample
CN111681085A (en) * 2020-06-10 2020-09-18 创新奇智(成都)科技有限公司 Commodity pushing method and device, server and readable storage medium
CN113112333A (en) * 2021-04-27 2021-07-13 湖南云畅网络科技有限公司 Data stream processing method and system

Similar Documents

Publication Publication Date Title
CN106296305A (en) Electric business website real-time recommendation System and method under big data environment
Ma et al. Recommender systems with social regularization
Yang et al. Friend or frenemy? Predicting signed ties in social networks
CN105488216A (en) Recommendation system and method based on implicit feedback collaborative filtering algorithm
CN101226557B (en) Method for processing efficient relating subject model data
CN102254028A (en) Personalized commodity recommending method and system which integrate attributes and structural similarity
CN106600302A (en) Hadoop-based commodity recommendation system
CN104268292A (en) Label word library update method of portrait system
CN104090886A (en) Method and device for constructing real-time portrayal of user
CN103514239A (en) Recommendation method and system integrating user behaviors and object content
CN103345698A (en) Personalized recommendation method based on cloud processing mode and applied in e-business environment
CN105868334A (en) Personalized film recommendation method and system based on feature augmentation
Huang et al. A modal interval based method for dynamic decision model considering uncertain quality of used products in remanufacturing
CN114647465B (en) Single program splitting method and system for multi-channel attention map neural network clustering
CN104424247A (en) Product information filtering recommendation method and device
KR20220063350A (en) Method and apparatus for providing marketing information
CN112612942A (en) Social big data-based fund recommendation system and method
Li Accurate digital marketing communication based on intelligent data analysis
Grolman et al. Utilizing transfer learning for in-domain collaborative filtering
Zhang Optimization of the marketing management system based on cloud computing and big data
Giri et al. Exploitation of social network data for forecasting garment sales
CN106021391B (en) Product review information real-time collecting method based on Storm
Minjing et al. Recognizing intentions of E-commerce consumers based on ant colony optimization simulation
Karimi-Majd et al. Extracting new ideas from the behavior of social network users
Antulov-Fantulin et al. Ecml-pkdd 2011 discovery challenge overview

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170104