CN110569446B - Method and system for constructing recommended object candidate set - Google Patents

Method and system for constructing recommended object candidate set Download PDF

Info

Publication number
CN110569446B
CN110569446B CN201910831714.XA CN201910831714A CN110569446B CN 110569446 B CN110569446 B CN 110569446B CN 201910831714 A CN201910831714 A CN 201910831714A CN 110569446 B CN110569446 B CN 110569446B
Authority
CN
China
Prior art keywords
classes
user
candidate set
target
objects
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910831714.XA
Other languages
Chinese (zh)
Other versions
CN110569446A (en
Inventor
刘正夫
周振华
张孝丹
武润鹏
伍思恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
4Paradigm Beijing Technology Co Ltd
Original Assignee
4Paradigm Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 4Paradigm Beijing Technology Co Ltd filed Critical 4Paradigm Beijing Technology Co Ltd
Priority to CN201910831714.XA priority Critical patent/CN110569446B/en
Publication of CN110569446A publication Critical patent/CN110569446A/en
Application granted granted Critical
Publication of CN110569446B publication Critical patent/CN110569446B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Finance (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method and system for constructing a candidate set of recommended objects is provided. The method comprises the following steps: acquiring an object data set and a user behavior data set, wherein the user behavior data comprises associated information between a user and an object; clustering the object data sets to form a plurality of classes; for each target object, obtaining a first number of classes which are most similar to the class to which the target object belongs by calculating similarity between different classes, and obtaining a second number of objects which are closest to the target object as an object candidate set of the target object by calculating similarity between the target object and the target object in the class to which the target object belongs and the first number of classes by using user behavior data; and for each target user, constructing a recommended object candidate set for the target user based on the user behavior data set and the object candidate set of each target object.

Description

Method and system for constructing recommended object candidate set
Technical Field
The invention relates to a method and a system for constructing a candidate set of recommended objects.
Background
There are a large number of objects (e.g., goods, services, virtual goods, etc.) on an e-commerce web site in order to enhance user behavior (e.g., buy/click/collect) on the objects. And each large e-commerce platform customizes a recommendation system and recommends different objects for different users, so that accurate marketing is realized.
The current e-commerce recommendation system generally comprises two steps: 1. generating an object candidate set, and screening out a small number of objects (for example, commodities in the order of hundreds) from a large number of objects (for example, commodities in the order of tens of millions), so as to reduce the calculation load of the subsequent steps; 2. and sorting, namely scoring the objects in the object candidate set, sorting the objects according to the scores, and recommending the objects sorted in the front to the corresponding users. The traditional method for constructing the object candidate set mainly comprises object content-based recommendation and collaborative filtering-based recommendation.
"object content-based recommendation" mainly calculates the similarity between objects according to the attributes of the objects, and then recommends similar objects to related users. This method usually calculates the similarity between all the objects, and thus, when the number of objects is N, the time complexity of this method is O (N)2) For e-commerce scenes, such a method takes a lot of time to calculate the similarity between objects due to the large number of objects.
The "collaborative filtering-based recommendation" is mainly to calculate the similarity of users or the similarity of objects according to "user behavior data". However, since the user-object matrix is often a sparse matrix in the e-market scene, the collaborative filtering algorithm does not achieve good results when dealing with the sparse matrix.
Disclosure of Invention
The invention aims to provide a method and a system for constructing a candidate set of a recommendation object, which aim to solve the problem of poor recommendation effect caused by excessive time complexity and/or a collaborative filtering algorithm in the prior art.
The invention provides a method for constructing a candidate set of recommended objects, which comprises the following steps: acquiring an object data set and a user behavior data set, wherein the user behavior data comprises associated information between a user and an object; clustering the object data sets to form a plurality of classes; for each target object, obtaining a first number of classes which are most similar to the class to which the target object belongs by calculating the similarity between different classes, and obtaining a second number of objects which are closest to the target object as an object candidate set of the target object by calculating the similarity between the target object and the target object in the class to which the target object belongs and the first number of classes by using the user behavior data; and for each target user, constructing a recommended object candidate set for each target object based on the user behavior data set and the object candidate set of the target object.
In an embodiment according to the inventive concept, the step of calculating the similarity between the different classes may include: for each class in a plurality of classes, calculating the average value of all objects in the class on each feature to obtain the average feature vector corresponding to the class; for any two of the multiple classes, the similarity between the two classes is obtained by calculating the cosine distance between two average feature vectors respectively corresponding to the two classes.
In an embodiment according to the inventive concept, the step of calculating the similarity between the target object and the object in the class to which the target object belongs and the first number of classes using the user behavior data may include: the similarity between each of the object in the class to which the target object belongs and the first number of classes and the target object satisfies the following equation:
Figure GDA0002374934800000021
wherein i represents an object of the class to which the target object belongs and an object of the first number of classes, j represents the target object, LijDenotes the degree of similarity between i and j, CijRepresenting the number of users, C, who are behaving on both i and j simultaneouslyiAnd CjRepresenting the number of users that produce behavior on i and j, respectively.
In an embodiment according to the inventive concept, each piece of object data in the object data set may include at least one of a price of the object, a category of the object, a discount of the object, and a brand of the object.
In an embodiment according to the inventive concept, the user behavior data may include at least one of purchase data of the object by the user, a browsing record of the object by the user, and a collection record of the object by the user.
In an embodiment according to the inventive concept, the clustering the object data set to form a plurality of classes may include: and clustering the object data sets by adopting a K-means algorithm.
In an embodiment according to the inventive concept, the method may further comprise, prior to clustering the object data set, at least one of the following data processing steps: carrying out one-hot coding processing on discrete features in the object data; normalizing the continuous characteristic of the object data; and performing dimension reduction processing on sparse features in the features of the object data.
In an embodiment according to the inventive concept, the step of performing the dimension reduction processing on the sparse feature among the features of the object data may include: at least one of Singular Value Decomposition (SVD) and Principal Component Analysis (PCA).
In an embodiment according to the inventive concept, the step of constructing, for each target user, a recommended object candidate set for the target user based on the user behavior data set and the object candidate set for each object may include: for each target user, objects that have generated the behavior of the user are excluded from the object candidate set to generate a recommended object candidate set.
Another aspect of the present invention provides a system for constructing a candidate set of recommended objects, the system comprising: the data acquisition unit is used for acquiring an object data set and a user behavior data set, wherein the user behavior data comprises the association information between the user and the object; a classification unit that clusters the object data set to form a plurality of classes; an object candidate set generating unit that obtains, for each target object, a first number of classes that are most similar to a class to which the target object belongs by calculating a similarity between different classes, and obtains, as an object candidate set of the target object, a second number of objects that are closest to the target object by calculating a similarity between an object in the class to which the target object belongs and the first number of classes and the target object by using user behavior data; and a recommended object candidate set generating unit which constructs, for each target user, a recommended object candidate set for the target user based on the user behavior data set and the object candidate set for each target object.
In an embodiment according to the inventive concept, the object candidate set generating unit may include: a class similarity calculation unit calculating a similarity between different classes; and an object similarity calculation unit that calculates a similarity between the objects using the user behavior data.
In an embodiment according to the inventive concept, the class similarity calculation unit may include: the average characteristic vector calculating unit is used for calculating the average value of all objects in the class on each characteristic to obtain an average characteristic vector corresponding to the class for each class in a plurality of classes; and the cosine calculating unit is used for calculating the cosine distance between two average characteristic vectors respectively corresponding to the two classes to obtain the similarity between the two classes for any two classes in the multiple classes.
In an embodiment according to the inventive concept, the object similarity calculation unit may calculate the similarity between each of the object and the target object in the class to which the target object belongs and the first number of classes, and may satisfy the following equation:
Figure GDA0002374934800000031
wherein i represents an object of the class to which the target object belongs and an object of the first number of classes, j represents the target object, LijDenotes the degree of similarity between i and j, CijRepresenting the number of users, C, who are behaving on both i and j simultaneouslyiAnd CjRepresenting the number of users that produce behavior on i and j, respectively.
In an embodiment according to the inventive concept, the classification unit may cluster the object data sets using a K-means algorithm.
In an embodiment according to the inventive concept, the system further comprises: a preprocessing unit that performs at least one of the following data processing steps before the classifying unit clusters the object data set: carrying out one-hot coding processing on discrete features in the object data; normalizing the continuous characteristic of the object data; and performing dimension reduction processing on sparse features in the features of the object data.
In an embodiment according to the inventive concept, the preprocessing unit may perform the dimensionality reduction on the sparse feature among the features of the object data using at least one of SVD and PCA.
In an embodiment according to the inventive concept, the recommended object candidate set generating unit may exclude, for each target user, objects that have produced behaviors of the user in the object candidate set to produce the recommended object candidate set.
Another aspect of the present invention provides a computer-readable storage medium, wherein the computer instructions, when executed by at least one computing device, cause the at least one computing device to perform the method of constructing a candidate set of recommended objects as described above.
Another aspect of the present invention provides a system comprising at least one computing device and at least one storage device storing instructions, wherein the instructions, when executed by the at least one computing device, cause the at least one computing device to perform the method of constructing a candidate set of recommended objects as described above.
According to one or more aspects of the invention, the method and the system for constructing the candidate set of recommended objects generate the candidate set of recommended objects based on the similarity between different classes and the user behavior data, so that the time complexity of calculation is reduced, the problem of large calculation amount of object content-based recommendation is solved, and the user behavior can be better characterized.
Drawings
These and/or other aspects and advantages of the present disclosure will become more apparent and more readily appreciated from the following detailed description of the embodiments of the present disclosure, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flowchart illustrating a method of building a candidate set of recommendation objects according to an exemplary embodiment of the present disclosure;
fig. 2 is a flowchart showing step S30 in fig. 1;
FIG. 3 is a block diagram illustrating a system for building a candidate set of recommendation objects in accordance with an exemplary embodiment of the present disclosure; and
FIG. 4 is a schematic diagram illustrating an environment in which recommendation object candidate set construction using the system of FIG. 3 is applied according to an exemplary embodiment of the present disclosure.
Detailed Description
Embodiments of the present invention are described in detail below with reference to the accompanying drawings.
Fig. 1 is a flowchart illustrating a method of constructing a recommendation object candidate set according to an exemplary embodiment of the present disclosure.
Referring to fig. 1, in step S10, an object data set and a user behavior data set may be obtained, wherein the user behavior data includes association information between a user and an object. Where the object data set relates only to the object, e.g., "price of object", "category of object", "discount of object", "brand of object", etc. The user behavior data set primarily records associations between users and objects, e.g., an association may refer to a behavior that a user produces on an object. For example, the user behavior data set may include "purchase data of the object by the user", "browsing records of the object by the user", "collection records of the object by the user", and so on.
In embodiments of the present disclosure, an "object" may be a good, service, or virtual good.
The step S10 will be explained below by way of example.
For example, the object data set acquired at step S10 is as follows:
Figure GDA0002374934800000051
where p1 through p5 are numbers (e.g., pids) of objects, each object featuring a price, discount, and color.
For example, the user behavior data set acquired at step S10 is as follows:
Figure GDA0002374934800000052
where u1 through u5 are numbers (e.g., uid) of users, and p1 through p5 are pids in the object dataset. If the user has made an action on the objects p1 to p5, it is 1, otherwise it is 0. Here, the behavior of the user may include purchase of the object by the user, browsing of the object by the user, collection of the object by the user, and the like.
In step S20, the object data set may be clustered to form a plurality of classes. The K-means algorithm (Kmeans) can be selected as the clustering algorithm, and the Kmeans algorithm is one of the most common clustering algorithms in the industry at present, because the Kmeans has the advantages of simple implementation, easy parallelism, high calculation speed and the like. After clustering is completed, each object obtains a category identification.
In an example embodiment, before clustering the object data set (i.e., step S20), a step of performing preprocessing on the object data set may be further included (not shown).
For example, the preprocessing step may include a One-Hot Encoding (One-Hot Encoding) process on the discrete features in the object data to quantize the discrete data for calculation in the subsequent step.
For example, the preprocessing step may include normalizing the continuous type features of the object data. The specific normalization process may use a log function (y ═ ln (x)) to unify dimension units between different feature dimensions, so as to avoid the effect that some features with larger relative values completely dominate the clustering.
For example, the preprocessing step may include performing dimension reduction on sparse ones of the features of the object data. If there are many sparse features (most objects have a feature of 0), then dimension reduction is required. A typical dimension reduction method may employ at least one of Singular Value Decomposition (SVD) and Principal Component Analysis (PCA).
In the above-described example embodiments, clustering the objects is performed after performing preprocessing on the object data set. It should be noted that corresponding preprocessing steps may be performed according to the structure and characteristics of the object data set, and in some example embodiments, one, more, or all of the preprocessing steps may be omitted.
The preprocessing step performed thereon is described in detail below based on the object data set obtained in step S10. The raw data has a total of 5 pieces of objects p1 through p5, each piece of object having 3 features (i.e., price, discount, color), where price and discount are continuous features and color is a discrete feature. The data obtained after one-hot encoding of discrete features is as follows:
Figure GDA0002374934800000061
then, the continuous features are normalized, and the obtained results are as follows:
Figure GDA0002374934800000071
since the result is a non-sparse matrix (in the prior art, it is generally considered that 0 fill rate exceeds 65% is a sparse matrix), the dimension reduction processing step can be omitted. The present exemplary embodiments are merely examples and should not be construed as limiting the inventive concept.
After the preprocessing step, step S20 may be performed, for example, the preprocessed results may be clustered using a kmeans algorithm to form a plurality of classes, resulting in the following results:
Figure GDA0002374934800000072
the category column indicates the class number of each object after clustering, and is indicated by A, B, C.
In step S30, for each target object, k classes that are most similar to the class to which the target object belongs are obtained by calculating the similarity between different classes, and M objects closest to the target object are obtained as an object candidate set of the target object by calculating the similarities between the target object and the objects in the classes and k classes using the user behavior data.
Fig. 2 is a flowchart illustrating step S30 in fig. 1, and step S30 is further described with reference to fig. 1 and 2.
In step S32, the step of calculating the similarity between the different classes may include: for each class in a plurality of classes, calculating the average value of all objects in the class on each feature to obtain an average feature vector corresponding to the class, namely, each class can be represented by one average vector at the moment; for any two of the classes, the similarity between the two classes is obtained by calculating the cosine distance between two average feature vectors respectively corresponding to the two classes, because the number of the classes is very small relative to the number of objects, and even if the method for calculating the cosine distance is adopted, too much calculation amount is not consumed.
Specifically, in step S32, the average values of different classes in each feature dimension are calculated based on the results obtained after the step S20 is executed in the above example, and the specific results are as follows:
price Discount and method for making same Color (Red) Color (blue) Color (Green) Categories
6.85 1.95 1 0 0 A
2.65 0.35 0 1 0 B
6.20 1.60 0 0 1 C
A vector of 5 features listed above is used here to represent a class. Then the A, B, C classes can be represented by vectors (6.85,1.95,1,0,0), (2.65,0.35,0,1,0), (6.20,1.60,0,0,1), respectively. Then the cosine distance between every two different types can be calculated according to the vector, wherein the distance of AB is
Figure GDA0002374934800000081
D can be obtained by the same calculation methodAc=0.978,DBC=0.918。
In step S34, the step of calculating the similarity between the target object and the object in the class to which the target object belongs and the k classes using the user behavior data may include: for each of the class to which the target object belongs and the objects in the k classes, a similarity between the target object and the object satisfies the following equation:
Figure GDA0002374934800000082
wherein i represents the class to which the target object belongs and one of the objects in the k classes, j represents the target object, LijDenotes the degree of similarity between i and j, CijRepresenting the number of users that are simultaneously behaving at both i and j, CiAnd CjIndicates that rows are generated on i and j, respectivelyIs the number of users.
The calculation method of the similarity between the objects only needs to simply count the behavior conditions of the users on different objects. Compared with the method for measuring the similarity by calculating the cosine distance, the method has higher calculation speed.
Specifically, in step S34, based on the user behavior data set obtained after performing step S10 in the above example, the user behavior data set may be obtained
Figure GDA0002374934800000083
To calculate the similarity of p1 and p 3. Wherein, C1=3,C3=3,C132, thus L130.22. The similarity of any two objects can be obtained by the same calculation method. Compared with the traditional cosine distance, the method for measuring the similarity provided by the invention only needs to simply count the purchase times of different objects, thereby reducing the calculation amount and being more suitable for E-commerce scenes.
In step S36, M objects closest to the target object are obtained as an object candidate set of the target object.
Specifically, for object j, in order to generate M objects closest to object j as an object candidate set for object j, the following steps may be performed for any object:
1. obtaining a class c to which the object belongs, and obtaining k classes most similar to the class c according to the similarity between the classes obtained in step S32, where k may be 3 in the embodiment;
2. merging the objects in the category c and the k classes, and then calculating M objects closest to the object j in the category c and the k classes according to step S34 to obtain an object candidate set, in which M may be 100 in an embodiment, and 20 in another embodiment.
It should be noted that the distances k and M are only for easy understanding and should not be construed as limiting the inventive concept.
For ease of explanation and illustration, step S30 is described in more detail to build an object candidate set of objects p1 based on the results of the above example obtained after performing step S20. First, class numbers of all objects are obtained:
Figure GDA0002374934800000091
for object p1, it can be obtained that object p1 belongs to class A. From the similarity between the classes obtained in step S32, it is found that the class C is most similar to the class a. For convenience of example, k is made to be 1 here. Then, the A and C classes are merged to obtain a set { p1, p2, p5 }. Through step S34, the similarity between each object in the set { p1, p2, p5} and the object p1 is calculated, resulting in L12=0.11,L150. In step S36, for convenience of example, M is made 1 here, so that the object candidate set of the available object p1 is the object p2, i.e., when a certain user is interested in the object p1, the object p2 can be recommended thereto.
Similar to the method of obtaining the recommended object of the object p1, an object candidate set for each object can be obtained as follows:
object Candidate set
p1 p2
p2 p5
p3 p4,p5
p4 p3
p5 p2
Referring back to fig. 1, in step S40, for each target user, a recommended object candidate set for each target object is constructed based on the user behavior data set and the object candidate set for the target user.
Specifically, for each target user, the object candidate sets of objects for which behaviors have been generated are merged according to the behavior data of the user, and the objects for which behaviors of the user have been generated are excluded from the merged object candidate sets to generate a recommended object candidate set.
For convenience of explanation and explanation, the recommendation object candidate set is generated with reference to the result obtained after the step S30 is executed in the above example. For example, for user u1, the candidate set of recommended objects for user u1 may be { p2, p4, p5} because it has already generated behavior on objects p1 and p 3; for another example, for the user u2, since it already generates behavior for the objects p1, p2, p3 and p4, the objects p2, p3 and p4 with behavior generated can be excluded from { p2, p3, p4 and p5}, so as to obtain the recommended object candidate set of the user u2 as { p5 }. The specific results are as follows:
Figure GDA0002374934800000101
FIG. 3 is a block diagram illustrating a system 10 for building a candidate set of recommendation objects according to an exemplary embodiment of the present disclosure. As an example, the methods illustrated in FIGS. 1 and 2 may be performed by the system 10 illustrated in FIG. 3.
As shown in FIG. 3, system 10 may be a system for performing the construction of a candidate set of recommended objects. The system 10 may include: a data acquisition unit 110, a classification unit 120, an object candidate set generation unit 130, and a recommended object candidate set generation unit 140.
The data acquiring unit 110 is configured to acquire an object data set and a user behavior data set, where the user behavior data includes association information between a user and an object. The data acquisition unit 110 may be configured to perform the method described with reference to step S10 described above, and thus redundant description is omitted herein.
The classification unit 120 clusters the object data set to form a plurality of classes. The classification unit 120 may cluster the object data sets using a K-means algorithm. The classification unit 120 may be configured to perform the method described with reference to the above-described step S20, and thus redundant description is omitted herein.
In an embodiment, the system 10 may further comprise a pre-processing unit (not shown) which may perform a pre-processing step on the object data set before the classification unit 120 clusters the object data set. The preprocessing unit may be configured to perform the method described with reference to the preprocessing step described above, and thus redundant description is omitted herein.
For each target object, the object candidate set generating unit 130 may obtain k classes that are most similar to the class to which the target object belongs by calculating similarities between the different classes, and obtain an object candidate set of M objects closest to the target object as the target object by calculating similarities between the target object and the objects in the classes and the k classes using the user behavior data, where k and M are positive integers. The object candidate set generating unit 130 may be configured to perform the method described with reference to the above step S30, and thus redundant description is omitted herein.
In an example embodiment, the object candidate set generating unit 130 may include a class similarity calculating unit 210 and an object similarity calculating unit 220.
The class similarity calculation unit 210 may calculate the similarity between different classes. The class similarity calculation unit 210 may include an average feature vector calculation unit 211 and a cosine calculation unit 212. In an embodiment, for each of a plurality of classes, the average feature vector calculating unit 211 may calculate an average value of all objects in the class over respective features to obtain an average feature vector corresponding to the class. In an embodiment, for any two of the multiple classes, the cosine calculating unit 212 may obtain the similarity between the two classes by calculating a cosine distance between two average feature vectors respectively corresponding to the two classes. The class similarity calculation unit 210 may be configured to perform the method described with reference to the above-described step S32, and thus redundant description is omitted herein.
The object similarity calculation unit 220 may calculate the similarity between objects using the user behavior data. Specifically, in an example embodiment, the object similarity calculation unit 220 may calculate the similarity between objects using the following equation:
Figure GDA0002374934800000111
wherein i and j represent objects, LijDenotes the degree of similarity between i and j, CijRepresenting the number of users, C, who are behaving on both i and j simultaneouslyiAnd CjRepresenting the number of users that produce behavior on i and j, respectively. The object similarity calculation unit 220 may be configured to perform the method described with reference to the above-described step S34, and thus redundant description is omitted herein.
For each target user, the recommended object candidate set generating unit 140 may construct a recommended object candidate set for each target object based on the user behavior data set and the object candidate set for the target user. The object similarity calculation unit 220 may be configured to perform the method described with reference to the above-described step S40, and thus redundant description is omitted herein.
In summary, in order to solve the problem of large calculation amount of recommendation based on object content, the invention firstly clusters the objects according to the object characteristics, then calculates the similarity between classes, and then calculates the similarity of the objects in a plurality of similar classes. If the number of all the objects is N, the N objects are grouped into m classes, and the average number of the objects in each class is N, the time complexity for calculating the similarity of the objects can be calculated by O (N) through the method of the invention2) Reduced to O (n)2)。
In order to better depict user behaviors, when the similarity between similar objects is calculated, the similarity is not calculated by directly using the object characteristics, but calculated by using user behavior data. The similarity calculation method for the calculation object defined by the conception of the invention is different from the traditional method for calculating the similarity by adopting the cosine distance in the 'collaborative filtering', not only the behavior information of the user is utilized, but also the calculation amount is reduced.
FIG. 4 is a schematic diagram illustrating an environment in which recommendation object candidate set construction using the system 10 of FIG. 3 according to an exemplary embodiment of the present disclosure is applied. It should be noted that the scenarios illustrated in the figures are only examples and are not intended to limit the exemplary embodiments of the present invention in any way.
The environment shown in FIG. 4 may include a system 10 for building a candidate set of recommended objects, a network 20, and user terminals 30 and 40. Here, it should be noted that the user terminal 30 and the user terminal 40 may respectively refer to a plurality of terminals.
Where the system 10 may be the system 10 described above with reference to fig. 3, the system 10 may be deployed at an IT facility of an entity that deals with material distribution, carriers, e-commerce websites, etc., or at an IT facility of an entity that specializes in providing recommendation services. The network 20 may include routes, switches, servers, cloud servers, and the like. The user terminals 30 and 40 may include any type of electronic product that can access the network 20, such as a cellular phone, a smart phone, a tablet computer, a wearable device, a Personal Digital Assistant (PDA), a Portable Multimedia Player (PMP), a digital camera, a music player, a portable game console, a navigation system, a digital television, a 3D television, a Personal Computer (PC), a home appliance, a laptop computer, and the like. The user terminals 30 and 40 may also be desktop computers, workstation computers or servers. The user terminals 30 and 40 access the network 20 and/or servers in the network 20 via an ethernet protocol, an Internet Protocol (IP) based protocol, a Transmission Control Protocol (TCP) based protocol, a User Datagram Protocol (UDP) based protocol, a Remote Direct Memory Access (RDMA) protocol based protocol, and a NVMe-af protocol based protocol, or combinations thereof.
Further, the user terminal 30 may be an object provider and upload objects to the network 20 and/or servers in the network 20 to form an object data set. The user terminal 40 may issue a recommendation request to the network 20 and/or a server in the network 20, which is then forwarded to the system 10 by the network 20 and/or the server in the network 20, and/or the user terminal 40 may issue a recommendation request directly to the system 10.
Upon receipt of the recommendation request by system 10, a candidate set of recommended objects may be generated based on the object data set and the user behavior data set corresponding to user terminal 40 stored within itself, network 20, and/or a server in network 20. The method of generating the candidate set of recommendation objects may be the same as the method described above with reference to fig. 1, 2, and 3, and will not be described herein again.
After the system 10 generates the candidate set of recommended objects, it may be provided to the user terminal 40 through the network 20 directly or via a third party.
The system 10 for building a candidate set of recommended objects according to an exemplary embodiment of the present invention includes units that may be respectively configured as software, hardware, firmware, or any combination thereof that perform a specific function. These means may correspond, for example, to a dedicated integrated circuit, to pure software code, or to a module combining software and hardware. Further, one or more functions implemented by these apparatuses may also be collectively performed by components in a physical entity device (e.g., a processor, a client, a server, or the like).
A computing device for constructing a candidate set of recommended objects is also presented in an exemplary embodiment of the invention. The computing devices may be deployed in servers or clients, as well as on node devices in a distributed network environment. Further, the computing device may be a PC computer, tablet device, personal digital assistant, smart phone, web application, or other device capable of executing the set of instructions described above.
The computing device need not be a single computing device, but can be any device or collection of circuits capable of executing the instructions (or sets of instructions) described above, individually or in combination. The computing device may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).
In the computing device, the processor may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.
Some of the operations described in the method for constructing a candidate set of recommended objects according to an exemplary embodiment of the present invention may be implemented by software, some of the operations may be implemented by hardware, and further, the operations may be implemented by a combination of hardware and software.
The processor may execute instructions or code stored in one of the storage components, which may also store data. Instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.
The memory component may be integral to the processor, e.g., having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the storage component may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The storage component and the processor may be operatively coupled or may communicate with each other, such as through an I/O port, a network connection, etc., so that the processor can read files stored in the storage component.
Further, the computing device may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the computing device may be connected to each other via a bus and/or a network.
Operations involved in a method for building a candidate set of recommended objects according to an exemplary embodiment of the present invention may be described as various interconnected or coupled functional blocks or functional diagrams. However, these functional blocks or functional diagrams may be equally integrated into a single logic device or operated on by non-exact boundaries.
For example, as described above, a system is provided comprising at least one computing device and at least one storage device storing instructions, wherein the instructions, when executed by the at least one computing device, cause the at least one computing device to perform the steps as described with reference to fig. 1 and 2.
That is, the method for constructing a candidate set of recommendation objects shown in fig. 1 and 2 may be performed by the computing apparatus described above. Since the above-mentioned method for constructing the candidate set of recommendation objects has been described in detail in fig. 1 and fig. 2, the contents of this part of the present invention are not repeated.
Alternatively, the system and the computing device for constructing the candidate set of recommended objects may be integrated in a server on the platform side (e.g., an e-commerce website), for example, may be integrated in a server of an application program that provides an object (e.g., a good or a service). In addition, the recommendation system can also be integrated in a third-party server to provide the candidate set of recommendation objects to the user, and the platform side can recommend the user according to the candidate set of recommendation objects (for example, an API interface provided by the third-party server).
It is to be understood that the method for constructing a candidate set of recommended objects according to an exemplary embodiment of the present invention may be implemented by a program recorded on a computer-readable medium, for example, according to an exemplary embodiment of the present invention, there may be provided a computer-readable medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the steps as described with reference to fig. 1 and 2.
The computer program in the computer-readable medium may be executed in an environment deployed in a computer device such as a client, a host, a proxy device, a server, etc., and it should be noted that the computer program may also be used to perform additional steps other than the above steps or perform more specific processing when the above steps are performed, and the contents of the additional steps and the further processing are described with reference to fig. 1 and 2, and will not be described again to avoid repetition.
It should be noted that the system for constructing a candidate set of recommended objects according to an exemplary embodiment of the present invention may completely depend on the execution of the computer program to realize the corresponding functions, that is, each device corresponds to each step in the functional architecture of the computer program, so that the entire system is called by a special software package (e.g., a lib library) to realize the corresponding functions.
On the other hand, the respective units included in the system for constructing a candidate set of recommended objects according to an exemplary embodiment of the present invention may also be implemented by hardware, software, firmware, middleware, microcode, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the corresponding operations may be stored in a computer-readable medium such as a storage medium, so that a processor may perform the corresponding operations by reading and executing the corresponding program code or code segments.
While exemplary embodiments of the invention have been described above, it should be understood that the above description is illustrative only and not exhaustive, and that the invention is not limited to the exemplary embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. Therefore, the protection scope of the present invention should be subject to the scope of the claims.

Claims (21)

1. A method of constructing a candidate set of recommended objects, the method comprising:
acquiring an object data set and a user behavior data set, wherein the user behavior data comprises associated information between a user and an object;
clustering the object data sets to form a plurality of classes;
for each target object, obtaining a first number of classes which are most similar to the class to which the target object belongs by calculating similarity between different classes, and obtaining a second number of objects which are closest to the target object as an object candidate set of the target object by calculating similarity between the target object and the target object in the class to which the target object belongs and the first number of classes by using user behavior data; and
for each target user, constructing a recommended object candidate set for the target user based on the user behavior data set and the object candidate set for each target object.
2. The method of claim 1, wherein the step of calculating the similarity between the different classes comprises:
for each class in the multiple classes, calculating the average value of all objects in the class on each feature to obtain the average feature vector corresponding to the class;
and for any two classes in the multiple classes, calculating the cosine distance between two average feature vectors respectively corresponding to the two classes to obtain the similarity between the two classes.
3. The method of claim 1, wherein the step of utilizing user behavior data to calculate the similarity between the target object and the objects in the class to which the target object belongs and the first number of classes comprises: for each of the class to which the target object belongs and the objects in the first number of classes, a similarity between the target object and the object satisfies the following equation:
Figure FDA0003519125460000011
wherein the content of the first and second substances,
i represents an object of the class to which the target object belongs and an object of the first number of classes, j represents the target object,
Lijrepresenting the degree of similarity between i and j,
Cijrepresenting the number of users that are behaving on both i and j simultaneously,
Ciand CjRepresenting the number of users acting on i and j, respectively。
4. The method of claim 1, wherein each piece of object data in the set of object data includes at least one of a price of an object, a category of an object, a discount of an object, and a brand of an object.
5. The method of claim 1, wherein the user behavior data comprises at least one of user purchase data for the object, user browsing records for the object, and user collection records for the object.
6. The method of claim 1, wherein the clustering the object data sets to form a plurality of classes comprises: and clustering the object data sets by adopting a K-means algorithm.
7. The method according to claim 1, wherein prior to clustering the object data set, the method further comprises at least one of the following data processing steps:
carrying out one-hot coding processing on discrete features in the object data;
normalizing the continuous characteristic of the object data;
and performing dimension reduction on sparse features in the features of the object data.
8. The method of claim 7, wherein the step of dimension-reducing sparse ones of the features of the object data comprises: at least one of singular value decomposition and principal component analysis.
9. The method of claim 1, wherein the step of constructing, for each target user, a candidate set of recommended objects for the target user based on the user behavior data set and the candidate set of objects for the target user comprises: for each target user, excluding objects that have resulted in behavior of the user in the object candidate set to produce a recommended object candidate set.
10. A system for constructing a candidate set of recommended objects, the system comprising:
the data acquisition unit is used for acquiring an object data set and a user behavior data set, wherein the user behavior data comprises the association information between the user and the object;
a classification unit that clusters the object data set to form a plurality of classes;
an object candidate set generating unit that obtains, for each target object, a first number of classes that are most similar to a class to which the target object belongs by calculating a similarity between different classes, and obtains, as an object candidate set of the target object, a second number of objects that are closest to the target object by calculating a similarity between an object in the class to which the target object belongs and the first number of classes and the target object by using user behavior data; and
and the recommended object candidate set generating unit is used for constructing a recommended object candidate set aiming at each target user based on the user behavior data set and the object candidate set of each target object.
11. The system of claim 10, wherein the object candidate set generating unit comprises:
a class similarity calculation unit calculating a similarity between different classes; and
and an object similarity calculation unit which calculates the similarity between the objects by using the user behavior data.
12. The system of claim 11, wherein the class similarity calculation unit comprises:
the average characteristic vector calculating unit is used for calculating the average value of all objects in the class on each characteristic to obtain an average characteristic vector corresponding to the class for each class in the multiple classes; and
and the cosine calculating unit is used for calculating the cosine distance between two average characteristic vectors respectively corresponding to the two classes to obtain the similarity between the two classes for any two classes in the multiple classes.
13. The system according to claim 11, wherein the object similarity calculation unit calculates the similarity between each of the object in the class to which the target object belongs and the first number of classes and the target object, satisfying the following equation:
Figure FDA0003519125460000031
wherein the content of the first and second substances,
i represents an object of the class to which the target object belongs and an object of the first number of classes, j represents the target object,
Lijrepresenting the degree of similarity between i and j,
Cijrepresenting the number of users that are behaving on both i and j simultaneously,
Ciand CjRepresenting the number of users that produce behavior on i and j, respectively.
14. The system of claim 10, wherein each piece of object data in the set of object data includes at least one of a price of an object, a category of an object, a discount on an object, and a brand of an object.
15. The system of claim 10, wherein the user behavior data includes at least one of user purchase data for the object, user browsing records for the object, and user collection records for the object.
16. The system of claim 10, wherein the classification unit clusters the object data sets using a K-means algorithm.
17. The system of claim 10, further comprising: a preprocessing unit that performs at least one of the following data processing steps before the classifying unit clusters the object data set:
carrying out one-hot coding processing on discrete features in the object data;
normalizing the continuous characteristic of the object data;
and performing dimension reduction on sparse features in the features of the object data.
18. The system of claim 17, wherein the preprocessing unit performs dimensionality reduction on sparse ones of the features of the object data using at least one of singular value decomposition and principal component analysis.
19. The system according to claim 10, wherein the recommended object candidate set generating unit excludes, for each target user, objects that have produced the behavior of the user in the object candidate set to produce a recommended object candidate set.
20. A computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the method of any of claims 1 to 9.
21. A system comprising at least one computing device and at least one storage device storing instructions that, when executed by the at least one computing device, cause the at least one computing device to perform the method of any of claims 1 to 9.
CN201910831714.XA 2019-09-04 2019-09-04 Method and system for constructing recommended object candidate set Active CN110569446B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910831714.XA CN110569446B (en) 2019-09-04 2019-09-04 Method and system for constructing recommended object candidate set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910831714.XA CN110569446B (en) 2019-09-04 2019-09-04 Method and system for constructing recommended object candidate set

Publications (2)

Publication Number Publication Date
CN110569446A CN110569446A (en) 2019-12-13
CN110569446B true CN110569446B (en) 2022-05-17

Family

ID=68777745

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910831714.XA Active CN110569446B (en) 2019-09-04 2019-09-04 Method and system for constructing recommended object candidate set

Country Status (1)

Country Link
CN (1) CN110569446B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111159578B (en) * 2019-12-31 2023-10-13 第四范式(北京)技术有限公司 Method and system for recommending objects
CN111291264B (en) * 2020-01-23 2023-06-23 腾讯科技(深圳)有限公司 Access object prediction method and device based on machine learning and computer equipment
CN113469773A (en) * 2020-06-05 2021-10-01 海信集团有限公司 Intelligent terminal and object recommendation method

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095256A (en) * 2014-05-07 2015-11-25 阿里巴巴集团控股有限公司 Information push method and apparatus based on similarity degree between users
CN105677825A (en) * 2016-01-04 2016-06-15 成都陌云科技有限公司 Analysis method for client browsing operation
CN106022865A (en) * 2016-05-10 2016-10-12 江苏大学 Goods recommendation method based on scores and user behaviors
CN107092616A (en) * 2016-11-02 2017-08-25 北京小度信息科技有限公司 A kind of object order method and device
CN109257398A (en) * 2017-07-12 2019-01-22 阿里巴巴集团控股有限公司 A kind of method for pushing and equipment of business object
CN109598278A (en) * 2018-09-20 2019-04-09 阿里巴巴集团控股有限公司 Clustering processing method, apparatus, electronic equipment and computer readable storage medium
CN110110225A (en) * 2019-04-17 2019-08-09 重庆第二师范学院 Online education recommended models and construction method based on user behavior data analysis
CN110134783A (en) * 2018-02-09 2019-08-16 阿里巴巴集团控股有限公司 Method, apparatus, equipment and the medium of personalized recommendation
CN110162706A (en) * 2019-05-22 2019-08-23 南京邮电大学 A kind of personalized recommendation method and system based on interaction data cluster

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10262330B2 (en) * 2013-01-04 2019-04-16 PlaceIQ, Inc. Location-based analytic platform and methods
CN106960248B (en) * 2016-01-08 2021-02-23 阿里巴巴集团控股有限公司 Method and device for predicting user problems based on data driving

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095256A (en) * 2014-05-07 2015-11-25 阿里巴巴集团控股有限公司 Information push method and apparatus based on similarity degree between users
CN105677825A (en) * 2016-01-04 2016-06-15 成都陌云科技有限公司 Analysis method for client browsing operation
CN106022865A (en) * 2016-05-10 2016-10-12 江苏大学 Goods recommendation method based on scores and user behaviors
CN107092616A (en) * 2016-11-02 2017-08-25 北京小度信息科技有限公司 A kind of object order method and device
CN109257398A (en) * 2017-07-12 2019-01-22 阿里巴巴集团控股有限公司 A kind of method for pushing and equipment of business object
CN110134783A (en) * 2018-02-09 2019-08-16 阿里巴巴集团控股有限公司 Method, apparatus, equipment and the medium of personalized recommendation
CN109598278A (en) * 2018-09-20 2019-04-09 阿里巴巴集团控股有限公司 Clustering processing method, apparatus, electronic equipment and computer readable storage medium
CN110110225A (en) * 2019-04-17 2019-08-09 重庆第二师范学院 Online education recommended models and construction method based on user behavior data analysis
CN110162706A (en) * 2019-05-22 2019-08-23 南京邮电大学 A kind of personalized recommendation method and system based on interaction data cluster

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
适用于校园网的视频推荐系统的设计与实现;丁欣等;《通信学报》;20130930;第175-179页 *

Also Published As

Publication number Publication date
CN110569446A (en) 2019-12-13

Similar Documents

Publication Publication Date Title
CN110321422B (en) Method for training model on line, pushing method, device and equipment
CN108205768B (en) Database establishing method, data recommending device, equipment and storage medium
CN110377740B (en) Emotion polarity analysis method and device, electronic equipment and storage medium
CN110569446B (en) Method and system for constructing recommended object candidate set
US9536201B2 (en) Identifying associations in data and performing data analysis using a normalized highest mutual information score
CN106326391B (en) Multimedia resource recommendation method and device
CN107871166B (en) Feature processing method and feature processing system for machine learning
WO2020238502A1 (en) Article recommendation method and apparatus, electronic device and storage medium
JP6261547B2 (en) Determination device, determination method, and determination program
CN110866805A (en) Method and system for recommending object
CN111159578B (en) Method and system for recommending objects
US20200118033A1 (en) Method for approximate k-nearest-neighbor search on parallel hardware accelerators
CN115795000A (en) Joint similarity algorithm comparison-based enclosure identification method and device
WO2016132588A1 (en) Data analysis device, data analysis method, and data analysis program
US20130212105A1 (en) Information processing apparatus, information processing method, and program
CN112307352A (en) Content recommendation method, system, device and storage medium
CN113225580A (en) Live broadcast data processing method and device, electronic equipment and medium
US11720592B2 (en) Generating overlap estimations between high-volume digital data sets based on multiple sketch vector similarity estimators
CN112905885B (en) Method, apparatus, device, medium and program product for recommending resources to user
US11373210B2 (en) Content interest from interaction information
CN110275986B (en) Video recommendation method based on collaborative filtering, server and computer storage medium
Keller-Ressel et al. Strain-minimizing hyperbolic network embeddings with landmarks
CN111475721A (en) Information pushing method, device, equipment and storage medium
CN113378065B (en) Method for determining content diversity based on sliding spectrum decomposition and method for selecting content
CN115081541A (en) User similarity determination method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant