CN114780861B - Clustering technology-based user multi-interest recommendation method, device, equipment and medium - Google Patents

Clustering technology-based user multi-interest recommendation method, device, equipment and medium Download PDF

Info

Publication number
CN114780861B
CN114780861B CN202210694014.2A CN202210694014A CN114780861B CN 114780861 B CN114780861 B CN 114780861B CN 202210694014 A CN202210694014 A CN 202210694014A CN 114780861 B CN114780861 B CN 114780861B
Authority
CN
China
Prior art keywords
user
cluster
article
clustering
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210694014.2A
Other languages
Chinese (zh)
Other versions
CN114780861A (en
Inventor
李多海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai 2345 Network Technology Co ltd
Original Assignee
Shanghai 2345 Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai 2345 Network Technology Co ltd filed Critical Shanghai 2345 Network Technology Co ltd
Priority to CN202210694014.2A priority Critical patent/CN114780861B/en
Publication of CN114780861A publication Critical patent/CN114780861A/en
Application granted granted Critical
Publication of CN114780861B publication Critical patent/CN114780861B/en
Priority to PCT/CN2022/136870 priority patent/WO2023245999A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Abstract

The application discloses a clustering technology-based user multi-interest recommendation method, device, equipment and medium, wherein the method comprises the following steps: s1, constructing a network graph of an article based on user behavior sequence data; s2, obtaining vector representation of each article node by using a Graph Embedding method; s3, clustering the vector data of the article nodes to generate K clusters; s4, dividing the articles with the behaviors generated by the users into M sets according to the class clusters in the S3, and obtaining vector representations of the M class clusters of each user; s5, obtaining Top-N similar recommendation results under each cluster corresponding to the user through similarity calculation; s6, calculating preference scores of the user for each corresponding class cluster; and S7, carrying out weight sampling on various cluster recall results of the user according to the cluster preference scores of the user to obtain a final recommendation result set. The recommendation result of the application has the characteristics of diversity, high accuracy and more conformity to the multiple interests of the user.

Description

Clustering technology-based user multi-interest recommendation method, device, equipment and medium
Technical Field
The invention relates to the technical field of internet information, in particular to a user multi-interest recommendation method, device, equipment and medium based on a clustering technology.
Background
Recommendation refers to recommending a product that may be of interest to a user based on the user's data and behavior. Personalized recommendation is an important research field in the internet service era. Currently, recommendation technologies are widely applied in many fields such as e-commerce and social networks. The development of information technology and the rapid growth of data make the traditional recommendation algorithm face huge challenges. Diversification of interest is a ubiquitous fact. The diversification of people's interests determines that the diversification of recommendation results is very important in the field of recommendation algorithms. Generally, the recommendation result is to recommend the item with high score to the user, or recommend the item with high similarity to the favorite item of the user to the user. However, in many field scenarios, the user needs to recommend preferred items among as many categories as possible, i.e., to satisfy the user's multiple interests.
The collaborative filtering is a recommendation algorithm which is widely applied and developed most at present, and the essence of the collaborative filtering is that user scoring information is utilized to search a target user or a neighbor set of a target item, and the scoring of the target user on an unknown item is predicted according to the scoring of the neighbor set. However, the traditional collaborative filtering is only suitable for single-user interest recommendation, and cannot well meet the multi-interest recommendation requirement of the user, so that the recommendation precision is influenced.
Therefore, how to implement accurate multi-interest recommendation based on user behavior sequence is a problem to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a user multi-interest recommendation method, device, equipment and medium based on a clustering technology, so as to solve the problems in the technical background.
In order to realize the purpose, the invention adopts the following technical scheme:
the first aspect of the application provides a user multi-interest recommendation method based on a clustering technology, which comprises the following steps:
s1, constructing a network Graph of the article based on user behavior sequence data, wherein each user behavior sequence comprises a plurality of user behaviors of the user on different articles in sequence according to a time sequence, the network Graph of the article is composed of article nodes clicked by the user, and one article node represents one article corresponding to the user behavior;
s2, obtaining vector representation of each article node by using a Graph Embedding method;
s3, clustering the vector data of the article nodes through a preset clustering algorithm to generate K clusters, wherein K is a positive integer;
s4, dividing the articles of which the behaviors are generated by each user into M sets according to the class clusters in the step S3, and adding and averaging vector data of article nodes in each set to obtain vector representations of the M class clusters of each user, wherein the range of M is as follows: [1,K ];
s5, obtaining the similarity between the vector representation of each cluster of each user and the vector set of the article nodes of the corresponding cluster, and obtaining Top N Top-N similar recommendation results of each cluster corresponding to the user;
s6, performing attenuation accumulation on the days of the behavior of the user under each class cluster from the current date, and then performing proportion calculation to obtain the preference score of the user on each class cluster corresponding to the user;
and S7, performing weight sampling on various cluster recall results of the user according to the cluster preference scores of the user to obtain a final recommendation result set of the user.
Preferably, the step S1 includes: and forming a directed and unweighted network Graph by the clicking behaviors of the users according to the occurrence time of the behaviors, wherein the network Graph comprises a plurality of article nodes.
Preferably, in the step S2, the Graph Embedding method includes one or more of a random walk algorithm and a Node2Vector algorithm; the method for Embedding Graph into Embedding Graph Embedding to obtain the vector representation of each article node specifically comprises the following steps:
s21, taking any article node in the network Graph as an initial walking point;
s22, random walk is carried out near the initial walk point, and L times of single-time total walk are carried out to generate a sequence with the length of L, wherein L represents the step number of single-time random walk, and L is a positive integer;
s23, repeating the steps S21 and S22 for N times on each article node in the network Graph, and finally obtaining N-V sequences with the node length being L, wherein N represents the number of times of random walk at each article node, V represents the number of article nodes included in the network Graph, and N and V are positive integers;
and S24, calculating the generated sequence data by applying a Word2Vec model to obtain the vector representation of each article node.
More preferably, the random walk process in step S22 specifically includes:
starting from any initial wandering point of the network Graph, randomly selecting one from a plurality of article nodes connected with the current article node at each step of wandering, and continuously repeating the process until the set wandering length is reached, and stopping the wandering, thereby obtaining new user behavior sequence data.
Preferably, in step S3, the preset clustering algorithm is a kmans clustering algorithm.
Preferably, in step S5, the similarity value is obtained by using a preset similarity calculation method, where the preset similarity calculation method includes, but is not limited to, a Faiss algorithm and a Milvus algorithm.
Preferably, the step S6 specifically includes the following steps:
s61, setting parameters: decay _ rate, days _ i, user _ cluster _ score; wherein, the first and the second end of the pipe are connected with each other,
the decay _ rate represents the decay rate, and the value range is as follows: (0, 1);
days _ i represents the number of days from the current date of the ith action date of the user in a certain cluster;
user _ cluster _ score represents the preference score of the user under a certain cluster;
s62, respectively carrying out attenuation accumulation on the behaviors of the user under each corresponding class cluster by adopting the following formula:
user_cluster_score = sum( decay_rate ^ days_i );
s63, respectively carrying out proportion calculation on the accumulated sums calculated in the step S62 to obtain preference scores of the user under each corresponding class cluster; and calculating the proportion of the preference scores of the user under a certain cluster to the sum of the preference scores of all the clusters corresponding to the user.
Preferably, the step S7 specifically includes the following steps:
s71, performing probability sampling on each class cluster corresponding to the user according to the preference score of each class cluster corresponding to the user;
s72, in the similar results sampled in the step S71, the articles which are the most front and not extracted by the user are obtained and put into a final recommendation result set;
and S73, repeatedly executing the steps S71 and S72 for preset times to obtain a final recommendation result set corresponding to the user, wherein the final recommendation result set comprises recommended articles for the preset times.
In a second aspect, the present application provides a device for recommending multiple interests of a user based on a clustering technique, including:
the network Graph construction module is used for constructing a network Graph of the article based on the user behavior sequence data, wherein each user behavior sequence comprises a plurality of user behaviors of the user on different articles in sequence according to the time sequence, the network Graph of the article is composed of article nodes clicked by the user, and one article node represents one article corresponding to the user behavior;
the article node vector generation module is used for obtaining the vector representation of each article node by using a Graph Embedding method;
the clustering module is used for clustering the vector data of the article nodes through a preset clustering algorithm to generate K clusters, wherein K is a positive integer;
a cluster-like vector generation module, configured to divide the articles in which each user has performed a behavior into M sets according to the cluster-like generated by the clustering processing module, and add and average vector data of article nodes in each set to obtain vector representations of M cluster-like of each user, where M is within a range of: [1,K ];
the first calculation module is used for calculating the similarity between the vector representation of each cluster of each user and the vector set of the article nodes of the corresponding cluster to obtain the Top N Top-N similar recommendation results of each cluster corresponding to the user;
the second calculation module is used for performing attenuation accumulation on days from the behavior of the user under each class cluster to the current date, and then performing proportion calculation to obtain the preference score of the user on each class cluster corresponding to the user;
and the final recommendation result set generation module is used for performing weight sampling on various cluster recall results of the user according to the cluster preference scores of the user to obtain a final recommendation result set of the user.
Preferably, in the clustering processing module, the preset clustering algorithm is a kmans clustering algorithm.
Preferably, the second calculation module includes:
a parameter setting unit for setting parameters: decay _ rate, days _ i, user _ cluster _ score; wherein, the decay _ rate represents the decay rate, and the value range is as follows: (0, 1); days _ i represents the number of days from the current date of the ith action date of the user in a certain cluster; user _ cluster _ score represents the preference score of the user under a certain cluster;
and the attenuation accumulation unit is used for respectively carrying out attenuation accumulation on the behaviors of the user under each corresponding class cluster by adopting the following formula: user _ cluster _ score = sum (decay _ rate ^ days _ i);
the proportion calculation unit is used for carrying out proportion calculation on each accumulated sum obtained by the attenuation accumulation unit respectively to obtain a preference score of the user under each corresponding class cluster; and calculating the proportion of the preference scores of the user in a certain cluster to the sum of the preference scores of all the clusters corresponding to the user.
A third aspect of the present application discloses an electronic device, comprising:
a processor; and
a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of the first aspect of the present application.
A fourth aspect of the present application provides a computer-readable storage medium having stored thereon executable code, which, when executed by a processor of an electronic device, causes the processor to perform the method of the first aspect of the present application.
In the above, the article includes text, pictures, audio or video.
Compared with the prior art, the technical scheme of the invention has the following beneficial effects:
compared with the traditional recommendation method, the recommendation result calculated by the method is more in line with the characteristic of much interest of the user, and the recommendation result has diversity and high accuracy.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:
FIG. 1 is a flow chart diagram illustrating a user multi-interest recommendation method based on clustering technology;
FIG. 2 is a schematic diagram showing the structure of an item network Graph constructed based on user behavior sequence data in one embodiment;
FIG. 3 illustrates an example graph of clustering vector data for item nodes, resulting in K cluster classes in one embodiment;
FIG. 4 illustrates an example diagram of partitioning the behavioral items of user A into multiple sets according to corresponding clusters of classes in one embodiment;
FIG. 5 illustrates an exemplary diagram of obtaining similar items under the user A corresponding cluster in one embodiment;
FIG. 6 illustrates an example diagram of computing a cluster preference score for user A in one embodiment;
FIG. 7 illustrates an exemplary diagram of obtaining a final recommendation result set for user A in one embodiment;
FIG. 8 is a schematic diagram illustrating a structure of a user multi-interest recommendation device based on a clustering technique;
fig. 9 shows a schematic structural diagram of an electronic device provided in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order, it being understood that the data so used may be interchanged under appropriate circumstances. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example (b):
in many scenarios, a sequence of user behaviors needs to be analyzed and processed. The user behavior sequence is the occurrence process of a series of events such as clicking, accessing, purchasing and the like generated in daily operation and use of a user, can be represented as a time sequence of an event set, and contains the characteristics of fine-grained habit preference and the like of the user.
Fig. 1 is a schematic flow chart of a user multi-interest recommendation method based on a clustering technique.
Referring to fig. 1, a user multi-interest recommendation method based on a clustering technique specifically includes the following steps:
step S1: and constructing a network Graph of the article based on the user behavior sequence data.
Each user behavior sequence comprises a plurality of user behaviors which are sequentially generated by the user for different articles according to a time sequence, the network Graph of the articles consists of article nodes clicked by the user, and one article node represents one article corresponding to the user behavior.
In the information flow product, a large amount of user click behavior is generated. According to the time of the behavior, the click behaviors can form a directed and unweighted network Graph which comprises a plurality of item nodes.
It should be noted that the items of which the user has made actions may be texts, pictures, audios or videos.
Taking a news product App as an example, a large amount of behavior data of users, such as click behavior data, can be acquired in the news product App, and the clustering-technology-based user multi-interest recommendation method of the application is specifically described below in combination with the application scenario.
Referring to fig. 2, for example, if the user a clicks news a, news b, and news c in sequence, two edges are generated in the Graph of the web page, where news a points to news b and news b points to news c. The click behaviors of other users are similar, and then the network Graph containing V article nodes is formed together.
Step S2: and obtaining the vector representation of each article node by using a Graph Embedding method.
The classic Graph Embedding technology mainly comprises methods of Randomwalk, node2Vector and the like.
The method for Embedding Graph into Embedding Graph Embedding to obtain the vector representation of each article node specifically comprises the following steps:
and step S21, taking any article node in the network Graph as an initial wandering point.
And S22, carrying out random walk near the initial walk point, wherein L times of single-time total walk are carried out to generate a sequence with the length of L, L represents the step number of single-time random walk, and L is a positive integer.
Wherein, the random walk process specifically comprises: starting from any initial wandering point of the network Graph, randomly selecting one item node from a plurality of item nodes connected with the current item node at each step of the wandering, and repeating the process until the set wandering length is reached, and stopping the wandering, thereby obtaining a new user behavior sequence data.
And S23, repeating the steps S21 and S22 for N times on each article node in the network Graph, and finally obtaining N V sequences with the node length of L, wherein N represents the number of times of random walk at each article node, V represents the number of article nodes included in the network Graph, and N and V are positive integers.
Step S24, using the generated sequence data, calculating by applying a Word2Vec model to obtain the vector representation of each article node.
The word2vector algorithm is an open source algorithm, the vector of the specified dimension is used for representing phrase information, and the vector is used for measuring the relation between words. In the present embodiment, the present invention is applied to article sequence data.
In this embodiment, assume that the sequence obtained by the walk sampling is:
[ [a, b, d, e, g, h, v, f],
[w, r, f, h, v, s, n, d, k],
……]
the vector representation of each news is calculated by a word2vector model, for example:
the vector for news a is denoted as a: [0.22,0.45,0.88,0.06,0.01,0.32];
the vector for news c is denoted c: [0.24,0.47,0.86,0.12,0.03,0.28];
……
here, it should be noted that, depending on the algorithm used, there may be different random walk modes, and the randomness is controlled by the parameters of the relevant algorithm.
And step S3: and clustering the obtained news vector data by using a KMeans clustering algorithm to generate K cluster classes, wherein K is a positive integer and is usually between 5 and 20.
Clustering is performed based on a Kmeans algorithm, so that the high-frequency word element approximate result such as Top-N is solved subsequently, and the calculation amount can be greatly reduced.
The Kmeans algorithm is an unsupervised clustering algorithm, and for a data set of N samples, the data set is clustered into K clusters, so that the variance sum of the clusters is minimum.
In this embodiment, the obtained news vector data is clustered into K class clusters, and finally nodes with a closer vector distance are classified into one class cluster as much as possible, otherwise, the nodes with the closer vector distance are classified into different class clusters, that is, the nodes with the closer vector distance are clustered together, and the nodes with a farther distance are classified. This is because similar vectors are typically approximated in some way, such as title text, category, author, or some implied semantics, so different clusters of classes represent different types.
Referring to fig. 3, the distance between the vector of news a, the vector of news f, the vector of news w, and the vector of news C is close, and the vectors represent the same type, and are divided into a cluster C1 after clustering; the vector distance of the news b is close to that of the news h, the news b represents another type, and the news b and the news h are divided into a cluster C2 after clustering.
And step S4: dividing the articles with behaviors generated by each user into M sets according to the class clusters in the step S3, and then adding and averaging the article vectors in the sets to obtain vector representations of the M class clusters of each user, wherein the range of M is as follows: [1,K ].
Referring to fig. 4, in the article in which the user a acts, the vector of news a and the vector of news C are divided into the same cluster C1, and the vector of news b is divided into a cluster C2, so that the vector of the user in the cluster C1 is represented as the average of the vector of news a and the vector of news C, and the vector representation of the cluster C1 is calculated: 0.23, 0.46, 0.87, 0.09, 0.02, 0.30, while user a has only one news b under class cluster C2, so the vector representation of news b is the vector representation of class cluster C2.
Step S5: and obtaining the similarity between the vector representation of each cluster of each user and the vector set of the article nodes of the corresponding cluster, and obtaining S most similar articles of each cluster corresponding to the user.
Referring to fig. 5, for example, the user a has two clusters, i.e., a cluster C1 and a cluster C2, which result in 2 × s similar articles.
The vector similarity calculation can use some commonly used packaging libraries, such as a Faiss library, or a constructed Milvus vector similarity service to obtain a result vector with the most similar target vector.
Step S6: and calculating the preference score of the user for each corresponding class cluster.
The method specifically comprises the following steps:
step S61, setting parameters: decay _ rate, days _ i, user _ cluster _ score; wherein, the first and the second end of the pipe are connected with each other,
the decay _ rate represents the decay rate, and the value range is as follows: (0, 1);
days _ i represents the number of days from the current date of the ith action date of the user in a certain cluster;
user _ cluster _ score represents the preference score of the user under a certain cluster.
Step S62, adopting the following formula to perform attenuation accumulation on the behaviors of the user under each corresponding class cluster respectively:
user_cluster_score = sum( decay_rate ^ days_i );
step S63, respectively carrying out proportion calculation on each accumulated sum obtained by calculation in step S62 to obtain a preference score of the user under each corresponding class cluster; and calculating the proportion of the preference scores of the user in a certain cluster to the sum of the preference scores of all the clusters corresponding to the user.
Referring to fig. 6, the time of action of clicking on news a by the user a is 1 day from the current time, the time of action of clicking on news b is 2 days from the current time, and the time of action of clicking on news c is 3 days from the current time.
Setting the attenuation rate decay _ rate to be 0.9, and according to the above clustering division, the preference of the user a for the class cluster C1 is divided into: 0.9^1+0.9^3=1.629, and the preference score of the user A on the cluster C2 is: 0.9^2=0.81.
And then, carrying out proportion calculation, specifically as follows:
cluster C1: 1.629/(1.629 + 0.81) = 0.668;
cluster C2: 0.81/(1.629 + 0.81) = 0.332.
Step S7: and performing weight sampling on various cluster recall results of the user according to the cluster preference scores of the user to obtain a final recommendation result set of the user.
The method specifically comprises the following steps:
step S71, performing probability sampling on each class cluster corresponding to the user according to the preference score of each class cluster corresponding to the user;
step S72, in the similar results sampled in step S71, the articles which are the most front and not extracted by the user are obtained and put into the final recommendation result set;
and S73, repeatedly executing the steps S71 and S72 for preset times to obtain a final recommendation result set corresponding to the user, wherein the final recommendation result set comprises recommended articles for the preset times.
Referring to fig. 7, if S items are recommended to the user a finally, but the user obtains 2 × S items to be selected in step S5, probability sampling may be performed according to the cluster preference score of the user. Taking the user a as an example, the method comprises the following steps:
a) The user A samples the clusters C1 and C2 once according to the probability of [0.668 and 0.332], for example, C1 is sampled;
b) Obtaining the articles which are most front and not extracted by the user A from the similar results corresponding to the C1, and putting the articles into a final recommendation set;
and (c) executing the steps a and b for S times to obtain the final S recommended articles.
It should be noted that the above application scenario is only one example of the embodiment of the present invention, and the embodiment of the present invention is not limited to the above application scenario, but may be applied to any application scenario to which the embodiment of the present invention is applied.
On the other hand, the application also discloses a user multi-interest recommendation device based on the clustering technology.
Referring to fig. 8, a device for recommending multiple interests of a user based on a clustering technique includes: the system comprises a network Graph building module 100, an article node vector generating module 200, a cluster processing module 300, a cluster vector generating module 400, a first calculating module 500, a second calculating module 600 and a final recommendation result set generating module 700.
The network Graph building module 100 is configured to build a network Graph of an article based on user behavior sequence data, where each user behavior sequence includes a plurality of user behaviors that occur in sequence to different articles according to a time sequence, the network Graph of the article is composed of article nodes clicked by a user, and one article node represents one article corresponding to a user behavior.
The item node vector generation module 200 is configured to obtain a vector representation of each item node by using a Graph Embedding method.
The clustering module 300 is configured to cluster the vector data of the article nodes through a preset clustering algorithm to generate K clusters, where K is a positive integer. The preset clustering algorithm is preferably a KMeans clustering algorithm.
The cluster-like vector generation module 400 is configured to divide the articles that have performed behaviors for each user into M sets according to the cluster-like generated by the clustering module 300, and add and average the vector data of the article nodes in each set to obtain vector representations of the M cluster-like of each user, where the range of M is: [1,K ].
The first calculating module 500 is configured to calculate a similarity between a vector representation of each cluster of each user and a vector set of an item node in a corresponding cluster, so as to obtain Top-N similar recommendation results in each cluster corresponding to the user.
The second calculating module 600 is configured to perform attenuation accumulation on days from the current date of the behavior of the user in each class cluster, and then perform proportion calculation to obtain a preference score of the user for each class cluster corresponding to the user.
The final recommendation result set generating module 700 is configured to weight-sample various cluster recall results of the user according to the cluster preference scores of the user, so as to obtain a final recommendation result set of the user.
Wherein the second computing module 600 comprises:
a parameter setting unit for setting parameters: decay _ rate, days _ i, user _ cluster _ score; wherein, decay _ rate represents the attenuation rate, and the value range is as follows: (0, 1); days _ i represents the number of days from the current date of the ith action date of the user in a certain cluster; user _ cluster _ score represents the preference score of the user under a certain cluster;
and the attenuation accumulation unit is used for respectively carrying out attenuation accumulation on the behaviors of the user under each corresponding class cluster by adopting the following formula: user _ cluster _ score = sum (decapay _ rate ^ days _ i);
the proportion calculation unit is used for carrying out proportion calculation on each accumulated sum obtained by the attenuation accumulation unit respectively to obtain a preference score of the user under each corresponding class cluster; and calculating the proportion of the preference scores of the user in a certain cluster to the sum of the preference scores of all the clusters corresponding to the user.
FIG. 9 illustrates a schematic structural diagram of a computing device according to an embodiment of the present disclosure.
As shown in fig. 9, a computing device 800 disclosed herein may include a processor 810 and a memory 820. A memory 820 for storing a computer program; wherein the processor 810 executes the computer program in the memory 820 to implement the methods provided by the method embodiments described above. For a specific implementation process, reference may be made to the related description above, and details are not described herein again.
In the embodiment, an electronic device is used for exemplifying the user multi-interest recommendation device based on the clustering technology provided by the application. The processor may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.
The memory may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer-readable storage medium and executed by a processor to implement the methods of the various embodiments of the application described above and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.
Furthermore, the present invention also provides a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, is adapted to carry out the method provided by the method embodiments described above.
In practice, the computer program in the present embodiment may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages, for performing the operations of the embodiments of the present application. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
In practice, the computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Those of skill would further appreciate that the various illustrative logical blocks, modules, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus and methods according to embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The embodiments of the present invention have been described in detail, but the embodiments are merely examples, and the present invention is not limited to the embodiments described above. Any equivalent modifications and substitutions to those skilled in the art are also within the scope of the present invention. Accordingly, equivalent changes and modifications made without departing from the spirit and scope of the present invention should be covered by the present invention.

Claims (8)

1. The user multi-interest recommendation method based on the clustering technology is characterized by comprising the following steps:
s1, constructing a network Graph of an article based on user behavior sequence data, wherein each user behavior sequence comprises a plurality of user behaviors of a user on different articles in sequence according to a time sequence, the network Graph of the article is composed of article nodes clicked by the user, and one article node represents one article corresponding to the user behavior;
s2, obtaining vector representation of each article node by using a Graph Embedding method;
s3, clustering the vector data of the article nodes through a preset clustering algorithm to generate K clusters, wherein K is a positive integer;
s4, dividing the articles of which the behaviors are generated by each user into M sets according to the class clusters in the step S3, and adding and averaging vector data of article nodes in each set to obtain vector representations of the M class clusters of each user, wherein the range of M is as follows: [1,K ];
s5, obtaining the similarity between the vector representation of each cluster of each user and the vector set of the article nodes of the corresponding cluster, and obtaining Top N Top-N similar recommendation results of each cluster corresponding to the user;
s6, performing attenuation accumulation on the days of the behavior of the user under each class cluster from the current date, and then performing proportion calculation to obtain the preference score of the user on each class cluster corresponding to the user;
s7, performing weight sampling on various cluster recall results of the user according to the cluster preference scores of the user to obtain a final recommendation result set of the user;
wherein, the step S6 specifically includes the following steps:
s61, setting parameters: decay _ rate, days _ i, user _ cluster _ score; wherein the content of the first and second substances,
the decay _ rate represents the decay rate, and the value range is as follows: (0, 1);
days _ i represents the number of days from the current date of the ith action date of the user in a certain cluster;
user _ cluster _ score represents the preference score of the user under a certain cluster;
s62, respectively carrying out attenuation accumulation on the behaviors of the user under each corresponding class cluster by adopting the following formula: user _ cluster _ score = sum (decay _ rate ^ days _ i);
s63, respectively carrying out proportion calculation on the accumulated sums calculated in the step S62 to obtain preference scores of the user under each corresponding class cluster; and calculating the proportion of the preference scores of the user under a certain cluster to the sum of the preference scores of all the clusters corresponding to the user.
2. The method for recommending users with multiple interests based on clustering technology as claimed in claim 1, wherein said step S1 comprises: and forming a directed and unweighted network Graph by the clicking behaviors of the users according to the occurrence time of the behaviors, wherein the network Graph comprises a plurality of article nodes.
3. The clustering-technology-based user multi-interest recommendation method according to claim 1, wherein in the step S2, the Graph Embedding method includes one or more of RandomWalk algorithm and Node2Vector algorithm; the method for Embedding Graph into Embedding Graph Embedding to obtain the vector representation of each article node specifically comprises the following steps:
s21, taking any article node in the network Graph as an initial walking point;
s22, random walk is carried out near the initial walk point, L times of single-time total walk are carried out, a sequence with the length of L is generated, L represents the step number of single-time random walk, and L is a positive integer;
s23, repeating the steps S21 and S22 for N times on each article node in the network Graph, and finally obtaining N-V sequences with the node length being L, wherein N represents the number of times of random walk at each article node, V represents the number of article nodes included in the network Graph, and N and V are positive integers;
and S24, calculating the generated sequence data by applying a Word2Vec model to obtain the vector representation of each article node.
4. The method as claimed in claim 1, wherein in step S3, the predetermined clustering algorithm is KMeans clustering algorithm.
5. The method for recommending users with multiple interests based on clustering technology as claimed in claim 1, wherein said step S7 specifically comprises the steps of:
s71, performing probability sampling on each class cluster corresponding to the user according to the preference score of each class cluster corresponding to the user;
s72, obtaining the articles which are the most front and not extracted by the user from the similar results sampled in the step S71, and putting the articles into a final recommendation result set;
and S73, repeatedly executing the steps S71 and S72 for preset times to obtain a final recommendation result set corresponding to the user, wherein the final recommendation result set comprises recommended articles for the preset times.
6. A user multi-interest recommendation device based on a clustering technology is characterized by comprising:
the network Graph construction module is used for constructing a network Graph of the article based on the user behavior sequence data, wherein each user behavior sequence comprises a plurality of user behaviors of the user on different articles in sequence according to the time sequence, the network Graph of the article is composed of article nodes clicked by the user, and one article node represents one article corresponding to the user behavior;
the article node vector generation module is used for obtaining the vector representation of each article node by using a Graph Embedding method;
the clustering processing module is used for clustering the vector data of the article nodes through a preset clustering algorithm to generate K clusters, wherein K is a positive integer;
a cluster-like vector generation module, configured to divide the articles that have performed behaviors for each user into M sets according to the cluster-like generated by the clustering processing module, and add and average vector data of article nodes in each set to obtain vector representations of the M clusters of each user, where M is within a range of: [1,K ];
the first calculation module is used for calculating the similarity between the vector representation of each cluster of each user and the vector set of the article nodes of the corresponding cluster to obtain the Top N Top-N similar recommendation results of each cluster corresponding to the user;
the second calculation module is used for performing attenuation accumulation on days of the behavior of the user under each class cluster from the current date and then performing proportion calculation to obtain the preference score of the user on each class cluster corresponding to the user;
the final recommendation result set generation module is used for performing weight sampling on various cluster recall results of the user according to the cluster preference scores of the user to obtain a final recommendation result set of the user;
wherein the second computing module comprises:
a parameter setting unit for setting parameters: decay _ rate, days _ i, user _ cluster _ score; wherein, the decay _ rate represents the decay rate, and the value range is as follows: (0, 1); days _ i represents the number of days from the current date of the ith action date of the user in a certain cluster; user _ cluster _ score represents the preference score of the user under a certain cluster;
and the attenuation accumulation unit is used for respectively carrying out attenuation accumulation on the behaviors of the user under each corresponding class cluster by adopting the following formula: user _ cluster _ score = sum (decay _ rate ^ days _ i);
the proportion calculation unit is used for carrying out proportion calculation on each accumulated sum obtained by the attenuation accumulation unit respectively to obtain a preference score of the user under each corresponding class cluster; and calculating the proportion of the preference scores of the user under a certain cluster to the sum of the preference scores of all the clusters corresponding to the user.
7. A computing device, comprising:
a processor; and
a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the clustering technique based user multi-interest recommendation method of any one of claims 1-5.
8. A computer-readable storage medium having executable code stored thereon, wherein the executable code, when executed by a processor of an electronic device, causes the processor to perform the clustering technique based user multi-interest recommendation method of any one of claims 1-5.
CN202210694014.2A 2022-06-20 2022-06-20 Clustering technology-based user multi-interest recommendation method, device, equipment and medium Active CN114780861B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210694014.2A CN114780861B (en) 2022-06-20 2022-06-20 Clustering technology-based user multi-interest recommendation method, device, equipment and medium
PCT/CN2022/136870 WO2023245999A1 (en) 2022-06-20 2022-12-06 User multi-interest recommendation method and apparatus based on clustering technology, and device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210694014.2A CN114780861B (en) 2022-06-20 2022-06-20 Clustering technology-based user multi-interest recommendation method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN114780861A CN114780861A (en) 2022-07-22
CN114780861B true CN114780861B (en) 2022-10-21

Family

ID=82420730

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210694014.2A Active CN114780861B (en) 2022-06-20 2022-06-20 Clustering technology-based user multi-interest recommendation method, device, equipment and medium

Country Status (2)

Country Link
CN (1) CN114780861B (en)
WO (1) WO2023245999A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114780861B (en) * 2022-06-20 2022-10-21 上海二三四五网络科技有限公司 Clustering technology-based user multi-interest recommendation method, device, equipment and medium
CN117520406B (en) * 2023-12-29 2024-03-26 安徽思高智能科技有限公司 RPA flow recommendation method, device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008121872A1 (en) * 2007-03-30 2008-10-09 Amazon Technologies, Inc. Cluster-based assessment of user interests
CN111611499A (en) * 2020-05-28 2020-09-01 贝壳技术有限公司 Collaborative filtering method, collaborative filtering device and collaborative filtering system
CN113239271A (en) * 2021-05-11 2021-08-10 四川大学 Recommendation method based on interest drift

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254028A (en) * 2011-07-22 2011-11-23 青岛理工大学 Personalized commodity recommending method and system which integrate attributes and structural similarity
US20150262069A1 (en) * 2014-03-11 2015-09-17 Delvv, Inc. Automatic topic and interest based content recommendation system for mobile devices
CN111523964A (en) * 2020-03-25 2020-08-11 北京三快在线科技有限公司 Clustering-based recall method and apparatus, electronic device and readable storage medium
CN114780861B (en) * 2022-06-20 2022-10-21 上海二三四五网络科技有限公司 Clustering technology-based user multi-interest recommendation method, device, equipment and medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008121872A1 (en) * 2007-03-30 2008-10-09 Amazon Technologies, Inc. Cluster-based assessment of user interests
CN111611499A (en) * 2020-05-28 2020-09-01 贝壳技术有限公司 Collaborative filtering method, collaborative filtering device and collaborative filtering system
CN113239271A (en) * 2021-05-11 2021-08-10 四川大学 Recommendation method based on interest drift

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Algorithms and System Architecture for Immediate Personalized News Recommendations";Takeshi Yoneda等;《 2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)》;20191125;第1-8页 *
"新闻推荐系统关键技术的研究与实现";刘军;《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》;20180215;第I138-2939页 *

Also Published As

Publication number Publication date
WO2023245999A1 (en) 2023-12-28
CN114780861A (en) 2022-07-22

Similar Documents

Publication Publication Date Title
CN114780861B (en) Clustering technology-based user multi-interest recommendation method, device, equipment and medium
CN108829822B (en) Media content recommendation method and device, storage medium and electronic device
CN107944629A (en) A kind of recommendation method and device based on heterogeneous information network representation
CN112231582B (en) Website recommendation method and equipment based on variation self-coding data fusion
CN108875090B (en) Song recommendation method, device and storage medium
CN112528010B (en) Knowledge recommendation method and device, computer equipment and readable storage medium
CN111143684A (en) Artificial intelligence-based generalized model training method and device
CN103534696A (en) Exploiting query click logs for domain detection in spoken language understanding
CN110991476A (en) Training method and device for decision classifier, recommendation method and device for audio and video, and storage medium
Su et al. Hybrid recommender system based on deep learning model
CN110717038A (en) Object classification method and device
CN110119511A (en) Prediction technique, medium, device and the calculating equipment of article hot spot score
CN110083766B (en) Query recommendation method and device based on meta-path guiding embedding
CN116257798A (en) Click rate prediction model training and click rate prediction method, system and equipment
CN115618024A (en) Multimedia recommendation method and device and electronic equipment
CN115129984A (en) Dual-clustering hot recommendation method and device based on user behavior sequence
CN114896491B (en) User personalized hot computing method, device, equipment and medium based on clustering
CN113010788A (en) Information pushing method and device, electronic equipment and computer readable storage medium
CN111597310A (en) Sensitive content detection method, device, equipment and medium
Hecking et al. Relational patterns in cross-media information diffusion networks
Iskandar et al. Popular content prediction based on web visitor data with data mining approach
Ficzere et al. Random walk for generalization in goal-directed human navigation on Wikipedia
Chanda et al. An improved web page recommendation system using partitioning and web usage mining
JP5654977B2 (en) Product recommendation device, method and program
CN113676505B (en) Information pushing method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant