CN115129984A - Dual-clustering hot recommendation method and device based on user behavior sequence - Google Patents

Dual-clustering hot recommendation method and device based on user behavior sequence Download PDF

Info

Publication number
CN115129984A
CN115129984A CN202210738909.1A CN202210738909A CN115129984A CN 115129984 A CN115129984 A CN 115129984A CN 202210738909 A CN202210738909 A CN 202210738909A CN 115129984 A CN115129984 A CN 115129984A
Authority
CN
China
Prior art keywords
user
cluster
clustering
article
clusters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210738909.1A
Other languages
Chinese (zh)
Inventor
李多海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai 2345 Network Technology Co ltd
Original Assignee
Shanghai 2345 Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai 2345 Network Technology Co ltd filed Critical Shanghai 2345 Network Technology Co ltd
Priority to CN202210738909.1A priority Critical patent/CN115129984A/en
Publication of CN115129984A publication Critical patent/CN115129984A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a user behavior sequence-based dual-clustering hot recommendation method and device, wherein the method comprises the following steps: constructing a network map of the item based on the user behavior sequence data; obtaining vector representation of each article node by using a Graph Embedding method; clustering vectors of the article nodes into K clusters; dividing the articles of which the user generates behaviors into M sets according to the class clusters, and acquiring vector representations of the M class clusters of each user; calculating preference scores of the user on various clusters; according to the preference scores of various clusters of the user, carrying out weighted average on vectors of the user under the various clusters to obtain the unique interest vector representation of the user; clustering the interest vectors of the users to obtain a cluster of each user; counting user behaviors under various clusters to form a cluster hot list; pushing a hot list under the cluster of the class for each user. The method and the device have the advantages that the double clustering algorithm is adopted, the accuracy rate of the recommendation result is high, and the recommendation effect is improved.

Description

Dual-clustering hot recommendation method and device based on user behavior sequence
Technical Field
The invention relates to the technical field of internet information, in particular to a user behavior sequence-based dual-clustering hot recommendation method and device.
Background
Clustering (Clustering) is an unsupervised learning method that can group data objects into multiple classes or clusters (clusters), with the goal of: objects in the same cluster have higher similarity, while objects in different clusters have larger differences. With the rapid development of the internet, the clustering technology plays an increasingly important role in the recommendation system. Different users can be classified according to their basic attributes, such as age, region, academic calendar, etc., or according to their behavior preferences. For example, in news browsing, users who prefer entertainment types may be clustered into entertainment type clusters, for which entertainment type information may be pushed primarily. From another perspective, a wide variety of articles may also be divided into different clusters. The core of "clustering by clusters" is the idea of clustering.
The above example clustering application has a premise that some information of the user or the article, such as the age of the user, the category of the article, etc., needs to be obtained in advance, but in practical application, the information is often not easy to obtain. Then, in the recommendation system, how to apply the clustering technique according to the behavior data of the user so as to better recommend the user is a problem to be solved by those skilled in the art.
Disclosure of Invention
The present invention provides a method and an apparatus for dual cluster hot recommendation based on user behavior sequences, so as to solve the problems set forth in the foregoing technical background.
In order to achieve the purpose, the invention adopts the following technical scheme:
the application provides a double-clustering hot recommendation method based on a user behavior sequence in a first aspect, which comprises the following steps:
s1, constructing a network Graph of the articles based on the user behavior sequence data, wherein each user behavior sequence comprises a plurality of user behaviors of the user on different articles in sequence according to a time sequence, the network Graph of the articles is composed of article nodes clicked by the user, and one article node represents one article corresponding to the user behavior;
s2, obtaining vector representation of each article node by using a Graph Embedding method;
s3, clustering the vector data of the article nodes through a preset clustering algorithm to generate K clusters, wherein K is a positive integer;
s4, dividing the articles of which the behaviors are generated by each user into M sets according to the class clusters in the step S3, and adding and averaging vector data of article nodes in each set to obtain vector representations of the M class clusters of each user, wherein the range of M is as follows: [1, K ];
s5, performing attenuation accumulation on the days of the behavior of the user under each class cluster from the current date, and then performing proportion calculation to obtain the preference score of the user on each class cluster corresponding to the user;
s6, carrying out weighted average on the vectors of the user under each class cluster according to the preference score of the user on each class cluster corresponding to the user to obtain the unique interest vector representation of the user;
s7, clustering the unique interest vectors of the users generated in the step S6 again by using a preset clustering algorithm, and recording the cluster of each user;
s8, counting the user behaviors under various clusters, calculating the click rate of each article, and arranging the click rates in a descending order to form hot lists of various clusters;
and S9, recommending a hot list under the class cluster where the user is located for each user.
Preferably, the step S1 includes: and forming a directed and unweighted network Graph by the clicking behaviors of the users according to the occurrence time of the behaviors, wherein the network Graph comprises a plurality of article nodes.
Preferably, in step S2, the method using Graph Embedding includes one or more of RandomWalk algorithm and Node2Vector algorithm; the method for obtaining the vector representation of each article node by using the Graph Embedding method specifically comprises the following steps:
s21, taking any item node in the network Graph as an initial walking point;
s22, random walk is carried out near the initial walk point, L times of single-time total walk are carried out, a sequence with the length of L is generated, L represents the step number of single-time random walk, and L is a positive integer;
s23, repeating the steps S21 and S22 for N times for each item node in the network Graph Graph, and finally obtaining N V sequences with the node length being L, wherein N represents the number of times of random walk at each item node, V represents the number of item nodes included in the network Graph Graph, and N, V is positive integers;
and S24, calculating the generated sequence data by applying a Word2Vec model to obtain the vector representation of each article node.
More preferably, the random walk process in step S22 specifically includes:
starting from any initial wandering point of the network Graph, randomly selecting one from a plurality of article nodes connected with the current article node at each step of wandering, and continuously repeating the process until the set wandering length is reached, and stopping the wandering, thereby obtaining new user behavior sequence data.
Preferably, in step S3, the preset clustering algorithm is a KMeans clustering algorithm.
Preferably, the step S5 specifically includes the following steps:
s51, setting parameters: decay _ rate, days _ i, user _ cluster _ score; wherein the content of the first and second substances,
the decay _ rate represents the decay rate, and the value range is as follows: (0, 1);
days _ i represents the number of days from the current date of the ith action date of the user in a certain cluster;
user _ cluster _ score represents the preference score of the user under a certain cluster;
s52, adopting the following formula to make attenuation accumulation for the user' S behavior under each corresponding cluster:
user_cluster_score=sum(decay_rate^days_i);
s53, respectively carrying out proportion calculation on the accumulated sums calculated in the step S62 to obtain the preference score of the user under each corresponding class cluster; and calculating the proportion of the preference scores of the user under a certain cluster to the sum of the preference scores of all the clusters corresponding to the user.
In a second aspect, the present application provides a dual-clustering hot recommendation apparatus based on a user behavior sequence, including:
the network Graph construction module is used for constructing a network Graph of the article based on the user behavior sequence data, wherein each user behavior sequence comprises a plurality of user behaviors of the user on different articles in sequence according to the time sequence, the network Graph of the article is composed of article nodes clicked by the user, and one article node represents one article corresponding to the user behavior;
the article node vector generation module is used for obtaining the vector representation of each article node by using a Graph Embedding method;
the first clustering processing module is used for clustering the vector data of the article nodes through a preset clustering algorithm to generate K clusters, wherein K is a positive integer;
a cluster-like vector generation module, configured to divide the articles that have performed behaviors according to the clusters generated by the first clustering processing module into M sets, and add and average vector data of article nodes in each set to obtain vector representations of the M clusters of each user, where M is within a range of: [1, K ];
the first calculation module is used for performing attenuation accumulation on days of the behavior of the user under each class cluster from the current date, and then performing proportion calculation to obtain the preference score of the user on each class cluster corresponding to the user;
the second calculation module is used for carrying out weighted average on the vectors of the users under each class cluster according to the preference scores of the users on each class cluster corresponding to the users to obtain the unique interest vector representation of the users;
the second clustering processing module is used for clustering the unique interest vectors of the users generated by the second calculating module again by using a preset clustering algorithm and recording the cluster of each user;
the cluster recommendation list generation module is used for counting the user behaviors under various clusters, calculating the click rate of each article, and performing descending order according to the click rate to form hot lists of various clusters;
and the hot list recommending module is used for recommending a hot list under the cluster where each user is located for each user.
Preferably, in the first clustering processing module and the second clustering processing module, the preset clustering algorithm is a kmans clustering algorithm.
Preferably, the first calculation module includes:
a parameter setting unit for setting parameters: decapay _ rate, days _ i, user _ cluster _ score; wherein, decay _ rate represents the attenuation rate, and the value range is as follows: (0, 1); days _ i represents the number of days from the current date of the ith action date of the user in a certain cluster; user _ cluster _ score represents the preference score of the user under a certain cluster;
and the attenuation accumulation unit is used for respectively carrying out attenuation accumulation on the behaviors of the user under each corresponding class cluster by adopting the following formula: user _ cluster _ score is sum (decay _ rate ^ days _ i);
the proportion calculation unit is used for carrying out proportion calculation on each accumulated sum obtained by the attenuation accumulation unit respectively to obtain a preference score of the user under each corresponding class cluster; and calculating the proportion of the preference scores of the user in a certain cluster to the sum of the preference scores of all the clusters corresponding to the user.
A third aspect of the present application discloses an electronic device, comprising:
a processor; and
a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of the first aspect of the present application.
A fourth aspect of the present application provides a computer-readable storage medium having stored thereon executable code, which, when executed by a processor of an electronic device, causes the processor to perform the method of the first aspect of the present application.
In the above, the article includes text, pictures, audio or video.
Compared with the prior art, the technical scheme of the invention has the following beneficial effects:
according to the method and the device, the dual clustering technology is applied according to the behavior data of the user, the defects of low precision and large error of the traditional recommendation method are overcome, and compared with a single clustering algorithm clustering result, the dual clustering algorithm adopted by the method and the device have the advantages that the corresponding recommendation result is more targeted, the recommendation effect is improved, and better article recommendation can be performed for the target user.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:
FIG. 1 is a flow diagram illustrating a method for bi-cluster hot recommendation based on a user behavior sequence;
FIG. 2 is a schematic diagram showing the structure of an item network Graph constructed based on user behavior sequence data in one embodiment;
FIG. 3 illustrates an example graph of clustering vector data for item nodes, resulting in K cluster classes in one embodiment;
FIG. 4 illustrates an example diagram of partitioning a behavioral item of user A into multiple sets according to corresponding class clusters in one embodiment;
FIG. 5 illustrates an example diagram of computing user A's various cluster preference scores in one embodiment;
FIG. 6 is a schematic diagram showing a structure of a double-clustering hot recommendation device based on a user behavior sequence;
fig. 7 shows a schematic structural diagram of an electronic device provided in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order, it being understood that the data so used may be interchanged under appropriate circumstances. Furthermore, the terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiment is as follows:
in many scenarios, a sequence of user behaviors needs to be analyzed and processed. The user behavior sequence is the occurrence process of a series of events such as clicking, accessing, purchasing and the like generated in daily operation and use of a user, can be represented as a time sequence of an event set, and contains the characteristics of fine-grained habit preference and the like of the user.
Fig. 1 is a flowchart illustrating a double-clustering hot recommendation method based on a user behavior sequence.
Referring to fig. 1, a dual clustering hot recommendation method based on a user behavior sequence specifically includes the following steps:
step S1: and constructing a network Graph of the article based on the user behavior sequence data.
Each user behavior sequence comprises a plurality of user behaviors which are sequentially generated for different articles by the user according to a time sequence, the network Graph of the articles is composed of article nodes clicked by the user, and one article node represents one article corresponding to the user behavior.
In the information flow product, a large amount of user click behavior is generated. According to the time when the action occurs, the click actions can form a directed and unweighted network Graph which comprises a plurality of item nodes.
It should be noted that the items of which the user has made actions may be texts, pictures, audios or videos.
Taking a news product App as an example, a large amount of behavior data of users, such as click behavior data, can be acquired in the news product App, and the user behavior sequence-based dual-clustering hot recommendation method of the application is specifically described below in combination with the application scenario.
Referring to fig. 2, for example, if the user a clicks news a, news b, and news c in sequence, two edges are generated in the Graph of the web page, where news a points to news b and news b points to news c. The click behaviors of other users are similar, and then the network Graph containing V article nodes is formed together.
Step S2: and obtaining the vector representation of each article node by using a Graph Embedding method.
The classic Graph Embedding technology mainly comprises methods such as Randomwalk, Node2Vector and the like.
The method for obtaining the vector representation of each article node by using the Graph Embedding method specifically comprises the following steps:
and step S21, taking any item node in the network Graph as a starting wandering point.
In step S22, random walks are performed near the initial walking point, and a sequence with a length of L is generated by L times of single-time total walking, where L represents the number of steps of single-time random walk, and L is a positive integer.
Wherein, the random walk process specifically comprises: starting from any initial wandering point of the network Graph, randomly selecting one from a plurality of article nodes connected with the current article node at each step of wandering, and continuously repeating the process until the set wandering length is reached, and stopping the wandering, thereby obtaining new user behavior sequence data.
And step S23, repeating steps S21 and S22 for N times for each item node in the network Graph, and finally obtaining N × V sequences with the node length of L, where N represents the number of times of random walks at each item node, V represents the number of item nodes included in the network Graph, and N, V is positive integers.
In step S24, using the generated sequence data, a vector representation of each item node is calculated using the Word2Vec model.
The word2vector algorithm is an open source algorithm, is a method for generating word vectors based on text sentences, uses vectors with specified dimensions to represent phrase information, and measures relationships among words by using the vectors. In the present embodiment, the present invention is applied to article sequence data.
In this embodiment, assume that the sequence obtained by the walk sampling is:
[[a,b,d,e,g,h,v,f],
[w,r,f,h,v,s,n,d,k],
……]
the vector representation of each news is calculated by a word2vector model, for example:
the vector for news a is denoted as a: [0.22, 0.45, 0.88, 0.06, 0.01, 0.32 ];
the vector for news c is denoted c: [0.24, 0.47, 0.86, 0.12, 0.03, 0.28 ];
……
here, it should be noted that, depending on the algorithm used, there may be different random walk modes, and the randomness is controlled by the parameters of the relevant algorithm.
Step S3: clustering the obtained news vector data through a KMeans clustering algorithm to generate K clusters, wherein K is a positive integer and is usually between 5 and 20.
The Kmeans algorithm is an unsupervised clustering algorithm, and for a data set of N samples, the data set is clustered into K clusters, so that the variance sum of the clusters is minimum.
In this embodiment, the obtained news vector data is clustered into K class clusters, and finally nodes with a closer vector distance are classified into one class cluster as much as possible, otherwise, the nodes with the closer vector distance are classified into different class clusters, that is, the nodes with the closer vector distance are clustered together, and the nodes with a farther distance are classified. This is because similar vectors are typically approximated in some way, such as title text, category, author, or some implied semantics, so different clusters of classes represent different types.
Referring to fig. 3, the distance between the vector of news a, the vector of news f, the vector of news w, and the vector of news C is close, and the vectors represent the same type, and are divided into a cluster C1 after clustering; the vector distance of news b and the vector distance of news h are close, represent another type, and are divided into class clusters C2 after clustering.
Step S4: dividing the articles of which the behaviors are generated by each user into M sets according to the class clusters in the step S3, and then adding and averaging the article vectors in the sets to obtain vector representations of the M class clusters of each user, wherein the range of M is as follows: [1, K ]]。
Referring to fig. 4, in the article of user a behavior, the vector of news a and the vector of news C are divided into the same class cluster C1, and the vector of news b is divided into the class cluster C2, so that the vector representation of user a under the class cluster C1 is the average of the vector representation of news a and the vector representation of news C, and the vector representation of the class cluster C1 is calculated: 0.23,0.46,0.87,0.09,0.02,0.30, while user a has only one news b under class cluster C2, so the vector representation of news b is the vector representation of class cluster C2, which is assumed here to be: [0.10,0.22,0.12,0.33,0.40,0.12].
Step S5: and calculating the preference score of the user for each corresponding class cluster.
The method specifically comprises the following steps:
step S51, setting parameters: decay _ rate, days _ i, user _ cluster _ score; wherein, the first and the second end of the pipe are connected with each other,
the decay _ rate represents the decay rate, and the value range is as follows: (0, 1);
days _ i represents the number of days from the current date of the ith action date of the user in a certain cluster;
user _ cluster _ score represents the preference score of the user under a certain class of clusters.
Step S52, respectively performing attenuation accumulation on the behavior of the user under each corresponding class cluster by using the following formula:
user_cluster_score=sum(decay_rate^days_i);
step S53, respectively carrying out proportion calculation on each accumulated sum obtained in step S52 to obtain a preference score of the user under each corresponding class cluster; and calculating the proportion of the preference scores of the user under a certain cluster to the sum of the preference scores of all the clusters corresponding to the user.
Referring to fig. 5, the time of action of clicking on news a by the user a is 1 day from the current time, the time of action of clicking on news b is 2 days from the current time, and the time of action of clicking on news c is 3 days from the current time.
Setting the attenuation rate decay _ rate to be 0.9, according to the above cluster division, the preference of the user a for the class cluster C1 is divided into: 0.9^1+0.9^3 ^ 1.629, and the preference of user A to class C2 is given by: 0.9^2 ^ 0.81.
And then, carrying out proportion calculation, specifically as follows:
class C1: 1.629/(1.629+0.81) ═ 0.668;
cluster C2: 0.81/(1.629+0.81) ═ 0.332.
Step S6And carrying out weighted average on the vectors of the user under each class cluster according to the preference score of the user on each class cluster corresponding to the user to obtain the unique interest vector representation of the user.
Specifically, in the above step S4, it is calculated that the vector of the user a for the class cluster C1 is represented as [0.23,0.46,0.87,0.09,0.02,0.30], and the vector of the user a for the class cluster C2 is represented as [0.10,0.22,0.12,0.33,0.40,0.12 ]. In the above step S5, it has been calculated that the preference score of the user a for the class cluster C1 is 0.668, and the preference score of the user a for the class cluster C2 is 0.332. Therefore, the interest vector of the user A can be calculated by the following method:
[0.23,0.46,0.87,0.09,0.02,0.30]*0.668+[0.10,0.22,0.12,0.33,0.40,0.12]*0.332=[0.19,0.38,0.62,0.17,0.15,0.24]
step S7And clustering the unique interest vectors of the users generated in the step S6 again by using the KMeans clustering algorithm in the step S3, so that the users can be clustered under 1-X clusters, each user is divided into one of the clusters, and the user groups under each cluster have similar behavior preferences, such as information of categories like entertainment and history.
Step S8And respectively calculating the clicking amount of the user clicking information under each of the X clusters. For example, in the class cluster X1, the number of hits on news a is 500, the number of hits on news b is 450, and the number of hits on news c is 200, which are sorted in descending order according to the hits to form a hot list of the class cluster X1, i.e., [ a, b, c ] in the hot list]。
Step S9And when a certain user accesses the product App, recommending a hot list under the corresponding class cluster for the user.
It should be noted that the above application scenario is only one example of the embodiment of the present invention, and the embodiment of the present invention is not limited to the above application scenario, but may be applied to any application scenario to which the embodiment of the present invention is applied.
On the other hand, the application also discloses a double-clustering hot recommendation device based on the user behavior sequence.
Referring to fig. 6, a double-clustering hot recommendation device based on a user behavior sequence includes: the system comprises a network Graph building module 100, an article node vector generating module 200, a first clustering processing module 300, a cluster-like vector generating module 400, a first calculating module 500, a second calculating module 600, a second clustering processing module 700, a cluster-like recommendation list generating module 800 and a hit list recommending module 900.
The network Graph construction module 100 is configured to construct a network Graph of an article based on user behavior sequence data, where each user behavior sequence includes a plurality of user behaviors that occur in sequence to different articles by the user according to a time sequence, the network Graph of the article is composed of article nodes clicked by the user, and one article node represents one article corresponding to a user behavior.
The item node vector generation module 200 is configured to obtain a vector representation of each item node by using a Graph Embedding method.
The first clustering module 300 is configured to cluster the vector data of the article nodes by using a preset clustering algorithm to generate K clusters, where K is a positive integer; the preset clustering algorithm is preferably a KMeans clustering algorithm.
A cluster-like vector generation module 400, configured to divide the articles that have performed behaviors according to the clusters generated by the first clustering module 300 into M sets, and add and average vector data of article nodes in each set to obtain vector representations of the M clusters of each user, where M is within a range of: [1, K ].
The first calculating module 500 is configured to perform attenuation accumulation on days from the current date of the behavior of the user in each class cluster, and then perform proportion calculation to obtain a preference score of the user for each class cluster corresponding to the user.
The second calculating module 600 is configured to perform weighted average on the vectors of the users in each class cluster according to the preference score of the user for each class cluster corresponding to the user, so as to obtain a unique interest vector representation of the user.
The second clustering module 700 is configured to re-cluster the unique interest vectors of the users generated by the second computing module 600 using a preset clustering algorithm (for example, using the KMeans clustering algorithm in the first clustering module 300), and record the cluster class of each user.
The cluster recommendation list generation module 800 is configured to count user behaviors under various clusters, calculate click rate of each article, and perform descending order according to the click rate to form a hot list of the various clusters.
And a hot list recommending module 900, configured to recommend a hot list under the class cluster where each user is located for each user.
Wherein the first computing module 500 comprises:
a parameter setting unit for setting parameters: decay _ rate, days _ i, user _ cluster _ score; wherein, the decay _ rate represents the decay rate, and the value range is as follows: (0, 1); days _ i represents the number of days from the current date of the ith action date of the user in a certain cluster; user _ cluster _ score represents the preference score of the user under a certain cluster;
and the attenuation accumulation unit is used for respectively carrying out attenuation accumulation on the behaviors of the user under each corresponding class cluster by adopting the following formula: user _ cluster _ score is sum (decay _ rate ^ days _ i);
the proportion calculation unit is used for carrying out proportion calculation on each accumulated sum obtained by the attenuation accumulation unit respectively to obtain a preference score of the user under each corresponding class cluster; and calculating the proportion of the preference scores of the user under a certain cluster to the sum of the preference scores of all the clusters corresponding to the user.
FIG. 7 illustrates a schematic structural diagram of a computing device according to an embodiment of the present disclosure.
As shown in fig. 7, the computing device disclosed herein may include a processor and a memory. A memory for storing a computer program; wherein the processor executes the computer program in the memory to implement the methods provided by the method embodiments described above. For a specific implementation process, reference may be made to the related description above, and details are not described herein again.
In the embodiment, the double-clustering hot recommendation device based on the user behavior sequence provided by the application is exemplified by an electronic device. The processor may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.
The memory may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on a computer-readable storage medium and executed by a processor to implement the methods of the various embodiments of the present application above and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.
Furthermore, the present invention also provides a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, is adapted to carry out the method provided by the method embodiments described above.
In practice, the computer program in this embodiment may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + +, etc., and conventional procedural programming languages, such as the "C" programming language or similar programming languages, for performing the operations of the embodiments of the present application. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
In practice, the computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Those of skill would further appreciate that the various illustrative logical blocks, modules, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus and methods according to embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The embodiments of the present invention have been described in detail, but the embodiments are merely examples, and the present invention is not limited to the embodiments described above. Any equivalent modifications and substitutions to those skilled in the art are also within the scope of the present invention. Accordingly, equivalent changes and modifications made without departing from the spirit and scope of the present invention should be covered by the present invention.

Claims (10)

1. A double-clustering hot recommendation method based on user behavior sequences is characterized by comprising the following steps:
s1, constructing a network Graph of the articles based on the user behavior sequence data, wherein each user behavior sequence comprises a plurality of user behaviors of the user on different articles in sequence according to a time sequence, the network Graph of the articles is composed of article nodes clicked by the user, and one article node represents one article corresponding to the user behavior;
s2, obtaining vector representation of each article node by using a Graph Embedding method;
s3, clustering the vector data of the article nodes through a preset clustering algorithm to generate K clusters, wherein K is a positive integer;
s4, dividing the articles of which each user has generated behavior into M sets according to the class clusters in the step S3, and adding and averaging the vector data of the article nodes in each set to obtain the vector representation of the M class clusters of each user, wherein the range of M is as follows: [1, K ];
s5, performing attenuation accumulation on the days from the behavior of the user under each cluster to the current date, and performing proportion calculation to obtain the preference score of the user on each corresponding cluster;
s6, carrying out weighted average on the vectors of the user under each class cluster according to the preference score of the user on each class cluster corresponding to the user to obtain the unique interest vector representation of the user;
s7, clustering the unique interest vectors of the users generated in the step S6 again by using a preset clustering algorithm, and recording the cluster of each user;
s8, counting the user behaviors under various clusters, calculating the click rate of each article, and arranging the click rates in a descending order to form hot lists of various clusters;
and S9, recommending hot lists under the cluster where the user is located for each user.
2. The method for recommending hot information of a binary cluster based on a user behavior sequence according to claim 1, wherein said step S1 comprises: and forming a directed and unweighted network Graph by the clicking behaviors of the users according to the occurrence time of the behaviors, wherein the network Graph comprises a plurality of article nodes.
3. The method according to claim 1, wherein in step S2, the method using Graph Embedding includes one or more of RandomWalk algorithm and Node2Vector algorithm; the method for obtaining the vector representation of each article node by using the Graph Embedding method specifically comprises the following steps:
s21, taking any one item node in the network Graph as an initial walking point;
s22, random walk is carried out near the initial walk point, L times of single-time total walk are carried out, a sequence with the length of L is generated, L represents the step number of single-time random walk, and L is a positive integer;
s23, repeating the steps S21 and S22 for N times for each item node in the network Graph Graph, and finally obtaining N V sequences with the node length being L, wherein N represents the number of times of random walk at each item node, V represents the number of item nodes included in the network Graph Graph, and N, V is positive integers;
and S24, calculating the generated sequence data by applying a Word2Vec model to obtain the vector representation of each item node.
4. The method as claimed in claim 1, wherein in step S3, the predetermined clustering algorithm is kmans clustering algorithm.
5. The method according to claim 1, wherein the step S5 specifically includes the following steps:
s51, setting parameters: decay _ rate, days _ i, user _ cluster _ score; wherein the content of the first and second substances,
the decay _ rate represents the decay rate, and the value range is as follows: (0, 1);
days _ i represents the number of days from the current date of the ith action date of the user in a certain cluster;
user _ cluster _ score represents the preference score of the user under a certain cluster;
s52, adopting the following formula to make attenuation accumulation for the user' S behavior under each corresponding cluster:
user_cluster_score=sum(decay_rate^days_i);
s53, respectively carrying out proportion calculation on the accumulated sums calculated in the step S62 to obtain the preference score of the user under each corresponding class cluster; and calculating the proportion of the preference scores of the user under a certain cluster to the sum of the preference scores of all the clusters corresponding to the user.
6. A double-clustering hot recommendation device based on user behavior sequences is characterized by comprising:
the network Graph construction module is used for constructing a network Graph of the article based on the user behavior sequence data, wherein each user behavior sequence comprises a plurality of user behaviors of the user on different articles in sequence according to the time sequence, the network Graph of the article is composed of article nodes clicked by the user, and one article node represents one article corresponding to the user behavior;
the article node vector generation module is used for obtaining the vector representation of each article node by using a Graph Embedding method;
the first clustering processing module is used for clustering the vector data of the article nodes through a preset clustering algorithm to generate K clusters, wherein K is a positive integer;
a cluster-like vector generation module, configured to divide the articles that have performed behaviors according to the clusters generated by the first clustering processing module into M sets, and add and average vector data of article nodes in each set to obtain vector representations of the M clusters of each user, where M is within a range of: [1, K ];
the first calculation module is used for performing attenuation accumulation on days of the behavior of the user under each class cluster from the current date and then performing proportion calculation to obtain the preference score of the user on each class cluster corresponding to the user;
the second calculation module is used for carrying out weighted average on the vectors of the users under each class cluster according to the preference scores of the users on each class cluster corresponding to the users to obtain the unique interest vector representation of the users;
the second clustering processing module is used for clustering the unique interest vectors of the users generated by the second computing module again by using a preset clustering algorithm and recording the cluster of each user;
the cluster recommendation list generation module is used for counting the user behaviors under various clusters, calculating the click rate of each article, and performing descending order according to the click rate to form hot lists of various clusters;
and the hot list recommending module is used for recommending a hot list under the cluster where each user is located for each user.
7. The apparatus according to claim 6, wherein the predetermined clustering algorithm in the first clustering module and the second clustering module is KMeans clustering algorithm.
8. The apparatus of claim 6, wherein the first computing module comprises:
a parameter setting unit for setting parameters: decapay _ rate, days _ i, user _ cluster _ score; wherein, decay _ rate represents the attenuation rate, and the value range is as follows: (0, 1); days _ i represents the number of days from the current date of the ith action date of the user in a certain cluster; user _ cluster _ score represents the preference score of the user under a certain cluster;
and the attenuation accumulation unit is used for respectively carrying out attenuation accumulation on the behaviors of the user under each corresponding class cluster by adopting the following formula: user _ cluster _ score is sum (decay _ rate ^ days _ i);
the proportion calculation unit is used for carrying out proportion calculation on each accumulated sum obtained by the attenuation accumulation unit respectively to obtain a preference score of the user under each corresponding class cluster; and calculating the proportion of the preference scores of the user under a certain cluster to the sum of the preference scores of all the clusters corresponding to the user.
9. A computing device, comprising:
a processor; and
a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform a method of user behavior sequence based dual cluster hot recommendation as claimed in any one of claims 1 to 5.
10. A computer readable storage medium having executable code stored thereon, wherein the executable code, when executed by a processor of an electronic device, causes the processor to perform a method for user behavior sequence based bi-cluster hot recommendation according to any of claims 1-5.
CN202210738909.1A 2022-06-28 2022-06-28 Dual-clustering hot recommendation method and device based on user behavior sequence Pending CN115129984A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210738909.1A CN115129984A (en) 2022-06-28 2022-06-28 Dual-clustering hot recommendation method and device based on user behavior sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210738909.1A CN115129984A (en) 2022-06-28 2022-06-28 Dual-clustering hot recommendation method and device based on user behavior sequence

Publications (1)

Publication Number Publication Date
CN115129984A true CN115129984A (en) 2022-09-30

Family

ID=83379125

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210738909.1A Pending CN115129984A (en) 2022-06-28 2022-06-28 Dual-clustering hot recommendation method and device based on user behavior sequence

Country Status (1)

Country Link
CN (1) CN115129984A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116541731A (en) * 2023-05-26 2023-08-04 北京百度网讯科技有限公司 Processing method, device and equipment of network behavior data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116541731A (en) * 2023-05-26 2023-08-04 北京百度网讯科技有限公司 Processing method, device and equipment of network behavior data

Similar Documents

Publication Publication Date Title
CN114780861B (en) Clustering technology-based user multi-interest recommendation method, device, equipment and medium
CN110781321B (en) Multimedia content recommendation method and device
Hofmann et al. Text mining and visualization: Case studies using open-source tools
Revell Size-correction and principal components for interspecific comparative studies
CN111090756B (en) Artificial intelligence-based multi-target recommendation model training method and device
Figueiredo et al. Tribeflow: Mining & predicting user trajectories
US20100017390A1 (en) Apparatus, method and program product for presenting next search keyword
CN111078836A (en) Machine reading understanding method, system and device based on external knowledge enhancement
US11681765B2 (en) System and method for integrating content into webpages
AU2017272253B2 (en) System and method for context and sequence aware recommendation
CN110765348B (en) Hot word recommendation method and device, electronic equipment and storage medium
Weisser et al. Pseudo-document simulation for comparing LDA, GSDMM and GPM topic models on short and sparse text using Twitter data
CN112464100B (en) Information recommendation model training method, information recommendation method, device and equipment
CN111143684A (en) Artificial intelligence-based generalized model training method and device
Weiler et al. Evaluation measures for event detection techniques on twitter data streams
CN113821654A (en) Multimedia data recommendation method and device, electronic equipment and storage medium
Keller et al. Recommender systems for museums: evaluation on a real dataset
Lorenz-Spreen et al. Tracking online topics over time: understanding dynamic hashtag communities
CN115129984A (en) Dual-clustering hot recommendation method and device based on user behavior sequence
Mishra et al. Dynamic identification of learning styles in MOOC environment using ontology based browser extension
WO2019133164A1 (en) System and method for recommending features for content presentations
CN113515663A (en) Comment information display method and device, electronic equipment and storage medium
CN112287160B (en) Method and device for ordering audio data, computer equipment and storage medium
CN114896491B (en) User personalized hot computing method, device, equipment and medium based on clustering
Batiuk et al. Intelligent System for Socialization of Individual's with Shared Interests based on NLP, Machine Learning and SEO Technologies.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination