CN111931053A

CN111931053A - Item pushing method and device based on clustering and matrix decomposition

Info

Publication number: CN111931053A
Application number: CN202010793860.0A
Authority: CN
Inventors: 马晓楠; 权爱荣; 王雅楠
Original assignee: Industrial and Commercial Bank of China Ltd ICBC; ICBC Technology Co Ltd
Current assignee: Industrial and Commercial Bank of China Ltd ICBC; ICBC Technology Co Ltd
Priority date: 2020-08-10
Filing date: 2020-08-10
Publication date: 2020-11-13

Abstract

The application provides a method and a device for pushing items based on clustering and matrix decomposition, wherein the method comprises the following steps: the method comprises the steps of firstly constructing a user label matrix, then clustering users to obtain a plurality of user group clusters, then generating a scoring matrix of user to items for each user group cluster, and then carrying out matrix operation through an ALS matrix decomposition algorithm so as to carry out item recommendation. Because the label clustering is carried out on the user firstly and then the matrix operation is carried out, the calculation amount of the matrix operation can be effectively reduced, and the resource consumption of the system is reduced. The technical problem that the existing item pushing is low in pushing efficiency is solved through the scheme, the technical effect of efficient and accurate pushing of government affair items is achieved, and the item pushing recall rate is improved.

Description

Item pushing method and device based on clustering and matrix decomposition

Technical Field

The application belongs to the technical field of big data processing, and particularly relates to a method and a device for pushing items based on clustering and matrix decomposition.

Background

At present, for the APP in the government affairs field, when item recommendation is performed, recommendation is generally performed based on popular items, and personalized recommendation is rarely performed for different users. Some government affair APPs use the collaborative filtering algorithm to recommend matters, however, the collaborative filtering algorithm is not good for the government affair recommendation, mainly because there are matters whose handling capacity is far greater than other matters in the government affair field, because the principle of the collaborative filtering algorithm is to find the similar matters to a certain matter and then recommend them to the user. However, in this case, similar matters of almost all matters are the most popular matters, and at this time, only using the collaborative filtering algorithm may cause most of the user recommendation matters to be similar, which may not achieve the effect of personalized recommendation, and the actual recommendation accuracy is not high.

Considering that the existing machine learning algorithm in the e-commerce field can be used for item recommendation in the government field, however, in practical application, it is found that there are few items in the scene of the government field, only about two hundred, and there is no great association between the items, unlike the e-commerce scene, the number of articles is huge, and the number of possible alternative commodities reaches tens of thousands or even more; and the incidence relation among the commodities is large, and the commodities which can be accepted by the user can be conveniently recommended for the user as long as the user is subjected to enough consumption portrayal.

Therefore, it is difficult to obtain an accurate personalized recommendation effect regardless of the government affair recommendation by using the collaborative filtering algorithm or the machine learning algorithm. An effective solution is not provided at present for how to obtain an accurate personalized recommendation effect.

Disclosure of Invention

The application aims to provide a method and a device for pushing matters based on clustering and matrix decomposition, which can efficiently and accurately push government matters, so that the recall rate is improved.

The application provides a method and a device for pushing items based on clustering and matrix decomposition, which are realized as follows:

a method of item pushing based on clustering and matrix factorization, the method comprising:

acquiring historical user behavior data of a target application in a preset time period;

constructing a user tag matrix according to the user historical behavior data, the user attribute data and the item tag system;

clustering the users of the target application according to the user label matrix and the historical user behavior data to obtain a user group cluster under each of a plurality of categories;

acquiring historical behavior information of each user under each category from the historical behavior data of the users to determine the processing operation of each user on matters;

according to the processing operation of each user on the items in each user group cluster, a scoring matrix of the user on the items in each user group cluster is constructed, wherein one user group cluster corresponds to one scoring matrix;

performing matrix operation on the scoring matrix of each group cluster through an ALS matrix decomposition algorithm to obtain an operation result matrix;

and pushing items to the user of the target application according to the operation result matrix.

In one embodiment, clustering the users of the target application according to the user tag matrix and the historical user behavior data to obtain a user group cluster under each of a plurality of categories includes:

calculating the average distortion degree of the data under different quantity categories by clustering according to the user label matrix;

taking the average distortion degree as a vertical coordinate and the category number as a horizontal coordinate, and drawing to obtain a target curve;

determining the category number of clusters from the target curve through an elbow rule;

and according to the determined category number, dividing the users of the target application into the user group cluster with the category number through a KMeans algorithm.

In one embodiment, dividing the users of the target application into the user group cluster of the category number according to the determined category number by using a KMeans algorithm includes:

selecting sample data of the users with the number of the categories from the historical user behavior data as an initial clustering center;

calculating the distance from the sample data of each user in the user historical behavior data to each initial clustering center, and dividing the current user into the class where the initial clustering center with the smallest distance is located;

after the division is finished, calculating a clustering center for each category, taking the calculated clustering center as an optimized clustering center, and taking the optimized clustering center as an initial clustering center to perform division and calculation of the clustering center until the calculated clustering center is unchanged.

In one embodiment, constructing a user tag matrix according to the user historical behavior data, the user attribute data and the item tag system includes:

based on the service attribute, dividing different categories for items to establish an item label system;

acquiring historical behavior data and user attribute data of a user;

comparing the historical user behavior data and the user attribute data with the item label system to determine labels carried by the users;

and constructing the user label matrix according to the determined labels carried by the users.

In one embodiment, the constructing a scoring matrix of the user to the item in each user group cluster according to the processing operation of each user to the item in each user group cluster comprises:

carrying out weighted accumulation on the processing operation of the current user on each item in the current user group cluster to obtain a scoring weighted value of the current user on each item;

obtaining a scoring table of the current user group cluster according to the scoring weighted value of each item of each user in the current user group cluster;

and according to the scoring table of the current user group cluster, constructing a scoring matrix of the user to items of the current user group cluster.

In one embodiment, the predicting the item with the score of 0 in the scoring matrix of each group cluster by the ALS matrix decomposition algorithm comprises:

performing multiple iterative operations on the value of 0 in the scoring matrix through a minimum loss function, and determining a value of 0 in the matrix to obtain a predicted value when a preset simulation condition is met;

and taking the obtained predicted value as the predicted value of the item with the score of 0 in the scoring matrix.

In one embodiment, the minimization loss function is:

wherein r is_uiDenotes the user u's score, x, for item i_uPreference matrix, y, representing user u versus implicit characteristics_iA matrix representing implicit features contained in the entry i, and λ represents a regularization coefficient.

In one embodiment, pushing items to a user of the target application according to the obtained scoring matrix of the items with no score of 0 comprises:

determining a preset number of items with the highest target user score from the scoring matrix of the items with the non-existing score of 0;

and pushing the determined scheduled number of items to the target user as the determined push items.

In one embodiment, the user historical behavior data includes at least one of: user historical item collection data, user historical item clicking behavior and user historical item handling behavior.

A transaction pushing device based on clustering and matrix decomposition comprises:

the first acquisition module is used for acquiring historical user behavior data of the target application in a preset time period;

the first construction module is used for constructing a user tag matrix according to the user historical behavior data, the user attribute data and the item tag system;

the clustering module is used for clustering the users of the target application according to the user label matrix and the historical user behavior data to obtain a user group cluster under each of a plurality of categories;

the second acquisition module is used for acquiring historical behavior information of each user in each category from the historical behavior data of the users so as to determine the processing operation of each user on matters;

the second construction module is used for constructing a scoring matrix of the user to the items in each user group cluster according to the processing operation of each user to the items in each user group cluster, wherein one user group cluster corresponds to one scoring matrix;

the prediction module is used for predicting items with the score of 0 in the scoring matrix of each group cluster through an ALS matrix decomposition algorithm, so that the scoring matrix corresponding to each group cluster has no items with the score of 0;

and the pushing module is used for pushing the items to the user of the target application according to the obtained scoring matrix of the items with the non-existence score of 0.

In one embodiment, the clustering module is specifically configured to calculate an average distortion degree of data clustered into different number categories according to the user tag matrix; taking the average distortion degree as a vertical coordinate and the category number as a horizontal coordinate, and drawing to obtain a target curve; determining the category number of clusters from the target curve through an elbow rule; and according to the determined category number, dividing the users of the target application into the user group cluster with the category number through a KMeans algorithm.

In one embodiment, dividing the users of the target application into the user group cluster of the category number according to the determined category number by using a KMeans algorithm includes: selecting sample data of the users with the number of the categories from the historical user behavior data as an initial clustering center; calculating the distance from the sample data of each user in the user historical behavior data to each initial clustering center, and dividing the current user into the class where the initial clustering center with the smallest distance is located; after the division is finished, calculating a clustering center for each category, taking the calculated clustering center as an optimized clustering center, and taking the optimized clustering center as an initial clustering center to perform division and calculation of the clustering center until the calculated clustering center is unchanged.

In one embodiment, the first building module is specifically configured to partition different categories for the item based on the service attribute to establish an item label system; acquiring historical behavior data and user attribute data of a user; comparing the historical user behavior data and the user attribute data with the item label system to determine labels carried by the users; and constructing the user label matrix according to the determined labels carried by the users.

In one embodiment, the second building module is specifically configured to perform weighted accumulation on processing operations of a current user on each item in a current user group cluster to obtain a scoring weighted value of the current user on each item; obtaining a scoring table of the current user group cluster according to the scoring weighted value of each item of each user in the current user group cluster; and according to the scoring table of the current user group cluster, constructing a scoring matrix of the user to items of the current user group cluster.

In a trial mode, the prediction module is specifically configured to perform multiple iterative operations on a value of 0 in the scoring matrix through a minimization loss function, and determine a value of 0 in the matrix to obtain a predicted value when a preset simulation condition is met; and taking the obtained predicted value as the predicted value of the item with the score of 0 in the scoring matrix.

In one embodiment, the above-mentioned minimization loss function may be:

In one embodiment, the pushing module is specifically configured to determine a predetermined number of items with the highest score of the target user from the scoring matrix of the items with the non-existence score of 0; and pushing the determined scheduled number of items to the target user as the determined push items.

In one embodiment, the user historical behavior data may include, but is not limited to, at least one of: user historical item collection data, user historical item clicking behavior and user historical item handling behavior.

A terminal device comprising a processor and a memory for storing processor-executable instructions, the instructions when executed by the processor implementing the steps of the method of:

A computer readable storage medium having stored thereon computer instructions which, when executed, implement the steps of a method comprising:

According to the item pushing method and device based on clustering and matrix decomposition, a user label matrix is firstly established, clustering processing is carried out on users, a plurality of user group clusters are obtained, a scoring matrix of the users for items is generated for each user group cluster, and then matrix operation is carried out through an ALS matrix decomposition algorithm, so that item recommendation is carried out. Because the label clustering is carried out on the user firstly and then the matrix operation is carried out, the calculation amount of the matrix operation can be effectively reduced, and the resource consumption of the system is reduced. The technical problem that the existing item pushing is low in pushing efficiency is solved through the scheme, the technical effect of efficient and accurate pushing of government affair items is achieved, and the item pushing recall rate is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

FIG. 1 is a flowchart of a method of one embodiment of a method for pushing items based on clustering and matrix factorization provided herein;

FIG. 2 is a graph illustrating the determination of the number of clusters by the elbow rule provided in the present application;

FIG. 3 is an architecture diagram of a computer terminal provided herein;

fig. 4 is a block diagram of a transaction pushing apparatus based on clustering and matrix decomposition according to the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

For the existing government affair recommendation, a collaborative filtering mode or a popular affair recommendation mode is generally adopted, however, the recall rate of the recommendation modes is low, and the data volume required for realization is large. If the ALS (Alternating Least Squares) matrix decomposition algorithm is used singly, the calculation amount is large. Therefore, in the embodiment, a label recommendation and clustering algorithm is introduced on the basis of the ALS matrix decomposition algorithm, the users are endowed with personalized labels through expert rules, and then all the users are classified according to the labels, so that the similarity degree of the users in each class is higher, and the fitting effect is better. After the users are subjected to label clustering and classified into k types, matrix operation is performed by using the ALS algorithm, and one type of users corresponds to one matrix, so that the matrix size of the single matrix is small, the resource consumed by matrix calculation can be greatly reduced, and the recommendation of each type can be calculated in parallel, so that the calculation time is remarkably improved compared with the original ALS algorithm.

FIG. 1 is a flowchart of a method of an embodiment of a method for pushing items based on clustering and matrix factorization as described herein. Although the present application provides method operational steps or apparatus configurations as illustrated in the following examples or figures, more or fewer operational steps or modular units may be included in the methods or apparatus based on conventional or non-inventive efforts. In the case of steps or structures which do not logically have the necessary cause and effect relationship, the execution sequence of the steps or the module structure of the apparatus is not limited to the execution sequence or the module structure described in the embodiments and shown in the drawings of the present application. When the described method or module structure is applied in an actual device or end product, the method or module structure according to the embodiments or shown in the drawings can be executed sequentially or executed in parallel (for example, in a parallel processor or multi-thread processing environment, or even in a distributed processing environment).

Specifically, as shown in fig. 1, a method for pushing items based on clustering and matrix factorization according to an embodiment of the present application may include the following steps:

step 101: acquiring historical user behavior data of a target application in a preset time period;

specifically, the historical behavior data of the user in the preset time period may be obtained from the history of the application. The time for triggering can be confirmed according to the requirement. For example, it may be triggered at a fixed time, twelve am and twelve pm per day, or when certain conditions are met, such as application start-up or shut-down. The specific condition or conditions used as the trigger condition may be determined according to an application scenario or a requirement, which is not limited in the present application.

The user historical behavior data may include, but is not limited to, at least one of the following: user historical item collection data, user historical item clicking behavior and user historical item handling behavior.

Step 102: constructing a user tag matrix according to the user historical behavior data, the user attribute data and the item tag system;

when the method is implemented, different categories can be divided for items based on the service attributes so as to establish an item label system; acquiring historical behavior data and user attribute data of a user; comparing the historical user behavior data and the user attribute data with the item label system to determine labels carried by the users; and constructing the user label matrix according to the determined labels carried by the users.

For example, a label may be employed that gives users their personalities through expert rules, so that the users that are ultimately classified in a class of user clusters are the closest users. For example, the users may be classified into a category by combining some basic attribute identity information of the users, for example, the users are administrative staff of a unit and the business of dealing with social security is needed each time. Or performing cluster analysis according to the historical behavior data of the users, and dividing the users who always apply for handling similar services into the same category. It is also possible to use the following information according to the user identity, for example: doctors, classified into one category. To achieve this division, corresponding labels may be added to different users, and then the users are classified by cluster analysis of the labels.

That is, in implementation, a classification label may be set for a user in combination with personal information of the user and historical behavior (i.e., business handling habits), so as to implement final accurate clustering.

Step 103: clustering the users of the target application according to the user label matrix and the historical user behavior data to obtain a user group cluster under each of a plurality of categories;

when the method is implemented, the average distortion degree of the data under different quantity categories can be calculated according to the user label matrix; taking the average distortion degree as a vertical coordinate and the category number as a horizontal coordinate, and drawing to obtain a target curve; determining the category number of clusters from the target curve through an elbow rule; and according to the determined category number, dividing the users of the target application into the user group cluster with the category number through a KMeans algorithm.

The KMeans algorithm (K-means clustering algorithm) is an iterative solution clustering analysis algorithm, and the method comprises the steps of dividing data into K groups, randomly selecting K objects as initial clustering centers, calculating the distance between each object and each seed clustering center, and allocating each object to the nearest clustering center. The cluster centers and the objects assigned to them represent a cluster. The cluster center of a cluster is recalculated for each sample assigned based on the objects existing in the cluster. This process will be repeated until some termination condition is met. The termination condition may be that no (or minimum number) objects are reassigned to different clusters, no (or minimum number) cluster centers are changed again, and the sum of squared errors is locally minimal.

According to the determined number of categories, dividing the users of the target application into the user group cluster with the number of categories through a KMeans algorithm, which may include: selecting sample data of the users with the number of the categories from the historical user behavior data as an initial clustering center; calculating the distance from the sample data of each user in the user historical behavior data to each initial clustering center, and dividing the current user into the class where the initial clustering center with the smallest distance is located; after the division is finished, calculating a clustering center for each category, taking the calculated clustering center as an optimized clustering center, and taking the optimized clustering center as an initial clustering center to perform division and calculation of the clustering center until the calculated clustering center is unchanged.

For example, canFirstly, randomly selecting k samples from data as initial clustering center C ═ C₁,c₂,...,c_k}; then, for each sample X in the dataset_iAnd calculating the distance from the sample to each cluster center and dividing the distance into the class in which the cluster center with the smallest distance is located, wherein the distance measurement formula can be as follows:

wherein X, Y respectively represent two samples for which a distance is to be calculated, x_i、y_iThe numerical values of the two samples in each dimension are respectively, and the formula represents the straight-line distance between the two coordinates in a two-dimensional coordinate system. Then for each assigned class C_iRecalculating the cluster centers for the category

(i.e., the centroids of all samples belonging to the class); and finally, continuously repeating the steps of calculating the distance and re-determining the clustering center until the position of the clustering center is not changed.

The elbow rule is to calculate the error square sum of the clusters under different cluster types and draw an image, and as shown in fig. 2, the k value when the curve is changed from sharp drop to gentle drop is the cluster type number that we want to select. That is, it is possible to select k to 3 as the user cluster category number, referring to the curve in fig. 2.

Step 104: acquiring historical behavior information of each user under each category from the historical behavior data of the users to determine the processing operation of each user on matters;

step 105: according to the processing operation of each user on the items in each user group cluster, a scoring matrix of the user on the items in each user group cluster is constructed, wherein one user group cluster corresponds to one scoring matrix;

in one embodiment, the processing operations of the current user on each item in the current user group cluster can be weighted and accumulated to obtain a scoring weighted value of the current user on each item; obtaining a scoring table of the current user group cluster according to the scoring weighted value of each item of each user in the current user group cluster; and according to the scoring table of the current user group cluster, constructing a scoring matrix of the user to items of the current user group cluster.

For example: the historical behavior information of each user under each category can be obtained from historical data, click and transaction records of all matters of the user are summarized, and the record is obtained according to R ∑ (k)₁C_i+k₂F_i+k₃E_i) Obtaining the scoring weight value of all matters of the user, wherein C_i、F_i、E_iAnd respectively counting the times of clicking, handling and collecting the item i by the user, and sequentially constructing a scoring table of the item by the user. All users in each category have the scores of past behavior items, and the score of the non-behavior items is 0;

step 106: performing matrix operation on the scoring matrix of each group cluster through an ALS matrix decomposition algorithm to obtain an operation result matrix;

the ALS matrix decomposition algorithm is to find an optimal solution with a null value in a scoring matrix by using an ALS (alternating least squares) algorithm, so that a predicted non-null value is similar to a known value as much as possible.

Specifically, a value of 0 in the scoring matrix can be subjected to multiple iterative operations through a minimum loss function, and a value of 0 in the matrix is determined to be a predicted value when a preset simulation condition is met; and taking the obtained predicted value as the predicted value of the item with the score of 0 in the scoring matrix. Wherein the minimization loss function can be expressed as:

Step 107: and pushing items to the user of the target application according to the operation result matrix.

For effective pushing, in implementation, a predetermined number of items with the highest score of a target user may be determined from the operation result matrix; and pushing the determined scheduled number of items to the target user as the determined push items.

In the above example, a user tag matrix is first constructed, then clustering is performed on users to obtain a plurality of user group clusters, then a scoring matrix of user-to-item is generated for each user group cluster, and then matrix operation is performed through an ALS matrix decomposition algorithm, so as to perform item recommendation. Because the label clustering is carried out on the user firstly and then the matrix operation is carried out, the calculation amount of the matrix operation can be effectively reduced, and the resource consumption of the system is reduced. The technical problem that the existing item pushing is low in pushing efficiency is solved through the scheme, the technical effect of efficient and accurate pushing of government affair items is achieved, and the item pushing recall rate is improved.

The recall rate may be obtained by dividing the number of the items of intersection between the items recommended to the user and the actual click behavior of the user by the number of the actual click items of the user, which may be understood as the percentage of the items recommended in the items clicked by the user.

The above method is described below with reference to a specific example, however, it should be noted that the specific example is only for better describing the present application and is not to be construed as limiting the present application.

Considering that in the government affair recommendation, if only the ALS matrix decomposition algorithm is adopted, a good recommendation effect can be obtained, but in a scene with a large number of users and affairs, the user-affair scoring matrix obtained by construction is extremely huge, and at this time, the problem of long time consumption occurs when the ALS matrix decomposition algorithm is applied, so that the ALS matrix decomposition algorithm is not suitable for the current scene.

In order to overcome the defects that the existing method cannot perform personalized recommendation and recommendation effect for a user and cannot meet the user requirements and the time consumption is very long when an ALS matrix decomposition algorithm is simply applied, the embodiment provides a government affair APP item recommendation method based on user tag clustering and the ALS matrix decomposition algorithm. By the method, the defect that the existing recommendation method cannot provide personalized recommendation for the user can be overcome, the evaluation index can be greatly improved compared with other algorithms, the problem of long time consumption caused by using only the ALS algorithm can be solved, and the accuracy is improved to a certain extent compared with the ALS algorithm.

Specifically, a government affair APP item recommendation method based on user tag clustering and ALS matrix decomposition algorithm is provided, which comprises the following steps:

s1: obtaining behavior data of an online user of an APP for a period of time, wherein the behavior data may include: the user clicks, transacts, collects the items and other behavior information. And then, filtering and cleaning the behavior data to filter out error data and problem data.

S2: from the aspect of business, different categories are divided according to items, and an item label system is established; according to the historical behavior data of the users, a user behavior label matrix is constructed, wherein the matrix comprises labels carried by all the users according to a label system, namely a user-label table;

specifically, a personalized tag can be given to the user by an expert rule, so that the final users classified in a class of user clusters are the closest users. For example, the users may be classified into a category by combining some basic attribute identity information of the users, for example, the users are administrative staff of a unit and the business of dealing with social security is needed each time. Or performing cluster analysis according to the historical behavior data of the users, and dividing the users who always apply for handling similar services into the same category. It is also possible to use the following information according to the user identity, for example: doctors, classified into one category. To achieve this division, corresponding labels may be added to different users, and then the users are classified by cluster analysis of the labels.

S3: and according to the generated user label matrix, clustering operation is carried out on the users by using a KMeans algorithm.

Specifically, k samples may be randomly selected from the data as the initial clustering center C ═ C₁,c₂,...,c_k}; then, for each sample X in the dataset_iAnd calculating the distance from the sample to each cluster center and dividing the distance into the class in which the cluster center with the smallest distance is located, wherein the distance measurement formula can be as follows:

wherein X, Y respectively represent two samples for which a distance is to be calculated, x_i、y_iThe numerical values of the two samples in each dimension are respectively, and the formula represents the straight-line distance between the two coordinates in a two-dimensional coordinate system.

Then, for each assigned category C_iRecalculating the cluster centers for the category

Furthermore, in order to count the clustering effect under different k values and select the best clustering class number k, the variance error in the cluster can be introduced

Calculating the average distortion degree of the data under different number categories, drawing an image by taking the average distortion degree as a y axis and the number k of the clustering categories as an x axis, and determining the optimal clustering category number according to an 'elbow rule' (namely that the ordinate is converted from obvious reduction to smoothness);

s4: performing clustering operation on the users according to the obtained optimal clustering category quantity, dividing the users into specified k categories, and obtaining user group clusters under each category;

s5: obtaining the historical behavior information of each user under each category from the historical data, summarizing the click and transaction records of the users for all matters, and obtaining the record according to R ∑ (k)₁C_i+k₂F_i+k₃E_i) Obtaining the scoring weight value of all matters of the user, wherein C_i、F_i、E_iRespectively counting the times of clicking, handling and collecting the item i by the user, and sequentially constructing a scoring table of the item by the user;

s6: constructing a user-item scoring matrix by using the constructed user-item scoring table to obtain the scoring of the action items of all users under each category, wherein the action-free item scoring is 0;

s7: predicting the item with the score of 0 in the constructed user-item-scoring matrix by using ALS matrix decomposition algorithm and minimizing a loss function

Wherein r is_uiDenotes the user u's score, x, for item i_uPreference matrix, y, representing user u versus implicit characteristics_iA matrix representing implicit characteristics contained in the item i, and lambda represents a regularization coefficient for preventing overfitting) to continuously iterate budget for a value of 0 in the matrix, and finally, a predicted value of 0 in the matrix is found when the matrix simulation effect is best, so that a user-item scoring matrix without 0 items is obtained;

the user-item-scoring matrix is constructed by calculating the scores of the items of the users according to a certain proportion through historical clicking, handling and other behaviors of the users. The top header of the matrix is all entries, the left header is all users, the scores of known users for entries are filled into the matrix, and then the scores of entries without values are predicted.

S8: and according to the obtained user-item scoring matrix, obtaining scores of all items of the predicted user, and according to the scores, screening and obtaining top N (TopN) items which the user likes and transacts the best as recommended items for the user.

In this example, by proposing an ALS matrix decomposition algorithm, preference scores of all users for all items are predicted, and items that the users prefer to handle are obtained. Therefore, personalized recommendation is provided for the users, and for each user, items are recommended according to personal preferences of the user, so that the defect of insufficient personalization commonly existing in collaborative filtering and popular items is overcome; and aiming at the characteristic that the user population in the government affair field is obvious, the method firstly proposes that the cluster calculation is carried out on the recommendation in the government affair field, users with the same characteristics are classified into one class, and then the user recommendation is calculated respectively, so that the recommendation effect can be greatly improved. Compared with the existing collaborative filtering recommendation or popular item recommendation government affairs field, the recall rate effect of the original collaborative filtering algorithm and popular item recommendation can only reach about 30%, and effective recommendation cannot be brought to users. Through the clustering and ALS algorithm provided by the embodiment, the recommended recall rate can reach over 75%, the item recommendation effect is improved to a great extent, and the experience of a user on application is improved.

The method embodiments provided in the above embodiments of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Taking the example of running on a computer terminal, fig. 3 is a block diagram of a hardware structure of the computer terminal of the item pushing method based on clustering and matrix decomposition according to the embodiment of the present invention. As shown in fig. 3, the computer terminal 10 may include one or more (only one shown) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 104 for storing data, and a transmission module 106 for communication functions. It will be understood by those skilled in the art that the structure shown in fig. 3 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 3, or have a different configuration than shown in FIG. 3.

The memory 104 may be used to store software programs and modules of application software, such as program instructions/modules corresponding to the clustering and matrix factorization-based item pushing method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, that is, implementing the clustering and matrix factorization-based item pushing method of the application program. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission module 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission module 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission module 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

At the software level, the above item pushing apparatus based on clustering and matrix factorization may be as shown in fig. 4, and includes:

a first obtaining module 401, configured to obtain historical user behavior data of a target application in a preset time period;

a first constructing module 402, configured to construct a user tag matrix according to the user historical behavior data, the user attribute data, and the item tag system;

a clustering module 403, configured to cluster the users of the target application according to the user tag matrix and the user historical behavior data to obtain a user group cluster under each of multiple categories;

a second obtaining module 404, configured to obtain historical behavior information of each user in each category from the user historical behavior data, so as to determine a processing operation of each user on an event;

a second construction module 405, configured to construct a scoring matrix for the items by the users in each user group cluster according to the processing operation for the items by each user in each user group cluster, where one user group cluster corresponds to one scoring matrix;

the prediction module 406 is configured to perform matrix operation on the scoring matrix of each group cluster through an ALS matrix decomposition algorithm to obtain an operation result matrix;

and the pushing module 407 is configured to push a transaction to the user of the target application according to the operation result matrix.

In an embodiment, the clustering module 403 may be specifically configured to calculate an average distortion degree of data clustered into different number categories according to the user tag matrix; taking the average distortion degree as a vertical coordinate and the category number as a horizontal coordinate, and drawing to obtain a target curve; determining the category number of clusters from the target curve through an elbow rule; and according to the determined category number, dividing the users of the target application into the user group cluster with the category number through a KMeans algorithm.

In one embodiment, the first building module 402 may be specifically configured to divide the items into different categories based on the service attributes to establish an item label system; acquiring historical behavior data and user attribute data of a user; comparing the historical user behavior data and the user attribute data with the item label system to determine labels carried by the users; and constructing the user label matrix according to the determined labels carried by the users.

In an embodiment, the second building module 405 may be specifically configured to perform weighted accumulation on processing operations of a current user on each item in a current user group cluster, so as to obtain a scoring weight value of the current user on each item; obtaining a scoring table of the current user group cluster according to the scoring weighted value of each item of each user in the current user group cluster; and according to the scoring table of the current user group cluster, constructing a scoring matrix of the user to items of the current user group cluster.

In a trial mode, the prediction module 406 may be specifically configured to perform multiple iterative operations on a value of 0 in the score matrix through a minimization loss function, and determine a predicted value of the value of 0 in the matrix when a preset simulation condition is met; and taking the obtained predicted value as the predicted value of the item with the score of 0 in the scoring matrix.

In one embodiment, the above-mentioned minimization loss function may be:

In one embodiment, the pushing module 407 may be specifically configured to determine a predetermined number of items with the highest score of the target user from the scoring matrix of the items with the non-existence score of 0; and pushing the determined scheduled number of items to the target user as the determined push items.

An embodiment of the present application further provides a specific implementation manner of an electronic device, which is capable of implementing all steps in the item pushing method based on clustering and matrix decomposition in the foregoing embodiment, where the electronic device specifically includes the following contents: a processor (processor), a memory (memory), a communication Interface (Communications Interface), and a bus; the processor, the memory and the communication interface complete mutual communication through the bus; the processor is configured to call a computer program in the memory, and the processor implements all the steps in the item pushing method based on clustering and matrix factorization in the above embodiment when executing the computer program, for example, the processor implements the following steps when executing the computer program:

step 1: acquiring historical user behavior data of a target application in a preset time period;

step 2: constructing a user tag matrix according to the user historical behavior data, the user attribute data and the item tag system;

and step 3: clustering the users of the target application according to the user label matrix and the historical user behavior data to obtain a user group cluster under each of a plurality of categories;

and 4, step 4: acquiring historical behavior information of each user under each category from the historical behavior data of the users to determine the processing operation of each user on matters;

and 5: according to the processing operation of each user on the items in each user group cluster, a scoring matrix of the user on the items in each user group cluster is constructed, wherein one user group cluster corresponds to one scoring matrix;

step 6: performing matrix operation on the scoring matrix of each group cluster through an ALS matrix decomposition algorithm to obtain an operation result matrix;

and 7: and pushing items to the user of the target application according to the operation result matrix.

As can be seen from the above description, in the embodiment of the present application, a user tag matrix is first constructed, then clustering is performed on users to obtain a plurality of user group clusters, then a scoring matrix of user-to-item is generated for each user group cluster, and then matrix operation is performed through an ALS matrix decomposition algorithm, so as to perform item recommendation. Because the label clustering is carried out on the user firstly and then the matrix operation is carried out, the calculation amount of the matrix operation can be effectively reduced, and the resource consumption of the system is reduced. The technical problem that the existing item pushing is low in pushing efficiency is solved through the scheme, the technical effect of efficient and accurate pushing of government affair items is achieved, and the item pushing recall rate is improved.

Embodiments of the present application further provide a computer-readable storage medium capable of implementing all steps in the clustering and matrix factorization based transaction pushing method in the above embodiments, where the computer-readable storage medium stores thereon a computer program, and when the computer program is executed by a processor, the computer program implements all steps of the clustering and matrix factorization based transaction pushing method in the above embodiments, for example, when the processor executes the computer program, the processor implements the following steps:

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the hardware + program class embodiment, since it is substantially similar to the method embodiment, the description is simple, and the relevant points can be referred to the partial description of the method embodiment.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Although the present application provides method steps as described in an embodiment or flowchart, additional or fewer steps may be included based on conventional or non-inventive efforts. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an actual apparatus or client product executes, it may execute sequentially or in parallel (e.g., in the context of parallel processors or multi-threaded processing) according to the embodiments or methods shown in the figures.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a vehicle-mounted human-computer interaction device, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Although embodiments of the present description provide method steps as described in embodiments or flowcharts, more or fewer steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an actual apparatus or end product executes, it may execute sequentially or in parallel (e.g., parallel processors or multi-threaded environments, or even distributed data processing environments) according to the method shown in the embodiment or the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in a process, method, article, or apparatus that comprises the recited elements is not excluded.

For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, in implementing the embodiments of the present description, the functions of each module may be implemented in one or more software and/or hardware, or a module implementing the same function may be implemented by a combination of multiple sub-modules or sub-units, and the like. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

The embodiments of this specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The described embodiments may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment. In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of an embodiment of the specification. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

The above description is only an example of the embodiments of the present disclosure, and is not intended to limit the embodiments of the present disclosure. Various modifications and variations to the embodiments described herein will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the embodiments of the present specification should be included in the scope of the claims of the embodiments of the present specification.

Claims

1. A method for pushing items based on clustering and matrix decomposition is characterized by comprising the following steps:

2. The method of claim 1, wherein clustering the users of the target application according to the user tag matrix and user historical behavior data to obtain user population clusters under each of a plurality of categories comprises:

3. The method of claim 2, wherein partitioning the users of the target application into the cluster of the number of categories of user groups by a KMeans algorithm according to the determined number of categories comprises:

4. The method of claim 1, wherein constructing a user tag matrix based on the user historical behavior data, user attribute data, and transaction tag hierarchy comprises:

acquiring historical behavior data and user attribute data of a user;

5. The method of claim 1, wherein constructing a scoring matrix for the user-to-item in each user group cluster according to the processing operation of each user to item in each user group cluster comprises:

6. The method as claimed in claim 1, wherein the scoring matrix of each group cluster is subjected to matrix operation by ALS matrix decomposition algorithm to obtain an operation result matrix, comprising:

7. The method of claim 6, wherein the minimization loss function is:

8. The method of claim 6, wherein pushing a transaction to a user of the target application according to the matrix of operation results comprises:

determining a preset number of items with the highest target user score from the operation result matrix;

9. The method of any of claims 1 to 8, wherein the user historical behavior data comprises at least one of: user historical item collection data, user historical item clicking behavior and user historical item handling behavior.

10. A transaction pushing device based on clustering and matrix decomposition is characterized by comprising:

the prediction module is used for respectively carrying out matrix operation on the scoring matrix of each group cluster through an ALS matrix decomposition algorithm to obtain an operation result matrix;

and the pushing module is used for pushing items to the user of the target application according to the operation result matrix.

11. A terminal device comprising a processor and a memory for storing processor-executable instructions, the instructions when executed by the processor implementing the steps of the method of:

12. A computer readable storage medium having stored thereon computer instructions which, when executed, implement the steps of a method comprising: