CN109101642B

CN109101642B - Method for reducing group recommendation list based on subgroup and social behavior

Info

Publication number: CN109101642B
Application number: CN201810951568.XA
Authority: CN
Inventors: 毛宇佳; 刘学军; 何瑾琳; 张军强; 陆淑娟
Original assignee: Nanjing Tech University
Current assignee: Nanjing Tech University
Priority date: 2018-08-20
Filing date: 2018-08-20
Publication date: 2022-06-24
Anticipated expiration: 2038-08-20
Also published as: CN109101642A

Abstract

The invention discloses a method for reducing a group recommendation list based on subgroups and social behaviors, and belongs to the technical field of data processing. Dividing a data set into a plurality of groups to obtain a project theme characteristic model and a user theme preference model, and dividing each group into a plurality of subgroups; acquiring an initial group recommendation list, subgroup preferences and subgroup weights according to the subgroups; obtaining group preference through a weighting model according to the subgroup weight and the subgroup preference; and performing similarity matching on the group preference and the initial group recommendation list to obtain a final group recommendation list. The invention can maximally reduce the group recommendation list on the premise of meeting the recommendation accuracy and fairness, so that the group members can make a selection more conveniently.

Description

Method for reducing group recommendation list based on subgroup and social behavior

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to a method for reducing a group recommendation list based on subgroups and social behaviors.

Background

With the rise of the internet era, people's clothes and food habits have changed greatly. Users purchase, read and browse on the internet, enjoy the rapidness brought by the internet and provide a lot of personal information for the internet, and the users not only receive information on the internet, but also manufacture information, which causes the explosive growth of data. In the face of increasingly dramatic data, sufficient analysis and utilization are needed to better serve users, and how to quickly and accurately obtain contents desired by users from massive data is a problem to be solved at present. The recommendation system aims to provide the user with items of interest in an overloaded search space. Currently, recommendation systems have been successfully applied in the fields of education, electronic commerce, financial investment, etc.

With the growing development of recommendation systems, the acquisition channels of user preferences are more and more diversified, and different additional information is also used for analyzing the requirement preferences of users, such as social relations, check-in records, comment information, pictures uploaded by the users and the like. The purpose of the recommendation system is also gradually expanded from the traditional accuracy of recommendation improvement to the improvement of the real-time performance and diversity of recommendation, so that the use experience of the user is improved, and better service is provided for the user more humanizedly. However, the conventional recommendation system provides recommendations for a single user, and many recommendations are often provided for a group of people, so the recommendation system needs to consider the needs of each user in the group and make recommendations satisfying the group members, thereby generating a group recommendation system. At present, group recommendation researches are mainly aimed at improving accuracy and fairness of results, and researches between scale size of a research recommendation list and recommendation satisfaction are not much. The final selection of the group members can be influenced by too large and too small a recommendation list generated by a group, and the larger the list is, the greater the diversity is, but the more difficult the member selection is; the list is too small to guarantee that as many items in the recommendation list as possible meet the member's preferences.

Disclosure of Invention

The invention aims to provide a method for reducing a group recommendation list based on subgroups and social behaviors, which can solve the problem of overlarge scale of the recommendation list in group recommendation and can ensure the fairness of recommendation.

Specifically, the invention is realized by adopting the following technical scheme, comprising the following steps:

dividing the data set into a plurality of groups to obtain a project theme characteristic model and a user theme preference model, and dividing each group into a plurality of subgroups;

acquiring an initial group recommendation list, subgroup preferences and subgroup weights according to the subgroups;

obtaining group preference through a weighting model according to the subgroup weight and the subgroup preference;

and performing similarity matching on the group preference and the initial group recommendation list to obtain a final group recommendation list.

Further, the step of obtaining the item topic feature model and the user topic preference model includes:

acquiring a project theme characteristic model by adopting an LDA theme model;

dividing users into active users and inactive users according to historical scoring records; an active user obtains a user theme preference model through TF-IDF and time factors; an inactive user obtains a theme preference model through an external expert;

members with high similarity are divided into a subgroup by utilizing similarity of preference of attributes among the members.

Further, the step of obtaining the initial group recommendation list comprises:

obtaining a similar subgroup of the target subgroup from two angles of scoring information and cross-item attribute information through a cosine similarity formula;

and obtaining similar subgroups of subgroups in the current group according to the similar subgroups, and obtaining recommendation lists of the subgroups in the current group, wherein the set of the recommendation lists is used as an initial group recommendation list. Further, the calculating of the subgroup preference refers to calculating the similarity between the members, and taking the preference with high similarity of the subgroup members as the current subgroup preference.

Further, the calculating of the subgroup weight means that the data is further processed, and the weight occupied by the subgroup in the current group is calculated according to the obtained tolerance and the beneficial behavior index of the members in the group.

The invention has the following beneficial effects: the method for reducing the group recommendation list based on the subgroups and the social behaviors maximally reduces the group recommendation list on the premise of meeting the recommendation accuracy and fairness, so that group members can make selections more conveniently.

Drawings

Fig. 1 is a system framework diagram of embodiment 1 of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the following examples and the accompanying drawings.

Example 1:

an embodiment of the present invention, taking the movie list as an example, introduces a method for narrowing down the group recommendation list based on subgroups and social behaviors, and the implementation process is shown in fig. 1.

The method comprises the following steps: the group is divided into subgroups.

The method comprises the steps of filtering useless data of an obtained original data set, preprocessing (randomly dividing the data set into a plurality of groups) the data set to obtain a project theme feature (namely, a movie type) model and a user theme preference model, and dividing each group into a plurality of subgroups. The method for obtaining the project theme feature model and the user theme preference model comprises the following steps:

1-1) adopting an LDA (Latent Dirichlet Allocation) topic model to obtain the following project topic feature models:

wherein, use

Representing the s-th film m_sWhether or not the subject feature g is contained in_i. When in use

Representing a movie m_sContains subject feature g_iOn the contrary, the number of the first and second,

representing a movie m_sDoes not contain the subject feature g_i。

1-2) dividing users into active and inactive users according to historical scoring records.

Active users can communicate with each other through TF-IDF (Term Frequency-Inverse file Frequency)The time factor obtains a user topic preference model. Active user u_jFor subject feature g_iPreference of

The calculation method is as follows:

wherein the content of the first and second substances,

representing a movie m_sWhether or not the subject feature g is contained in_iWhen it comes to

Representing a movie m_sContaining the subject feature g_iOn the contrary, in the case of a high-frequency,

representing a movie m_sDoes not contain the subject feature g_i(ii) a n is the number of movies in the current group; k is the number of the current movie theme features;

representing a user u merged into a forgetting function_iFor movie m_sScoring of (4);

the calculation formula of (2) is as follows:

wherein the content of the first and second substances,

representative user u_jFor movie m_sF (Δ t) represents a forgetting function, and f (Δ t) is calculated by the formula:

wherein, Δ T represents the time difference from the scoring behavior of the user to the current time point, T₀Is a decay coefficient, controls the speed of interest decay, T₀The larger the interest decay rate. For temporary preferences, i.e. the preferences just generated, the forgetting speed of the user is faster due to the smaller time difference Δ t, and for fixed preferences, the forgetting speed is slower.

Thus, active user u_jSubject preference model

Comprises the following steps:

where m is the total number of users, i.e. the sum of the number of active users and the number of inactive users.

The inactive user obtains the theme preference model through an external expert. Based on the observation of users in the sweepy microblog, the following two types of users are collectively called as external experts:

● social celebrities, have a large number of concerns and are recognized by the general public in real life.

● experts, are in widespread use in a particular area.

External expert e_tFor subject feature g_iPreference of

The calculation method of (A) is as follows:

wherein the content of the first and second substances,

external expert e representing the integration of a forgetting function_tSubject feature g in microblog_iThe number of the forgetting function, where the calculation formula of the forgetting function is the same as the forgetting function f (Δ t);

representing external experts e_tThe total number of microblogs; k is the number of movie theme features;

representing user u_jA set of external experts of interest.

Therefore, an external expert e_tSubject preference model

Comprises the following steps:

the inactive user obtains the theme preference model through an external expert. Topic preference model for inactive users

The calculation method is as follows:

wherein the content of the first and second substances,

is a subject preference model for an external expert,

representing user u_jWith external experts e_tThe common behavior ratio is calculated by the following method:

wherein the content of the first and second substances,

representing user u_jForwarding or commenting by e_tThe number of micro-blogs to send,

representing user u_jA set of external experts of interest.

1-3) members with high similarity are divided into a subgroup by utilizing similarity of preference of attributes among the members.

Member u_jAnd u_lInter-pair attribute g_iSimilarity of preference

The calculation method is as follows:

wherein the content of the first and second substances,

representative user u_jFor g_iThe degree of preference of (a) is,

representative user u₁For g_iK is the number of movie theme features, calculated from the above equation (2).

Step two: obtaining an initial group recommendation list, calculating subgroup preference and calculating subgroup weight.

2a) An initial group recommendation list is obtained.

2a-1) obtaining a similar subgroup Sim (SG) of the target subgroup from two angles of the scoring information and the cross-item attribute information through the following cosine similarity formula_x，SG_y)。sim(SG_x，SG_y) The calculation formula of (2) is as follows:

sim(SG_x，SG_y)＝λsim_R(SG_x，SG_y)+(1-λ)sim_p(SG_x，SG_y) (11)

wherein sim_R(SG_x，SG_y) Representing subgroup SG based on scoring information_xAnd SG_ySimilarity between, sim_p(SG_x，SG_y) Representing sub-group SG based on cross-item attribute_xAnd SG_yAnd the similarity between the two groups is that lambda is a weight factor and the value range is 0-1.

In formula (11), subgroup SG based on score information_xAnd SG_ySimilarity sim between_R(SG_x，SG_y) The calculation formula of (c) is:

wherein the content of the first and second substances,

representing subgroup SG_xAnd SG_yThe collection of movies viewed by the middle members together is calculated by calculating the number of the sub-group members to the movie m_sIs scored to obtain subgroup SG_xFor movie m_sIs scored

And subgroup SG_yFor movie m_sIs scored

In equation (11), subgroup SG based on cross-item attributes_xAnd SG_ySimilarity sim between_p(SG_x，SG_y) The calculation formula of (2) is as follows:

wherein the content of the first and second substances,

representing subgroup SG_xAnd SG_yA collection of movies that the members in the group have viewed together,

representative subgroup SG_xFor g_iThe value of the preference of (c) is,

representative subgroup SG_yFor g_iA preference value of (c).

2a-2) according to similar subgroup Sim (SG)_x，SG_y) Obtaining subgroup SG in current group G_xAnd obtaining the recommendation lists of all subgroups in the current group G, and collecting the recommendation lists as an initial group recommendation list top-N. the top-N is obtained as follows:

wherein the content of the first and second substances,

representative subgroup SG_xIs a subgroup SG_xFor movie m_sIs scored

The top k movies.

2b) Using the similarity between the members calculated in 1-3), the common preference (i.e. the attribute preference with high similarity) of the subgroup members is used as the current subgroup preference

2c) The subgroup weights are calculated.

Further processing the data according to the resulting tolerance of members of the groupDegree and Litta behavior index, calculating the subgroup SG in the current group_xThe occupied weight specifically comprises the following steps:

2c-1) analyzing the reactions of different users in the conflict situation when the users face strangers to obtain the tolerance index of the users

The calculation formula is as follows:

wherein, X₁And X₂Representing the social activity and social influence of the user, alpha, beta and gamma are parameters, the value ranges of alpha and beta are-1-0, the value ranges of gamma are 0-1, and X is₁And X₂The calculation methods of (a) are respectively as follows:

wherein the content of the first and second substances,

representing user u_jIs the set of people of interest of (a),

representing user u_jIs focused on a person's preferred subject matter feature set,

indicating the length of time for which the user is registered,

a set of fans representing a user is presented,

representing user u_jThe mutual powder aggregation of (1).

2c-2) consider when user u_jFace friend u₁Then, the user u is obtained according to the social relationship_jRival behavior index of

The calculation formula is as follows:

wherein the content of the first and second substances,

representing user u_jThe mutual powder aggregation of (A) and (B),

representing user u₁The mutual powder aggregation of (1).

2c-3) obtaining the subgroup SG in the current group according to the obtained user tolerance and the Litta behavior index_xOccupied weight

The calculation formula of (2) is as follows:

wherein, | SG_x| represents subgroup SG_xThe number of people in (1) is,

representative user u_jThe tolerance of the pressure sensor is indicated by the tolerance index,

representative user u_jFace friend u₁Users u in time_jRival behavior index of (SG)_x-{u_jRepresents subgroup SG_xUser u is divided_jOther users than the others.

Step three: a group preference is obtained.

Obtaining group preference through a weighting model according to the subgroup weight and the subgroup preference, wherein the calculation formula is as follows:

wherein, SG_xFor the purpose of the current sub-group,

is subgroup SG in current subgroup G_xThe weight that is taken up by the user,

preference for subgroups, i.e. subgroup SG_xThe theme preferences of (3), have been obtained by step 2 b).

Step four: and obtaining a final group recommendation list.

And performing similarity matching on the group preference and the initial group recommendation list to obtain a final group recommendation list. Similarity matching is carried out by utilizing a cross-item attribute similarity calculation formula, and the calculation mode is as follows:

wherein the content of the first and second substances,

representing the target group G pair G_iThe value of the preference of (c) is,

representative movie m_sWhether or not the subject feature g is contained in_iMovie, filmm_sIs the set of movies contained in the initial group recommendation list top-N, k being the number of movie theme features.

In order to evaluate the performance of the algorithm, a data set can be divided into a training set and a testing set, and Precision (Precision), Recall (Recall) and an F-measure (F-measure) are used as evaluation indexes of the performance of the algorithm. The calculation method of each evaluation index is as follows:

wherein E is_*Representing items that are present in the final recommended list of the training set and the test set at the same time, i.e. items that are predicted to be correct, E_DRepresenting items in the final recommendation list in the test set, E_rAn item representing the final recommendation list in the training set. The larger the Precision, Recall, F values are, the better the algorithm performance is.

And for each group, calculating the precision rate, the recall rate and the F value of the group according to a final recommendation list obtained by the group in the training set and the test set, and finally taking the average value of the precision rate, the recall rate and the F value of all the groups as the final result of the algorithm.

Different group partitions can be performed on the data set again, the final recommendation list is obtained according to the method, and the average values of the accuracy rate, the recall rate and the F value of all the groups are calculated. And comparing the result with the result obtained by the previous group division mode, and when the average values of the precision rate, the recall rate and the F value of the two results tend to be converged, the current group division of the data set and the obtained final recommendation result can be considered to be appropriate.

Although the present invention has been described in terms of the preferred embodiment, it is not intended that the invention be limited to the embodiment. Any equivalent changes or modifications made without departing from the spirit and scope of the present invention also belong to the protection scope of the present invention. The scope of the invention should therefore be determined with reference to the appended claims.

Claims

1. The method for reducing the group recommendation list based on the subgroup and the social behavior is characterized by comprising the following steps:

performing similarity matching on the group preference and the initial group recommendation list to obtain a final group recommendation list;

the step of obtaining the project theme feature model and the user theme preference model and dividing each group into a plurality of subgroups comprises the following steps:

acquiring a project theme characteristic model by adopting an LDA theme model;

dividing users into active users and inactive users according to historical scoring records; the active user obtains a user theme preference model through TF-IDF and a time factor; an inactive user obtains a theme preference model through an external expert;

dividing the data of the members with high similarity into a subgroup by utilizing the similarity of the preference of the members to the attributes;

the active user obtains the user theme preference model through the TF-IDF and the time factor, and the method comprises the following steps:

computing active user u_jFor subject feature g_iPreference of

The calculation method is as follows:

wherein the content of the first and second substances,

representing a user u who has merged into a forgetting function_jFor movie m_sScoring of (4);

the calculation formula of (2) is as follows:

wherein the content of the first and second substances,

wherein, Δ T represents the time difference from the scoring behavior of the user to the current time point, T₀Is a decay coefficient, controls the speed of interest decay, T₀The larger, the slower the interest decay rate; for temporary preference, namely preference just generated, the forgetting speed of the user is higher due to smaller time difference delta t, and for fixed preference, the forgetting speed is lower;

computing active user u_jSubject preference model

Comprises the following steps:

wherein m is the total number of users, i.e. the sum of the number of active users and the number of inactive users;

the method for acquiring the theme preference model by the inactive user through the external expert comprises the following steps:

computing external expert e_tFor subject feature g_iPreference of

The calculation method is as follows:

wherein the content of the first and second substances,

external expert e representing the integration of a forgetting function_tSubject feature g in microblog_iThe number of forgetting functions, where the calculation formula of the forgetting function is the same as the forgetting function f (Δ t);

representing an external expert e_tTotal micro-blog ofCounting; k is the number of movie theme features;

representing user u_jA set of external experts focused on;

define external expert e_tSubject preference model

Comprises the following steps:

computing topic preference models for inactive users

The calculation method is as follows:

wherein, the first and the second end of the pipe are connected with each other,

is a subject preference model for an external expert,

wherein the content of the first and second substances,

representing user u_jA set of external experts focused on;

the step of obtaining an initial group recommendation list comprises:

obtaining a similar subgroup Sim (SG) of the target subgroup from two angles of the scoring information and the cross-item attribute information through a cosine similarity formula_x,SG_y)；

According to similar subgroup Sim (SG)_x,SG_y) Obtaining subgroup SG in current group G_xObtaining the recommendation lists of all subgroups in the current group G, and taking the set of the recommendation lists as an initial group recommendation list top-N;

similar subgroup Sim (SG) of the target subgroup_x，SG_y) The calculation formula of (2) is as follows:

sim(SG_x，SG_y)＝λsim_R(SG_x，SG_y)+(1-λ)sim_p(SG_x，SG_y) (11)

wherein, sim_R(SG_x，SG_y) Representing subgroup SG based on scoring information_xAnd SG_ySimilarity between, sim_p(SG_x，SG_y) Representing sub-group SG based on cross-item attribute_xAnd SG_yThe similarity between the two groups is that lambda is a weight factor and the value range is 0-1;

in the formula (11), the subgroup SG based on the score information_xAnd SG_ySimilarity between them sim_R(SG_x，SG_y) The calculation formula of (c) is:

wherein the content of the first and second substances,

representing subgroup SG_xAnd SG_yThe collection of movies viewed by the middle members together is calculated by calculating the number of the sub-group members to the movie m_sThe average score of (3) results in subgroup SG_xFor movie m_sIs scored by

And subgroup SG_yFor movie m_sIs scored

In formula (11), the subgroup SG based on the cross-item attribute_xAnd SG_ySimilarity sim between_p(SG_x，SG_y) The calculation formula of (2) is as follows:

wherein the content of the first and second substances,

representative subgroup SG_xFor g_iThe value of the preference of (c) is,

representative subgroup SG_yFor g_iA preference value of;

the method for acquiring the initial group recommendation list top-N comprises the following steps:

wherein the content of the first and second substances,

representative subgroup SG_xIs a subgroup SG_xFor movie m_sIs scored

Highest height

A movie;

the subgroup preference refers to a preference with high similarity of subgroup members;

the subgroup weight refers to the weight of the subgroup in the current group obtained by calculation according to the tolerance of the members in the group and the Litta behavior index;

performing similarity matching between the group preference and the initial group recommendation list to obtain a final group recommendation list, wherein the step of performing similarity matching between the group preference and the initial group recommendation list comprises the following steps:

similarity matching is carried out by utilizing a cross-item attribute similarity calculation formula, and the calculation mode is as follows:

wherein the content of the first and second substances,

representing the target group G pair G_iThe value of the preference of (c) is,

representative movie m_sWhether or not the subject feature g is contained in_iMovie m_sIs the set of movies contained in the initial group recommendation list top-N, k being the number of movie theme features.

2. The method of claim 1, wherein the tolerance index of the members in the group is a measure of the probability of the members in the group contracting the group recommendation list based on their subgroups and social behaviors