CN110909257A - Scoring prediction method and device - Google Patents

Scoring prediction method and device Download PDF

Info

Publication number
CN110909257A
CN110909257A CN201911152840.9A CN201911152840A CN110909257A CN 110909257 A CN110909257 A CN 110909257A CN 201911152840 A CN201911152840 A CN 201911152840A CN 110909257 A CN110909257 A CN 110909257A
Authority
CN
China
Prior art keywords
user
item
target
score
cost
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911152840.9A
Other languages
Chinese (zh)
Inventor
张恒汝
钱捷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Petroleum University
Original Assignee
Southwest Petroleum University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Petroleum University filed Critical Southwest Petroleum University
Priority to CN201911152840.9A priority Critical patent/CN110909257A/en
Publication of CN110909257A publication Critical patent/CN110909257A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a scoring prediction method and a scoring prediction device, wherein the scoring prediction method comprises the following steps: acquiring global scoring information, wherein the global scoring information comprises scores of at least two users for at least two projects, the at least two users comprise target users serving as scoring prediction user objects, and the at least two projects comprise target projects serving as scoring prediction project objects; clustering global scoring information to obtain local scoring information, wherein the local scoring information comprises the scoring of at least two items including a target item and at least one reference item by at least one user except the target user, and different users have similar preference and/or different items have similar popularity in the local scoring information; and taking the local scoring information as the input of the collaborative filtering algorithm to obtain the prediction score. According to the scheme, the accuracy of predicting the user score can be improved.

Description

Scoring prediction method and device
Technical Field
The invention relates to the technical field of data processing, in particular to a score prediction method and a score prediction device.
Background
With the continuous development and progress of computer technology and big data processing technology, many industries can determine the preference of users by analyzing historical data generated by the users, and then recommend commodities to the users according to the preference of the users, so as to improve the use experience of the users. For example, in an application scenario for recommending items such as movies, dramas, or books, the rating of a target item by a user may be predicted according to rating information of other items historically performed by the user, and then whether the target item needs to be recommended to the user may be determined according to the predicted rating.
At present, when the scoring of a target user for a target item is predicted, the scoring information of the target user for other items in history and the scoring information of other users who score the target item for other items in history need to be acquired, and then the scoring of the target user for the target item is predicted through a collaborative filtering algorithm according to the acquired scoring information of each item.
In the existing scoring prediction method, historical scoring information of a target user and other users is used as global scoring information and input into a collaborative filtering algorithm to calculate a prediction score, and since one user may score a plurality of projects historically and a plurality of users may score the same project, the global scoring information includes a plurality of dimensions, and the problem of overfitting is likely to occur when the collaborative filtering algorithm is used for calculating the prediction score, so that the accuracy of predicting the user score is poor.
Disclosure of Invention
The embodiment of the invention provides a score prediction method and a score prediction device, which can improve the accuracy of prediction of user scores.
In a first aspect, an embodiment of the present invention provides a score prediction method, including:
acquiring global scoring information, wherein the global scoring information comprises the scoring of at least two projects by at least two users, the at least two users comprise target users serving as scoring prediction user objects, the at least two projects comprise target projects serving as scoring prediction project objects, the global scoring information comprises the scoring of at least one project by the target users except the target projects, and the global scoring information further comprises the scoring of at least one project by each user except the target users;
clustering the global scoring information to obtain local scoring information, wherein the local scoring information comprises the scoring of the target user on at least one reference item except the target item, and the local scoring information further comprises the scoring of at least two items including the target item and at least one reference item by at least one user except the target user respectively, and different users have similar preference and/or different items have similar popularity in the local scoring information;
and taking the local scoring information as the input of a collaborative filtering algorithm to obtain a prediction score, wherein the prediction score is the predicted score of the target user on the target item.
In a first possible implementation manner, with reference to the first aspect, the clustering the global scoring information to obtain local scoring information includes:
clustering the global scoring information to obtain a user cluster, and determining the user cluster as the local scoring information, wherein the user cluster comprises the target user and at least one first user with similar preference to the target user for scoring at least two first items respectively, and the at least two first items comprise the target item and at least one reference item.
In a second possible implementation manner, with reference to the first aspect, the clustering the global scoring information to obtain local scoring information includes:
clustering the global scoring information to obtain a project cluster, and determining the project cluster as the local scoring information, wherein the project cluster comprises scores of at least two second users for the target project and at least one second project with similar popularity degree with the target project, and the at least two second users comprise the target user.
In a third possible implementation manner, with reference to the first aspect, the clustering the global scoring information to obtain local scoring information includes:
clustering the global scoring information to obtain a user-item cluster, and determining the user-item cluster as the local scoring information, wherein the user-item cluster comprises scores of at least two third users with similar preferences on at least two third items with similar popularity, respectively, the at least two third users comprise the target user, and the at least two third items comprise the target item and at least one reference item.
In a fourth possible implementation manner, with reference to the first aspect, after the obtaining the prediction score, the method further includes:
determining a first recommended behavior threshold and a second recommended behavior threshold according to a misclassification cost and a promotion cost, wherein the first recommended behavior threshold is smaller than the second recommended behavior threshold, the misclassification cost is used for representing a cost generated by recommending the items which are disliked by the user to the user or recommending the items which are liked by the user to the user, and the promotion cost is used for representing a cost generated by promoting the items to the user with ambiguous preference;
comparing the predicted score with the first recommended behavior threshold and the second recommended behavior threshold, and performing:
if the prediction score is less than the first recommended behavior threshold, not recommending the target item to the target user;
if the prediction score is greater than or equal to the first recommended behavior threshold and the prediction score is less than the second recommended behavior threshold, promoting the target item to the target user;
recommending the target item to the target user if the prediction score is greater than or equal to the second recommended behavior threshold.
In a fifth possible implementation manner, with reference to the fourth possible implementation manner, the determining a first recommended behavior threshold and a second recommended behavior threshold according to the misclassification cost and the promotion cost includes:
obtaining first to sixth costs, wherein the first cost is a cost for recommending an item to a user when the user likes the item, the second cost is a cost for promoting the item to the user when the user likes the item, the third cost is a cost for not recommending the item to the user when the user likes the item, the fourth cost is a cost for recommending the item to the user when the user dislikes the item, the fifth cost is a cost for promoting the item to the user when the user dislikes the item, and the sixth cost is a cost for not recommending the item to the user when the user dislikes the item;
calculating the first recommended behavior threshold value by a first formula and calculating the second recommended behavior threshold value by a second formula according to the first to sixth costs;
the first formula includes:
Figure BDA0002284016770000041
the second formula includes:
Figure BDA0002284016770000042
wherein, the β*For characterizing the first recommended behavior threshold, said α*For characterizing the second recommended behavior threshold, said lambdaPPFor characterizing said first cost, said λBPFor characterizing said second cost, said λNPFor characterizing said third cost, said λPNFor characterizing said fourth cost, said λBNFor characterizing said fifth cost, said λNNFor characterizing the sixth cost, rhThe highest score obtainable for characterizing the item, rlThe lowest score, λ, available for characterizing the itemPPBPNP,λPNBNNN
Figure BDA0002284016770000043
In a sixth possible implementation manner, with reference to the first aspect and any one of the first possible implementation manner, the second possible implementation manner, the third possible implementation manner, the fourth possible implementation manner, and the fifth possible implementation manner of the first aspect, the obtaining a prediction score by using the local score information as an input of the collaborative filtering algorithm includes:
s1: for each item except the target item in the local scoring information, determining the users with scores of the item and the target item in the local scoring information, and determining the set of the determined users as a user set corresponding to the item;
s2: for each determined user set, calculating a score deviation corresponding to the user set through the following third formula;
the third formula includes:
Figure BDA0002284016770000051
wherein, the Si,jFor characterizing the item t corresponding to the local scoring informationiThe set of users of, the | Si,jL is used to characterize the Si,jThe number of said users, said devi,jFor characterizing the corresponding Si,jThe score deviation of (a), the ukFor characterizing said Si,jThe k-th user in (1), the rk,jFor characterizing users ukFor target item tjThe score of rk,iFor characterizing the user ukFor the item tiScoring of (4);
s3: calculating the prediction score according to each of the calculated score deviations by a fourth formula as follows;
the fourth formula includes:
Figure BDA0002284016770000052
wherein, the Pu,jFor characterizing a target user uuFor the target item tjThe predictive score of, the Nu,jFor characterizing a set of items, for each of at least one of the items included in the set of items, there are scores for the item by at least two of the users including the target user in the local scoring information, and | Nu,jL is used for representing the number of the items in the item set, and tiFor characterizing the Nu,jThe ith said item, said ru,iFor characterizing the target user uuFor the item tiThe score of (1).
In a second aspect, an embodiment of the present invention further provides a score prediction apparatus, including:
an information obtaining module, configured to obtain global scoring information, where the global scoring information includes scores of at least two items by at least two users, where the at least two users include a target user as a score prediction user object, the at least two items include a target item as a score prediction item object, the global scoring information includes a score of at least one item other than the target item by the target user, and the global scoring information further includes a score of at least one item by each user other than the target user;
an information clustering module, configured to cluster the global scoring information acquired by the information acquisition module to acquire local scoring information, where the local scoring information includes a score of the target user on at least one reference item other than the target item, and the local scoring information further includes a score of at least one user other than the target user on at least two items including the target item and the at least one reference item, respectively, and in the local scoring information, different users have similar preferences and/or different items have similar popularity;
and the score calculation module is used for taking the local score information obtained by the information clustering module as the input of a collaborative filtering algorithm to obtain a prediction score, wherein the prediction score is the predicted score of the target user on the target item.
In a first possible implementation manner, with reference to the second aspect, the information clustering module is configured to cluster the global scoring information to obtain a user cluster, and determine the user cluster as the local scoring information, where the user cluster includes scores of at least two first items by the target user and at least one first user having a similar preference to the target user, respectively, and the at least two first items include the target item and at least one reference item.
In a second possible implementation manner, with reference to the second aspect, the information clustering module is configured to cluster the global scoring information to obtain a project cluster, and determine the project cluster as the local scoring information, where the project cluster includes at least two second users scoring the target project and at least one second project having a similar popularity with the target project, and the at least two second users include the target user.
In a third possible implementation manner, with reference to the second aspect, the information clustering module is configured to cluster the global scoring information to obtain a user-item cluster, and determine the user-item cluster as the local scoring information, where the user-item cluster includes scores of at least two third users with similar preferences for at least two third items with similar popularity, respectively, where the at least two third users include the target user, and the at least two third items include the target item and at least one reference item.
In a fourth possible implementation manner, with reference to the second aspect, the score prediction apparatus further includes:
a threshold obtaining module, configured to determine a first recommended behavior threshold and a second recommended behavior threshold according to a misclassification cost and a promotion cost, where the first recommended behavior threshold is smaller than the second recommended behavior threshold, the misclassification cost is used to represent a cost resulting from recommending the item that the user does not like to the user or recommending the item that the user does not like to the user, and the promotion cost is used to represent a cost resulting from promoting the item to the user with ambiguous preferences;
an action execution module, configured to compare the predicted score obtained by the score calculation module with the first recommended behavior threshold and the second recommended behavior threshold obtained by the threshold acquisition module, if the predicted score is smaller than the first recommended behavior threshold, not recommend the target item to the target user, if the predicted score is greater than or equal to the first recommended behavior threshold and the predicted score is smaller than the second recommended behavior threshold, promote the target item to the target user, and if the predicted score is greater than or equal to the second recommended behavior threshold, recommend the target item to the target user.
In a fifth possible implementation manner, with reference to the fourth possible implementation manner, the threshold obtaining module includes:
a cost obtaining unit, configured to obtain first to sixth costs, where the first cost is a cost for recommending an item to a user when the user likes the item, the second cost is a cost for promoting the item to the user when the user likes the item, the third cost is a cost for not recommending the item to the user when the user likes the item, the fourth cost is a cost for recommending the item to the user when the user dislikes the item, the fifth cost is a cost for promoting the item to the user when the user dislikes the item, and the sixth cost is a cost for not recommending the item to the user when the user dislikes the item;
a threshold calculation unit, configured to calculate, according to the first to sixth costs acquired by the cost acquisition unit, the first recommended behavior threshold by using a first formula as follows, and calculate the second recommended behavior threshold by using a second formula as follows;
the first formula includes:
Figure BDA0002284016770000071
the second formula includes:
Figure BDA0002284016770000081
wherein, the β*For characterizing the first recommended behavior threshold, said α*For characterizing the second recommended behavior threshold, said lambdaPPFor characterizing said first cost, said λBPFor characterizing said second cost, said λNPFor characterizing said third cost, said λPNFor characterizing said fourth cost, said λBNFor characterizing said fifth cost, said λNNFor characterizing the sixth cost, rhThe highest score obtainable for characterizing the item, rlThe lowest score, λ, available for characterizing the itemPPBPNP,λPNBNNN
Figure BDA0002284016770000082
In a sixth possible implementation manner, with reference to the second aspect and any one of the first possible implementation manner, the second possible implementation manner, the third possible implementation manner, the fourth possible implementation manner, and the fifth possible implementation manner of the second aspect, the score calculating module includes:
a user set determining unit, configured to determine, for each item in the local scoring information except for the target item, the user whose score is present for both the item and the target item in the local scoring information, and determine a set of the determined users as a user set corresponding to the item;
a deviation calculating unit for calculating, for each of the user sets determined by the user set determining unit, a score deviation corresponding to the user set by a third formula;
the third formula includes:
Figure BDA0002284016770000083
wherein, the Si,jFor characterizing the item t corresponding to the local scoring informationiThe set of users of, the | Si,jL is used to characterize the Si,jThe number of said users, said devi,jFor characterizing the corresponding Si,jThe score deviation of (a), the ukFor characterizing said Si,jThe k-th user in (1), the rk,jFor characterizing users ukFor target item tjThe score of rk,iFor characterizing the user ukFor the item tiScoring of (4);
a score predicting unit for calculating said predicted score by the following fourth formula based on each of said score deviations calculated by said deviation calculating unit;
the fourth formula includes:
Figure BDA0002284016770000091
wherein, the Pu,jFor characterizing a target user uuFor the target item tjThe predictive score of, the Nu,jFor characterizing a set of items, for each of at least one of the items included in the set of items, there are scores for the item by at least two of the users including the target user in the local scoring information, and | Nu,jL is used for representing the number of the items in the item set, and tiFor watchesCharacterise said Nu,jThe ith said item, said ru,iFor characterizing the target user uuFor the item tiThe score of (1).
According to the technical scheme, when the target user scores the target items in need of prediction, global scoring information is obtained firstly, the global scoring information comprises scores of a plurality of users on the plurality of items, the plurality of items comprise the target items serving as scoring prediction item objects, the plurality of users comprise the target users serving as scoring prediction user objects, but the global scoring information does not comprise the scores of the target users on the target items, then the global scoring information is clustered, local scoring information comprising the target users and the target items can be obtained, the users comprising the local scoring information have similar preferences and/or the items comprising the local scoring information have similar popularity, and then the local scoring information is used as the input of a collaborative filtering algorithm to predict the scores of the target users on the target items. Therefore, by clustering the global scoring information, the local scoring information comprising the users with similar preferences, the items with similar popularity or the users with similar preferences and the items with similar popularity can be obtained, so that the dimensionality included by the scoring information can be reduced, when the local scoring information is input into a collaborative filtering algorithm to obtain the predicted score, the problem of overfitting can be avoided, and the accuracy of predicting the user score is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a scoring prediction method according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating global scoring information and local scoring information, according to an embodiment of the present invention;
FIG. 3 is a flow chart of a method for recommending items according to an embodiment of the present invention;
FIG. 4 is a flowchart of a recommended behavior threshold determination method according to an embodiment of the present invention;
FIG. 5 is a flow chart of a method for calculating a prediction score according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a score prediction apparatus according to an embodiment of the present invention;
fig. 7 is a schematic diagram of another score prediction apparatus provided in an embodiment of the present invention;
fig. 8 is a schematic diagram of another score prediction apparatus according to an embodiment of the present invention;
fig. 9 is a schematic diagram of still another score prediction apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.
As shown in fig. 1, an embodiment of the present invention provides a score prediction method, which may include the following steps:
step 101: acquiring global scoring information, wherein the global scoring information comprises scores of at least two users for at least two projects, the at least two users comprise target users serving as score prediction user objects, the at least two projects comprise target projects serving as score prediction project objects, the global scoring information comprises scores of the target users for at least one project except the target projects, and the global scoring information further comprises scores of each user except the target users for at least one project;
step 102: clustering global scoring information to obtain local scoring information, wherein the local scoring information comprises the scoring of a target user on at least one reference item except the target item, the local scoring information also comprises the scoring of at least one user except the target user on at least two items including the target item and the at least one reference item, and different users have similar preference and/or different items have similar popularity in the local scoring information;
step 103: and taking the local score information as the input of the collaborative filtering algorithm to obtain a prediction score, wherein the prediction score is the score of the predicted target user on the target item.
In the embodiment of the invention, when the target user scores the target items in need of prediction, global scoring information is firstly acquired, the global scoring information comprises scores of a plurality of users on the plurality of items, the plurality of items comprise the target items serving as score prediction item objects, the plurality of users comprise the target users serving as score prediction user objects, but the global scoring information does not comprise the scores of the target users on the target items, then the global scoring information is clustered, local scoring information comprising the target users and the target items can be acquired, the users comprising the local scoring information have similar preference and/or the included items have similar popularity, and then the local scoring information is used as the input of a collaborative filtering algorithm to predict the scores of the target users on the target items. Therefore, by clustering the global scoring information, the local scoring information comprising the users with similar preferences, the items with similar popularity or the users with similar preferences and the items with similar popularity can be obtained, so that the dimensionality included by the scoring information can be reduced, when the local scoring information is input into a collaborative filtering algorithm to obtain the predicted score, the problem of overfitting can be avoided, and the accuracy of predicting the user score is improved.
In the embodiment of the present invention, the obtained global scoring information generally includes a large number of users and items, that is, each user in the global scoring information scores a large number of items, and the local scoring information obtained by clustering the global scoring information includes fewer users and items relative to the global scoring information.
In the embodiment of the present invention, in order to predict the rating of the target item by the target user according to the local rating information, it is necessary to ensure that the local rating information includes at least two users and two items, and it is necessary to ensure that the local rating information includes the target user and the target item, for each user in the local rating information except the target user, the rating of the user on the target item needs to be included in the local rating information, and the rating of the user on the same item(s) as the target user also needs to be included in the local rating information. Combining the above requirements for the content included in the local rating information, the local rating information needs to include at least three ratings, and at least one of the three ratings is given by the target user.
Alternatively, on the basis of the score prediction method shown in fig. 1, when the global score information is clustered by step 102 to obtain local score information, the local score information including a plurality of users with similar preferences may be obtained by clustering, or the local score information including a plurality of items with similar popularity may be obtained by clustering, or the local score information including both a plurality of users with similar preferences and a plurality of items with similar popularity may be obtained by clustering. The three types of local evaluation information are described below.
For local scoring information comprising a plurality of users with similar preferences:
and clustering the global scoring information to obtain a user cluster, and determining the obtained user cluster as local evaluation information, wherein the user cluster comprises a target user and at least one first user with similar preference to the target user for scoring at least two first items respectively, and the at least two first items comprise a target item and at least one reference item.
As shown in FIG. 2, the data set shown in the diagram (a) is global scoring information, including users u1To user u6Respectively for item t1To item t6By clustering global scoring information, user u is determined1And user u3With similar preferences, user u2And user u5With similar preferences and users u4And user u6Have similar preferences and will include user u1And user u3Respectively for item t1To item t6Will include user u as the first user cluster2And user u5Respectively for item t1To item t6Will include user u as a second cluster of users4And user u6Respectively for item t1To item t6The scored set of (c) is used as a third user cluster, i.e., the set of data in each dashed box in fig. (b) is determined as a user cluster.
After the global scoring information is clustered into a plurality of user clusters, one user cluster including both the target user and the target item is determined as local scoring information. For example, as shown in (b) of FIG. 2, if the target user is user u1And the target item is item t1Will include user u at the same time1And item t1Is determined as local scoring information.
For local scoring information comprising a plurality of items having similar popularity:
and clustering the global scoring information to obtain a project cluster, and determining the obtained project cluster as local scoring information, wherein the project cluster comprises scores of a target project and at least one second project which is similar to the target project and comprises at least two second users, and the at least two second users comprise the target user.
As shown in fig. 2, the item t is determined by clustering the global score information shown in fig. (a)1And item t4With similar popularity, item t3And item t5With similar popularity and item t2And item t6With similar popularity, and would include user u1To user u6Respectively for item t1And item t4Will include user u as the first cluster of items1To user u6Respectively for item t3And item t5Will include user u as a second cluster of items1To user u6Respectively for item t2And item t6As a third item cluster, i.e., the set of data in each dashed box in fig. (c) is determined as an item cluster.
After the global scoring information is clustered into a plurality of item clusters, one item cluster including both the target user and the target item is determined as local scoring information. For example, as shown in (c) of FIG. 2, if the target user is user u1And the target item is item t1Then user u will be included at the same time1And item t1Is determined as local scoring information.
Local scoring information for multiple items simultaneously including multiple users with similar preferences and similar popularity:
and clustering the global scoring information to obtain a user-item cluster, and determining the obtained user-item cluster as local scoring information, wherein the user-item cluster comprises scores of at least two third users with similar preferences on at least two third items with similar popularity, respectively, the at least two third users comprise target users, and the at least two third items comprise target items and at least one reference item.
As shown in FIG. 2, the global score information shown in FIG. (a) is clustered to determineUser u1And user u3With similar preferences, user u2And user u5With similar preferences and users u4And user u6Have similar preferences and determine item t1And item t4With similar popularity, item t3And item t5With similar popularity and item t2And item t6With similar popularity, and would include user u1And user u3Respectively for item t1And item t4Will include user u as the first user-item cluster1And user u3Respectively for item t2And item t6Will include user u as a second user-item cluster1And user u3Respectively for item t3And item t5Will include user u as a third user-item cluster2And user u5Respectively for item t1And item t4Will include user u as a fourth user-item cluster2And user u5Respectively for item t2And item t6As a fifth user-item cluster, will include user u2And user u5Respectively for item t3And item t5Will include user u as the sixth user-item cluster4And user u6Respectively for item t1And item t4Will include user u as a seventh user-item cluster4And user u6Respectively for item t2And item t6Will include user u as the eighth user-item cluster4And user u6Respectively for item t3And item t5As the ninth user-item cluster, i.e., the set of data in each dashed box in fig. (d) is determined as a user-item cluster.
After the global scoring information is clustered into a plurality of user-item clusters, one item which simultaneously comprises a target user and a target item is obtainedThe clusters are determined as local scoring information. For example, as shown in (d) of FIG. 2, if the target user is user u1And the target item is item t1Then user u will be included at the same time1And item t1Is determined as local scoring information.
In the embodiment of the present invention, by clustering global scoring information, the obtained local scoring information satisfies any one of the following three conditions: the included users have similar preferences, the included items have similar popularity, the included users have similar preferences and the included items have similar popularity. Because at least one of the users and the items included in the local scoring information has similarity, the dimensionality of the local scoring information can be reduced, and the accuracy of predicting the user scoring is further ensured.
In the embodiment of the present invention, when clustering global scoring information, it is necessary to determine whether any two users have similar preferences, and it is necessary to determine whether any two items have similar popularity, specifically, the similarity between two users or two items may be calculated by a cosine similarity calculation formula or a pearson correlation coefficient formula, and when the calculation result is greater than a set threshold, it may be determined that corresponding two users have similar preferences or corresponding two items have similar popularity.
Alternatively, on the basis of the score prediction method shown in fig. 1, after the prediction score is obtained, the behavior of recommending the target item to the target user may be determined according to the size of the obtained prediction score. As shown in fig. 3, the method of performing recommended actions based on the prediction scores includes the steps of:
step 301: and determining a first recommended behavior threshold value and a second recommended behavior threshold value according to the misclassification cost and the promotion cost.
The first recommendation behavior threshold is smaller than the second recommendation behavior threshold, the misclassification cost is used for representing the cost generated by recommending the disliked items to the user or not recommending the favorite items to the user, and the promotion cost is used for representing the cost generated by promoting the items to the user with indefinite preference.
Step 302: and judging whether the prediction score is smaller than a first recommended behavior threshold value, if so, executing step 303, and otherwise, executing step 304.
Step 303: and not recommending the target item to the target user, and ending the current flow.
Step 304: and judging whether the prediction score is greater than or equal to a second recommended behavior threshold value, if so, executing the step 305, otherwise, executing the step 306.
Step 305: and recommending the target item to the target user, and ending the current flow.
Step 306: and promoting the target item to the target user.
In the embodiment of the invention, the misclassification cost is used for representing the cost generated by recommending a disliked item to a user or not recommending a favorite item to the user, the promotion cost is used for representing the cost generated by promoting the item to a user with indefinite preference, after the misclassification cost and the promotion cost are obtained based on experience, a first recommendation action threshold value and a second recommendation action threshold value can be determined according to the misclassification cost and the promotion cost, and then actions such as recommendation/promotion/non-recommendation are taken according to the size relationship between the predicted score and the first recommendation action threshold value and the second recommendation action threshold value, so that the project is recommended to the user based on the predicted score, and the success rate of recommending the project is improved.
In the embodiment of the invention, when the prediction score is greater than or equal to the first recommended behavior threshold and less than the second recommended behavior threshold, the target item can be promoted to the target user, and the main means for promoting the target item comprises a method coupon and the like so as to determine whether the target user with indefinite preference likes the target item. One of three recommendation behaviors of recommendation, non-recommendation and promotion is adopted according to different obtained prediction scores, favorite items can be recommended to the user more accurately, the items can be promoted to the user under the condition that the preference of the user is ambiguous, the experience of the user can be improved, and the probability of successful recommendation can be improved.
In the embodiment of the invention, the items mainly comprise movies, television dramas, books, commodities and the like, and after the prediction score of a user for one item is obtained, the item can be recommended to the user on the sale website of the item according to the size of the prediction score, or a coupon for purchasing the item can be issued to the user.
Alternatively, on the basis of the method for performing recommendation behavior according to the prediction scores shown in fig. 3, the first recommendation behavior threshold and the second recommendation behavior threshold may be calculated according to the misclassification cost, the promotion cost, and the highest score and the lowest score that can be obtained by the item. As shown in fig. 4, the method for calculating the first recommended behavior threshold and the second recommended behavior threshold includes the following steps:
step 401: first to sixth costs are obtained.
The first cost is the cost of recommending the item to the user when the user likes an item, the second cost is the cost of promoting the item to the user when the user likes an item, the third cost is the cost of not recommending the item to the user when the user likes an item, the fourth cost is the cost of recommending the item to the user when the user dislikes an item, the fifth cost is the cost of promoting the item to the user when the user dislikes an item, and the sixth cost is the cost of not recommending the item to the user when the first user dislikes an item.
In the embodiment of the present invention, the misclassification cost includes a second cost and a fifth cost, and the promotion cost includes a third cost and a fourth cost, where the first cost and the sixth cost are equal to zero in a normal case.
Step 402: the first recommended behavior threshold is calculated by a first formula as follows, and the second recommended behavior threshold is calculated by a second formula as follows, according to the first to sixth costs.
The first formula includes:
Figure BDA0002284016770000161
the second formula includes:
Figure BDA0002284016770000162
wherein, β*For characterizing a first recommended behavior threshold, α*For characterizing a second recommended-behavior threshold, λPPFor characterizing a first valence, λBPFor characterizing the second valence, λNPFor characterizing third valences, λPNFor characterizing a fourth cost, λBNFor characterizing a fifth cost, λNNFor characterizing the sixth cost, rhFor characterizing the highest score obtainable for an item, rlFor characterizing the lowest score, λ, obtainable for an itemPPBPNP,λPNBNNN
Figure BDA0002284016770000163
In the embodiment of the present invention, six costs of three recommended actions combined with two preference types may be determined, as shown in table 1 below, where the first cost λPPA cost for recommending an item to a user when the user likes the item, a second cost lambdaBPA cost for promoting an item to a user when the user likes the item, and a third cost lambdaNPA cost for a user not recommending an item to the user when the user likes the item, a fourth cost lambdaPNA cost for recommending an item to a user when the user dislikes the item, a fifth cost lambdaBNFor a cost of promoting an item to a user when the user dislikes the item, a sixth cost lambdaNNThe cost of not recommending an item to a first user when the user dislikes the item.
TABLE 1
Recommending behavior/preference Like Dislike of
Recommending λPP λPN
Popularization of λBP λBN
Is not recommended λNP λNN
In the embodiment of the present invention, after the six costs shown in table 1 are determined, the first recommended behavior threshold and the second recommended behavior threshold may be calculated according to the six costs, so that it is ensured that the like degree of a user to an item can be predicted more accurately according to the calculated first recommended behavior threshold and the calculated second recommended behavior threshold, and further, the accuracy of the adopted recommended behavior is ensured.
In the embodiment of the present invention, in the first formula and the second formula, rhAnd rlHighest and lowest scores characterizing the target user item, respectively, but rhAnd rlIt is not necessary that the highest score and the lowest score in the local score information are included, because the highest score and the lowest score may not be included in the local score information. r ishAnd rlIs determined according to the rule for scoring the items, and the highest score which can be given by the user and is determined by the scoring rule is rhThe lowest score that can be given by the user and is determined by the scoring rule is rl. For example, in the evaluation of moviesThe scoring rules specify that the user's score for a movie needs to be between 0-5, where r ishIs 5, rlIs 0. For another example, when scoring a book, the scoring rules specify that the user's score for a book needs to be between 0 and 100, where r ishIs 100, rlIs 0.
Alternatively, on the basis of the score prediction method provided in each of the above embodiments, when the step 103 determines the prediction score according to the local score information, the score deviation may be calculated according to each score included in the local score information, and the prediction score may be calculated according to the calculated score deviation. As shown in fig. 5, the method of calculating the prediction score includes the steps of:
step 501: determining users with scores of the project and the target project in the local scoring information aiming at each project except the target project in the local scoring information, and determining a set of the determined users as a user set corresponding to the project;
step 502: calculating, for each determined user set, a score deviation corresponding to the user set by a third formula;
the third formula includes:
Figure BDA0002284016770000181
wherein S isi,jFor characterizing items t corresponding to local scoring informationiIs set, | Si,jI used to characterize Si,jNumber of users, devi,jFor the characterization corresponding to Si,jScore deviation of (u)kFor characterizing Si,jThe k-th user in (1), rk,jFor characterizing users ukFor target item tjScore of rk,iFor characterizing users ukFor item tiScoring of (4);
step 503: calculating a prediction score by the following fourth formula according to each calculated score deviation;
the fourth formula includes:
Figure BDA0002284016770000182
wherein, Pu,jFor characterizing a target user uuFor target item tjPredictive score of, Nu,jFor characterizing a project set, for each project in at least one project included in the project set, the score of the project by at least two users including a target user exists in the local scoring information, | Nu,jL is used for representing the number of items in the item set, tiFor characterizing Nu,jThe ith item in (1), ru,iFor characterizing a target user uuFor item tiThe score of (1).
In the embodiment of the invention, after the local scoring information is obtained, firstly, for each item except for the target item in the local scoring information, a user with scoring on the item and the target item in the local scoring information is determined, then, a set of the determined users is used as a user set corresponding to the item, then, for each determined user set, a scoring deviation corresponding to the user set is calculated, the scoring deviation is used for representing a difference value between the scoring on the item corresponding to the user set and the scoring on the target eraser by each user in the corresponding user set, and then, the prediction scoring on the target item by the target user is calculated according to the scoring deviation corresponding to each user set.
Because different users with similar preferences usually give similar scores to the same item, and the same user usually gives similar scores to items with similar popularity, the score deviation is determined according to the scores of other users with similar preferences to the target user to the target item and other items, or the score deviation is determined according to the scores of other users to the target item and other items with similar popularity to the target item, at the moment, the score deviation reflects the score rules of other users with similar preferences to the target user, or the score deviation reflects the score rules of the user to other items with similar popularity to the target item, the score of the target user to the target item can be predicted based on the score deviation, and the accuracy of the obtained predicted score can be ensured.
Since the local scoring information may be a user cluster, a project cluster or a user-project cluster, the user cluster, the project cluster and the user-project cluster shown in fig. 2 are combined below, and the target user is the user u1And the target item is item t1For example, a method of calculating the prediction score will be described.
For the case where the local scoring information is the first user cluster in the graph (b) in fig. 2:
since there is only item t2Item t4And item t5All exist user u1And user u3So that only item t needs to be determined2Item t4And item t5A corresponding set of users. Since the local scoring information only includes the user u at this time1And user u3And user u1Is a target user, and thus corresponds to an item t2User set S of2,1Corresponding item t4User set S of4,1And corresponding item t5User set S of5,1All only include user u3This one element.
Computing a set of users S2,1Corresponding score deviation dev2,1
Figure BDA0002284016770000191
Computing a set of users S4,1Corresponding score deviation dev4,1
Figure BDA0002284016770000192
Computing a set of users S5,1Corresponding score deviation dev5,1
Figure BDA0002284016770000193
Calculating target users according to all the grading deviationsu1For target item t1Prediction score of (2):
Figure BDA0002284016770000201
for the case where the local score information is the first item cluster of the graph (c) in fig. 2:
since the local score information includes only the item t1And item t4And item t1Are target items, therefore, all need to obtain corresponding items t4User set S of4,1. Due to user u2User u3And user u4All exist to item t1And item t4So that the user set S4,1Including user u2User u3And user u4These 3 elements.
Computing a set of users S4,1Corresponding score deviation dev4,1
Figure BDA0002284016770000202
Calculating a target user u according to the grading deviation1For target item t1Prediction score of (2):
Figure BDA0002284016770000203
for the case where the local scoring information is the first user-item cluster in the graph of fig. 2 (d):
since the local score information includes only the item t1And item t4And item t1Are target items, therefore, all need to obtain corresponding items t4User set S of4,1. Since the local scoring information only includes the user u at this time1And user u3And user u1Is a target user, and thus corresponds to an item t4User set S of4,1Including only user u3This one element.
Computing a set of users S4,1Corresponding score deviation dev4,1
Figure BDA0002284016770000204
Calculating a target user u according to the grading deviation1For target item t1Prediction score of (2):
Figure BDA0002284016770000205
it should be noted that, in addition to the method for calculating the prediction score provided in the above embodiment, the prediction score may also be calculated by a collaborative filtering algorithm such as a nearest neighbor algorithm (kNN).
It should be noted that the user-item cluster includes local user and local item information. The local user information and the local project information are respectively obtained by clustering from the global scoring information, and can be regarded as integration of the local user information and the local project information, so that the user preference and the project popularity are expressed in a finer granularity mode. The local information is embedded into a three-branch recommendation algorithm as input, and a traditional collaborative filtering two-branch recommendation mode with global information as input is replaced. According to the embodiment of the invention, the user cluster/project cluster is obtained by using the clustering algorithm, and the over-fitting problem is effectively avoided by using the global information as input in comparison with the traditional method for embedding the local information. Three-branch decision is a decision model that is more consistent with general knowledge. In addition to the recommended behaviors of the recommended and unreported traditional collaborative filtering algorithms, the three-branch recommendation method also considers the situation that the user does not have sufficient grasp, and expands the potential two-classification problem into a three-classification problem.
As shown in fig. 6, an embodiment of the present invention provides a score prediction apparatus, including:
an information obtaining module 601, configured to obtain global scoring information, where the global scoring information includes scores of at least two items by at least two users, where the at least two users include target users serving as scoring prediction user objects, the at least two items include target items serving as scoring prediction item objects, the global scoring information includes scores of at least one item other than the target items by the target users, and the global scoring information further includes a score of at least one item by each user other than the target users;
an information clustering module 602, configured to cluster the global scoring information acquired by the information acquisition module 601, and acquire local scoring information, where the local scoring information includes a score of the target user on at least one reference item other than the target item, and the local scoring information further includes a score of at least one user other than the target user on at least two items including the target item and the at least one reference item, respectively, where different users have similar preferences and/or different items have similar popularity in the local scoring information;
a score calculating module 603, configured to use the local score information obtained by the information clustering module 602 as an input of a collaborative filtering algorithm to obtain a predicted score, where the predicted score is a score of the predicted target user on the target item.
In the embodiment of the present invention, the information obtaining module 601 may be configured to perform step 101 in the above-described method embodiment, the information clustering module 602 may be configured to perform step 102 in the above-described method embodiment, and the score calculating module 603 may be configured to perform step 103 in the above-described method embodiment.
Alternatively, on the basis of the score prediction apparatus shown in fig. 6,
the information clustering module 602 is configured to cluster the global scoring information to obtain a user cluster, and determine the user cluster as local scoring information, where the user cluster includes a target user and at least one first user having a preference similar to that of the target user and scores of at least two first items, respectively, where the at least two first items include a target item and at least one reference item.
Alternatively, on the basis of the score prediction apparatus shown in fig. 6,
the information clustering module 602 is configured to cluster the global scoring information to obtain a project cluster, and determine the project cluster as local scoring information, where the project cluster includes at least two second users scoring the target project and at least one second project with a similar popularity as the target project, and the at least two second users include the target user.
Alternatively, on the basis of the score prediction apparatus shown in fig. 6,
the information clustering module 602 is configured to cluster the global scoring information to obtain a user-item cluster, and determine the user-item cluster as local scoring information, where the user-item cluster includes scores of at least two third users with similar preferences for at least two third items with similar popularity, respectively, the at least two third users include a target user, and the at least two third items include a target item and at least one reference item.
Optionally, on the basis of the score prediction method shown in fig. 6, as shown in fig. 7, the score prediction apparatus further includes:
a threshold obtaining module 604, configured to determine a first recommended behavior threshold and a second recommended behavior threshold according to a misclassification cost and a promotion cost, where the first recommended behavior threshold is smaller than the second recommended behavior threshold, the misclassification cost is used to represent a cost resulting from recommending the item that the user does not like to the user or recommending the item that the user does not like to the user, and the promotion cost is used to represent a cost resulting from promoting the item to the user with ambiguous preference;
an action executing module 605, configured to compare the predicted score obtained by the score calculating module 603 with the first recommended behavior threshold and the second recommended behavior threshold obtained by the threshold obtaining module 604, if the predicted score is smaller than the first recommended behavior threshold, not recommend the target item to the target user, if the predicted score is greater than or equal to the first recommended behavior threshold and the predicted score is smaller than the second recommended behavior threshold, promote the target item to the target user, and if the predicted score is greater than or equal to the second recommended behavior threshold, recommend the target item to the target user.
In an embodiment of the present invention, the threshold obtaining module 604 may be configured to perform step 301 in the above-described method embodiment, and the action performing module 605 may be configured to perform steps 302 to 305 in the above-described method embodiment.
Optionally, on the basis of the score predicting apparatus shown in fig. 7, as shown in fig. 8, the threshold obtaining module 604 includes:
a cost obtaining unit 6041 configured to obtain first to sixth costs, where the first cost is a cost for recommending an item to a user when the user likes the item, the second cost is a cost for promoting the item to the user when the user likes the item, the third cost is a cost for not recommending the item to the user when the user likes the item, the fourth cost is a cost for recommending the item to the user when the user dislikes the item, the fifth cost is a cost for promoting the item to the user when the user dislikes the item, and the sixth cost is a cost for not recommending the item to the user when the user dislikes the item;
a threshold calculation unit 6042, configured to calculate, according to the first to sixth costs determined by the cost acquisition unit 6041, a first recommended behavior threshold by using a first formula as follows, and a second recommended behavior threshold by using a second formula as follows;
the first formula includes:
Figure BDA0002284016770000231
the second formula includes:
Figure BDA0002284016770000232
wherein, β*For characterizing a first recommended behavior threshold, α*For characterizing a second recommended-behavior threshold, λPPFor characterizing the firstCost, λBPFor characterizing the second valence, λNPFor characterizing third valences, λPNFor characterizing a fourth cost, λBNFor characterizing a fifth cost, λNNFor characterizing the sixth cost, rhHighest score, r, obtainable for characterizing a projectlLowest score, λ, available for characterizing a projectPPBPNP,λPNBNNN
Figure BDA0002284016770000233
In the embodiment of the present invention, cost determination unit 6041 may be configured to perform step 401 in the above-described method embodiment, and threshold calculation unit 6042 may be configured to perform step 402 in the above-described method embodiment.
Alternatively, on the basis of the score predicting device shown in fig. 6, as shown in fig. 9, the score calculating module 603 includes:
a user set determining unit 6031, configured to determine, for each item except for the target item in the local rating information, a user whose rating is present for both the item and the target item in the local rating information, and determine a set of the determined users as a user set corresponding to the item;
a deviation calculation unit 6032 for calculating, for each user set determined by the user set determination unit 6031, a score deviation corresponding to the user set by the following third formula;
the third formula includes:
Figure BDA0002284016770000241
wherein S isi,jFor characterizing items t corresponding to local scoring informationiIs set, | Si,jI used to characterize Si,jNumber of users, devi,jFor the characterization corresponding to Si,jScore deviation of (u)kFor characterizing Si,jThe k-th user in (1), rk,jFor characterisation purposesHuu (household)kFor target item tjScore of rk,iFor characterizing users ukFor item tiScoring of (4);
a score prediction unit 6033 for calculating a prediction score by the following fourth formula based on each score deviation calculated by the deviation calculation unit 6032;
the fourth formula includes:
Figure BDA0002284016770000242
wherein, Pu,jFor characterizing a target user uuFor target item tjThe predictive score of, said Nu,jFor characterizing a set of items, for each of at least one of the items included in the set of items, there are scores for the item by at least two of the users including the target user in the local scoring information, and | Nu,jL is used for representing the number of the items in the item set, tiFor characterizing Nu,jThe ith item in (1), ru,iFor characterizing a target user uuFor item tiThe score of (1).
In the embodiment of the present invention, the user set determination unit 6031 may be configured to perform step 501 in the above-described method embodiment, the deviation calculation unit 6032 may be configured to perform step 502 in the above-described method embodiment, and the score prediction unit 6033 may be configured to perform step 503 in the above-described method embodiment.
It is to be understood that the illustrated structure of the embodiment of the present invention does not specifically limit the score prediction device. In other embodiments of the invention, the score prediction means may comprise more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Because the information interaction, execution process, and other contents between the units in the device are based on the same concept as the method embodiment of the present invention, specific contents may refer to the description in the method embodiment of the present invention, and are not described herein again.
The embodiment of the invention also provides another scoring prediction device, which comprises: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor is configured to invoke the machine readable program to perform the scoring prediction method provided in any of the above embodiments.
Embodiments of the present invention further provide a computer-readable medium, on which computer instructions are stored, and when executed by a processor, the computer instructions cause the processor to execute the score prediction method in any embodiment of the present invention. Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.
In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.
Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.
Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.
Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.
It should be noted that not all steps and modules in the above flows and system structure diagrams are necessary, and some steps or modules may be omitted according to actual needs. The execution order of the steps is not fixed and can be adjusted as required. The system structure described in the above embodiments may be a physical structure or a logical structure, that is, some modules may be implemented by the same physical entity, or some modules may be implemented by a plurality of physical entities, or some components in a plurality of independent devices may be implemented together.
In the above embodiments, the hardware unit may be implemented mechanically or electrically. For example, a hardware element may comprise permanently dedicated circuitry or logic (such as a dedicated processor, FPGA or ASIC) to perform the corresponding operations. The hardware elements may also comprise programmable logic or circuitry, such as a general purpose processor or other programmable processor, that may be temporarily configured by software to perform the corresponding operations. The specific implementation (mechanical, or dedicated permanent, or temporarily set) may be determined based on cost and time considerations.
While the invention has been shown and described in detail in the drawings and in the preferred embodiments, it is not intended to limit the invention to the embodiments disclosed, and it will be apparent to those skilled in the art that various combinations of the code auditing means in the various embodiments described above may be used to obtain further embodiments of the invention, which are also within the scope of the invention.

Claims (10)

1. A score prediction method, comprising:
acquiring global scoring information, wherein the global scoring information comprises the scoring of at least two projects by at least two users, the at least two users comprise target users serving as scoring prediction user objects, the at least two projects comprise target projects serving as scoring prediction project objects, the global scoring information comprises the scoring of at least one project by the target users except the target projects, and the global scoring information further comprises the scoring of at least one project by each user except the target users;
clustering the global scoring information to obtain local scoring information, wherein the local scoring information comprises the scoring of the target user on at least one reference item except the target item, and the local scoring information further comprises the scoring of at least two items including the target item and at least one reference item by at least one user except the target user respectively, and different users have similar preference and/or different items have similar popularity in the local scoring information;
and taking the local scoring information as the input of a collaborative filtering algorithm to obtain a prediction score, wherein the prediction score is the predicted score of the target user on the target item.
2. The method of claim 1, wherein the clustering the global scoring information to obtain local scoring information comprises:
clustering the global scoring information to obtain a user cluster, and determining the user cluster as the local scoring information, wherein the user cluster comprises the target user and at least one first user with similar preference to the target user for scoring at least two first items respectively, and the at least two first items comprise the target item and at least one reference item;
alternatively, the first and second electrodes may be,
clustering the global scoring information to obtain a project cluster, and determining the project cluster as the local scoring information, wherein the project cluster comprises scores of at least two second users for the target project and at least one second project with similar popularity degree with the target project, and the at least two second users comprise the target user;
alternatively, the first and second electrodes may be,
clustering the global scoring information to obtain a user-item cluster, and determining the user-item cluster as the local scoring information, wherein the user-item cluster comprises scores of at least two third users with similar preferences on at least two third items with similar popularity, respectively, the at least two third users comprise the target user, and the at least two third items comprise the target item and at least one reference item.
3. The method of claim 1, wherein after said obtaining a prediction score, further comprising:
determining a first recommended behavior threshold and a second recommended behavior threshold according to a misclassification cost and a promotion cost, wherein the first recommended behavior threshold is smaller than the second recommended behavior threshold, the misclassification cost is used for representing a cost generated by recommending the items which are disliked by the user to the user or recommending the items which are liked by the user to the user, and the promotion cost is used for representing a cost generated by promoting the items to the user with ambiguous preference;
comparing the predicted score with the first recommended behavior threshold and the second recommended behavior threshold, and performing:
if the prediction score is less than the first recommended behavior threshold, not recommending the target item to the target user;
if the prediction score is greater than or equal to the first recommended behavior threshold and the prediction score is less than the second recommended behavior threshold, promoting the target item to the target user;
recommending the target item to the target user if the prediction score is greater than or equal to the second recommended behavior threshold.
4. The method of claim 3, wherein determining the first recommended behavior threshold and the second recommended behavior threshold according to the misclassification cost and the promotion cost comprises:
obtaining first to sixth costs, wherein the first cost is a cost for recommending an item to a user when the user likes the item, the second cost is a cost for promoting the item to the user when the user likes the item, the third cost is a cost for not recommending the item to the user when the user likes the item, the fourth cost is a cost for recommending the item to the user when the user dislikes the item, the fifth cost is a cost for promoting the item to the user when the user dislikes the item, and the sixth cost is a cost for not recommending the item to the user when the user dislikes the item;
calculating the first recommended behavior threshold value by a first formula and calculating the second recommended behavior threshold value by a second formula according to the first to sixth costs;
the first formula includes:
Figure FDA0002284016760000031
the second formula includes:
Figure FDA0002284016760000032
wherein, the β*For characterizing the first recommended behavior threshold, said α*For characterizing the second recommended behavior threshold, said lambdaPPFor characterizing said first cost, said λBPFor characterizing said second cost, said λNPFor characterizing said third cost, said λPNFor characterizing said fourth cost, said λBNFor characterizing said fifth cost, said λNNFor characterizing the sixth cost, rhFor watchesCharacterizing the highest score obtainable for said item, said rlThe lowest score, λ, available for characterizing the itemPP<λBP<λNP,λPN<λBN<λNN
Figure FDA0002284016760000033
5. The method according to any one of claims 1 to 4, wherein the obtaining of the prediction score using the local scoring information as an input to a collaborative filtering algorithm comprises:
s1: for each item except the target item in the local scoring information, determining the users with scores of the item and the target item in the local scoring information, and determining the set of the determined users as a user set corresponding to the item;
s2: for each determined user set, calculating a score deviation corresponding to the user set through the following third formula;
the third formula includes:
Figure FDA0002284016760000034
wherein, the Si,jFor characterizing the item t corresponding to the local scoring informationiThe set of users of, the | Si,jL is used to characterize the Si,jThe number of said users, said devi,jFor characterizing the corresponding Si,jThe score deviation of (a), the ukFor characterizing said Si,jThe k-th user in (1), the rk,jFor characterizing users ukFor target item tjThe score of rk,iFor characterizing the user ukFor the item tiScoring of (4);
s3: calculating the prediction score according to each of the calculated score deviations by a fourth formula as follows;
the fourth formula includes:
Figure FDA0002284016760000041
wherein, the Pu,jFor characterizing a target user uuFor the target itemEyes of a usertjThe predictive score of, the Nu,jFor characterizing a set of items, for each of at least one of the items included in the set of items, there are scores for the item by at least two of the users including the target user in the local scoring information, and | Nu,jL is used for representing the number of the items in the item set, and tiFor characterizing the Nu,jThe ith said item, said ru,iFor characterizing the target user uuFor the item tiThe score of (1).
6. A score prediction device, comprising:
an information obtaining module, configured to obtain global scoring information, where the global scoring information includes scores of at least two items by at least two users, where the at least two users include a target user as a score prediction user object, the at least two items include a target item as a score prediction item object, the global scoring information includes a score of at least one item other than the target item by the target user, and the global scoring information further includes a score of at least one item by each user other than the target user;
an information clustering module, configured to cluster the global scoring information acquired by the information acquisition module to acquire local scoring information, where the local scoring information includes a score of the target user on at least one reference item other than the target item, and the local scoring information further includes a score of at least one user other than the target user on at least two items including the target item and the at least one reference item, respectively, and in the local scoring information, different users have similar preferences and/or different items have similar popularity;
and the score calculation module is used for taking the local score information obtained by the information clustering module as the input of a collaborative filtering algorithm to obtain a prediction score, wherein the prediction score is the predicted score of the target user on the target item.
7. The apparatus of claim 6,
the information clustering module is configured to cluster the global scoring information to obtain a user cluster, and determine the user cluster as the local scoring information, where the user cluster includes scores of at least two first items by the target user and at least one first user having a similar preference to the target user, respectively, and the at least two first items include the target item and at least one reference item;
alternatively, the first and second electrodes may be,
the information clustering module is configured to cluster the global scoring information to obtain a project cluster, and determine the project cluster as the local scoring information, where the project cluster includes scores of the target project and at least one second project having a similar popularity degree with the target project by at least two second users, and the at least two second users include the target user;
alternatively, the first and second electrodes may be,
the information clustering module is configured to cluster the global scoring information to obtain a user-item cluster, and determine the user-item cluster as the local scoring information, where the user-item cluster includes scores of at least two third users with similar preferences for at least two third items with similar popularity, respectively, where the at least two third users include the target user, and the at least two third items include the target item and at least one reference item.
8. The apparatus of claim 6, further comprising:
a threshold obtaining module, configured to determine a first recommended behavior threshold and a second recommended behavior threshold according to a misclassification cost and a promotion cost, where the first recommended behavior threshold is smaller than the second recommended behavior threshold, the misclassification cost is used to represent a cost resulting from recommending the item that the user does not like to the user or recommending the item that the user does not like to the user, and the promotion cost is used to represent a cost resulting from promoting the item to the user with ambiguous preferences;
an action execution module, configured to compare the predicted score obtained by the score calculation module with the first recommended behavior threshold and the second recommended behavior threshold obtained by the threshold acquisition module, if the predicted score is smaller than the first recommended behavior threshold, not recommend the target item to the target user, if the predicted score is greater than or equal to the first recommended behavior threshold and the predicted score is smaller than the second recommended behavior threshold, promote the target item to the target user, and if the predicted score is greater than or equal to the second recommended behavior threshold, recommend the target item to the target user.
9. The apparatus of claim 8, wherein the threshold acquisition module comprises:
a cost obtaining unit, configured to obtain first to sixth costs, where the first cost is a cost for recommending an item to a user when the user likes the item, the second cost is a cost for promoting the item to the user when the user likes the item, the third cost is a cost for not recommending the item to the user when the user likes the item, the fourth cost is a cost for recommending the item to the user when the user dislikes the item, the fifth cost is a cost for promoting the item to the user when the user dislikes the item, and the sixth cost is a cost for not recommending the item to the user when the user dislikes the item;
a threshold calculation unit, configured to calculate, according to the first to sixth costs acquired by the cost acquisition unit, the first recommended behavior threshold by using a first formula as follows, and calculate the second recommended behavior threshold by using a second formula as follows;
the first formula includes:
Figure FDA0002284016760000061
the second formula includes:
Figure FDA0002284016760000062
wherein, the β*For characterizing the first recommended behavior threshold, said α*For characterizing the second recommended behavior threshold, said lambdaPPFor characterizing said first cost, said λBPFor characterizing said second cost, said λNPFor characterizing said third cost, said λPNFor characterizing said fourth cost, said λBNFor characterizing said fifth cost, said λNNFor characterizing the sixth cost, rhThe highest score obtainable for characterizing the item, rlThe lowest score, λ, available for characterizing the itemPP<λBP<λNP,λPN<λBN<λNN
Figure FDA0002284016760000071
10. The apparatus of any one of claims 6 to 9, wherein the score calculation module comprises:
a user set determining unit, configured to determine, for each item in the local scoring information except for the target item, the user whose score is present for both the item and the target item in the local scoring information, and determine a set of the determined users as a user set corresponding to the item;
a deviation calculating unit for calculating, for each of the user sets determined by the user set determining unit, a score deviation corresponding to the user set by a third formula;
the third formula includes:
Figure FDA0002284016760000072
wherein, the Si,jFor characterizing the item t corresponding to the local scoring informationiThe set of users of, the | Si,jL is used to characterize the Si,jThe number of said users, said devi,jFor characterizing the corresponding Si,jThe score deviation of (a), the ukFor characterizing said Si,jThe k-th user in (1), the rk,jFor characterizing users ukFor target item tjThe score of rk,iFor characterizing the user ukFor the item tiScoring of (4);
a score predicting unit for calculating said predicted score by the following fourth formula based on each of said score deviations calculated by said deviation calculating unit;
the fourth formula includes:
Figure FDA0002284016760000073
wherein, the Pu,jFor characterizing the target item t by the target user uujThe predictive score of, the Nu,jFor characterizing a set of items, the local score information comprising for each of at least one of the items of the set of itemsThere are at least two of the users' scores for the item, including the target user, the | Nu,jL is used for representing the number of the items in the item set, and tiFor characterizing the Nu,jThe ith said item, said ru,iFor characterizing the target user uuFor the item tiThe score of (1).
CN201911152840.9A 2019-11-22 2019-11-22 Scoring prediction method and device Pending CN110909257A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911152840.9A CN110909257A (en) 2019-11-22 2019-11-22 Scoring prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911152840.9A CN110909257A (en) 2019-11-22 2019-11-22 Scoring prediction method and device

Publications (1)

Publication Number Publication Date
CN110909257A true CN110909257A (en) 2020-03-24

Family

ID=69818798

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911152840.9A Pending CN110909257A (en) 2019-11-22 2019-11-22 Scoring prediction method and device

Country Status (1)

Country Link
CN (1) CN110909257A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163733A (en) * 2020-08-28 2021-01-01 南京星耀智能科技有限公司 Fighting capacity assessment method based on expert knowledge combined collaborative filtering algorithm
CN113032687A (en) * 2021-03-04 2021-06-25 重庆邮电大学 Collaborative filtering movie recommendation method based on three-decision-making user clustering

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105678430A (en) * 2016-02-29 2016-06-15 大连大学 Improved user recommendation method based on neighbor project slope one algorithm
CN106021428A (en) * 2016-05-16 2016-10-12 武汉理工大学 KNN and three-way decision-based movie recommendation method
CN106484876A (en) * 2016-10-13 2017-03-08 中山大学 A kind of based on typical degree and the collaborative filtering recommending method of trust network
CN108205682A (en) * 2016-12-19 2018-06-26 同济大学 It is a kind of for the fusion content of personalized recommendation and the collaborative filtering method of behavior

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105678430A (en) * 2016-02-29 2016-06-15 大连大学 Improved user recommendation method based on neighbor project slope one algorithm
CN106021428A (en) * 2016-05-16 2016-10-12 武汉理工大学 KNN and three-way decision-based movie recommendation method
CN106484876A (en) * 2016-10-13 2017-03-08 中山大学 A kind of based on typical degree and the collaborative filtering recommending method of trust network
CN108205682A (en) * 2016-12-19 2018-06-26 同济大学 It is a kind of for the fusion content of personalized recommendation and the collaborative filtering method of behavior

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HENG-RU ZHANG ET AL.: "Regression-based three-way recommendation", 《INFORMATION SCIENCE》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163733A (en) * 2020-08-28 2021-01-01 南京星耀智能科技有限公司 Fighting capacity assessment method based on expert knowledge combined collaborative filtering algorithm
CN113032687A (en) * 2021-03-04 2021-06-25 重庆邮电大学 Collaborative filtering movie recommendation method based on three-decision-making user clustering

Similar Documents

Publication Publication Date Title
US11868391B2 (en) User-specific media playlists
Desrosiers et al. A comprehensive survey of neighborhood-based recommendation methods
Ning et al. A comprehensive survey of neighborhood-based recommendation methods
CN105993028B (en) Method, device and system for content recommendation
CN109033101B (en) Label recommendation method and device
US20120185481A1 (en) Method and Apparatus for Executing a Recommendation
US20120246161A1 (en) Apparatus and method for recommending information, and non-transitory computer readable medium thereof
Boratto et al. The rating prediction task in a group recommender system that automatically detects groups: architectures, algorithms, and performance evaluation
Pham et al. Preference-based user rating correction process for interactive recommendation systems
CN111371853B (en) Resource information pushing method, device, server and storage medium
Hwang et al. An algorithm for movie classification and recommendation using genre correlation
US20230244721A1 (en) Automated metadata asset creation using machine learning models
Vilakone et al. Movie recommendation system based on users’ personal information and movies rated using the method of k-clique and normalized discounted cumulative gain
CN110909257A (en) Scoring prediction method and device
CN109063052B (en) Personalized recommendation method and device based on time entropy
CN113032676A (en) Recommendation method and system based on micro-feedback
CN111708945A (en) Product recommendation method and device, electronic equipment and computer storage medium
Abdel-Hafez et al. Item reputation-aware recommender systems
Sreepada et al. Revisiting tendency based collaborative filtering for personalized recommendations
CN114912031A (en) Mixed recommendation method and system based on clustering and collaborative filtering
El Alami et al. Improving Neighborhood-Based Collaborative Filtering by a Heuristic Approach and an Adjusted Similarity Measure.
CN113111251A (en) Project recommendation method, device and system
CN112330405B (en) Recommendation method, terminal and storage medium for item set explicit feedback
US20220108192A1 (en) Information processing apparatus and non-transitory computer readable medium
KR20160064446A (en) A preference prediction method based on collaborative filtering algorithm using preference points

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200324