CN113269609A - User similarity calculation method, calculation system, device and storage medium - Google Patents
User similarity calculation method, calculation system, device and storage medium Download PDFInfo
- Publication number
- CN113269609A CN113269609A CN202110570380.2A CN202110570380A CN113269609A CN 113269609 A CN113269609 A CN 113269609A CN 202110570380 A CN202110570380 A CN 202110570380A CN 113269609 A CN113269609 A CN 113269609A
- Authority
- CN
- China
- Prior art keywords
- user
- commodity
- score
- calculating
- scoring
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Item recommendations
Landscapes
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present disclosure provides a user similarity calculation method, a calculation system, a computer device, and a storage medium, the method including: acquiring a user-commodity scoring matrix; modifying the scores in the user-commodity scoring matrix based on the preset time weight to obtain a new user-commodity scoring matrix; calculating the scoring difference value of any two users for each common scoring commodity aiming at the new user-commodity scoring matrix; classifying the grading difference values and respectively calculating the frequency of the grading difference values of all classes; calculating improved information entropy of all category score difference values; and calculating the similarity between any two users in the new user-commodity scoring matrix according to the information entropy and a preset similarity calculation method. The technical scheme of the disclosure enables the scoring to reflect the preference of the user more truly; meanwhile, the problem of data sparseness is relieved by introducing the information entropy, so that the similarity calculation result is more in line with the actual situation, and the commodity recommendation is more accurate.
Description
Technical Field
The disclosure belongs to the technical field of electronic commerce, and particularly relates to a user similarity calculation method, a user similarity calculation system, a computer device and a computer readable storage medium.
Background
A Collaborative Filtering (CF) algorithm is a representative algorithm in a recommendation system, and is widely applied to various large e-commerce platforms. The collaborative filtering algorithm mainly comprises a User-based collaborative filtering (User-CF) algorithm and a commodity-based collaborative filtering (Item-CF) algorithm. As shown in fig. 1, the key of the User-CF algorithm is to find similar users of a target User, and to synthesize the preferred goods of the similar users and recommend the goods to the target User. The method comprises the following three steps: 1. acquiring user-commodity scoring information; 2. calculating user similarity according to the user-commodity scoring information, sorting according to size, and taking the top N users with larger similarity as a neighbor user set; 3. and according to the scores of the commodities of the neighbor user set, carrying out score prediction on the commodities unknown to the user, and recommending the commodities with the highest prediction scores to the user.
It can be seen that the User similarity calculation is the key to the User-CF algorithm. The user similarity calculation is completed based on a user-commodity scoring matrix, and strategies which can be used in the solving process include cosine similarity, modified cosine similarity, Pearson correlation coefficient, Jacard similarity and the like.
Because the existing user similarity calculation is completed based on the user-commodity scoring matrix, data concentration is needed, and enough user behavior information is provided, when the user historical behaviors are few, even a new user does not have the historical behavior information, the problem that enough common commodity scoring information does not exist among users occurs, namely the user-commodity scoring matrix data is sparse, the similarity calculation among the users is inaccurate, and therefore recommendation with high accuracy is difficult to make. Moreover, the existing collaborative filtering algorithm treats the commodities accessed by the user equally, and the contribution of the commodities accessed by the user recently to the user interest measurement is not fully considered, so that the recommendation reliability and recommendation precision of the recommendation system are not high.
Disclosure of Invention
The present disclosure provides a method, a system, a computer device and a storage medium for calculating user similarity, so that a score can reflect user preferences more truly; and the problem of data sparseness is relieved, the similarity calculation result is more in line with the actual situation, and the commodity recommendation is more accurate.
In a first aspect, an embodiment of the present disclosure provides a method for calculating user similarity, including:
acquiring a user-commodity scoring matrix;
modifying the scores in the user-commodity scoring matrix based on the preset time weight to obtain a new user-commodity scoring matrix;
calculating the scoring difference value of any two users for each common scoring commodity aiming at the new user-commodity scoring matrix;
classifying the grading difference values and respectively calculating the frequency of the grading difference values of all classes;
calculating the improved information entropy of all the category score difference values according to the frequency of each category score difference value;
and calculating the similarity between any two users in the new user-commodity scoring matrix according to the information entropy and a preset similarity calculation method.
Further, the score in the user-commodity scoring matrix is corrected based on the preset time weight, and the score is obtained by adopting the following formula:
in the formulae (1) and (2), t (u)i) And t (v)i) Respectively representing the scoring time of the user u and the user v for the commodity i; w is at(ui)、wt(vi) Respectively presetting time weight calculation formulas for a user u and a user v; t (0) represents the earliest scoring time when the user u and the user v score the commodities; alpha represents a time attenuation parameter and reflects the speed of interest change of a user; t represents a time window; u. ofiAnd viRespectively representing the scores of the user u and the user v on the commodity i; u'iAnd v'iRespectively representing the correction scores of the user u and the user v on the commodity i; and i is 1 to n.
Further, the scoring difference of any two users for each common scoring commodity is calculated for the new user-commodity scoring matrix, and the scoring difference is obtained by adopting the following formula:
dif(u′,v′)=(u1′-v1′,…,ui′-vi′,…,un′-vn′)=(d1,…,di,…,dn) (3)
in the formula (3), dif (u ', v') represents the difference value of the scores of the user u and the user v on the common score commodities; d1,…,di,…,dnRepresenting the difference in the scores of user u and user v for commonly scored items 1, …, item i, …, and item n, respectively.
Further, the frequency of the difference value of each category score is calculated by the following formula:
fre(dif(u′,v′))=(p1,p2,…,pj,…,pk) (4)
in the formula, fre (dif (u ', v')) represents the frequency of the grade difference of each category after the grade difference of each common grade commodity of the user u and the user v is divided into k categories; dif (u ', v') represents the difference value of the scores of the user u and the user v on each common score commodity; k represents the number of categories into which the score difference of the respective common score commodities is divided, pjIndicating the probability in which the j-th class score difference occurs.
Further, the improved information entropy of all the category score difference values is calculated by adopting the following formula:
in the formula (5), H ' (fre (dif (u ', v ')) 0 represents the improved information entropy of the score difference values of all categories after the score difference values of the user u and the user v for the common score commodities are divided into k categories;for improved entropy of information, calculating formula, wherein d (p)j) Representing a distribution probability of pjThe difference in scores of (a).
Further, the similarity between any two users in the new user-commodity scoring matrix is calculated according to the information entropy and a preset similarity calculation method, and is obtained by adopting the following formula:
in formula (6), sim (u ', v') represents the similarity between user u and user v; i isuAnd IvRespectively representing commodity sets scored by the user u and the user v;the formula is a Jaccard similarity calculation formula.
In a second aspect, an embodiment of the present disclosure provides a system for calculating user similarity, including:
an acquisition module configured to acquire a user-commodity scoring matrix;
the score correction module is set to correct scores in the user-commodity score matrix based on a preset time weight to obtain a new user-commodity score matrix;
the first calculation module is configured to calculate a score difference value of any two users for each common score commodity aiming at the new user-commodity score matrix; and the number of the first and second groups,
classifying the grading difference values and respectively calculating the frequency of the grading difference values of all classes;
a second calculation module configured to calculate an improved information entropy for all category score differences according to the frequency of each category score difference; and the number of the first and second groups,
and calculating the similarity between any two users in the new user-commodity scoring matrix according to the information entropy and a preset similarity calculation method.
Further, the score correction module is specifically configured to:
and (3) correcting the scores in the user-commodity scoring matrix by adopting a formula (1) and a formula (2):
in the formulae (1) and (2), t (u)i) And t (v)i) Respectively representing the scoring time of the user u and the user v for the commodity i; w is at(ui)、wt(vi) Respectively presetting time weight calculation formulas for a user u and a user v; t (0) represents the earliest scoring time when the user u and the user v score the commodities; alpha represents a time attenuation parameter and reflects the speed of interest change of a user; t represents a time window; u. ofiAnd viRespectively representing the scores of the user u and the user v on the commodity i; u'iAnd v'iRespectively representing the correction scores of the user u and the user v on the commodity i; and i is 1 to n.
In a third aspect, an embodiment of the present disclosure further provides a computer device, including a memory and a processor, where the memory stores a computer program, and when the processor runs the computer program stored in the memory, the processor executes the method for calculating the user similarity according to any one of the first aspect.
In a fourth aspect, an embodiment of the present disclosure further provides a computer-readable storage medium, including: computer program, which when run on a computer, causes the computer to perform a method of calculating user similarity as described in any one of the first aspects.
Has the advantages that:
the user similarity calculation method, the calculation system, the computer equipment and the storage medium provided by the disclosure are realized by acquiring a user-commodity scoring matrix; modifying the scores in the user-commodity scoring matrix based on the preset time weight to obtain a new user-commodity scoring matrix; calculating the scoring difference value of any two users for each common scoring commodity aiming at the new user-commodity scoring matrix; classifying the grading difference values and respectively calculating the frequency of the grading difference values of all classes; calculating the improved information entropy of all the category score difference values according to the frequency of each category score difference value; and calculating the similarity between any two users in the new user-commodity scoring matrix according to the information entropy and a preset similarity calculation method. According to the technical scheme, the influence of time on the user interest is considered, and the time weight is introduced to correct the user score, so that the score can reflect the user preference more truly; meanwhile, an information entropy calculation idea is introduced, the user similarity is calculated, the problem of data sparseness is relieved, the similarity calculation result is more in line with the actual situation, and the commodity recommendation is more accurate.
Drawings
FIG. 1 is a schematic diagram of a user-based collaborative filtering recommendation algorithm in the prior art;
fig. 2 is a schematic flowchart of a method for calculating user similarity according to a first embodiment of the present disclosure;
fig. 3 is an architecture diagram of a computing system for user similarity according to a second embodiment of the present disclosure;
fig. 4 is an architecture diagram of a computer device according to a third embodiment of the present disclosure.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those skilled in the art, the present disclosure is further described in detail below with reference to the accompanying drawings and examples.
In which the terminology used in the embodiments of the disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in the disclosed embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
Because the existing user similarity calculation is completed based on the user-commodity scoring matrix, data concentration and enough user behavior information are needed, when the historical behaviors of the users are less, even a new user does not have the historical behavior information, the problem that enough common commodity scoring information does not exist among the users, namely the user-commodity scoring matrix data is sparse occurs, so that the similarity calculation among the users is inaccurate, and the recommendation with high accuracy is difficult to make. And the traditional collaborative filtering algorithm treats the commodities accessed by the user equally, the contribution of the recently accessed commodities to the user interest measurement is not fully considered, and the recommendation reliability and the recommendation precision are not high.
The following describes the technical solutions of the present disclosure and how to solve the above problems in detail with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
Fig. 1 is a schematic flowchart of a method for calculating user similarity in a collaborative filtering algorithm according to an embodiment of the present disclosure, as shown in fig. 1, including:
step S101: acquiring a user-commodity scoring matrix;
step S102: modifying the scores in the user-commodity scoring matrix based on the preset time weight to obtain a new user-commodity scoring matrix;
step S103: calculating the scoring difference value of any two users for each common scoring commodity aiming at the new user-commodity scoring matrix;
step S104: classifying the grading difference values and respectively calculating the frequency of the grading difference values of all classes;
step S105: calculating the improved information entropy of all the category score difference values according to the frequency of each category score difference value;
step S106: and calculating the similarity between any two users in the new user-commodity scoring matrix according to the information entropy and a preset similarity calculation method.
The User similarity calculation is the key of the User-CF algorithm. The user similarity calculation is performed based on a user-commodity scoring matrix, such as a user-commodity scoring matrix RmnThe following were used:
wherein m represents m users, n represents n commodities, and RmnAnd representing the scoring of the nth commodity by the mth user, and calculating the similarity of the users by adopting a row vector. The strategies that can be used in the solving process include cosine similarity, modified cosine similarity, Pearson correlation coefficient and the like.
Considering that the user interest changes along with time, in order to reflect the user scoring condition more truly, time weight is introduced to correct the scoring in the commodity scoring matrix, and a new user commodity scoring matrix is constructed. By correcting the scores in the commodity scoring matrix, the recent scores of the users are higher, and the scores can reflect the current interests of the users.
Then, calculating the grade difference of the commodities which are jointly graded by the user u and the user v based on the new commodity grading matrix, then carrying out frequency analysis, classifying the grade difference and calculating the frequency of each category; the information entropy is calculated to calculate the similarity of users, the information entropy can be understood as the occurrence probability of certain specific information (the occurrence probability of discrete random events), the chaos degree of a system can be reflected, and the lower the information entropy is, the more ordered the system is. Because the user similarity and the information entropy are in inverse proportion, the larger the information entropy is, the larger the difference degree between two users is, the more dissimilar the two users are; the smaller the information entropy, the smaller the degree of difference between the two users, and the more similar the two users. The calculation formula of the information entropy is as follows:
in the formula, n represents the number of information types in the sample U, and piIndicating the probability of the occurrence of the information numbered i in the sample U. In an implementation manner of the embodiment of the present disclosure, besides considering the frequency of the score difference, the information entropy may be improved, for example, the score difference itself also has an influence on the calculation result, and the score difference itself is added to the formula (7) in the information entropy calculation.
Further, the score in the user-commodity scoring matrix is corrected based on the preset time weight, and the score is obtained by adopting the following formula:
in the formulae (1) and (2), t (u)i) And t (v)i) Respectively representing the scoring time of the user u and the user v for the commodity i; w is at(ui)、wt(vi) Respectively presetting time weight calculation formulas for a user u and a user v; t (0) represents the earliest scoring time when the user u and the user v score the commodities; alpha represents a time attenuation parameter and reflects the speed of interest change of a user; t represents a time window; u. ofiAnd viRespectively representing the scores of the user u and the user v on the commodity i; u'iAnd v'iRespectively representing the correction scores of the user u and the user v on the commodity i; and i is 1 to n.
By time weight wtThe proportion of long-term interest of the user can be reduced, the proportion of short-term interest can be increased, and the interest of the user at present can be better reflected. Time weights w of different userstThe time decay parameters in (1) are the same.
Further, the scoring difference of any two users for each common scoring commodity is calculated for the new user-commodity scoring matrix, and the scoring difference is obtained by adopting the following formula:
dif(u′,v′)=(u1′-v1′,…,ui′-vi′,…,un′-vn′)=(d1,…,di,…,dn) (3)
in the formula (3), dif (u ', v') represents the difference value of the scores of the user u and the user v on the common score commodities; d1,…,di,…,dnRepresenting the difference in the scores of user u and user v for commonly scored items 1, …, item i, …, and item n, respectively. .
Through the corrected commodity scoring matrix, the scoring difference of the two users on the commonly scored commodities under the current condition can be obtained, and the influence of scoring of the two users at different time on the similarity of the two users is eliminated.
Further, the frequency of the difference value of each category score is calculated by the following formula:
fre(dif(u′,v′))=(p1,p2,…,pj,…,pk) (4)
in the formula, fre (dif (u ', v')) represents the frequency of the grade difference of each category after the grade difference of each common grade commodity of the user u and the user v is divided into k categories; dif (u ', v') represents the difference value of the scores of the user u and the user v on each common score commodity; k represents the number of categories into which the score difference of the respective common score commodities is divided, pjIndicating the probability in which the j-th class score difference occurs.
And performing frequency analysis on the score difference to obtain distribution characteristics of the score difference, wherein for example, if the score difference of the jointly scored commodities of the user u and the user v is (1, 2,2, 3), the frequency of 3 categories with the score difference of 1, 2,3 is represented as (1/4, 1/2, 1/4).
Further, the improved information entropy of all the category score difference values is calculated by adopting the following formula:
in the formula (5), H ' (fre (dif (u ', v '))) represents the improved information entropy of the score difference values of all categories after the score difference values of the user u and the user v for the common score commodities are divided into k categories;for improved entropy of information, calculating formula, wherein d (p)j) Representing a distribution probability of pjThe difference in scores of (a).
Besides considering the frequency of the score difference, the score difference itself also has an influence on the calculation result, for example, if dif (u ', v') (1, 2,3) and dif (u ', w') (3, 4, 5), the information entropy calculation result is consistent, but the similarity between the actual user u and the user v is greater than that between the user u and the user w. Therefore, the information entropy calculation formula is improved by adding the score difference value.
Further, the similarity between any two users in the new user-commodity scoring matrix is calculated according to the information entropy and a preset similarity calculation method, and is obtained by adopting the following formula:
in formula (6), sim (u ', v') represents the similarity between user u and user v; i isuAnd IvRespectively representing commodity sets scored by the user u and the user v;the formula is a Jaccard similarity calculation formula.
The Jaccard similarity does not care about the grade of the user on the commodity, and only considers the behavior that whether the user has preference on the commodity, namely the ratio of the common commodity grade of the two users to the total grade. The value is between (0, 1), when the value is 0, the two users do not have any common preference, and when the value is 1, the two users have consistent preference.
Iu、IvAnd respectively representing the commodity sets scored by the user u and the user v.
According to the embodiment of the disclosure, the change of the user interest along with time is considered, and the time weight is introduced to correct the user score, so that the current interest preference of the user is reflected more truly; meanwhile, an information entropy calculation idea is introduced, the similarity of the user is calculated by improving and combining the Jaccard similarity, the problem of data sparseness is relieved, the similarity calculation result is more in line with the actual situation, and the recommendation result is more accurate.
Fig. 3 is an architecture diagram of a computing system for user similarity according to a second embodiment of the present disclosure, as shown in fig. 3, including:
an acquisition module 1 configured to acquire a user-commodity scoring matrix;
the score correction module 2 is configured to correct scores in the user-commodity score matrix based on a preset time weight to obtain a new user-commodity score matrix;
a first calculating module 3, configured to calculate, for the new user-commodity scoring matrix, a scoring difference value of any two users for each common scored commodity; and the number of the first and second groups,
classifying the grading difference values and respectively calculating the frequency of the grading difference values of all classes;
a second calculation module 4 arranged to calculate the improved information entropy for all category score differences according to the frequency of each category score difference; and the number of the first and second groups,
and calculating the similarity between any two users in the new user-commodity scoring matrix according to the information entropy and a preset similarity calculation method.
Further, the score correction module 2 is specifically configured to:
and (3) correcting the scores of the user compared commodities in the user-commodity score matrix by adopting a formula (1) and a formula (2):
in the formulae (1) and (2), t (u)i) And t (v)i) Respectively representing the scoring time of the user u and the user v for the commodity i; w is at(ui)、wt(vi) Respectively presetting time weight calculation formulas for a user u and a user v; t (0) represents user u andthe earliest grading time when the user v grades the commodities; alpha represents a time attenuation parameter and reflects the speed of interest change of a user; t represents a time window; u. ofiAnd viRespectively representing the scores of the user u and the user v on the commodity i; u'iAnd v'iRespectively representing the correction scores of the user u and the user v on the commodity i; and i is 1 to n.
Further, the first calculating module 3 is specifically configured to:
calculating the difference value of the scores of any two users for each common score commodity by adopting a formula (3):
dif(u′,v′)=(u1′-v1′,…,ui′-vi′,…,un′-vn′)=(d1,…,di,…,dn) (3)
in the formula (3), dif (u ', v') represents the difference value of the scores of the user u and the user v on the common score commodities; d1,…,di,…,dnRepresenting the difference in the scores of user u and user v for commonly scored items 1, …, item i, …, and item n, respectively.
Further, the first calculating module 3 is further configured to:
the frequency of the difference value of each category score is calculated by the following formula:
fre(dig(u′,v′))=(p1,p2,…,pj,…,pk) (4)
in the formula, fre (dif (u ', v')) represents the frequency of the grade difference of each category after the grade difference of each common grade commodity of the user u and the user v is divided into k categories; dif (u ', v') represents the difference value of the scores of the user u and the user v on each common score commodity; k represents the number of categories into which the score difference of the respective common score commodities is divided, pjIndicating the probability in which the j-th class score difference occurs.
Further, the second calculating module 4 is specifically configured to:
the improved information entropy of all category score differences is calculated using the following formula:
in the formula (5), H ' (fre (dif (u ', v '))) represents the improved information entropy of the score difference values of all categories after the score difference values of the user u and the user v for the common score commodities are divided into k categories;for improved entropy of information, calculating formula, wherein d (p)j) Representing a distribution probability of pjThe difference in scores of (a).
Further, the second calculating module 4 is specifically further configured to:
calculating the similarity between any two users in the new user-commodity scoring matrix by adopting the following formula:
in formula (6), sim (u ', v') represents the similarity between user u and user v; i isuAnd IvRespectively representing commodity sets scored by the user u and the user v;the formula is a Jaccard similarity calculation formula.
The user similarity calculation system in the embodiment of the present disclosure is used for implementing the user similarity calculation method in the first method embodiment, so that the description is simpler, and reference may be specifically made to the related description in the first method embodiment, and details are not repeated here.
Furthermore, as shown in fig. 4, a computer device according to a third embodiment of the present disclosure further includes a memory 10 and a processor 20, where the memory 10 stores a computer program, and when the processor 20 runs the computer program stored in the memory 10, the processor 20 executes the above-mentioned methods for calculating the user similarity.
In addition, the embodiments of the present disclosure also provide a computer-readable storage medium, in which computer-executable instructions are stored, and when at least one processor of the user equipment executes the computer-executable instructions, the user equipment executes the above-mentioned various possible methods.
Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC (Application Specific Integrated Circuit). Additionally, the ASIC may reside in user equipment. Of course, the processor and the storage medium may reside as discrete components in a communication device.
It is to be understood that the above embodiments are merely exemplary embodiments that are employed to illustrate the principles of the present disclosure, and that the present disclosure is not limited thereto. It will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the disclosure, and these are to be considered as the scope of the disclosure.
Claims (10)
1. A method for calculating user similarity is characterized by comprising the following steps:
acquiring a user-commodity scoring matrix;
modifying the scores in the user-commodity scoring matrix based on the preset time weight to obtain a new user-commodity scoring matrix;
calculating the scoring difference value of any two users for each common scoring commodity aiming at the new user-commodity scoring matrix;
classifying the grading difference values and respectively calculating the frequency of the grading difference values of all classes;
calculating the improved information entropy of all the category score difference values according to the frequency of each category score difference value;
and calculating the similarity between any two users in the new user-commodity scoring matrix according to the information entropy and a preset similarity calculation method.
2. The calculation method according to claim 1, wherein the score in the user-commodity score matrix is corrected based on the preset time weight, and the following formula is adopted:
in the formulae (1) and (2), t (u)i)、t(vi) Respectively representing the scoring time of the user u and the user v for the commodity i; w is at(ui)、wt(vi) Respectively presetting time weight calculation formulas for a user u and a user v; t (0) represents the earliest scoring time when the user u and the user v score the commodities; alpha represents a time attenuation parameter and reflects the speed of interest change of a user; t represents a time window; u. ofi、viRespectively representing the scores of the user u and the user v on the commodity i; u'i、v′iRespectively representing the correction scores of the user u and the user v on the commodity i; and i is 1 to n.
3. The calculation method according to claim 2, wherein for the new user-commodity scoring matrix, the scoring difference of any two users for each common scored commodity is calculated, and the following formula is adopted to obtain:
dif(u′,v′)=(u1′-v1′,…,ui′-vi′,…,un′-vn′)=(d1,…,di,…,dn) (3)
in the formula (3), dif (u ', v') represents the difference value of the scores of the user u and the user v on the common score commodities; d1,…,di,…,dnRepresenting the difference in the scores of user u and user v for commonly scored items 1, …, item i, …, and item n, respectively.
4. The method of claim 2, wherein the frequency of calculating the difference between the respective category scores is obtained by using the following formula:
fre(dif(u′,v′))=(p1,p2,…,pj,…,pk) (4)
in the formula, fre (dif (u ', v')) represents the frequency of the grade difference of each category after the grade difference of each common grade commodity of the user u and the user v is divided into k categories; dif (u ', v') represents the difference value of the scores of the user u and the user v on each common score commodity; k represents the number of categories into which the score difference of the respective common score commodities is divided, pjIndicating the probability in which the j-th class score difference occurs.
5. The calculation method according to claim 4, wherein the improved information entropy of all the category score differences is calculated by using the following formula:
in the formula (5), H ' (fre (dif (u ', V '))) represents the improved information entropy of the score difference values of all categories after the score difference values of the user u and the user V for the common score commodities are divided into k categories;for improved entropy of information, calculating formula, wherein d (p)j) Representing a distribution probability of pjThe difference in scores of (a).
6. The calculation method according to claim 5, wherein the similarity between any two users in the new user-commodity rating matrix is calculated according to the information entropy and a preset similarity calculation method, and is obtained by adopting the following formula:
7. A system for calculating user similarity, comprising:
an acquisition module configured to acquire a user-commodity scoring matrix;
the score correction module is set to correct scores in the user-commodity score matrix based on a preset time weight to obtain a new user-commodity score matrix;
the first calculation module is configured to calculate a score difference value of any two users for each common score commodity aiming at the new user-commodity score matrix; and the number of the first and second groups,
classifying the grading difference values and respectively calculating the frequency of the grading difference values of all classes;
a second calculation module configured to calculate an improved information entropy for all category score differences according to the frequency of each category score difference; and the number of the first and second groups,
and calculating the similarity between any two users in the new user-commodity scoring matrix according to the information entropy and a preset similarity calculation method.
8. The computing system of claim 7, wherein the score modification module is specifically configured to:
and (3) correcting the scores in the user-commodity scoring matrix by adopting a formula (1) and a formula (2):
in the formulae (1) and (2), t (u)i) And t (v)i) Respectively representing the scoring time of the user u and the user v for the commodity i; w is at(ui)、wt(vi) Respectively presetting time weight calculation formulas for a user u and a user v; t (0) represents the earliest scoring time when the user u and the user v score the commodities; alpha represents a time attenuation parameter and reflects the speed of interest change of a user; t represents a time window; u. ofiAnd viRespectively representing the scores of the user u and the user v on the commodity i; u'iAnd v'iRespectively representing the correction scores of the user u and the user v on the commodity i; and i is 1 to n.
9. A computer device characterized by comprising a memory in which a computer program is stored and a processor that executes the user similarity calculation method according to any one of claims 1 to 6 when the processor runs the computer program stored in the memory.
10. A computer-readable storage medium, comprising: computer program, which, when run on a computer, causes the computer to carry out the method of calculating user similarity as claimed in any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110570380.2A CN113269609A (en) | 2021-05-25 | 2021-05-25 | User similarity calculation method, calculation system, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110570380.2A CN113269609A (en) | 2021-05-25 | 2021-05-25 | User similarity calculation method, calculation system, device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113269609A true CN113269609A (en) | 2021-08-17 |
Family
ID=77232725
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110570380.2A Withdrawn CN113269609A (en) | 2021-05-25 | 2021-05-25 | User similarity calculation method, calculation system, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113269609A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678431A (en) * | 2013-03-26 | 2014-03-26 | 南京邮电大学 | Recommendation method based on standard labels and item grades |
CN104935970A (en) * | 2015-07-09 | 2015-09-23 | 三星电子(中国)研发中心 | Method for recommending television content and television client |
CN107247753A (en) * | 2017-05-27 | 2017-10-13 | 深圳大学 | A kind of similar users choosing method and device |
CN109241203A (en) * | 2018-09-27 | 2019-01-18 | 天津理工大学 | A kind of user preference and distance weighted clustering method of time of fusion factor |
CN109408734A (en) * | 2018-09-28 | 2019-03-01 | 嘉兴学院 | A kind of collaborative filtering recommending method of fuse information Entropy conformability degree and dynamic trust |
-
2021
- 2021-05-25 CN CN202110570380.2A patent/CN113269609A/en not_active Withdrawn
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678431A (en) * | 2013-03-26 | 2014-03-26 | 南京邮电大学 | Recommendation method based on standard labels and item grades |
CN104935970A (en) * | 2015-07-09 | 2015-09-23 | 三星电子(中国)研发中心 | Method for recommending television content and television client |
CN107247753A (en) * | 2017-05-27 | 2017-10-13 | 深圳大学 | A kind of similar users choosing method and device |
CN109241203A (en) * | 2018-09-27 | 2019-01-18 | 天津理工大学 | A kind of user preference and distance weighted clustering method of time of fusion factor |
CN109408734A (en) * | 2018-09-28 | 2019-03-01 | 嘉兴学院 | A kind of collaborative filtering recommending method of fuse information Entropy conformability degree and dynamic trust |
Non-Patent Citations (1)
Title |
---|
刘文龙: "基于加权信息熵相似度的协同过滤算法", 《中国优秀硕士学位论文全文数据库》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109783734B (en) | Mixed collaborative filtering recommendation algorithm based on project attributes | |
US7206780B2 (en) | Relevance value for each category of a particular search result in the ranked list is estimated based on its rank and actual relevance values | |
CN108920503A (en) | A kind of micro- video personalized recommendation algorithm based on social networks degree of belief | |
US8738436B2 (en) | Click through rate prediction system and method | |
CN105787061A (en) | Information pushing method | |
US20100262454A1 (en) | System and method for sentiment-based text classification and relevancy ranking | |
US20150161529A1 (en) | Identifying Related Events for Event Ticket Network Systems | |
CN106021298B (en) | A kind of collaborative filtering recommending method and system based on asymmetric Weighted Similarity | |
CN109635206B (en) | Personalized recommendation method and system integrating implicit feedback and user social status | |
US9830643B2 (en) | Adaptive risk-based verification and authentication platform | |
CN104766219B (en) | Based on the user's recommendation list generation method and system in units of list | |
CN113129053B (en) | Information recommendation model training method, information recommendation method and storage medium | |
CN113191838A (en) | Shopping recommendation method and system based on heterogeneous graph neural network | |
CN111400585A (en) | Book recommendation method and device | |
CN115439139A (en) | User interest analysis method based on E-commerce big data | |
CN111563787A (en) | Recommendation system and method based on user comments and scores | |
Smith | Structural breaks in grouped heterogeneity | |
CN111382265B (en) | Searching method, device, equipment and medium | |
CN113269609A (en) | User similarity calculation method, calculation system, device and storage medium | |
He et al. | Understanding Users' Coupon Usage Behaviors in E-Commerce Environments | |
Liao et al. | Accumulative Time Based Ranking Method to Reputation Evaluation in Information Networks | |
CN114912031A (en) | Mixed recommendation method and system based on clustering and collaborative filtering | |
Priyati et al. | The comparison study of matrix factorization on collaborative filtering recommender system | |
CN106951462A (en) | A kind of film based on Time Trust similarities recommends method | |
CN110825967A (en) | Recommendation list re-ranking method for improving diversity of recommendation system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20210817 |
|
WW01 | Invention patent application withdrawn after publication |