CN113269609A - User similarity calculation method, calculation system, device and storage medium - Google Patents

User similarity calculation method, calculation system, device and storage medium Download PDF

Info

Publication number
CN113269609A
CN113269609A CN202110570380.2A CN202110570380A CN113269609A CN 113269609 A CN113269609 A CN 113269609A CN 202110570380 A CN202110570380 A CN 202110570380A CN 113269609 A CN113269609 A CN 113269609A
Authority
CN
China
Prior art keywords
user
commodity
score
calculating
scoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110570380.2A
Other languages
Chinese (zh)
Inventor
霍慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN202110570380.2A priority Critical patent/CN113269609A/en
Publication of CN113269609A publication Critical patent/CN113269609A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides a user similarity calculation method, a calculation system, a computer device, and a storage medium, the method including: acquiring a user-commodity scoring matrix; modifying the scores in the user-commodity scoring matrix based on the preset time weight to obtain a new user-commodity scoring matrix; calculating the scoring difference value of any two users for each common scoring commodity aiming at the new user-commodity scoring matrix; classifying the grading difference values and respectively calculating the frequency of the grading difference values of all classes; calculating improved information entropy of all category score difference values; and calculating the similarity between any two users in the new user-commodity scoring matrix according to the information entropy and a preset similarity calculation method. The technical scheme of the disclosure enables the scoring to reflect the preference of the user more truly; meanwhile, the problem of data sparseness is relieved by introducing the information entropy, so that the similarity calculation result is more in line with the actual situation, and the commodity recommendation is more accurate.

Description

User similarity calculation method, calculation system, device and storage medium
Technical Field
The disclosure belongs to the technical field of electronic commerce, and particularly relates to a user similarity calculation method, a user similarity calculation system, a computer device and a computer readable storage medium.
Background
A Collaborative Filtering (CF) algorithm is a representative algorithm in a recommendation system, and is widely applied to various large e-commerce platforms. The collaborative filtering algorithm mainly comprises a User-based collaborative filtering (User-CF) algorithm and a commodity-based collaborative filtering (Item-CF) algorithm. As shown in fig. 1, the key of the User-CF algorithm is to find similar users of a target User, and to synthesize the preferred goods of the similar users and recommend the goods to the target User. The method comprises the following three steps: 1. acquiring user-commodity scoring information; 2. calculating user similarity according to the user-commodity scoring information, sorting according to size, and taking the top N users with larger similarity as a neighbor user set; 3. and according to the scores of the commodities of the neighbor user set, carrying out score prediction on the commodities unknown to the user, and recommending the commodities with the highest prediction scores to the user.
It can be seen that the User similarity calculation is the key to the User-CF algorithm. The user similarity calculation is completed based on a user-commodity scoring matrix, and strategies which can be used in the solving process include cosine similarity, modified cosine similarity, Pearson correlation coefficient, Jacard similarity and the like.
Because the existing user similarity calculation is completed based on the user-commodity scoring matrix, data concentration is needed, and enough user behavior information is provided, when the user historical behaviors are few, even a new user does not have the historical behavior information, the problem that enough common commodity scoring information does not exist among users occurs, namely the user-commodity scoring matrix data is sparse, the similarity calculation among the users is inaccurate, and therefore recommendation with high accuracy is difficult to make. Moreover, the existing collaborative filtering algorithm treats the commodities accessed by the user equally, and the contribution of the commodities accessed by the user recently to the user interest measurement is not fully considered, so that the recommendation reliability and recommendation precision of the recommendation system are not high.
Disclosure of Invention
The present disclosure provides a method, a system, a computer device and a storage medium for calculating user similarity, so that a score can reflect user preferences more truly; and the problem of data sparseness is relieved, the similarity calculation result is more in line with the actual situation, and the commodity recommendation is more accurate.
In a first aspect, an embodiment of the present disclosure provides a method for calculating user similarity, including:
acquiring a user-commodity scoring matrix;
modifying the scores in the user-commodity scoring matrix based on the preset time weight to obtain a new user-commodity scoring matrix;
calculating the scoring difference value of any two users for each common scoring commodity aiming at the new user-commodity scoring matrix;
classifying the grading difference values and respectively calculating the frequency of the grading difference values of all classes;
calculating the improved information entropy of all the category score difference values according to the frequency of each category score difference value;
and calculating the similarity between any two users in the new user-commodity scoring matrix according to the information entropy and a preset similarity calculation method.
Further, the score in the user-commodity scoring matrix is corrected based on the preset time weight, and the score is obtained by adopting the following formula:
Figure BDA0003082430700000021
Figure BDA0003082430700000022
in the formulae (1) and (2), t (u)i) And t (v)i) Respectively representing the scoring time of the user u and the user v for the commodity i; w is at(ui)、wt(vi) Respectively presetting time weight calculation formulas for a user u and a user v; t (0) represents the earliest scoring time when the user u and the user v score the commodities; alpha represents a time attenuation parameter and reflects the speed of interest change of a user; t represents a time window; u. ofiAnd viRespectively representing the scores of the user u and the user v on the commodity i; u'iAnd v'iRespectively representing the correction scores of the user u and the user v on the commodity i; and i is 1 to n.
Further, the scoring difference of any two users for each common scoring commodity is calculated for the new user-commodity scoring matrix, and the scoring difference is obtained by adopting the following formula:
dif(u′,v′)=(u1′-v1′,…,ui′-vi′,…,un′-vn′)=(d1,…,di,…,dn) (3)
in the formula (3), dif (u ', v') represents the difference value of the scores of the user u and the user v on the common score commodities; d1,…,di,…,dnRepresenting the difference in the scores of user u and user v for commonly scored items 1, …, item i, …, and item n, respectively.
Further, the frequency of the difference value of each category score is calculated by the following formula:
fre(dif(u′,v′))=(p1,p2,…,pj,…,pk) (4)
in the formula, fre (dif (u ', v')) represents the frequency of the grade difference of each category after the grade difference of each common grade commodity of the user u and the user v is divided into k categories; dif (u ', v') represents the difference value of the scores of the user u and the user v on each common score commodity; k represents the number of categories into which the score difference of the respective common score commodities is divided, pjIndicating the probability in which the j-th class score difference occurs.
Further, the improved information entropy of all the category score difference values is calculated by adopting the following formula:
Figure BDA0003082430700000031
in the formula (5), H ' (fre (dif (u ', v ')) 0 represents the improved information entropy of the score difference values of all categories after the score difference values of the user u and the user v for the common score commodities are divided into k categories;
Figure BDA0003082430700000032
for improved entropy of information, calculating formula, wherein d (p)j) Representing a distribution probability of pjThe difference in scores of (a).
Further, the similarity between any two users in the new user-commodity scoring matrix is calculated according to the information entropy and a preset similarity calculation method, and is obtained by adopting the following formula:
Figure BDA0003082430700000033
in formula (6), sim (u ', v') represents the similarity between user u and user v; i isuAnd IvRespectively representing commodity sets scored by the user u and the user v;
Figure BDA0003082430700000034
the formula is a Jaccard similarity calculation formula.
In a second aspect, an embodiment of the present disclosure provides a system for calculating user similarity, including:
an acquisition module configured to acquire a user-commodity scoring matrix;
the score correction module is set to correct scores in the user-commodity score matrix based on a preset time weight to obtain a new user-commodity score matrix;
the first calculation module is configured to calculate a score difference value of any two users for each common score commodity aiming at the new user-commodity score matrix; and the number of the first and second groups,
classifying the grading difference values and respectively calculating the frequency of the grading difference values of all classes;
a second calculation module configured to calculate an improved information entropy for all category score differences according to the frequency of each category score difference; and the number of the first and second groups,
and calculating the similarity between any two users in the new user-commodity scoring matrix according to the information entropy and a preset similarity calculation method.
Further, the score correction module is specifically configured to:
and (3) correcting the scores in the user-commodity scoring matrix by adopting a formula (1) and a formula (2):
Figure BDA0003082430700000041
Figure BDA0003082430700000042
in the formulae (1) and (2), t (u)i) And t (v)i) Respectively representing the scoring time of the user u and the user v for the commodity i; w is at(ui)、wt(vi) Respectively presetting time weight calculation formulas for a user u and a user v; t (0) represents the earliest scoring time when the user u and the user v score the commodities; alpha represents a time attenuation parameter and reflects the speed of interest change of a user; t represents a time window; u. ofiAnd viRespectively representing the scores of the user u and the user v on the commodity i; u'iAnd v'iRespectively representing the correction scores of the user u and the user v on the commodity i; and i is 1 to n.
In a third aspect, an embodiment of the present disclosure further provides a computer device, including a memory and a processor, where the memory stores a computer program, and when the processor runs the computer program stored in the memory, the processor executes the method for calculating the user similarity according to any one of the first aspect.
In a fourth aspect, an embodiment of the present disclosure further provides a computer-readable storage medium, including: computer program, which when run on a computer, causes the computer to perform a method of calculating user similarity as described in any one of the first aspects.
Has the advantages that:
the user similarity calculation method, the calculation system, the computer equipment and the storage medium provided by the disclosure are realized by acquiring a user-commodity scoring matrix; modifying the scores in the user-commodity scoring matrix based on the preset time weight to obtain a new user-commodity scoring matrix; calculating the scoring difference value of any two users for each common scoring commodity aiming at the new user-commodity scoring matrix; classifying the grading difference values and respectively calculating the frequency of the grading difference values of all classes; calculating the improved information entropy of all the category score difference values according to the frequency of each category score difference value; and calculating the similarity between any two users in the new user-commodity scoring matrix according to the information entropy and a preset similarity calculation method. According to the technical scheme, the influence of time on the user interest is considered, and the time weight is introduced to correct the user score, so that the score can reflect the user preference more truly; meanwhile, an information entropy calculation idea is introduced, the user similarity is calculated, the problem of data sparseness is relieved, the similarity calculation result is more in line with the actual situation, and the commodity recommendation is more accurate.
Drawings
FIG. 1 is a schematic diagram of a user-based collaborative filtering recommendation algorithm in the prior art;
fig. 2 is a schematic flowchart of a method for calculating user similarity according to a first embodiment of the present disclosure;
fig. 3 is an architecture diagram of a computing system for user similarity according to a second embodiment of the present disclosure;
fig. 4 is an architecture diagram of a computer device according to a third embodiment of the present disclosure.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those skilled in the art, the present disclosure is further described in detail below with reference to the accompanying drawings and examples.
In which the terminology used in the embodiments of the disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in the disclosed embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
Because the existing user similarity calculation is completed based on the user-commodity scoring matrix, data concentration and enough user behavior information are needed, when the historical behaviors of the users are less, even a new user does not have the historical behavior information, the problem that enough common commodity scoring information does not exist among the users, namely the user-commodity scoring matrix data is sparse occurs, so that the similarity calculation among the users is inaccurate, and the recommendation with high accuracy is difficult to make. And the traditional collaborative filtering algorithm treats the commodities accessed by the user equally, the contribution of the recently accessed commodities to the user interest measurement is not fully considered, and the recommendation reliability and the recommendation precision are not high.
The following describes the technical solutions of the present disclosure and how to solve the above problems in detail with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
Fig. 1 is a schematic flowchart of a method for calculating user similarity in a collaborative filtering algorithm according to an embodiment of the present disclosure, as shown in fig. 1, including:
step S101: acquiring a user-commodity scoring matrix;
step S102: modifying the scores in the user-commodity scoring matrix based on the preset time weight to obtain a new user-commodity scoring matrix;
step S103: calculating the scoring difference value of any two users for each common scoring commodity aiming at the new user-commodity scoring matrix;
step S104: classifying the grading difference values and respectively calculating the frequency of the grading difference values of all classes;
step S105: calculating the improved information entropy of all the category score difference values according to the frequency of each category score difference value;
step S106: and calculating the similarity between any two users in the new user-commodity scoring matrix according to the information entropy and a preset similarity calculation method.
The User similarity calculation is the key of the User-CF algorithm. The user similarity calculation is performed based on a user-commodity scoring matrix, such as a user-commodity scoring matrix RmnThe following were used:
Figure BDA0003082430700000071
wherein m represents m users, n represents n commodities, and RmnAnd representing the scoring of the nth commodity by the mth user, and calculating the similarity of the users by adopting a row vector. The strategies that can be used in the solving process include cosine similarity, modified cosine similarity, Pearson correlation coefficient and the like.
Considering that the user interest changes along with time, in order to reflect the user scoring condition more truly, time weight is introduced to correct the scoring in the commodity scoring matrix, and a new user commodity scoring matrix is constructed. By correcting the scores in the commodity scoring matrix, the recent scores of the users are higher, and the scores can reflect the current interests of the users.
Then, calculating the grade difference of the commodities which are jointly graded by the user u and the user v based on the new commodity grading matrix, then carrying out frequency analysis, classifying the grade difference and calculating the frequency of each category; the information entropy is calculated to calculate the similarity of users, the information entropy can be understood as the occurrence probability of certain specific information (the occurrence probability of discrete random events), the chaos degree of a system can be reflected, and the lower the information entropy is, the more ordered the system is. Because the user similarity and the information entropy are in inverse proportion, the larger the information entropy is, the larger the difference degree between two users is, the more dissimilar the two users are; the smaller the information entropy, the smaller the degree of difference between the two users, and the more similar the two users. The calculation formula of the information entropy is as follows:
Figure BDA0003082430700000072
in the formula, n represents the number of information types in the sample U, and piIndicating the probability of the occurrence of the information numbered i in the sample U. In an implementation manner of the embodiment of the present disclosure, besides considering the frequency of the score difference, the information entropy may be improved, for example, the score difference itself also has an influence on the calculation result, and the score difference itself is added to the formula (7) in the information entropy calculation.
Further, the score in the user-commodity scoring matrix is corrected based on the preset time weight, and the score is obtained by adopting the following formula:
Figure BDA0003082430700000073
Figure BDA0003082430700000074
in the formulae (1) and (2), t (u)i) And t (v)i) Respectively representing the scoring time of the user u and the user v for the commodity i; w is at(ui)、wt(vi) Respectively presetting time weight calculation formulas for a user u and a user v; t (0) represents the earliest scoring time when the user u and the user v score the commodities; alpha represents a time attenuation parameter and reflects the speed of interest change of a user; t represents a time window; u. ofiAnd viRespectively representing the scores of the user u and the user v on the commodity i; u'iAnd v'iRespectively representing the correction scores of the user u and the user v on the commodity i; and i is 1 to n.
By time weight wtThe proportion of long-term interest of the user can be reduced, the proportion of short-term interest can be increased, and the interest of the user at present can be better reflected. Time weights w of different userstThe time decay parameters in (1) are the same.
Further, the scoring difference of any two users for each common scoring commodity is calculated for the new user-commodity scoring matrix, and the scoring difference is obtained by adopting the following formula:
dif(u′,v′)=(u1′-v1′,…,ui′-vi′,…,un′-vn′)=(d1,…,di,…,dn) (3)
in the formula (3), dif (u ', v') represents the difference value of the scores of the user u and the user v on the common score commodities; d1,…,di,…,dnRepresenting the difference in the scores of user u and user v for commonly scored items 1, …, item i, …, and item n, respectively. .
Through the corrected commodity scoring matrix, the scoring difference of the two users on the commonly scored commodities under the current condition can be obtained, and the influence of scoring of the two users at different time on the similarity of the two users is eliminated.
Further, the frequency of the difference value of each category score is calculated by the following formula:
fre(dif(u′,v′))=(p1,p2,…,pj,…,pk) (4)
in the formula, fre (dif (u ', v')) represents the frequency of the grade difference of each category after the grade difference of each common grade commodity of the user u and the user v is divided into k categories; dif (u ', v') represents the difference value of the scores of the user u and the user v on each common score commodity; k represents the number of categories into which the score difference of the respective common score commodities is divided, pjIndicating the probability in which the j-th class score difference occurs.
And performing frequency analysis on the score difference to obtain distribution characteristics of the score difference, wherein for example, if the score difference of the jointly scored commodities of the user u and the user v is (1, 2,2, 3), the frequency of 3 categories with the score difference of 1, 2,3 is represented as (1/4, 1/2, 1/4).
Further, the improved information entropy of all the category score difference values is calculated by adopting the following formula:
Figure BDA0003082430700000091
in the formula (5), H ' (fre (dif (u ', v '))) represents the improved information entropy of the score difference values of all categories after the score difference values of the user u and the user v for the common score commodities are divided into k categories;
Figure BDA0003082430700000092
for improved entropy of information, calculating formula, wherein d (p)j) Representing a distribution probability of pjThe difference in scores of (a).
Besides considering the frequency of the score difference, the score difference itself also has an influence on the calculation result, for example, if dif (u ', v') (1, 2,3) and dif (u ', w') (3, 4, 5), the information entropy calculation result is consistent, but the similarity between the actual user u and the user v is greater than that between the user u and the user w. Therefore, the information entropy calculation formula is improved by adding the score difference value.
Further, the similarity between any two users in the new user-commodity scoring matrix is calculated according to the information entropy and a preset similarity calculation method, and is obtained by adopting the following formula:
Figure BDA0003082430700000093
in formula (6), sim (u ', v') represents the similarity between user u and user v; i isuAnd IvRespectively representing commodity sets scored by the user u and the user v;
Figure BDA0003082430700000094
the formula is a Jaccard similarity calculation formula.
The Jaccard similarity does not care about the grade of the user on the commodity, and only considers the behavior that whether the user has preference on the commodity, namely the ratio of the common commodity grade of the two users to the total grade. The value is between (0, 1), when the value is 0, the two users do not have any common preference, and when the value is 1, the two users have consistent preference.
Figure BDA0003082430700000095
Iu、IvAnd respectively representing the commodity sets scored by the user u and the user v.
According to the embodiment of the disclosure, the change of the user interest along with time is considered, and the time weight is introduced to correct the user score, so that the current interest preference of the user is reflected more truly; meanwhile, an information entropy calculation idea is introduced, the similarity of the user is calculated by improving and combining the Jaccard similarity, the problem of data sparseness is relieved, the similarity calculation result is more in line with the actual situation, and the recommendation result is more accurate.
Fig. 3 is an architecture diagram of a computing system for user similarity according to a second embodiment of the present disclosure, as shown in fig. 3, including:
an acquisition module 1 configured to acquire a user-commodity scoring matrix;
the score correction module 2 is configured to correct scores in the user-commodity score matrix based on a preset time weight to obtain a new user-commodity score matrix;
a first calculating module 3, configured to calculate, for the new user-commodity scoring matrix, a scoring difference value of any two users for each common scored commodity; and the number of the first and second groups,
classifying the grading difference values and respectively calculating the frequency of the grading difference values of all classes;
a second calculation module 4 arranged to calculate the improved information entropy for all category score differences according to the frequency of each category score difference; and the number of the first and second groups,
and calculating the similarity between any two users in the new user-commodity scoring matrix according to the information entropy and a preset similarity calculation method.
Further, the score correction module 2 is specifically configured to:
and (3) correcting the scores of the user compared commodities in the user-commodity score matrix by adopting a formula (1) and a formula (2):
Figure BDA0003082430700000101
Figure BDA0003082430700000102
in the formulae (1) and (2), t (u)i) And t (v)i) Respectively representing the scoring time of the user u and the user v for the commodity i; w is at(ui)、wt(vi) Respectively presetting time weight calculation formulas for a user u and a user v; t (0) represents user u andthe earliest grading time when the user v grades the commodities; alpha represents a time attenuation parameter and reflects the speed of interest change of a user; t represents a time window; u. ofiAnd viRespectively representing the scores of the user u and the user v on the commodity i; u'iAnd v'iRespectively representing the correction scores of the user u and the user v on the commodity i; and i is 1 to n.
Further, the first calculating module 3 is specifically configured to:
calculating the difference value of the scores of any two users for each common score commodity by adopting a formula (3):
dif(u′,v′)=(u1′-v1′,…,ui′-vi′,…,un′-vn′)=(d1,…,di,…,dn) (3)
in the formula (3), dif (u ', v') represents the difference value of the scores of the user u and the user v on the common score commodities; d1,…,di,…,dnRepresenting the difference in the scores of user u and user v for commonly scored items 1, …, item i, …, and item n, respectively.
Further, the first calculating module 3 is further configured to:
the frequency of the difference value of each category score is calculated by the following formula:
fre(dig(u′,v′))=(p1,p2,…,pj,…,pk) (4)
in the formula, fre (dif (u ', v')) represents the frequency of the grade difference of each category after the grade difference of each common grade commodity of the user u and the user v is divided into k categories; dif (u ', v') represents the difference value of the scores of the user u and the user v on each common score commodity; k represents the number of categories into which the score difference of the respective common score commodities is divided, pjIndicating the probability in which the j-th class score difference occurs.
Further, the second calculating module 4 is specifically configured to:
the improved information entropy of all category score differences is calculated using the following formula:
Figure BDA0003082430700000111
in the formula (5), H ' (fre (dif (u ', v '))) represents the improved information entropy of the score difference values of all categories after the score difference values of the user u and the user v for the common score commodities are divided into k categories;
Figure BDA0003082430700000112
for improved entropy of information, calculating formula, wherein d (p)j) Representing a distribution probability of pjThe difference in scores of (a).
Further, the second calculating module 4 is specifically further configured to:
calculating the similarity between any two users in the new user-commodity scoring matrix by adopting the following formula:
Figure BDA0003082430700000113
in formula (6), sim (u ', v') represents the similarity between user u and user v; i isuAnd IvRespectively representing commodity sets scored by the user u and the user v;
Figure BDA0003082430700000114
the formula is a Jaccard similarity calculation formula.
The user similarity calculation system in the embodiment of the present disclosure is used for implementing the user similarity calculation method in the first method embodiment, so that the description is simpler, and reference may be specifically made to the related description in the first method embodiment, and details are not repeated here.
Furthermore, as shown in fig. 4, a computer device according to a third embodiment of the present disclosure further includes a memory 10 and a processor 20, where the memory 10 stores a computer program, and when the processor 20 runs the computer program stored in the memory 10, the processor 20 executes the above-mentioned methods for calculating the user similarity.
In addition, the embodiments of the present disclosure also provide a computer-readable storage medium, in which computer-executable instructions are stored, and when at least one processor of the user equipment executes the computer-executable instructions, the user equipment executes the above-mentioned various possible methods.
Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC (Application Specific Integrated Circuit). Additionally, the ASIC may reside in user equipment. Of course, the processor and the storage medium may reside as discrete components in a communication device.
It is to be understood that the above embodiments are merely exemplary embodiments that are employed to illustrate the principles of the present disclosure, and that the present disclosure is not limited thereto. It will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the disclosure, and these are to be considered as the scope of the disclosure.

Claims (10)

1. A method for calculating user similarity is characterized by comprising the following steps:
acquiring a user-commodity scoring matrix;
modifying the scores in the user-commodity scoring matrix based on the preset time weight to obtain a new user-commodity scoring matrix;
calculating the scoring difference value of any two users for each common scoring commodity aiming at the new user-commodity scoring matrix;
classifying the grading difference values and respectively calculating the frequency of the grading difference values of all classes;
calculating the improved information entropy of all the category score difference values according to the frequency of each category score difference value;
and calculating the similarity between any two users in the new user-commodity scoring matrix according to the information entropy and a preset similarity calculation method.
2. The calculation method according to claim 1, wherein the score in the user-commodity score matrix is corrected based on the preset time weight, and the following formula is adopted:
Figure FDA0003082430690000011
Figure FDA0003082430690000012
in the formulae (1) and (2), t (u)i)、t(vi) Respectively representing the scoring time of the user u and the user v for the commodity i; w is at(ui)、wt(vi) Respectively presetting time weight calculation formulas for a user u and a user v; t (0) represents the earliest scoring time when the user u and the user v score the commodities; alpha represents a time attenuation parameter and reflects the speed of interest change of a user; t represents a time window; u. ofi、viRespectively representing the scores of the user u and the user v on the commodity i; u'i、v′iRespectively representing the correction scores of the user u and the user v on the commodity i; and i is 1 to n.
3. The calculation method according to claim 2, wherein for the new user-commodity scoring matrix, the scoring difference of any two users for each common scored commodity is calculated, and the following formula is adopted to obtain:
dif(u′,v′)=(u1′-v1′,…,ui′-vi′,…,un′-vn′)=(d1,…,di,…,dn) (3)
in the formula (3), dif (u ', v') represents the difference value of the scores of the user u and the user v on the common score commodities; d1,…,di,…,dnRepresenting the difference in the scores of user u and user v for commonly scored items 1, …, item i, …, and item n, respectively.
4. The method of claim 2, wherein the frequency of calculating the difference between the respective category scores is obtained by using the following formula:
fre(dif(u′,v′))=(p1,p2,…,pj,…,pk) (4)
in the formula, fre (dif (u ', v')) represents the frequency of the grade difference of each category after the grade difference of each common grade commodity of the user u and the user v is divided into k categories; dif (u ', v') represents the difference value of the scores of the user u and the user v on each common score commodity; k represents the number of categories into which the score difference of the respective common score commodities is divided, pjIndicating the probability in which the j-th class score difference occurs.
5. The calculation method according to claim 4, wherein the improved information entropy of all the category score differences is calculated by using the following formula:
Figure FDA0003082430690000021
in the formula (5), H ' (fre (dif (u ', V '))) represents the improved information entropy of the score difference values of all categories after the score difference values of the user u and the user V for the common score commodities are divided into k categories;
Figure FDA0003082430690000022
for improved entropy of information, calculating formula, wherein d (p)j) Representing a distribution probability of pjThe difference in scores of (a).
6. The calculation method according to claim 5, wherein the similarity between any two users in the new user-commodity rating matrix is calculated according to the information entropy and a preset similarity calculation method, and is obtained by adopting the following formula:
Figure FDA0003082430690000023
in formula (6), sim (u ', v') represents the similarity between user u and user v; i isuAnd IvRespectively representing commodity sets scored by the user u and the user v;
Figure FDA0003082430690000031
the formula is a Jaccard similarity calculation formula.
7. A system for calculating user similarity, comprising:
an acquisition module configured to acquire a user-commodity scoring matrix;
the score correction module is set to correct scores in the user-commodity score matrix based on a preset time weight to obtain a new user-commodity score matrix;
the first calculation module is configured to calculate a score difference value of any two users for each common score commodity aiming at the new user-commodity score matrix; and the number of the first and second groups,
classifying the grading difference values and respectively calculating the frequency of the grading difference values of all classes;
a second calculation module configured to calculate an improved information entropy for all category score differences according to the frequency of each category score difference; and the number of the first and second groups,
and calculating the similarity between any two users in the new user-commodity scoring matrix according to the information entropy and a preset similarity calculation method.
8. The computing system of claim 7, wherein the score modification module is specifically configured to:
and (3) correcting the scores in the user-commodity scoring matrix by adopting a formula (1) and a formula (2):
Figure FDA0003082430690000032
Figure FDA0003082430690000033
in the formulae (1) and (2), t (u)i) And t (v)i) Respectively representing the scoring time of the user u and the user v for the commodity i; w is at(ui)、wt(vi) Respectively presetting time weight calculation formulas for a user u and a user v; t (0) represents the earliest scoring time when the user u and the user v score the commodities; alpha represents a time attenuation parameter and reflects the speed of interest change of a user; t represents a time window; u. ofiAnd viRespectively representing the scores of the user u and the user v on the commodity i; u'iAnd v'iRespectively representing the correction scores of the user u and the user v on the commodity i; and i is 1 to n.
9. A computer device characterized by comprising a memory in which a computer program is stored and a processor that executes the user similarity calculation method according to any one of claims 1 to 6 when the processor runs the computer program stored in the memory.
10. A computer-readable storage medium, comprising: computer program, which, when run on a computer, causes the computer to carry out the method of calculating user similarity as claimed in any one of claims 1 to 6.
CN202110570380.2A 2021-05-25 2021-05-25 User similarity calculation method, calculation system, device and storage medium Withdrawn CN113269609A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110570380.2A CN113269609A (en) 2021-05-25 2021-05-25 User similarity calculation method, calculation system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110570380.2A CN113269609A (en) 2021-05-25 2021-05-25 User similarity calculation method, calculation system, device and storage medium

Publications (1)

Publication Number Publication Date
CN113269609A true CN113269609A (en) 2021-08-17

Family

ID=77232725

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110570380.2A Withdrawn CN113269609A (en) 2021-05-25 2021-05-25 User similarity calculation method, calculation system, device and storage medium

Country Status (1)

Country Link
CN (1) CN113269609A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678431A (en) * 2013-03-26 2014-03-26 南京邮电大学 Recommendation method based on standard labels and item grades
CN104935970A (en) * 2015-07-09 2015-09-23 三星电子(中国)研发中心 Method for recommending television content and television client
CN107247753A (en) * 2017-05-27 2017-10-13 深圳大学 A kind of similar users choosing method and device
CN109241203A (en) * 2018-09-27 2019-01-18 天津理工大学 A kind of user preference and distance weighted clustering method of time of fusion factor
CN109408734A (en) * 2018-09-28 2019-03-01 嘉兴学院 A kind of collaborative filtering recommending method of fuse information Entropy conformability degree and dynamic trust

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678431A (en) * 2013-03-26 2014-03-26 南京邮电大学 Recommendation method based on standard labels and item grades
CN104935970A (en) * 2015-07-09 2015-09-23 三星电子(中国)研发中心 Method for recommending television content and television client
CN107247753A (en) * 2017-05-27 2017-10-13 深圳大学 A kind of similar users choosing method and device
CN109241203A (en) * 2018-09-27 2019-01-18 天津理工大学 A kind of user preference and distance weighted clustering method of time of fusion factor
CN109408734A (en) * 2018-09-28 2019-03-01 嘉兴学院 A kind of collaborative filtering recommending method of fuse information Entropy conformability degree and dynamic trust

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘文龙: "基于加权信息熵相似度的协同过滤算法", 《中国优秀硕士学位论文全文数据库》 *

Similar Documents

Publication Publication Date Title
CN109783734B (en) Mixed collaborative filtering recommendation algorithm based on project attributes
US7206780B2 (en) Relevance value for each category of a particular search result in the ranked list is estimated based on its rank and actual relevance values
CN108920503A (en) A kind of micro- video personalized recommendation algorithm based on social networks degree of belief
US8738436B2 (en) Click through rate prediction system and method
CN105787061A (en) Information pushing method
US20100262454A1 (en) System and method for sentiment-based text classification and relevancy ranking
US20150161529A1 (en) Identifying Related Events for Event Ticket Network Systems
CN106021298B (en) A kind of collaborative filtering recommending method and system based on asymmetric Weighted Similarity
CN109635206B (en) Personalized recommendation method and system integrating implicit feedback and user social status
US9830643B2 (en) Adaptive risk-based verification and authentication platform
CN104766219B (en) Based on the user's recommendation list generation method and system in units of list
CN113129053B (en) Information recommendation model training method, information recommendation method and storage medium
CN113191838A (en) Shopping recommendation method and system based on heterogeneous graph neural network
CN111400585A (en) Book recommendation method and device
CN115439139A (en) User interest analysis method based on E-commerce big data
CN111563787A (en) Recommendation system and method based on user comments and scores
Smith Structural breaks in grouped heterogeneity
CN111382265B (en) Searching method, device, equipment and medium
CN113269609A (en) User similarity calculation method, calculation system, device and storage medium
He et al. Understanding Users' Coupon Usage Behaviors in E-Commerce Environments
Liao et al. Accumulative Time Based Ranking Method to Reputation Evaluation in Information Networks
CN114912031A (en) Mixed recommendation method and system based on clustering and collaborative filtering
Priyati et al. The comparison study of matrix factorization on collaborative filtering recommender system
CN106951462A (en) A kind of film based on Time Trust similarities recommends method
CN110825967A (en) Recommendation list re-ranking method for improving diversity of recommendation system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210817

WW01 Invention patent application withdrawn after publication