CN111428145A - Recommendation method and system fusing tag data and naive Bayesian classification - Google Patents

Recommendation method and system fusing tag data and naive Bayesian classification Download PDF

Info

Publication number
CN111428145A
CN111428145A CN202010194133.2A CN202010194133A CN111428145A CN 111428145 A CN111428145 A CN 111428145A CN 202010194133 A CN202010194133 A CN 202010194133A CN 111428145 A CN111428145 A CN 111428145A
Authority
CN
China
Prior art keywords
user
label
tag
users
recommendation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010194133.2A
Other languages
Chinese (zh)
Other versions
CN111428145B (en
Inventor
何登平
何泽灵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Information Technology Designing Co ltd
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing Information Technology Designing Co ltd
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Information Technology Designing Co ltd, Chongqing University of Post and Telecommunications filed Critical Chongqing Information Technology Designing Co ltd
Priority to CN202010194133.2A priority Critical patent/CN111428145B/en
Publication of CN111428145A publication Critical patent/CN111428145A/en
Application granted granted Critical
Publication of CN111428145B publication Critical patent/CN111428145B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a recommendation method and a recommendation system fusing tag data and naive Bayesian classification. The user label data is used as the attribute characteristics of the user, the user stands at the angle of the user, and the association between the user and the label is established by using the ideas of statistics and probability theory, so that the preference information of the user is more accurately expressed. And the association between the user and the label is combined with a naive Bayes classification algorithm, so as to classify the user and match the category of the new user. In addition, two factors of the expansibility of the label and the time context information are considered, and the influence caused by data sparsity is further reduced. And finally, calculating the average scoring information of the user in the category on the articles to realize Top-N recommendation.

Description

Recommendation method and system fusing tag data and naive Bayesian classification
Technical Field
The invention belongs to the field of personalized recommendation, and particularly relates to a recommendation method fusing tag data and naive Bayesian classification.
Background
Recommender systems play a central role in the recommendation of many online applications and products such as e-commerce services, movies, music, and articles. Large companies, such as amazon, eBay, and Netflix, have employed recommendation techniques in their systems to predict potential preferences of customers and recommend related items to users. Recommendation performance has a tremendous impact on the commercial success of these companies in terms of revenue and user satisfaction. There are three types of mainstream recommendation methods: a content-based recommendation method, a collaborative filtering recommendation method, and a hybrid recommendation method. The collaborative filtering recommendation method is the most popular recommendation system design method at present. It uses a large amount of data collected from the user's historical behavior to predict which goods the user will like, and it does not need to analyze the content of the items. Rather, it depends on the relationship between the user and the item.
However, the recommended method of collaborative filtering has a cold start problem. According to different main bodies in the system, the cold start problem can be divided into an article cold start problem and a user cold start problem; according to whether the number of the evaluation records is zero or not, the problems can be divided into a complete cold start problem and a non-complete cold start problem. Cold start is a special case of data sparsity problems. Because the collaborative filtering recommendation method requires a user to have a large amount of rating actions on the item or the item has a large amount of rating data to obtain an effective recommendation, but is not applicable to new users and new items because the new users or new items have little or no rating available in the system. In this case, how to predict the relationship between the item and the user and give an effective recommendation is a very challenging problem.
Disclosure of Invention
The present invention is directed to solving the problems of the prior art. The recommendation method and the recommendation system for fusing the tag data and the naive Bayesian classification are provided, and the problem of user cold start existing in the traditional collaborative filtering algorithm is solved.
The technical scheme of the invention is as follows:
a recommendation method fusing tag data and naive Bayesian classification comprises the following steps:
firstly, setting a threshold value of tag data expansion; the method comprises the steps that user label data are used as attribute features of a user, the user stands at the angle of the user, association between the user and a label is established by using the concepts of statistics and probability theory, the association between the user and the label is combined with a naive Bayes classification algorithm, and then old users are classified, and category matching is carried out on target users;
and finally, calculating average grading information of the user in the category on the article by considering two factors of the expansibility of the label and the time context information, and realizing Top-N recommendation.
Further, the setting of the threshold for tag data expansion specifically includes: setting a threshold value by calculating the similarity between labels, and for the label t, making N (t) be an article set marked with the label t, nt,iIn order to mark the number of users of the label t on the article i, the similarity between the label t and the label t' is calculated by utilizing a cosine similarity formula:
Figure RE-GDA0002487764750000021
calculating the similarity between the tag t and similar tags thereof according to the formula (1), setting a threshold value y, and if sim (t, t ') > y, the tag t' belongs to the expansion set of the tag t.
Further, the classifying the old users in the system specifically includes: firstly, randomly selecting x active users in a system as classified categories, and then classifying the users by using a label-based naive Bayes classifier, wherein the calculation formula is shown as a formula (2):
Figure RE-GDA0002487764750000022
wherein P (C) represents the class prior probability of the user class, P (u)iI C) represents the conditional probability of each attribute of user u, C represents the user category, and d represents the number of attributes representing the user.
Further, the classifying the users by using the naive bayes classifier based on the tags specifically comprises the following steps:
(1) firstly, determining class prior probability of user class, defining user class C (u)1,u2,…,ux)∈U(C1,C2,…,Cm) X represents in class Cx users, m representing m user categories;
as shown in equation (3):
Figure RE-GDA0002487764750000031
(2) and (3) estimating the conditional probability of each attribute, namely the labeled different label, of the user u, as shown in formula (4):
Figure RE-GDA0002487764750000032
in the formula, SIM (t) represents a tag set in which tag t is extended,
Figure RE-GDA0002487764750000033
label t for representing user u labelkN (t) represents the total number of times user u generates the label tagging behavior, f (τ)k) Representing a time decay function.
Further, the performing category matching on the target user specifically includes: after a target user enters the system, extracting interest labels of the target user, and obtaining the category of the target user according to the steps and the calculation formula for classifying the user by using a naive Bayes classifier based on the labels.
Further, the generating of the recommendation list specifically includes: considering two factors of the expansibility of the tag and the time context information, wherein the expansibility of the tag is to expand the range of the tag by calculating the similarity of the tag, and the calculation is carried out by using a formula (1); temporal context information is obtained by introducing a decay function f (tau)k);
And according to the obtained category of the target user, extracting the average scores of all the users in the category on the articles, taking the scores as the prediction scores of the target user, and recommending the top N items with the highest prediction scores to the users.
A recommendation system that fuses tag data and naive bayes classification, comprising:
a classification module: the method is used for setting a threshold value of tag data expansion, adopting user tag data as attribute characteristics of a user, standing at the angle of the user, establishing association between the user and a tag by using the concepts of statistics and probability theory, and combining the association between the user and the tag with a naive Bayes classification algorithm to further classify old users:
the matching module is used for matching the categories of the target users;
a recommendation module: and finally, calculating average grading information of the user in the category on the article by considering two factors of the expansibility of the label and the time context information, and realizing Top-N recommendation.
The invention has the following advantages and beneficial effects:
the invention establishes the association between the user and the label by adopting the user label data as the attribute characteristics of the user and utilizing the ideas of statistics and probability theory, thereby expressing the preference information of the user more accurately and solving the problem of cold start of the user. And the association between the user and the label is combined with a naive Bayes classification algorithm, so as to classify the user and match the category of the new user. In addition, two factors of the expansibility of the label and the time context information are considered, and the influence caused by data sparsity is further reduced. The method and the device effectively improve the accuracy of recommending the new user.
The invention mainly solves the problem of more accurate recommendation to a new user in the system, and has the innovation points that: 1. The method is more accurate for old users in the system, invalid for new users lacking enough data, the label data can reflect the interest of the users more plurally, and the new users in the system tend to select the interested labels; 2. when category matching is carried out, considering that each user has different linguistic expressions for the same semantic and the interest of the user changes along with time, the labels are preprocessed, namely similar labels are used as a label set, and a time attenuation function is introduced, so that the influence caused by data sparsity is further reduced, and the classification result is more accurate.
Drawings
FIG. 1 is a schematic flow diagram of a preferred embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.
The technical scheme for solving the technical problems is as follows:
as shown in fig. 1, the first step of the present invention: setting a threshold for tag data expansion
The threshold is set by calculating the similarity between the tags. For tag t, let N (t) be the set of items labeled with tag t, nt,iThe number of users who label an item i with a label t. Calculating the similarity between the label t and the label t' by using a cosine similarity formula:
Figure RE-GDA0002487764750000051
and (3) calculating the similarity between the tag t and similar tags thereof according to the formula (1), selecting a proper threshold value y, and if sim (t, t ') > y, enabling the tag t' to belong to an expansion set of the tag t.
Step two: classifying old users in a system
Randomly selecting x active users in the system as classification categories, classifying the users in the system by using a label-based naive Bayes classifier, firstly determining class prior probability of the user categories, and defining user categories C (u)1,u2,…,ux)∈U(C1,C2,…,Cm) As shown in equation (3):
Figure RE-GDA0002487764750000052
then, the conditional probability of each attribute (i.e. labeled different labels) of the user u is estimated, as shown in formula (4):
Figure RE-GDA0002487764750000053
in the formula, SIM (t) represents a tag set after tag expansion,
Figure RE-GDA0002487764750000054
the number of times that the user marks the label is represented, and the total number of times that the user generates the label marking behavior is represented. Representing a time decay function.
Finally, the calculation formula (2) calculates the category:
Figure RE-GDA0002487764750000055
step three: carrying out category matching on the target user;
and (4) extracting the interest tags of the target users just after the target users enter the system, and then obtaining the categories of the target users according to the calculation method in the step three.
Step four: a recommendation list is generated.
And D, extracting the average scores of all the users in the category on the articles according to the category obtained in the step three, taking the scores as the prediction scores of the target users, and recommending the top N items with the highest prediction scores to the users.
The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims (7)

1. A recommendation method fusing tag data and naive Bayesian classification is characterized by comprising the following steps:
firstly, setting a threshold value of tag data expansion; the method comprises the steps that user label data are used as attribute features of a user, the user stands at the angle of the user, association between the user and a label is established by using the concepts of statistics and probability theory, the association between the user and the label is combined with a naive Bayes classification algorithm, and then old users are classified, and category matching is carried out on target users;
and finally, calculating average grading information of the user in the category on the article by considering two factors of the expansibility of the label and the time context information, and realizing Top-N recommendation.
2. The recommendation method fusing the label data and the naive Bayes classification as claimed in claim 1, wherein said setting the threshold value of the label data expansion specifically comprises: setting a threshold value by calculating the similarity between labels, and for the label t, making N (t) be an article set marked with the label t, nt,iIn order to mark the number of users of the label t on the article i, the similarity between the label t and the label t' is calculated by utilizing a cosine similarity formula:
Figure RE-FDA0002487764740000011
calculating the similarity between the tag t and similar tags thereof according to the formula (1), setting a threshold value y, and if sim (t, t ') > y, the tag t' belongs to the expansion set of the tag t.
3. The recommendation method fusing tag data and naive Bayes classification as claimed in claim 1 or 2, wherein said classifying old users in the system specifically comprises: firstly, randomly selecting x active users in a system as classified categories, and then classifying the users by using a label-based naive Bayes classifier, wherein the calculation formula is shown as a formula (2):
Figure RE-FDA0002487764740000012
wherein P (C) represents the class prior probability of the user class, P (u)i| C) represents the conditional probability of each attribute of user u, C represents the user category, and d represents the number of attributes representing the user。
4. The recommendation method fusing the label data and the naive Bayes classification as claimed in claim 3, wherein said classifying the user by using the label-based naive Bayes classifier specifically comprises the following steps:
(1) firstly, determining class prior probability of user class, defining user class C (u)1,u2,…,ux)∈U(C1,C2,…,Cm) X represents that there are x users in the category C, and m represents that there are m user categories;
as shown in equation (3):
Figure RE-FDA0002487764740000021
(2) and (3) estimating the conditional probability of each attribute, namely the labeled different label, of the user u, as shown in formula (4):
Figure RE-FDA0002487764740000022
in the formula, SIM (t) represents a tag set in which tag t is extended,
Figure RE-FDA0002487764740000023
label t for representing user u labelkN (t) represents the total number of times user u generates the label tagging behavior, f (τ)k) Representing a time decay function.
5. The recommendation method fusing tag data and naive Bayes classification as claimed in claim 4, wherein said matching of categories to target users specifically comprises: after a target user enters the system, extracting interest labels of the target user, and obtaining the category of the target user according to the steps and the calculation formula for classifying the user by using a naive Bayes classifier based on the labels.
6. The method of claim 5A recommendation method fusing tag data and naive Bayesian classification is characterized in that the generation of a recommendation list specifically comprises: considering two factors of the expansibility of the tag and the time context information, wherein the expansibility of the tag is to expand the range of the tag by calculating the similarity of the tag, and the calculation is carried out by using a formula (1); temporal context information is obtained by introducing a decay function f (tau)k);
And according to the obtained category of the target user, extracting the average scores of all the users in the category on the articles, taking the scores as the prediction scores of the target user, and recommending the top N items with the highest prediction scores to the users.
7. A recommendation system fusing tag data and naive Bayesian classification, comprising:
a classification module: the method is used for setting a threshold value of tag data expansion, adopting user tag data as attribute characteristics of a user, standing at the angle of the user, establishing association between the user and a tag by using the concepts of statistics and probability theory, and combining the association between the user and the tag with a naive Bayes classification algorithm to further classify old users:
the matching module is used for matching the categories of the target users;
a recommendation module: and finally, calculating average grading information of the user in the category on the article by considering two factors of the expansibility of the label and the time context information, and realizing Top-N recommendation.
CN202010194133.2A 2020-03-19 2020-03-19 Recommendation method and system fusing tag data and naive Bayesian classification Active CN111428145B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010194133.2A CN111428145B (en) 2020-03-19 2020-03-19 Recommendation method and system fusing tag data and naive Bayesian classification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010194133.2A CN111428145B (en) 2020-03-19 2020-03-19 Recommendation method and system fusing tag data and naive Bayesian classification

Publications (2)

Publication Number Publication Date
CN111428145A true CN111428145A (en) 2020-07-17
CN111428145B CN111428145B (en) 2022-12-27

Family

ID=71549514

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010194133.2A Active CN111428145B (en) 2020-03-19 2020-03-19 Recommendation method and system fusing tag data and naive Bayesian classification

Country Status (1)

Country Link
CN (1) CN111428145B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112269942A (en) * 2020-12-03 2021-01-26 北京达佳互联信息技术有限公司 Method, device and system for recommending object and electronic equipment
CN112561479A (en) * 2020-12-16 2021-03-26 中国平安人寿保险股份有限公司 Enterprise employee increase method and device based on intelligent decision and computer equipment
CN113032675A (en) * 2021-03-26 2021-06-25 李蕊男 User similarity multi-factor evaluation method in personalized recommendation
CN116049573A (en) * 2023-03-28 2023-05-02 南京邮电大学 User hierarchical recommendation method for improving collaborative filtering

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541921A (en) * 2010-12-24 2012-07-04 华东师范大学 Control method and device for recommending resources through tag extension
CN102968506A (en) * 2012-12-14 2013-03-13 北京理工大学 Personalized collaborative filtering recommendation method based on extension characteristic vectors
US20130204833A1 (en) * 2012-02-02 2013-08-08 Bo PANG Personalized recommendation of user comments
CN106886872A (en) * 2017-01-20 2017-06-23 淮阴工学院 Method is recommended in a kind of logistics based on cluster and cosine similarity
CN108334558A (en) * 2018-01-02 2018-07-27 南京师范大学 A kind of collaborative filtering recommending method of combination tag and time factor
CN108876470A (en) * 2018-06-29 2018-11-23 腾讯科技(深圳)有限公司 Tagging user extended method, computer equipment and storage medium
CN109840833A (en) * 2019-02-13 2019-06-04 苏州大学 Bayes's collaborative filtering recommending method
CN110390058A (en) * 2019-06-28 2019-10-29 哈尔滨理工大学 Consider the credible mixed recommendation method of Web service of timeliness

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541921A (en) * 2010-12-24 2012-07-04 华东师范大学 Control method and device for recommending resources through tag extension
US20130204833A1 (en) * 2012-02-02 2013-08-08 Bo PANG Personalized recommendation of user comments
CN102968506A (en) * 2012-12-14 2013-03-13 北京理工大学 Personalized collaborative filtering recommendation method based on extension characteristic vectors
CN106886872A (en) * 2017-01-20 2017-06-23 淮阴工学院 Method is recommended in a kind of logistics based on cluster and cosine similarity
CN108334558A (en) * 2018-01-02 2018-07-27 南京师范大学 A kind of collaborative filtering recommending method of combination tag and time factor
CN108876470A (en) * 2018-06-29 2018-11-23 腾讯科技(深圳)有限公司 Tagging user extended method, computer equipment and storage medium
CN109840833A (en) * 2019-02-13 2019-06-04 苏州大学 Bayes's collaborative filtering recommending method
CN110390058A (en) * 2019-06-28 2019-10-29 哈尔滨理工大学 Consider the credible mixed recommendation method of Web service of timeliness

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李欢: "新型协同过滤推荐算法研究", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112269942A (en) * 2020-12-03 2021-01-26 北京达佳互联信息技术有限公司 Method, device and system for recommending object and electronic equipment
CN112269942B (en) * 2020-12-03 2021-03-23 北京达佳互联信息技术有限公司 Method, device and system for recommending object and electronic equipment
CN112561479A (en) * 2020-12-16 2021-03-26 中国平安人寿保险股份有限公司 Enterprise employee increase method and device based on intelligent decision and computer equipment
CN112561479B (en) * 2020-12-16 2023-09-19 中国平安人寿保险股份有限公司 Intelligent decision-making-based enterprise personnel increasing method and device and computer equipment
CN113032675A (en) * 2021-03-26 2021-06-25 李蕊男 User similarity multi-factor evaluation method in personalized recommendation
CN116049573A (en) * 2023-03-28 2023-05-02 南京邮电大学 User hierarchical recommendation method for improving collaborative filtering
CN116049573B (en) * 2023-03-28 2023-09-01 南京邮电大学 User hierarchical recommendation method for improving collaborative filtering

Also Published As

Publication number Publication date
CN111428145B (en) 2022-12-27

Similar Documents

Publication Publication Date Title
Choi et al. Identifying machine learning techniques for classification of target advertising
CN111428145B (en) Recommendation method and system fusing tag data and naive Bayesian classification
Dogan et al. Customer segmentation by using RFM model and clustering methods: a case study in retail industry
Sivapalan et al. Recommender systems in e-commerce
CN103246980B (en) Information output method and server
CN105426528A (en) Retrieving and ordering method and system for commodity data
CN102411754A (en) Personalized recommendation method based on commodity property entropy
KR20150023432A (en) Method and apparatus for inferring user demographics
US10970296B2 (en) System and method for data mining and similarity estimation
CN111310038B (en) Information recommendation method and device, electronic equipment and computer-readable storage medium
Liu et al. Riding the tide of sentiment change: sentiment analysis with evolving online reviews
Xu et al. Intelligent Classification and Personalized Recommendation of E-commerce Products Based on Machine Learning
Zheng et al. A scalable purchase intention prediction system using extreme gradient boosting machines with browsing content entropy
CN113744019A (en) Commodity recommendation method, commodity recommendation device, commodity recommendation equipment and storage medium
CN114841760B (en) Advertisement recommendation management method and system based on audience behavior characteristic analysis
Wu et al. [Retracted] Using the Mathematical Model on Precision Marketing with Online Transaction Data Computing
CN108416611B (en) Supermarket path recommendation system and method thereof
CN115131108A (en) E-commerce commodity screening system
Qiu A predictive model for customer purchase behavior in e-commerce context
CN114912031A (en) Mixed recommendation method and system based on clustering and collaborative filtering
Choeh et al. Personalized Approach for Recommending Useful Product Reviews Based on Information Gain.
Jouyandeh et al. IPARS: An image-based personalized advertisement recommendation system on social networks
Christodoulou et al. Leveraging Natural Language Processing in Persuasive Marketing
Liu et al. Improving Recommendation Accuracy by Considering Electronic Word-of-Mouth and the Effects of Its Propagation Using Collective Matrix Factorization
Alkhatib et al. Image Process Based Recommender System for Social Media Marketing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant