CN111428145A - Recommendation method and system fusing tag data and naive Bayesian classification - Google Patents
Recommendation method and system fusing tag data and naive Bayesian classification Download PDFInfo
- Publication number
- CN111428145A CN111428145A CN202010194133.2A CN202010194133A CN111428145A CN 111428145 A CN111428145 A CN 111428145A CN 202010194133 A CN202010194133 A CN 202010194133A CN 111428145 A CN111428145 A CN 111428145A
- Authority
- CN
- China
- Prior art keywords
- user
- label
- tag
- users
- recommendation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
- G06F18/24155—Bayesian classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a recommendation method and a recommendation system fusing tag data and naive Bayesian classification. The user label data is used as the attribute characteristics of the user, the user stands at the angle of the user, and the association between the user and the label is established by using the ideas of statistics and probability theory, so that the preference information of the user is more accurately expressed. And the association between the user and the label is combined with a naive Bayes classification algorithm, so as to classify the user and match the category of the new user. In addition, two factors of the expansibility of the label and the time context information are considered, and the influence caused by data sparsity is further reduced. And finally, calculating the average scoring information of the user in the category on the articles to realize Top-N recommendation.
Description
Technical Field
The invention belongs to the field of personalized recommendation, and particularly relates to a recommendation method fusing tag data and naive Bayesian classification.
Background
Recommender systems play a central role in the recommendation of many online applications and products such as e-commerce services, movies, music, and articles. Large companies, such as amazon, eBay, and Netflix, have employed recommendation techniques in their systems to predict potential preferences of customers and recommend related items to users. Recommendation performance has a tremendous impact on the commercial success of these companies in terms of revenue and user satisfaction. There are three types of mainstream recommendation methods: a content-based recommendation method, a collaborative filtering recommendation method, and a hybrid recommendation method. The collaborative filtering recommendation method is the most popular recommendation system design method at present. It uses a large amount of data collected from the user's historical behavior to predict which goods the user will like, and it does not need to analyze the content of the items. Rather, it depends on the relationship between the user and the item.
However, the recommended method of collaborative filtering has a cold start problem. According to different main bodies in the system, the cold start problem can be divided into an article cold start problem and a user cold start problem; according to whether the number of the evaluation records is zero or not, the problems can be divided into a complete cold start problem and a non-complete cold start problem. Cold start is a special case of data sparsity problems. Because the collaborative filtering recommendation method requires a user to have a large amount of rating actions on the item or the item has a large amount of rating data to obtain an effective recommendation, but is not applicable to new users and new items because the new users or new items have little or no rating available in the system. In this case, how to predict the relationship between the item and the user and give an effective recommendation is a very challenging problem.
Disclosure of Invention
The present invention is directed to solving the problems of the prior art. The recommendation method and the recommendation system for fusing the tag data and the naive Bayesian classification are provided, and the problem of user cold start existing in the traditional collaborative filtering algorithm is solved.
The technical scheme of the invention is as follows:
a recommendation method fusing tag data and naive Bayesian classification comprises the following steps:
firstly, setting a threshold value of tag data expansion; the method comprises the steps that user label data are used as attribute features of a user, the user stands at the angle of the user, association between the user and a label is established by using the concepts of statistics and probability theory, the association between the user and the label is combined with a naive Bayes classification algorithm, and then old users are classified, and category matching is carried out on target users;
and finally, calculating average grading information of the user in the category on the article by considering two factors of the expansibility of the label and the time context information, and realizing Top-N recommendation.
Further, the setting of the threshold for tag data expansion specifically includes: setting a threshold value by calculating the similarity between labels, and for the label t, making N (t) be an article set marked with the label t, nt,iIn order to mark the number of users of the label t on the article i, the similarity between the label t and the label t' is calculated by utilizing a cosine similarity formula:
calculating the similarity between the tag t and similar tags thereof according to the formula (1), setting a threshold value y, and if sim (t, t ') > y, the tag t' belongs to the expansion set of the tag t.
Further, the classifying the old users in the system specifically includes: firstly, randomly selecting x active users in a system as classified categories, and then classifying the users by using a label-based naive Bayes classifier, wherein the calculation formula is shown as a formula (2):
wherein P (C) represents the class prior probability of the user class, P (u)iI C) represents the conditional probability of each attribute of user u, C represents the user category, and d represents the number of attributes representing the user.
Further, the classifying the users by using the naive bayes classifier based on the tags specifically comprises the following steps:
(1) firstly, determining class prior probability of user class, defining user class C (u)1,u2,…,ux)∈U(C1,C2,…,Cm) X represents in class Cx users, m representing m user categories;
as shown in equation (3):
(2) and (3) estimating the conditional probability of each attribute, namely the labeled different label, of the user u, as shown in formula (4):
in the formula, SIM (t) represents a tag set in which tag t is extended,label t for representing user u labelkN (t) represents the total number of times user u generates the label tagging behavior, f (τ)k) Representing a time decay function.
Further, the performing category matching on the target user specifically includes: after a target user enters the system, extracting interest labels of the target user, and obtaining the category of the target user according to the steps and the calculation formula for classifying the user by using a naive Bayes classifier based on the labels.
Further, the generating of the recommendation list specifically includes: considering two factors of the expansibility of the tag and the time context information, wherein the expansibility of the tag is to expand the range of the tag by calculating the similarity of the tag, and the calculation is carried out by using a formula (1); temporal context information is obtained by introducing a decay function f (tau)k);
And according to the obtained category of the target user, extracting the average scores of all the users in the category on the articles, taking the scores as the prediction scores of the target user, and recommending the top N items with the highest prediction scores to the users.
A recommendation system that fuses tag data and naive bayes classification, comprising:
a classification module: the method is used for setting a threshold value of tag data expansion, adopting user tag data as attribute characteristics of a user, standing at the angle of the user, establishing association between the user and a tag by using the concepts of statistics and probability theory, and combining the association between the user and the tag with a naive Bayes classification algorithm to further classify old users:
the matching module is used for matching the categories of the target users;
a recommendation module: and finally, calculating average grading information of the user in the category on the article by considering two factors of the expansibility of the label and the time context information, and realizing Top-N recommendation.
The invention has the following advantages and beneficial effects:
the invention establishes the association between the user and the label by adopting the user label data as the attribute characteristics of the user and utilizing the ideas of statistics and probability theory, thereby expressing the preference information of the user more accurately and solving the problem of cold start of the user. And the association between the user and the label is combined with a naive Bayes classification algorithm, so as to classify the user and match the category of the new user. In addition, two factors of the expansibility of the label and the time context information are considered, and the influence caused by data sparsity is further reduced. The method and the device effectively improve the accuracy of recommending the new user.
The invention mainly solves the problem of more accurate recommendation to a new user in the system, and has the innovation points that: 1. The method is more accurate for old users in the system, invalid for new users lacking enough data, the label data can reflect the interest of the users more plurally, and the new users in the system tend to select the interested labels; 2. when category matching is carried out, considering that each user has different linguistic expressions for the same semantic and the interest of the user changes along with time, the labels are preprocessed, namely similar labels are used as a label set, and a time attenuation function is introduced, so that the influence caused by data sparsity is further reduced, and the classification result is more accurate.
Drawings
FIG. 1 is a schematic flow diagram of a preferred embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.
The technical scheme for solving the technical problems is as follows:
as shown in fig. 1, the first step of the present invention: setting a threshold for tag data expansion
The threshold is set by calculating the similarity between the tags. For tag t, let N (t) be the set of items labeled with tag t, nt,iThe number of users who label an item i with a label t. Calculating the similarity between the label t and the label t' by using a cosine similarity formula:
and (3) calculating the similarity between the tag t and similar tags thereof according to the formula (1), selecting a proper threshold value y, and if sim (t, t ') > y, enabling the tag t' to belong to an expansion set of the tag t.
Step two: classifying old users in a system
Randomly selecting x active users in the system as classification categories, classifying the users in the system by using a label-based naive Bayes classifier, firstly determining class prior probability of the user categories, and defining user categories C (u)1,u2,…,ux)∈U(C1,C2,…,Cm) As shown in equation (3):
then, the conditional probability of each attribute (i.e. labeled different labels) of the user u is estimated, as shown in formula (4):
in the formula, SIM (t) represents a tag set after tag expansion,the number of times that the user marks the label is represented, and the total number of times that the user generates the label marking behavior is represented. Representing a time decay function.
Finally, the calculation formula (2) calculates the category:
step three: carrying out category matching on the target user;
and (4) extracting the interest tags of the target users just after the target users enter the system, and then obtaining the categories of the target users according to the calculation method in the step three.
Step four: a recommendation list is generated.
And D, extracting the average scores of all the users in the category on the articles according to the category obtained in the step three, taking the scores as the prediction scores of the target users, and recommending the top N items with the highest prediction scores to the users.
The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.
Claims (7)
1. A recommendation method fusing tag data and naive Bayesian classification is characterized by comprising the following steps:
firstly, setting a threshold value of tag data expansion; the method comprises the steps that user label data are used as attribute features of a user, the user stands at the angle of the user, association between the user and a label is established by using the concepts of statistics and probability theory, the association between the user and the label is combined with a naive Bayes classification algorithm, and then old users are classified, and category matching is carried out on target users;
and finally, calculating average grading information of the user in the category on the article by considering two factors of the expansibility of the label and the time context information, and realizing Top-N recommendation.
2. The recommendation method fusing the label data and the naive Bayes classification as claimed in claim 1, wherein said setting the threshold value of the label data expansion specifically comprises: setting a threshold value by calculating the similarity between labels, and for the label t, making N (t) be an article set marked with the label t, nt,iIn order to mark the number of users of the label t on the article i, the similarity between the label t and the label t' is calculated by utilizing a cosine similarity formula:
calculating the similarity between the tag t and similar tags thereof according to the formula (1), setting a threshold value y, and if sim (t, t ') > y, the tag t' belongs to the expansion set of the tag t.
3. The recommendation method fusing tag data and naive Bayes classification as claimed in claim 1 or 2, wherein said classifying old users in the system specifically comprises: firstly, randomly selecting x active users in a system as classified categories, and then classifying the users by using a label-based naive Bayes classifier, wherein the calculation formula is shown as a formula (2):
wherein P (C) represents the class prior probability of the user class, P (u)i| C) represents the conditional probability of each attribute of user u, C represents the user category, and d represents the number of attributes representing the user。
4. The recommendation method fusing the label data and the naive Bayes classification as claimed in claim 3, wherein said classifying the user by using the label-based naive Bayes classifier specifically comprises the following steps:
(1) firstly, determining class prior probability of user class, defining user class C (u)1,u2,…,ux)∈U(C1,C2,…,Cm) X represents that there are x users in the category C, and m represents that there are m user categories;
as shown in equation (3):
(2) and (3) estimating the conditional probability of each attribute, namely the labeled different label, of the user u, as shown in formula (4):
5. The recommendation method fusing tag data and naive Bayes classification as claimed in claim 4, wherein said matching of categories to target users specifically comprises: after a target user enters the system, extracting interest labels of the target user, and obtaining the category of the target user according to the steps and the calculation formula for classifying the user by using a naive Bayes classifier based on the labels.
6. The method of claim 5A recommendation method fusing tag data and naive Bayesian classification is characterized in that the generation of a recommendation list specifically comprises: considering two factors of the expansibility of the tag and the time context information, wherein the expansibility of the tag is to expand the range of the tag by calculating the similarity of the tag, and the calculation is carried out by using a formula (1); temporal context information is obtained by introducing a decay function f (tau)k);
And according to the obtained category of the target user, extracting the average scores of all the users in the category on the articles, taking the scores as the prediction scores of the target user, and recommending the top N items with the highest prediction scores to the users.
7. A recommendation system fusing tag data and naive Bayesian classification, comprising:
a classification module: the method is used for setting a threshold value of tag data expansion, adopting user tag data as attribute characteristics of a user, standing at the angle of the user, establishing association between the user and a tag by using the concepts of statistics and probability theory, and combining the association between the user and the tag with a naive Bayes classification algorithm to further classify old users:
the matching module is used for matching the categories of the target users;
a recommendation module: and finally, calculating average grading information of the user in the category on the article by considering two factors of the expansibility of the label and the time context information, and realizing Top-N recommendation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010194133.2A CN111428145B (en) | 2020-03-19 | 2020-03-19 | Recommendation method and system fusing tag data and naive Bayesian classification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010194133.2A CN111428145B (en) | 2020-03-19 | 2020-03-19 | Recommendation method and system fusing tag data and naive Bayesian classification |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111428145A true CN111428145A (en) | 2020-07-17 |
CN111428145B CN111428145B (en) | 2022-12-27 |
Family
ID=71549514
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010194133.2A Active CN111428145B (en) | 2020-03-19 | 2020-03-19 | Recommendation method and system fusing tag data and naive Bayesian classification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111428145B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112269942A (en) * | 2020-12-03 | 2021-01-26 | 北京达佳互联信息技术有限公司 | Method, device and system for recommending object and electronic equipment |
CN112561479A (en) * | 2020-12-16 | 2021-03-26 | 中国平安人寿保险股份有限公司 | Enterprise employee increase method and device based on intelligent decision and computer equipment |
CN113032675A (en) * | 2021-03-26 | 2021-06-25 | 李蕊男 | User similarity multi-factor evaluation method in personalized recommendation |
CN116049573A (en) * | 2023-03-28 | 2023-05-02 | 南京邮电大学 | User hierarchical recommendation method for improving collaborative filtering |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102541921A (en) * | 2010-12-24 | 2012-07-04 | 华东师范大学 | Control method and device for recommending resources through tag extension |
CN102968506A (en) * | 2012-12-14 | 2013-03-13 | 北京理工大学 | Personalized collaborative filtering recommendation method based on extension characteristic vectors |
US20130204833A1 (en) * | 2012-02-02 | 2013-08-08 | Bo PANG | Personalized recommendation of user comments |
CN106886872A (en) * | 2017-01-20 | 2017-06-23 | 淮阴工学院 | Method is recommended in a kind of logistics based on cluster and cosine similarity |
CN108334558A (en) * | 2018-01-02 | 2018-07-27 | 南京师范大学 | A kind of collaborative filtering recommending method of combination tag and time factor |
CN108876470A (en) * | 2018-06-29 | 2018-11-23 | 腾讯科技(深圳)有限公司 | Tagging user extended method, computer equipment and storage medium |
CN109840833A (en) * | 2019-02-13 | 2019-06-04 | 苏州大学 | Bayes's collaborative filtering recommending method |
CN110390058A (en) * | 2019-06-28 | 2019-10-29 | 哈尔滨理工大学 | Consider the credible mixed recommendation method of Web service of timeliness |
-
2020
- 2020-03-19 CN CN202010194133.2A patent/CN111428145B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102541921A (en) * | 2010-12-24 | 2012-07-04 | 华东师范大学 | Control method and device for recommending resources through tag extension |
US20130204833A1 (en) * | 2012-02-02 | 2013-08-08 | Bo PANG | Personalized recommendation of user comments |
CN102968506A (en) * | 2012-12-14 | 2013-03-13 | 北京理工大学 | Personalized collaborative filtering recommendation method based on extension characteristic vectors |
CN106886872A (en) * | 2017-01-20 | 2017-06-23 | 淮阴工学院 | Method is recommended in a kind of logistics based on cluster and cosine similarity |
CN108334558A (en) * | 2018-01-02 | 2018-07-27 | 南京师范大学 | A kind of collaborative filtering recommending method of combination tag and time factor |
CN108876470A (en) * | 2018-06-29 | 2018-11-23 | 腾讯科技(深圳)有限公司 | Tagging user extended method, computer equipment and storage medium |
CN109840833A (en) * | 2019-02-13 | 2019-06-04 | 苏州大学 | Bayes's collaborative filtering recommending method |
CN110390058A (en) * | 2019-06-28 | 2019-10-29 | 哈尔滨理工大学 | Consider the credible mixed recommendation method of Web service of timeliness |
Non-Patent Citations (1)
Title |
---|
李欢: "新型协同过滤推荐算法研究", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112269942A (en) * | 2020-12-03 | 2021-01-26 | 北京达佳互联信息技术有限公司 | Method, device and system for recommending object and electronic equipment |
CN112269942B (en) * | 2020-12-03 | 2021-03-23 | 北京达佳互联信息技术有限公司 | Method, device and system for recommending object and electronic equipment |
CN112561479A (en) * | 2020-12-16 | 2021-03-26 | 中国平安人寿保险股份有限公司 | Enterprise employee increase method and device based on intelligent decision and computer equipment |
CN112561479B (en) * | 2020-12-16 | 2023-09-19 | 中国平安人寿保险股份有限公司 | Intelligent decision-making-based enterprise personnel increasing method and device and computer equipment |
CN113032675A (en) * | 2021-03-26 | 2021-06-25 | 李蕊男 | User similarity multi-factor evaluation method in personalized recommendation |
CN116049573A (en) * | 2023-03-28 | 2023-05-02 | 南京邮电大学 | User hierarchical recommendation method for improving collaborative filtering |
CN116049573B (en) * | 2023-03-28 | 2023-09-01 | 南京邮电大学 | User hierarchical recommendation method for improving collaborative filtering |
Also Published As
Publication number | Publication date |
---|---|
CN111428145B (en) | 2022-12-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Choi et al. | Identifying machine learning techniques for classification of target advertising | |
CN111428145B (en) | Recommendation method and system fusing tag data and naive Bayesian classification | |
Dogan et al. | Customer segmentation by using RFM model and clustering methods: a case study in retail industry | |
Sivapalan et al. | Recommender systems in e-commerce | |
CN103246980B (en) | Information output method and server | |
CN105426528A (en) | Retrieving and ordering method and system for commodity data | |
CN102411754A (en) | Personalized recommendation method based on commodity property entropy | |
KR20150023432A (en) | Method and apparatus for inferring user demographics | |
US10970296B2 (en) | System and method for data mining and similarity estimation | |
CN111310038B (en) | Information recommendation method and device, electronic equipment and computer-readable storage medium | |
Liu et al. | Riding the tide of sentiment change: sentiment analysis with evolving online reviews | |
Xu et al. | Intelligent Classification and Personalized Recommendation of E-commerce Products Based on Machine Learning | |
Zheng et al. | A scalable purchase intention prediction system using extreme gradient boosting machines with browsing content entropy | |
CN113744019A (en) | Commodity recommendation method, commodity recommendation device, commodity recommendation equipment and storage medium | |
CN114841760B (en) | Advertisement recommendation management method and system based on audience behavior characteristic analysis | |
Wu et al. | [Retracted] Using the Mathematical Model on Precision Marketing with Online Transaction Data Computing | |
CN108416611B (en) | Supermarket path recommendation system and method thereof | |
CN115131108A (en) | E-commerce commodity screening system | |
Qiu | A predictive model for customer purchase behavior in e-commerce context | |
CN114912031A (en) | Mixed recommendation method and system based on clustering and collaborative filtering | |
Choeh et al. | Personalized Approach for Recommending Useful Product Reviews Based on Information Gain. | |
Jouyandeh et al. | IPARS: An image-based personalized advertisement recommendation system on social networks | |
Christodoulou et al. | Leveraging Natural Language Processing in Persuasive Marketing | |
Liu et al. | Improving Recommendation Accuracy by Considering Electronic Word-of-Mouth and the Effects of Its Propagation Using Collective Matrix Factorization | |
Alkhatib et al. | Image Process Based Recommender System for Social Media Marketing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |