CN108228771A - One kind is based on user tag algorithm - Google Patents
One kind is based on user tag algorithm Download PDFInfo
- Publication number
- CN108228771A CN108228771A CN201711452260.2A CN201711452260A CN108228771A CN 108228771 A CN108228771 A CN 108228771A CN 201711452260 A CN201711452260 A CN 201711452260A CN 108228771 A CN108228771 A CN 108228771A
- Authority
- CN
- China
- Prior art keywords
- user
- labels
- label
- messages
- extracting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 claims abstract description 7
- 238000004458 analytical method Methods 0.000 claims abstract 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims 1
- 238000001914 filtration Methods 0.000 claims 1
- 238000012216 screening Methods 0.000 claims 1
- 239000013589 supplement Substances 0.000 claims 1
- 235000010627 Phaseolus vulgaris Nutrition 0.000 abstract 2
- 244000046052 Phaseolus vulgaris Species 0.000 abstract 2
- 238000004445 quantitative analysis Methods 0.000 abstract 1
- 238000012360 testing method Methods 0.000 description 6
- 238000012549 training Methods 0.000 description 3
- 238000004140 cleaning Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses one kind based on user tag algorithm, by obtaining Herpink platform user information, quantitative analysis is carried out to user information, user information includes the following contents:Number, the bean vermicelli quantity of user and the message of publication of analysis concern user is standard, according to analysis result, corresponding label recommendation method is proposed for the user group of different characteristic, provide better influence power to the user, select concern that is better, preferring that can improve the bigger value of user in this way to object for bean vermicelli.
Description
Technical Field
The invention mainly combines the user label and the user interest, establishes the user interest model and the user attribute description, carries out personalized interest recommendation for the user, and mainly recommends the interested label for the user or recommends the interested user for the user.
Background
Calculating the contact degree between the user and the label in the label of the user with similar or concerned interest points; not all tags present in a point may well reflect the user's real interest. For example, in a user interest recommendation system, a user may reflect his/her own rating for a food large V-tag, such as: i like the dish, looks like the taste, and is a food. However, it is impossible for the system to consider this to be an interest preference of the user because the user marks the tag "dish". Therefore, the degree of contact between the user and the tag needs to be calculated to deduce whether the tag can really describe the interest preference of the user. In a tag system, the less frequently a tag appears in the system, but the more frequently a user uses the tag, the more likely the tag is to describe the user's interest preferences. This feature is just in line with the core idea of the traditional algorithm, so the algorithm is introduced when calculating the contact degree of the user and the label. The method utilizes a clustering method based on similarity to cluster the labels used by the user, and describes the interest of the user by utilizing one type of labels; the method comprises the following specific steps: calculating the similarity between all the labels used by the user; and clustering the labels according to a set threshold value to generate a plurality of label sets capable of describing the interest points of the user. The finally generated overall interest model Hu of the user u can be represented by a k-dimensional vector: hu ═ i (interest1, interest2, …, interest), where k is the number of points of interest of the user and intersti is the weight of the ith point of interest of the user. The weight can simply be considered as the number of tag frequencies contained under the point of interest.
Disclosure of Invention
Different interest characteristics also exist in a certain interest category of the user, and in order to better recommend the user, the user tag contact degree needs to be calculated for each specific interest category. And (3) calculating the user label contact degree by combining the recommendation method provided by the patent and utilizing the TF-IDF concept: finding out the label t which can describe the interest category most under a certain interest category of the user A, namely calculating the degree of relation rel (i, t) between the interest category in and the label t, and the steps are as follows:
according to the idea of the TF-IDF method, the degree rel (i, t) of the interest of the user i and the label is calculated, and is defined as follows:
TAGS, which represents all label sets under a certain interest category of a user;
i: a set representing user interests;
rel (i, t) indicates the number of times item i is marked with tag t.
The formula is shown in formula (1), rel (i, t) ═ TF (i, t) × IDF (t) (1)
Wherein,
formula (2) shows that under a certain interest category of the user, the frequency of using the tags is higher, and the numerical value is higher to show that the frequency of using the tags t and the user interest i is higher.
Drawings
Fig. 1 is an architectural intent based on a user tag algorithm according to an exemplary embodiment of the present application.
Detailed Description
1. Processing the data set, cleaning special characters in the label data, such as characters of a question mark, a double quotation mark and the like, and keeping the readability of the label data; in order to reduce the sparseness of data, more labels are selected for users, users pay more attention or leave messages more, therefore, users with fewer labels are filtered, the number of messages is less than 20, the number of messages is not more than 20, and the users are called inactive users, and the inactive users are filtered. Similar labels for the predicted target users are then employed. Generating a specified label and user comment data set according to a certain format according to the behavior record of the system to each user; processing the generated data set according to a certain requirement, and cutting the data set into M parts according to a required rule, wherein M-1 part is used as a training set, and the rest is used as a test set; training a recommendation algorithm on M-1 training sets, testing on the testing sets, and respectively selecting different testing sets to perform M times of tests; and obtaining a prediction result on each test set through a well-defined evaluation index algorithm, and finally taking the average value of M times as a final prediction result.
Claims (2)
1. A user tag based algorithm, comprising the steps of:
(1) acquiring corresponding data such as UID (user identification), real name or nickname of a user, user label, gender, number of fan of the user, attention number, message number, creation time and basic attributes of the user according to the user information;
(2) the number of people paying attention to the user, the number of fans of the user and issued messages, which are used for extracting data by adopting a data processing tool Python, are taken as standards;
(3) performing characteristic analysis on user data, and if the number of messages is less than 20 and the number of messages is not more than 20, filtering out inactive users;
(4) and finally, according to the characteristics and the interests of the user represented by the user attendees, extracting the labels of the user attendees as original labels, extracting potential labels according to the published messages, adding the potential labels into candidate labels as supplements, screening the labels with higher frequency, and recommending the results with higher frequency to the user.
2. The method of claim 1, wherein the specific algorithm comprises the following steps:
(1) extracting all concerned labels of the user;
(2) performing weight calculation on all collected labels, wherein the weight of each label is the number of times that the label appears in all the labels;
(3) and performing recommendation ranking on all the labels according to the occurrence times, and giving recommendation results according to ranking results from high to low.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711452260.2A CN108228771A (en) | 2017-12-26 | 2017-12-26 | One kind is based on user tag algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711452260.2A CN108228771A (en) | 2017-12-26 | 2017-12-26 | One kind is based on user tag algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108228771A true CN108228771A (en) | 2018-06-29 |
Family
ID=62648184
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711452260.2A Pending CN108228771A (en) | 2017-12-26 | 2017-12-26 | One kind is based on user tag algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108228771A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108427769A (en) * | 2018-03-29 | 2018-08-21 | 苏州大学 | A kind of personage's interest tags extracting method based on social networks |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102929892A (en) * | 2011-08-12 | 2013-02-13 | 莫润刚 | Accurate information promoting system and method based on social network |
CN103377262A (en) * | 2012-04-28 | 2013-10-30 | 国际商业机器公司 | Method and device for grouping users |
US20150149469A1 (en) * | 2012-06-14 | 2015-05-28 | Nokia Corporation | Methods and apparatus for associating interest tags with media items based on social diffusions among users |
-
2017
- 2017-12-26 CN CN201711452260.2A patent/CN108228771A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102929892A (en) * | 2011-08-12 | 2013-02-13 | 莫润刚 | Accurate information promoting system and method based on social network |
CN103377262A (en) * | 2012-04-28 | 2013-10-30 | 国际商业机器公司 | Method and device for grouping users |
US20150149469A1 (en) * | 2012-06-14 | 2015-05-28 | Nokia Corporation | Methods and apparatus for associating interest tags with media items based on social diffusions among users |
Non-Patent Citations (1)
Title |
---|
陈渊 等: "一种面向微博用户的标签推荐方法", 《智能计算机与应用》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108427769A (en) * | 2018-03-29 | 2018-08-21 | 苏州大学 | A kind of personage's interest tags extracting method based on social networks |
CN108427769B (en) * | 2018-03-29 | 2021-10-08 | 苏州大学 | Character interest tag extraction method based on social network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Mostafa | Clustering halal food consumers: A Twitter sentiment analysis | |
CN107424043B (en) | Product recommendation method and device and electronic equipment | |
CN109299994B (en) | Recommendation method, device, equipment and readable storage medium | |
US10198635B2 (en) | Systems and methods for associating an image with a business venue by using visually-relevant and business-aware semantics | |
CN107346496B (en) | Target user orientation method and device | |
CN108346075B (en) | Information recommendation method and device | |
CN104281622A (en) | Information recommending method and information recommending device in social media | |
US20210264463A1 (en) | Creating Meta-Descriptors of Marketing Messages to Facilitate In Delivery Performance Analysis, Delivery Performance Prediction and Offer Selection | |
US20190220902A1 (en) | Information analysis apparatus, information analysis method, and information analysis program | |
CN108805598B (en) | Similarity information determination method, server and computer-readable storage medium | |
CN107291755B (en) | Terminal pushing method and device | |
Mukherjee et al. | Read what you need: Controllable aspect-based opinion summarization of tourist reviews | |
JP7210958B2 (en) | Product recommendation device and program | |
CN113256397A (en) | Commodity recommendation method and system based on big data and computer-readable storage medium | |
KR102593996B1 (en) | Apparatus and Method For Providing Recommendation Service Through Social Media Activity Analysis | |
JP2013246747A (en) | Program and campaign management device | |
KR20160107079A (en) | User customized product recommendation apparatus and method based on web activity of users | |
CN109522487A (en) | A kind of dining room personalized recommendation method based on comment | |
CN108228771A (en) | One kind is based on user tag algorithm | |
KR102078541B1 (en) | Issue interest based news value evaluation apparatus and method, storage media storing the same | |
KR101754124B1 (en) | The restaurant recommending system and the recommending method thereof | |
CN111651590A (en) | Data processing method and device, electronic equipment and storage medium | |
JP6696270B2 (en) | Information providing server device, program and information providing method | |
CN111178934B (en) | Method and device for acquiring target object | |
CN114519100A (en) | Catering data analysis method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180629 |