CN108228771A - One kind is based on user tag algorithm - Google Patents

One kind is based on user tag algorithm Download PDF

Info

Publication number
CN108228771A
CN108228771A CN201711452260.2A CN201711452260A CN108228771A CN 108228771 A CN108228771 A CN 108228771A CN 201711452260 A CN201711452260 A CN 201711452260A CN 108228771 A CN108228771 A CN 108228771A
Authority
CN
China
Prior art keywords
user
labels
label
messages
extracting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711452260.2A
Other languages
Chinese (zh)
Inventor
万迅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ai Pink Technology (wuhan) Ltd By Share Ltd
Original Assignee
Ai Pink Technology (wuhan) Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ai Pink Technology (wuhan) Ltd By Share Ltd filed Critical Ai Pink Technology (wuhan) Ltd By Share Ltd
Priority to CN201711452260.2A priority Critical patent/CN108228771A/en
Publication of CN108228771A publication Critical patent/CN108228771A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses one kind based on user tag algorithm, by obtaining Herpink platform user information, quantitative analysis is carried out to user information, user information includes the following contents:Number, the bean vermicelli quantity of user and the message of publication of analysis concern user is standard, according to analysis result, corresponding label recommendation method is proposed for the user group of different characteristic, provide better influence power to the user, select concern that is better, preferring that can improve the bigger value of user in this way to object for bean vermicelli.

Description

User label algorithm
Technical Field
The invention mainly combines the user label and the user interest, establishes the user interest model and the user attribute description, carries out personalized interest recommendation for the user, and mainly recommends the interested label for the user or recommends the interested user for the user.
Background
Calculating the contact degree between the user and the label in the label of the user with similar or concerned interest points; not all tags present in a point may well reflect the user's real interest. For example, in a user interest recommendation system, a user may reflect his/her own rating for a food large V-tag, such as: i like the dish, looks like the taste, and is a food. However, it is impossible for the system to consider this to be an interest preference of the user because the user marks the tag "dish". Therefore, the degree of contact between the user and the tag needs to be calculated to deduce whether the tag can really describe the interest preference of the user. In a tag system, the less frequently a tag appears in the system, but the more frequently a user uses the tag, the more likely the tag is to describe the user's interest preferences. This feature is just in line with the core idea of the traditional algorithm, so the algorithm is introduced when calculating the contact degree of the user and the label. The method utilizes a clustering method based on similarity to cluster the labels used by the user, and describes the interest of the user by utilizing one type of labels; the method comprises the following specific steps: calculating the similarity between all the labels used by the user; and clustering the labels according to a set threshold value to generate a plurality of label sets capable of describing the interest points of the user. The finally generated overall interest model Hu of the user u can be represented by a k-dimensional vector: hu ═ i (interest1, interest2, …, interest), where k is the number of points of interest of the user and intersti is the weight of the ith point of interest of the user. The weight can simply be considered as the number of tag frequencies contained under the point of interest.
Disclosure of Invention
Different interest characteristics also exist in a certain interest category of the user, and in order to better recommend the user, the user tag contact degree needs to be calculated for each specific interest category. And (3) calculating the user label contact degree by combining the recommendation method provided by the patent and utilizing the TF-IDF concept: finding out the label t which can describe the interest category most under a certain interest category of the user A, namely calculating the degree of relation rel (i, t) between the interest category in and the label t, and the steps are as follows:
according to the idea of the TF-IDF method, the degree rel (i, t) of the interest of the user i and the label is calculated, and is defined as follows:
TAGS, which represents all label sets under a certain interest category of a user;
i: a set representing user interests;
rel (i, t) indicates the number of times item i is marked with tag t.
The formula is shown in formula (1), rel (i, t) ═ TF (i, t) × IDF (t) (1)
Wherein,
formula (2) shows that under a certain interest category of the user, the frequency of using the tags is higher, and the numerical value is higher to show that the frequency of using the tags t and the user interest i is higher.
Drawings
Fig. 1 is an architectural intent based on a user tag algorithm according to an exemplary embodiment of the present application.
Detailed Description
1. Processing the data set, cleaning special characters in the label data, such as characters of a question mark, a double quotation mark and the like, and keeping the readability of the label data; in order to reduce the sparseness of data, more labels are selected for users, users pay more attention or leave messages more, therefore, users with fewer labels are filtered, the number of messages is less than 20, the number of messages is not more than 20, and the users are called inactive users, and the inactive users are filtered. Similar labels for the predicted target users are then employed. Generating a specified label and user comment data set according to a certain format according to the behavior record of the system to each user; processing the generated data set according to a certain requirement, and cutting the data set into M parts according to a required rule, wherein M-1 part is used as a training set, and the rest is used as a test set; training a recommendation algorithm on M-1 training sets, testing on the testing sets, and respectively selecting different testing sets to perform M times of tests; and obtaining a prediction result on each test set through a well-defined evaluation index algorithm, and finally taking the average value of M times as a final prediction result.

Claims (2)

1. A user tag based algorithm, comprising the steps of:
(1) acquiring corresponding data such as UID (user identification), real name or nickname of a user, user label, gender, number of fan of the user, attention number, message number, creation time and basic attributes of the user according to the user information;
(2) the number of people paying attention to the user, the number of fans of the user and issued messages, which are used for extracting data by adopting a data processing tool Python, are taken as standards;
(3) performing characteristic analysis on user data, and if the number of messages is less than 20 and the number of messages is not more than 20, filtering out inactive users;
(4) and finally, according to the characteristics and the interests of the user represented by the user attendees, extracting the labels of the user attendees as original labels, extracting potential labels according to the published messages, adding the potential labels into candidate labels as supplements, screening the labels with higher frequency, and recommending the results with higher frequency to the user.
2. The method of claim 1, wherein the specific algorithm comprises the following steps:
(1) extracting all concerned labels of the user;
(2) performing weight calculation on all collected labels, wherein the weight of each label is the number of times that the label appears in all the labels;
(3) and performing recommendation ranking on all the labels according to the occurrence times, and giving recommendation results according to ranking results from high to low.
CN201711452260.2A 2017-12-26 2017-12-26 One kind is based on user tag algorithm Pending CN108228771A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711452260.2A CN108228771A (en) 2017-12-26 2017-12-26 One kind is based on user tag algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711452260.2A CN108228771A (en) 2017-12-26 2017-12-26 One kind is based on user tag algorithm

Publications (1)

Publication Number Publication Date
CN108228771A true CN108228771A (en) 2018-06-29

Family

ID=62648184

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711452260.2A Pending CN108228771A (en) 2017-12-26 2017-12-26 One kind is based on user tag algorithm

Country Status (1)

Country Link
CN (1) CN108228771A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427769A (en) * 2018-03-29 2018-08-21 苏州大学 A kind of personage's interest tags extracting method based on social networks

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929892A (en) * 2011-08-12 2013-02-13 莫润刚 Accurate information promoting system and method based on social network
CN103377262A (en) * 2012-04-28 2013-10-30 国际商业机器公司 Method and device for grouping users
US20150149469A1 (en) * 2012-06-14 2015-05-28 Nokia Corporation Methods and apparatus for associating interest tags with media items based on social diffusions among users

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929892A (en) * 2011-08-12 2013-02-13 莫润刚 Accurate information promoting system and method based on social network
CN103377262A (en) * 2012-04-28 2013-10-30 国际商业机器公司 Method and device for grouping users
US20150149469A1 (en) * 2012-06-14 2015-05-28 Nokia Corporation Methods and apparatus for associating interest tags with media items based on social diffusions among users

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈渊 等: "一种面向微博用户的标签推荐方法", 《智能计算机与应用》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427769A (en) * 2018-03-29 2018-08-21 苏州大学 A kind of personage's interest tags extracting method based on social networks
CN108427769B (en) * 2018-03-29 2021-10-08 苏州大学 Character interest tag extraction method based on social network

Similar Documents

Publication Publication Date Title
Mostafa Clustering halal food consumers: A Twitter sentiment analysis
CN107424043B (en) Product recommendation method and device and electronic equipment
CN109299994B (en) Recommendation method, device, equipment and readable storage medium
US10198635B2 (en) Systems and methods for associating an image with a business venue by using visually-relevant and business-aware semantics
CN107346496B (en) Target user orientation method and device
CN108346075B (en) Information recommendation method and device
CN104281622A (en) Information recommending method and information recommending device in social media
US20210264463A1 (en) Creating Meta-Descriptors of Marketing Messages to Facilitate In Delivery Performance Analysis, Delivery Performance Prediction and Offer Selection
US20190220902A1 (en) Information analysis apparatus, information analysis method, and information analysis program
CN108805598B (en) Similarity information determination method, server and computer-readable storage medium
CN107291755B (en) Terminal pushing method and device
Mukherjee et al. Read what you need: Controllable aspect-based opinion summarization of tourist reviews
JP7210958B2 (en) Product recommendation device and program
CN113256397A (en) Commodity recommendation method and system based on big data and computer-readable storage medium
KR102593996B1 (en) Apparatus and Method For Providing Recommendation Service Through Social Media Activity Analysis
JP2013246747A (en) Program and campaign management device
KR20160107079A (en) User customized product recommendation apparatus and method based on web activity of users
CN109522487A (en) A kind of dining room personalized recommendation method based on comment
CN108228771A (en) One kind is based on user tag algorithm
KR102078541B1 (en) Issue interest based news value evaluation apparatus and method, storage media storing the same
KR101754124B1 (en) The restaurant recommending system and the recommending method thereof
CN111651590A (en) Data processing method and device, electronic equipment and storage medium
JP6696270B2 (en) Information providing server device, program and information providing method
CN111178934B (en) Method and device for acquiring target object
CN114519100A (en) Catering data analysis method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180629