CN108228771A

CN108228771A - One kind is based on user tag algorithm

Info

Publication number: CN108228771A
Application number: CN201711452260.2A
Authority: CN
Inventors: 万迅
Original assignee: Ai Pink Technology (wuhan) Ltd By Share Ltd
Current assignee: Ai Pink Technology (wuhan) Ltd By Share Ltd
Priority date: 2017-12-26
Filing date: 2017-12-26
Publication date: 2018-06-29

Abstract

The invention discloses one kind based on user tag algorithm, by obtaining Herpink platform user information, quantitative analysis is carried out to user information, user information includes the following contents：Number, the bean vermicelli quantity of user and the message of publication of analysis concern user is standard, according to analysis result, corresponding label recommendation method is proposed for the user group of different characteristic, provide better influence power to the user, select concern that is better, preferring that can improve the bigger value of user in this way to object for bean vermicelli.

Description

User label algorithm

Technical Field

The invention mainly combines the user label and the user interest, establishes the user interest model and the user attribute description, carries out personalized interest recommendation for the user, and mainly recommends the interested label for the user or recommends the interested user for the user.

Background

Calculating the contact degree between the user and the label in the label of the user with similar or concerned interest points; not all tags present in a point may well reflect the user's real interest. For example, in a user interest recommendation system, a user may reflect his/her own rating for a food large V-tag, such as: i like the dish, looks like the taste, and is a food. However, it is impossible for the system to consider this to be an interest preference of the user because the user marks the tag "dish". Therefore, the degree of contact between the user and the tag needs to be calculated to deduce whether the tag can really describe the interest preference of the user. In a tag system, the less frequently a tag appears in the system, but the more frequently a user uses the tag, the more likely the tag is to describe the user's interest preferences. This feature is just in line with the core idea of the traditional algorithm, so the algorithm is introduced when calculating the contact degree of the user and the label. The method utilizes a clustering method based on similarity to cluster the labels used by the user, and describes the interest of the user by utilizing one type of labels; the method comprises the following specific steps: calculating the similarity between all the labels used by the user; and clustering the labels according to a set threshold value to generate a plurality of label sets capable of describing the interest points of the user. The finally generated overall interest model Hu of the user u can be represented by a k-dimensional vector: hu ═ i (interest1, interest2, …, interest), where k is the number of points of interest of the user and intersti is the weight of the ith point of interest of the user. The weight can simply be considered as the number of tag frequencies contained under the point of interest.

Disclosure of Invention

Different interest characteristics also exist in a certain interest category of the user, and in order to better recommend the user, the user tag contact degree needs to be calculated for each specific interest category. And (3) calculating the user label contact degree by combining the recommendation method provided by the patent and utilizing the TF-IDF concept: finding out the label t which can describe the interest category most under a certain interest category of the user A, namely calculating the degree of relation rel (i, t) between the interest category in and the label t, and the steps are as follows:

according to the idea of the TF-IDF method, the degree rel (i, t) of the interest of the user i and the label is calculated, and is defined as follows:

TAGS, which represents all label sets under a certain interest category of a user;

i: a set representing user interests;

rel (i, t) indicates the number of times item i is marked with tag t.

The formula is shown in formula (1), rel (i, t) ═ TF (i, t) × IDF (t) (1)

Wherein,

formula (2) shows that under a certain interest category of the user, the frequency of using the tags is higher, and the numerical value is higher to show that the frequency of using the tags t and the user interest i is higher.

Drawings

Fig. 1 is an architectural intent based on a user tag algorithm according to an exemplary embodiment of the present application.

Detailed Description

1. Processing the data set, cleaning special characters in the label data, such as characters of a question mark, a double quotation mark and the like, and keeping the readability of the label data; in order to reduce the sparseness of data, more labels are selected for users, users pay more attention or leave messages more, therefore, users with fewer labels are filtered, the number of messages is less than 20, the number of messages is not more than 20, and the users are called inactive users, and the inactive users are filtered. Similar labels for the predicted target users are then employed. Generating a specified label and user comment data set according to a certain format according to the behavior record of the system to each user; processing the generated data set according to a certain requirement, and cutting the data set into M parts according to a required rule, wherein M-1 part is used as a training set, and the rest is used as a test set; training a recommendation algorithm on M-1 training sets, testing on the testing sets, and respectively selecting different testing sets to perform M times of tests; and obtaining a prediction result on each test set through a well-defined evaluation index algorithm, and finally taking the average value of M times as a final prediction result.

Claims

1. A user tag based algorithm, comprising the steps of:

(1) acquiring corresponding data such as UID (user identification), real name or nickname of a user, user label, gender, number of fan of the user, attention number, message number, creation time and basic attributes of the user according to the user information;

(2) the number of people paying attention to the user, the number of fans of the user and issued messages, which are used for extracting data by adopting a data processing tool Python, are taken as standards;

(3) performing characteristic analysis on user data, and if the number of messages is less than 20 and the number of messages is not more than 20, filtering out inactive users;

(4) and finally, according to the characteristics and the interests of the user represented by the user attendees, extracting the labels of the user attendees as original labels, extracting potential labels according to the published messages, adding the potential labels into candidate labels as supplements, screening the labels with higher frequency, and recommending the results with higher frequency to the user.

2. The method of claim 1, wherein the specific algorithm comprises the following steps:

(1) extracting all concerned labels of the user;

(2) performing weight calculation on all collected labels, wherein the weight of each label is the number of times that the label appears in all the labels;

(3) and performing recommendation ranking on all the labels according to the occurrence times, and giving recommendation results according to ranking results from high to low.