CN102262653A

CN102262653A - Label recommendation method and system based on user motivation orientation

Info

Publication number: CN102262653A
Application number: CN 201110154353
Authority: CN
Inventors: 李瑞轩; 靳延安; 文坤梅; 辜希武; 李玉华
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2011-06-09
Filing date: 2011-06-09
Publication date: 2011-11-30
Anticipated expiration: 2031-06-09
Also published as: CN102262653B

Abstract

The invention provides a label recommendation method based on a user motivation orientation. The method provided by the invention comprises the following steps of: calculating a user motivation orientation, a motivation orientation of each labeled resource and a motivation orientation of a resource to be labeled according to a user triple; selecting a resource which is similar with the motivation orientation of the resource to be labeled from the labeled resources to obtain a non-user-depended similar resource; selecting a resource which is similar with the user motivation orientation from the non-user-depended similar resource to obtain a label recommendation candidate resource; combining all labels in the label recommendation candidate resource to obtain a combined label set; calculating a recommendation importance of each label in the combined label set; finally, carrying out the label recommendation according to the recommendation importance of each label from big to small. The method provided by the invention can recognize the calculating of the network information resource labeled by the user and then recommends a list which accords with the user intention and is composed of multiple labels to the user. The invention also provides a label recommendation system based on the method.

Description

Label recommendation method and system based on user motivation tendency

Technical Field

The invention belongs to the field of Web information resource processing and utilization, and particularly relates to a method for recommending a label for a Web information resource based on user motivation tendency and a recommendation system based on the method.

Background

With the increasing development of the Internet, the network information resources are growing at a speed which is hard to imagine, and the appearance of web2.0 makes the growth more rapid. In Web2.0, the internet system is changed from the original top-down centralized control and dominance of a few resource controllers to the bottom-up collective intelligence and strength dominance of a large number of users. The user is not only the browser of the network information resource, but also the producer of the network information resource. Although the web2.0 user created content enriches the sources of information and accelerates the diffusion of information, the web2.0 user created content also causes problems of information overload, search load increase, information quality reduction, and the like. How users can organize and manage the vast amount of network information resources covered by the sky delicately and how to obtain suitable and high-quality information quickly, inexpensively and effectively becomes an insurmountable major research topic in front of us.

The ideal network information resource organization should take users as the center, fully utilize emerging technologies and experience accumulated by people, and the organization system should have higher practicability and usability. In the web2.0 environment, social tag systems are playing an important role as a very effective method for organizing network information resources. As an organizational approach, unlike traditional top-down, rigid, controlled hierarchical classification systems, the social label system has three advantages: (1) the social labels are generated when network information resource users label network resources, and the same social labels form a new classification after being collected, and the classification is from bottom to top; (2) the social label is not controlled by experts, users can label by using any words, the social label has extremely high flexibility, usability and subjective cognition, and network resources can be 'flexible' and belong to a plurality of popular classifications. (3) In a social tagging system, a user can label network resources from multiple dimensions, multiple levels. Thus, its structure is non-hierarchical.

However, while having many advantages, the label method also has disadvantages, which are mainly expressed in the following two aspects: (1) most social label systems allow a user to input labels by himself, and the operation mode enables the user to easily control labeling behaviors, but the randomness of labeling also causes more noise, misspelling, ambiguity and user-defined labels without practical significance to be filled in the labels, so that the practicability of the labels is not hindered. For this reason, some social tagging systems have to give some guiding principles specifically to the user. (2) The data sparseness problem is that the tag type browsing is a new information organization mode, and has not yet been widely applied, especially in the chinese resources, the network resources adopting the organization mode are very rare, and on the other hand, users are not used to add a large number of tags to the network resources, so that the existing tag resources on the network are very rare.

In recent years, it is with this practical demand that tag recommendation technology has received a great deal of attention from academia and internet enterprises. The label recommendation is to provide a series of high-quality labels as candidates for the network information resources to be labeled through investigating, analyzing and mining the content of the network information resources and the labeling history, explicit or implicit relation of the user. The recommended purposes are mainly as follows: (1) the labeling program is simplified, and the use by a user is facilitated, so that the usability and the viscosity of the social label system are increased. (2) The quality of the label is improved, the situations of misspelling, ambiguity and the like are reduced, and the functions of the label in information resource organization, retrieval, utilization and discovery are improved. (3) The structure of the label space is changed, so that the label space is more quickly stabilized and converged, and further the semantics are developed.

At present, some developed social label recommendation systems for various network information resources exist at home and abroad, and the systems play very important roles in the aspects of information resource organization, retrieval, sharing, discovery and the like. These systems include: amazon for performing label recommendation on commodities, Delcious for performing label recommendation on webpage resources, Flickr for performing label recommendation on pictures, Bibsonomy for performing label recommendation on academic papers, a bean-lobe net for recommending labels for books and movies, a potato net for providing recommended labels for video sharing, and the like. The existing label recommendation system mainly adopts the traditional technology of recommending commodities in an electronic commerce system, and mainly comprises the following steps: content-based recommendation techniques, collaborative filtering-based recommendation techniques, association rule-based recommendation techniques, and hybrids of these techniques. On the basis of recommendation, the traditional recommendation technologies recommend based on the content of the resource or based on the historical result marked by the user. Most of the recommended algorithms are algorithms using data mining or machine learning. The traditional label recommendation technologies solve the problems of information overload and organization, classification and retrieval of information resources to a certain extent, but are not ideal in effect, and especially cannot recommend labels meeting the information requirements of users.

Disclosure of Invention

In order to meet the information requirements of users, and from the motivation of users to use social label systems, the information targets of the users are identified, and more accurate social labels are recommended for the users. The invention also provides a label recommendation system based on the method.

The invention is realized by adopting the following technical scheme: the invention provides a label recommendation method based on user motivation tendency, which comprises the following steps:

(1) calculating the motivation tendency of the user, the motivation tendency of each labeled resource and the motivation tendency of the resource to be labeled according to the user triples; the user triple comprises the labeling history of the user, the labeled resources and the corresponding labels as well as the resources to be labeled and the corresponding labels;

(2) selecting resources with similar motivation tendentiousness to resources to be labeled from the labeled resources, and calling the obtained resources as non-user-dependent similar resources;

(3) selecting resources similar to the motivation tendency of the user from the non-user-dependent similar resources, and calling the obtained resources as label recommendation candidate resources;

(4) merging all labels in the label recommendation candidate resources to obtain a merged label set;

(5) calculating the recommendation importance of each label in the combined label set;

(6) and recommending the labels according to the recommendation importance of each label from large to small.

The invention also provides a label recommendation system based on the user motivation tendency, which comprises a motivation tendency calculation module, a module for selecting non-user-dependent similar resources, a module for selecting label recommendation candidate resources, a label combination module, a recommendation importance calculation module and an output module;

the motivation tendency calculation module is used for calculating motivation tendency of the user, motivation tendency of each marked resource and motivation tendency of the resource to be marked;

the non-user-dependent similar resource selecting module is used for selecting resources similar to the motivation tendency of the resources to be labeled from the labeled resources to obtain the non-user-dependent similar resources;

the label recommendation candidate resource selection module is used for selecting resources similar to the motivation tendency of the user from the non-user-dependent similar resources to obtain label recommendation candidate resources;

the label merging module is used for merging all labels in the label recommendation candidate resources to obtain a merged label set;

the recommendation importance calculating module is used for calculating the recommendation importance of each label in the combined label set;

the output module is used for recommending the labels according to the recommendation importance of each label from large to small.

The method for recommending the label in the conventional social label system has the main points that the content of the resource or the co-occurrence structure of the label is the same, and the like, and the method provided by the invention directly starts from the relatively stable labeling motivation tendency of the user, obtains the labeling motivation tendency of the user, and recommends the label according to the labeling motivation tendency, so that the recommended label better conforms to the intention of the user, and the recommendation effect is better. The method can identify the motivation of the user for marking the network information resource, the motivation discovery provides good design reference for a design label recommendation system, and can generate guidance function for learning of the body in the label space, thereby being more beneficial to the stability of the social label structure and the semantic emergence of the social label.

Drawings

FIG. 1 is a flow of tag recommendation based on user motivational tendencies;

FIG. 2 is a schematic diagram of a special tag usage query in accordance with the present invention;

FIG. 3 is a tag cloud depicting motivational users according to the present invention;

FIG. 4 is a block diagram of a tag recommendation system according to the present invention.

Detailed Description

The present invention is described in further detail below with reference to the attached drawings and examples.

There are two main categories of motivational tendency described in the present invention, namely, categorical motivational tendency and descriptive motivational tendency, and their characteristics are shown in table 1.

TABLE 1 categorizes motivational trends and characteristics describing motivational trends

	Tendency of classification motivation	Description of the inventionTendency of motivation
			Purpose(s) to	Facilitating later browsing	Facilitating future queries and retrieval
Resource label rate	Is low in	Height of
			Word list size	Limited by	Infinite number of elements
Occurrence of synonyms	Chinese character shao (a Chinese character of 'shao')	Multiple purpose
			Tags from resource titles	Chinese character shao (a Chinese character of 'shao')	Multiple purpose
Changing the cost of a tag	Big (a)	Small

Specifically, the purpose of using tags by a tagging user with a tendency of classification motivation is to provide a browsing aid function for tagged resources. Therefore, users with a tendency to sort motivations may wish to create a stable vocabulary according to their preferences. For easy browsing, the word list is simpler, less redundant and better, so that the labeling user can avoid using words with the same semantics and select words which are easy to understand and remember. For example, when labeling a car, often only "car" will be present in the vocabulary of users with a tendency to sort motives, and words such as "automobile", "vehicle", etc. having the same meaning will not be used. Thus, from the results of the labeling, such a vocabulary more closely resembles a semantic classification system. Of course, as with conventional classification systems, the cost of modifying the classification system is relatively large.

The purpose of the label used by the labeling user with the description motivation tendency is to accurately describe the content of the labeled resource for later query and retrieval. To better support the query and browsing purposes, the user's vocabulary may introduce many unusual and synonymous words, for example, "car", "automatic", and "vehicle" may appear in the vocabulary when describing the car. In addition, users often wish to describe resources in a number of ways, not limiting the number of words used. It is also possible that words of the same meaning may change as cognition progresses during the annotation process. Thus, the vocabulary describing the motivational tendency of the user is an open, dynamic vocabulary.

In the invention, the following symbols are adopted to mark relevant parameters in the social label system, U represents a user, R represents a resource, such as a webpage, U represents a set of all users U marking the resource R, R represents a set of all marked resources of the user, and R represents_uRepresents all resources that user u has tagged, | R_uI represents the set R_uThe number of labels in, t represents any one label, t₁，t₂，…，t_nAll represent a specific label, T represents a label set given to all the labeled resources of the user, T_uRepresents the set of all tags, T, used by user u_uI represents the set T_uNumber of labels in, T_rAll tags, T, representing all users assigned to resource r_rI represents the set T_rThe number of tags in (a); r_u(t) represents the resource that user u has tagged with tag t.

The invention provides a label recommendation method based on user motivation tendency, which comprises the following steps:

(1) calculating the motivation tendency of the user, the motivation tendency of each labeled resource and the motivation tendency of the resource to be labeled according to the user triples; the user triple comprises the labeling history of the user, the labeled resources and the corresponding labels as well as the resources to be labeled and the corresponding labels; and the label corresponding to the resource comprises the label of all users to the resource.

The motivational tendency of user u may be given by vector M_uRepresents, i.e.:

M_u＝(TRR_u，LFTU_u，TRCE_u，TSOF_u，STR_u) (1)

wherein, TRR_u，LFTU_u，TRCE_u，TSOF_u，STR_uThe meaning and calculation of 5 measures for motivational tendency are as follows:

a) user's labeled resource average label rate (Tags/Resources Ratio, TRR)

User's labeled resource average label rate TRR_uThe average number of tags used by the user to label each resource is measured as the ratio of the size of the user's vocabulary to the total number of resources labeled by the user, as shown in equation (2).

TRR_u＝e-|T_u|/|R_u| (2)

Users who describe motivation tendencies may select various words to describe resources for descriptive purposes, and are not limited in number by theory. Users with a propensity to sort incentives tend to select fewer words to label resources for browsing purposes. Thus, its vocabulary is limited. Generally, users who classify motivational tendencies score significantly lower in this feature metric than users who describe motivational tendencies. That is, the TRR of the user_uThe smaller the value, the more likely the user is to be classified; TRR_uThe larger the value, the more likely the user is to be descriptive.

b) User Low Frequency Tag usage (LFTR)

To account for the use of tags, LFTR is used_uTo delineate the extent to which the user uses those low frequency tags, which is equal to the proportion of the number of low frequency tags to the total number of tags in the user's vocabulary. The low frequency label is an infrequently used label marked for a few resources by a user, that is, a label marked for a few times is used. Its metric is calculated using equation (3):

where t represents any label, t_maxFor the most frequently used tags by the user, | R (t) |, | R (t)_max) I is respectively the inclusion label t and the most frequent label t_maxN is the most frequent label t_maxThe integer of p-th of the resource number is obtained, p is more than 0 and less than or equal to 100,

the number of resources marked by the set of low-frequency tags for user u, i.e. the tags contained in the set, is no more than n,

the number of the low frequency tags is the user u.

It is apparent that 0. ltoreq. LFTR_uLess than or equal to 1. When LFTR_uWhen 1, all tags representing the user are used no more than n times, the tags all describe the resource from different corners, different sides, and the user does not mind using low frequency words. Of course, the user may be considered to have a descriptive motivational tendency. When LFTR_uWhen the value is 0, it means that the user rarely uses the low frequency tag, and the low frequency tag is considered to be unfavorable for the classified browsing. If a low frequency tag is introduced, equal to leadNoise, which destroys the possibility of maintaining consistent classification, makes the user very interested in using low frequency words. Of course, the user may be considered to have a tendency to sort motivations.

c) Relative Conditional Entropy (TRCE) of each label of the user

For users with a tendency to sort motivations, they want the tags to have the greatest degree of discrimination, and only so efficiently browse. Therefore, the process of selecting a tag by the user is compared with the process of encoding information. The information encoding is to maximize the information entropy of the code, and the user selection of a distinguishing or useful label is to maximize the information entropy of the label. In other words, users with a classification tendency want all tags to be the same in use frequency, so that it is possible to maximize the information entropy of the tags, and thus facilitate browsing of the users. Conversely, users describing motivational tendencies are not interested in this.

When the user uses the tag to encode the resource, its conditional entropy H_u(R | T) reflects the effectiveness of this encoding process and can be calculated according to equation (4):

<math> <mrow> <msub> <mi>H</mi> <mi>u</mi> </msub> <mrow> <mo>(</mo> <mi>R</mi> <mo>|</mo> <mi>T</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <munder> <mi>Σ</mi> <mrow> <mi>r</mi> <mo>&Element;</mo> <mi>R</mi> </mrow> </munder> <munder> <mi>Σ</mi> <mrow> <mi>t</mi> <mo>&Element;</mo> <mi>T</mi> </mrow> </munder> <mi>p</mi> <mrow> <mo>(</mo> <mi>r</mi> <mo>,</mo> <mi>t</mi> <mo>)</mo> </mrow> <msub> <mi>log</mi> <mn>2</mn> </msub> <mi>p</mi> <mrow> <mo>(</mo> <mi>r</mi> <mo>|</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow> </math>

wherein p (r, t) is the joint distribution probability of the label t on the resource r, and p (r | t) is the probability of labeling the resource r by the label t.

In the calculation of the label information entropy, labels and resources are used as random variables. Conditional entropy can be interpreted as the uncertainty of the resource retention tag, mainly affected by the number of resources and the size of the vocabulary. In order to distinguish the difference between users, the conditional entropy is normalized to reserve the coding information, so that the conditional entropy H of the real observation_u(R | T) is equal to ideal conditional entropy H_opt(R | T) for comparison. When each label has the same discrimination, i.e. p (r | t) of all labels is the same, the ideal conditional entropy H can be obtained_opt(R | T), where the conditional entropy is also at a maximum. Therefore, the conditional entropy in the ideal case can be used as a normalization factor, and on the basis of the normalization factor, the relative conditional entropy of the label is calculated as the formula (5)

{TRCE}_{u} = \frac{H_{opt} (R | T) - H_{u} (R | T)}{H_{opt} (R | T)} - - - (5)

Obviously, 0. ltoreq. TRCE_uLess than or equal to 1. When TRCE_uThe closer to 0, the closer to the ideal condition the conditional entropy of the user is, and the more equal probability distribution the label tends to. In this case, the labels have a strong ability to distinguish, and it can be determined that the user is likely to belong to a user with a tendency to sort motivations. Instead, it is likely to belong to a user who describes motivational tendencies.

d) User's Tag Semantic repeat Factor (Tag Semantic overlay Factor, TSOF)

For users with a tendency to have a classification motivation, it is desirable that the synonyms in the vocabulary of the users are as few as possible, so that the browsing efficiency can be improved. But for a user who has a tendency to describe motivation, they do not care about this, but rather, can describe resources more fully as well. Therefore, the motivational tendency of the user can be measured by calculating the similarity of the labels used by the user, as shown in formula (6).

Wherein, sim (t)_i，t_j) Is two labels t_i，t_jThe similarity between the two is calculated by the formula (7). In the formula f (t)_i)，f(t_j) Are respectively a label t_i，t_jNumber of times the tab set of user u appears, f (t)_i，t_j) The number of times the tab sets co-occur at the user.

sim (t_{i}, t_{j}) = \frac{\max (\log f (t_{i}), \log f (t_{j})) - lo gf (t_{i}, t_{j})}{\log N - \min (\log f (t_{i}), \log f (t_{j}))} - - - (7)

Where N is the total number of words in the user's tagset.

TSOF when the similarity of all tags of a user approaches 0_uClose to 0, the motivational tendency of the user is indicated as classification tendency; otherwise, the motivation of the user is explainedTendentiousness is a descriptive tendency.

e) User's Special tag usage rate (STR)

Through the statistical analysis of the social label system labels, the following results are found: when users are labeled with question adverbs such as where, what, how, and how, the rest of label users are often selected from the titles of the resources, and through comparison analysis, the intention of the users describing the page content is very obvious (as shown in fig. 2). These query adverb labels are defined as special labels. Meanwhile, as can be seen from fig. 3, the motivations of these users to label other resources also tend to be descriptive motivations, for example, other labeling records of the user "breneaux" in the first record obviously contain other features (such as tag semantic repeat factors) of descriptive motivation tendencies.

Therefore, the usage rate of the special words when labeling the resources can also be used as one of the determination indexes of the user motivation tendency. If a user has a higher usage rate of a particular word, he has a tendency to describe the motivation. Conversely, the lower the descriptive motivational tendency he has. The usage of special tags is measured using equation (8).

STR_u＝card(t∈T_str)/|T_u| (8)

Wherein, T_strWhere { who, when, what, when, where, how, … } is a special set of labels, it can be set as a set of all query adverbs in english. card (T e T)_str) Inclusion in T for use by user u_strThe number of tags in (1), including the repeat count. It is apparent that 0. ltoreq. STR_uWhen STR is less than or equal to 1, when STR is equal to_u＝card(t∈T_str)/|T_uThe closer | is to 1, the more likely user u is to have a tendency to describe motivation; if STR_uThe closer to 0, the more likely user u has a propensity to sort motives.

The motivational tendency of each labeled resource can also be represented by a vector of 5 motivational metrics; for a resource r, its motivational tendency may be given by a vector M_rRepresents, i.e.:

M_r＝(TRR_r，LFTU_r，TRCE_r，TSOF_r，STR_r) (9)

wherein, TRR_r，LFTU_r，TRCE_r，TSOF_r，STR_rThe meaning and calculation of 5 metrics for motivational tendency of resources are as follows:

a) average label rate per user for tagged Resources (Tags/Resources Ratio, TRR)

Per user average label rate TRR for tagged resources_rThe method is used for measuring the number of the tags used by each user of the labeled resource on average, and is equal to the ratio of the number of all the tags used by the resource to the number of the users labeling the resource, as shown in formula (10).

TRR_r＝e-|T_r|/|U_r| (10)

Wherein, | T_rI represents the number of all tags given to resource r by all users, | U_rL represents the number of all users who have tagged resource r. Generally, the resources that classify motivational trends score significantly lower in the feature metric than the resources that describe the motivational trends. That is, the TRR of the resource_rThe smaller the value, the more likely the resource is to be classified; TRR_rThe larger the value, the more likely the resource is to be described.

b) Low Frequency Tag usage of tagged resources (LFTR)

To account for the use of tags, LFTR is used_rThe usage degree of the low-frequency labels assigned to the resource is characterized, and the usage degree is equal to the proportion of the number of the low-frequency labels to the total number of the labels in all the word lists assigned to the resource. The so-called low frequency tag is a tag that is not often assigned to the resource. Its metric is calculated using equation (11):

where t represents any label of the resource r, t_max' the label that is the most frequently used resource R, | R_u(t) | is the number of users using the label t, | R_u(t_max') | is the most frequent label t used_max' the number of users, m is the most frequently used tag t_max' q is a integer which is one q of the number of users, q is more than 0 and less than or equal to 100,

a set of low frequency tags for resource r, i.e., the number of users using tags in the set is m or less,

the number of low frequency tags of resource r.

c) Relative Conditional Entropy (TRCE) of each label of a tagged resource

For resources that are trended by a classification incentive, the labels assigned to them should have the greatest degree of discrimination, and only then is it most efficient when the user is browsing. Thus, the process of tagging resources may be compared to the process of encoding information. The most effective information coding is to maximize the information entropy of the code, and selecting the label with the discrimination degree is to maximize the information entropy of the label. When the user uses the tag to encode the resource, its conditional entropy H_rThe validity of this encoding process is reflected in (U | T), which can be calculated according to equation (12):

<math> <mrow> <msub> <mi>H</mi> <mi>r</mi> </msub> <mrow> <mo>(</mo> <mi>U</mi> <mo>|</mo> <mi>T</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <munder> <mi>Σ</mi> <mrow> <mi>u</mi> <mo>&Element;</mo> <mi>U</mi> </mrow> </munder> <munder> <mi>Σ</mi> <mrow> <mi>t</mi> <mo>&Element;</mo> <mi>T</mi> </mrow> </munder> <mi>p</mi> <msub> <mrow> <mrow> <mo>(</mo> <mi>u</mi> <mo>,</mo> <mi>t</mi> <mo>)</mo> </mrow> <mi>log</mi> </mrow> <mn>2</mn> </msub> <mi>p</mi> <mrow> <mo>(</mo> <mi>u</mi> <mo>|</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>12</mn> <mo>)</mo> </mrow> </mrow> </math>

wherein p (u, t) is the joint distribution probability of the user u using the label t, and p (u | t) is the probability of the user u using the label t to label the resource r. When each label has the same discrimination, i.e. p (u | t) of all labels is the same, the ideal conditional entropy H can be obtained_ropt(U | T), where the conditional entropy is also at a maximum.

In the calculation of the label information entropy, the label and the user are used as random variables. Conditional entropy can be interpreted as the uncertainty of the user's use of the tag, mainly influenced by the number of users and the size of the vocabulary. In order to distinguish the difference between the resources, the conditional entropy is normalized to reserve the coding information, so that the conditional entropy H of the real observation_r(U | T) is equal to ideal conditional entropy H_ropt(UT) for comparison. Therefore, the conditional entropy in the ideal case can be used as a normalization factor, and on the basis of the normalization factor, the relative conditional entropy of the label is calculated as the formula (13)

{TRCE}_{r} = \frac{H_{ropt} (U | T) - H_{r} (U | T)}{H_{ropt} (U | T)} - - - (13)

d) Tagged Semantic repeat Factor for tagged resources (Tag Semantic overlay Factor, TSOF)

For resources with a tendency of classification motivation, synonyms in the vocabulary of the resources are as few as possible, so that the browsing efficiency can be improved. But for a resource that has a tendency to describe motivation, the opposite is true if the tags are able to describe the resource more fully. Therefore, the motivational tendency of the resource can be measured by calculating the similarity of the labels assigned to the resource, as shown in equation (14).

Wherein, sim (t)_i，t_j) ' is two labels t_i，t_jThe similarity between them is calculated by the formula (15). In the formula f (t)_i)’，f(t_j) ' are respectively a label t_i，t_jThe number of occurrences of the tagset in resource r, f (t)_i，t_j) ' is two labels t_i，t_jThe number of times the set of tags in resource r co-occur.

Where N' is the total number of words in the tagset for resource r.

e) Special tag usage of tagged resources (STR)

The definition of the special label of the resource is the same as that of the special label of the user. The usage rate of the special words can also be used as one of the judgment indexes of the labeling motivation tendency of the resources. If a resource is given a higher usage of a particular word, it has a tendency to describe motivation. Conversely, the higher the classification motivational tendency he has. The specific tag usage is measured by equation (16).

STR_r＝card(t∈T_str)′/|T_r| (16)

Wherein, T_strWhere { who, when, what, when, where, how, … } is a special set of labels, it can be set as a set of all query adverbs in english. card (T e T)_str) ' Inclusion in T used for resource r_strThe number of tags in (1), including the repeat count. It is apparent that 0. ltoreq. STR_rWhen STR is less than or equal to 1, when STR is equal to_r＝card(t∈T_str)/|T_rThe closer | is to 1, the more likely resource r is to have a tendency to describe motivation; if STR_rThe closer to 0, the more likely the resource r is to have a classification motivation tendency.

Also, resources to be annotated

Move the machineThe tendency can also be expressed as a vector of 5 motivational metrics

Namely:

M_{\hat{r}} = ({TRR}_{\hat{r}}, {LFTU}_{\hat{r}}, {TRCE}_{\hat{r}}, {TSOF}_{\hat{r}}, {STR}_{\hat{r}}) - - - (17)

wherein,

5 measures for the motivational tendency of the resource, and the calculation of the 5 measures is the same as the calculation method of the motivational tendency of the labeled resource.

(2) Selecting resources with similar motivation tendentiousness to resources to be labeled from the labeled resources to obtain similar resources which are not depended by the user; namely, calculating the similarity of motive tendencies of each labeled resource and resource to be labeled, selecting the labeled resources with the similarity larger than a threshold value alpha, and synthesizing the selected resources into the resources independent of usersSimilar resource R_sim；

Specifically, when a resource is labeled, a user often selects a label that has been used before to label a new resource. So that resources similar to the resources to be labeled are found, and the matching degree of the resources and the current motivation tendency of the user is calculated. The label of the resource with high matching degree is used as a recommended candidate set, and the label meeting the user intention can be obtained. The tendency similarity between the resource to be labeled and the labeled resource of the user can be calculated by a vector space-based cosine method, such as formula (18).

<math> <mrow> <msub> <mi>sim</mi> <mrow> <mi>r</mi> <mo>&Element;</mo> <msub> <mi>R</mi> <mi>u</mi> </msub> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>M</mi> <mi>r</mi> </msub> <mo>,</mo> <msub> <mi>M</mi> <mover> <mi>r</mi> <mo>^</mo> </mover> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>M</mi> <mi>r</mi> </msub> <mo>·</mo> <msub> <mi>M</mi> <mover> <mi>r</mi> <mo>^</mo> </mover> </msub> </mrow> <mrow> <mo>|</mo> <msub> <mi>M</mi> <mi>r</mi> </msub> <mo>|</mo> <mo>|</mo> <msub> <mi>M</mi> <mover> <mi>r</mi> <mo>^</mo> </mover> </msub> <mo>|</mo> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>18</mn> <mo>)</mo> </mrow> </mrow> </math>

Wherein M is_rMotivational tendencies for resources that have been labeled for a userThe representation of the vector is carried out,

is represented by a vector of motivational tendencies of the resource to be annotated. Setting a threshold value alpha as a control factor, and if the similarity is greater than or equal to the threshold value alpha (alpha is 0 to 1, which can be measured by experiments and is suggested to be set to 0.6), this means that the similarity degree of the motivation tendencies of the user labeled resources and the resources to be labeled is very high. Combining these labeled groups of very similar degree, using R_simRepresenting combined, non-user-dependent, similar resources, i.e.

The similarity can also be calculated by adopting methods such as mutual information, Pearson similarity and the like.

(3) Relying on similar resources R in non-users_simSelecting resources similar to the motivation tendency of the user to obtain label recommendation candidate resources R_cad；

The non-user-dependent similar resource R is calculated by adopting a formula (19)_simThe similarity of the motivational tendency of each resource to the motivational tendency of the user,

<math> <mrow> <msub> <mi>sim</mi> <mrow> <mi>r</mi> <mo>&Element;</mo> <msub> <mi>R</mi> <mi>sim</mi> </msub> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>M</mi> <mi>r</mi> </msub> <mo>,</mo> <msub> <mi>M</mi> <mi>u</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>M</mi> <mi>r</mi> </msub> <mo>·</mo> <msub> <mi>M</mi> <mi>u</mi> </msub> </mrow> <mrow> <mo>|</mo> <msub> <mi>M</mi> <mi>r</mi> </msub> <mo>|</mo> <mo>|</mo> <msub> <mi>M</mi> <mi>u</mi> </msub> <mo>|</mo> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>19</mn> <mo>)</mo> </mrow> </mrow> </math>

wherein M is_rFor the vector representation of resources which are similar to the resources to be marked in motivation tendency of the marked resources of the user, M_uIs represented by a vector of motivational tendencies of the user. Setting a threshold value beta as a control factor, and if the similarity is greater than or equal to the threshold value beta (the value of beta is 0 to 1, which can be measured through experiments and is suggested to be set to be 0.6), indicating that the labels of the resources can be recommended as labels meeting the user intention. Selecting non-user-dependent similar resources with similarity greater than threshold beta as label recommendation candidate resources, and using R_cadRepresenting tag recommendation candidate resources, i.e.

(4) Recommending candidate resource R by label_cadAll the labels in the step (1) are combined to obtain a combined label set; recommending candidate resource R to label_cadEach resource in the system is combined with all labels thereof according to a formula (20) to obtain a combined label set;

<math> <mrow> <msub> <mover> <mi>T</mi> <mo>^</mo> </mover> <mi>u</mi> </msub> <mo>=</mo> <munder> <mrow> <mi></mi> <mo>∪</mo> </mrow> <mrow> <mo>(</mo> <mi>r</mi> <mo>|</mo> <msub> <mi>sim</mi> <mrow> <mi>r</mi> <mo>&Element;</mo> <msub> <mi>R</mi> <mi>sim</mi> </msub> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>M</mi> <mi>r</mi> </msub> <mo>,</mo> <msub> <mi>M</mi> <mi>u</mi> </msub> <mo>)</mo> </mrow> <mo>&GreaterEqual;</mo> <mi>β</mi> <mo>)</mo> </mrow> </munder> <msub> <mi>T</mi> <mi>r</mi> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>20</mn> <mo>)</mo> </mrow> </mrow> </math>

(5) calculating the recommendation importance of each label in the combined label set; i.e., calculating the recommended importance of each tag in the consolidated set of tags according to equation (21)

<math> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>|</mo> <mover> <mi>r</mi> <mo>^</mo> </mover> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mi>Σ</mi> <mrow> <mi>w</mi> <mo>&Element;</mo> <mover> <mi>r</mi> <mo>^</mo> </mover> <mo>,</mo> <mi>t</mi> <mo>&Element;</mo> <msub> <mover> <mi>T</mi> <mo>^</mo> </mover> <mi>u</mi> </msub> </mrow> </munder> <mi>p</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> <mi>s</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>21</mn> <mo>)</mo> </mrow> </mrow> </math>

Where p (w) is the resource to be recommendedEach word w in the resource to be recommended

The content importance in (1) is calculated according to the formula (22); s (w, t) is the correlation between the word w and the tag t in the merged tagset, according to equation (23).

p (w) = \log (\frac{tf (w, \hat{r})}{N_{\hat{r}}} + 1) \log (\frac{N_{R_{cad}}}{| R_{cad} (w) |} + 1) - - - (22)

s (w, t) = \frac{\max ({\log tf (w, \hat{r}), \log tf (t, \hat{r})}) - \log tf (w, t, \hat{r})}{\log N_{\hat{r}} - \min ({\log tf (w, \hat{r}), \log tf (t, \hat{r})})} - - - (23)

Wherein,

is the word w in the resource

The number of times of occurrence of (a),

for tag t in resource

The number of times of occurrence of (a),

in resources for word w and tag t

The number of simultaneous occurrences of (a) and (b),

as a resource

The number of all the words in the list,

recommending the number of all words contained in the candidate resource for all labels, | R_cad(w) | is the number of resources of the word w contained in all the tag recommendation candidate resources, and the word w is a word in the english language.

(6) And recommending the corresponding label according to the recommendation importance p (t | r) from large to small.

The invention also provides a label recommendation system based on the user motivation tendency, which comprises a motivation tendency calculation module (100), a selection non-user-dependent similar resource module (200), a selection label recommendation candidate resource module (300), a label merging module (400), a recommendation importance calculation module (500) and an output module (600), as shown in fig. 4;

the motivation tendency calculation module (100) is used for calculating the motivation tendency of the user, the motivation tendency of each labeled resource and the motivation tendency of the resource to be labeled;

the non-user-dependent similar resource selecting module (200) is used for selecting resources similar to the motivation tendency of the resources to be labeled from the labeled resources to obtain the non-user-dependent similar resources;

the label recommendation candidate resource selection module (300) is used for selecting resources similar to the motivation tendency of the user from the non-user-dependent similar resources to obtain label recommendation candidate resources;

the tag merging module (400) is used for merging all tags in the tag recommendation candidate resources to obtain a merged tag set;

the recommendation importance calculating module (500) is used for calculating the recommendation importance of each label in the combined label set;

the output module (600) is used for recommending the labels according to the recommendation importance of each label from large to small.

The present invention is not limited to the above embodiments, and those skilled in the art can implement the present invention in other embodiments according to the disclosure of the present invention, so that the design structure and idea of the present invention all fall into the protection scope of the present invention.

Claims

1. A label recommendation method based on user motivation tendency comprises the following steps:

2. The tag recommendation method according to claim 1, wherein the motivational tendency of user u in step (1) is M_u＝(TRR_u，LFTU_u，TRCE_u，TSOF_u，STR_u)，TRR_u，LFTU_u，TRCE_u，TSOF_u，STR_uThe metric indexes of the motivational tendency of the user u are calculated according to the following formula:

(a)TRR_u＝e-|T_u|/|R_u|；

wherein, T_uRepresents the set of all tags, T, used by user u_uI represents the set T_uNumber of labels in, R_uRepresents all resources that user u has tagged, | R_uI represents the set R_uThe number of middle tags;

wherein t represents any label, t_maxThe most frequently used label for user u, | R (t) | is the amount of resource containing label t, | R (t)_max) I is the label t containing the most frequent_maxN is the most frequent label t_maxThe integer of p-th of the resource number is obtained, p is more than 0 and less than or equal to 100,

for the set of low frequency tags for user u,

the number of the low-frequency tags is the number of the user u;

wherein p (R, T) is the joint distribution probability of the label T on the resource R, p (R | T) is the probability of labeling the resource R by the label T, R represents the set of all labeled resources of the user u, T represents the label set given by all labeled resources of the user u, H_opt(R | T) is H when p (R | T) of all tags is the same_uThe value of (R | T);

wherein, sim (t)_i，t_j) Representing two labels t_i，t_jSimilarity between them, f (t)_i)，f(t_i) Are respectively a label t_i，t_jNumber of times the tab set of user u appears, f (t)_i，t_j) Is two labels t_i、t_jThe number of times of common occurrence in the tag set of the user u, wherein N is the total number of words in the tag set of the user u;

(e)STR_u＝card(t∈T_str)/|T_u|；

wherein, T_strFor a particular set of labels, card (T ∈ T)_str) Packages for use by user uIs contained in T_strThe number of tags in (1), including the repeat count;

the motivation tendency of any resource r in the marked resources and the resources to be marked in the step (1) is M_r＝(TRR_r，LFTU_r，TRCE_r，TSOF_r，STR_r)，TRR_r，LFTU_r，TRCE_r，TSOF_r，STR_rEach metric is a measure of the motivational tendency of the resource r and is calculated according to the following formula:

(a’)TRR_r＝e-|T_r|/|U_r|；

wherein, | T_rI represents the number of all tags given to resource r by all users, | U_rL represents the number of all users who have tagged resource r;

where t represents any label of resource r, t_max' the label that is the most frequently used resource R, | R_u(t) | is the number of users using the label t, | R_u(t_max') | is the most frequent label t used_max' the number of users, m is the most frequently used tag t_max' q is a integer which is one q of the number of users, q is more than 0 and less than or equal to 100,

is a set of low frequency tags for resource r,

the number of low frequency tags which are resources r;

wherein p (U, t) is the joint distribution probability of the user U using the label t, p (U | t) is the probability of the user U using the label t to label the resource r, U represents the set of all users labeling the resource r, H_ropt(R | T) is H when p (u | T) of all tags is the same_rThe value of (R | T);

wherein, sim (t)_i，t_j) ' is two labels t_i，t_jSimilarity between them, f (t)_i)’，f(t_j) ' are respectively a label t_i，t_jThe number of occurrences of the tagset in resource r, f (t)_i，t_j) ' is two labels t_i，t_jThe number of times of common occurrence in the tag set of the resource r, wherein N' is the total number of words in the tag set of the resource r;

(e’) STR_r＝card(t∈T_str)′/|T_r|；

wherein, T_strFor a particular set of labels, card (T ∈ T)_str) ' Inclusion in T used for resource r_strThe number of tags in (1), including the repeat count.

3. The tag recommendation method according to claim 1 or 2, wherein the following method is adopted in step (2) to obtain the non-user-dependent similar resource:

(3.1) respectively calculating the similarity between the motive tendency of each marked resource and the motive tendency of the resource to be marked;

and (3.2) selecting the marked resources with the similarity greater than the threshold value alpha, so as to obtain the similar resources independent of the user, wherein alpha is greater than 0 and less than 1.

4. The tag recommendation method according to claim 1 or 2, wherein the tag recommendation candidate resource is obtained in step (3) by adopting the following method:

(4.1) calculating the similarity between the motivational tendency of each resource in the non-user-dependent similar resources and the motivational tendency of the user;

and (4.2) selecting the non-user-dependent similar resources with the similarity larger than the threshold value beta, namely the label recommendation candidate resources, wherein the similarity is more than 0 and less than 1.

5. The tag recommendation method according to claim 1 or 2, wherein the step (5) calculates the merged tag set by using the following method

The recommended importance of each tag in (1):

(5.1) calculating resources to be annotated

Each word w in the resource to be labeled

The content importance of (1), (p), (w),wherein,

to mark resources for word w

The number of times of occurrence of (a),

to be markedResource(s)

The number of all the words in the list,

recommending the number of all words contained in the candidate resource for all labels, | R_cad(w) | is the number of resources of the word w contained in all the label recommendation candidate resources;

(5.2) calculating word w and merging tag sets

The correlation s (w, t) between the middle labels t,

wherein,

tagging a resource to tag t

The number of times of occurrence of (a),

for the word w and the label t in the resource to be labeled

The number of simultaneous occurrences of;

(5.3) calculating the recommendation importance of tag t

6. A label recommendation system based on user motivation tendency comprises a motivation tendency calculation module (100), a selection non-user-dependent similar resource module (200), a selection label recommendation candidate resource module (300), a label merging module (400), a recommendation importance calculation module (500) and an output module (600);