CN102262653A - Label recommendation method and system based on user motivation orientation - Google Patents
Label recommendation method and system based on user motivation orientation Download PDFInfo
- Publication number
- CN102262653A CN102262653A CN 201110154353 CN201110154353A CN102262653A CN 102262653 A CN102262653 A CN 102262653A CN 201110154353 CN201110154353 CN 201110154353 CN 201110154353 A CN201110154353 A CN 201110154353A CN 102262653 A CN102262653 A CN 102262653A
- Authority
- CN
- China
- Prior art keywords
- user
- label
- resource
- resources
- tendency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000008450 motivation Effects 0.000 title claims abstract description 85
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000001419 dependent effect Effects 0.000 claims description 22
- 238000002372 labelling Methods 0.000 claims description 20
- 238000004364 calculation method Methods 0.000 claims description 15
- 230000008520 organization Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 5
- 238000010606 normalization Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000019771 cognition Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 235000002595 Solanum tuberosum Nutrition 0.000 description 1
- 244000061456 Solanum tuberosum Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a label recommendation method based on a user motivation orientation. The method provided by the invention comprises the following steps of: calculating a user motivation orientation, a motivation orientation of each labeled resource and a motivation orientation of a resource to be labeled according to a user triple; selecting a resource which is similar with the motivation orientation of the resource to be labeled from the labeled resources to obtain a non-user-depended similar resource; selecting a resource which is similar with the user motivation orientation from the non-user-depended similar resource to obtain a label recommendation candidate resource; combining all labels in the label recommendation candidate resource to obtain a combined label set; calculating a recommendation importance of each label in the combined label set; finally, carrying out the label recommendation according to the recommendation importance of each label from big to small. The method provided by the invention can recognize the calculating of the network information resource labeled by the user and then recommends a list which accords with the user intention and is composed of multiple labels to the user. The invention also provides a label recommendation system based on the method.
Description
Technical Field
The invention belongs to the field of Web information resource processing and utilization, and particularly relates to a method for recommending a label for a Web information resource based on user motivation tendency and a recommendation system based on the method.
Background
With the increasing development of the Internet, the network information resources are growing at a speed which is hard to imagine, and the appearance of web2.0 makes the growth more rapid. In Web2.0, the internet system is changed from the original top-down centralized control and dominance of a few resource controllers to the bottom-up collective intelligence and strength dominance of a large number of users. The user is not only the browser of the network information resource, but also the producer of the network information resource. Although the web2.0 user created content enriches the sources of information and accelerates the diffusion of information, the web2.0 user created content also causes problems of information overload, search load increase, information quality reduction, and the like. How users can organize and manage the vast amount of network information resources covered by the sky delicately and how to obtain suitable and high-quality information quickly, inexpensively and effectively becomes an insurmountable major research topic in front of us.
The ideal network information resource organization should take users as the center, fully utilize emerging technologies and experience accumulated by people, and the organization system should have higher practicability and usability. In the web2.0 environment, social tag systems are playing an important role as a very effective method for organizing network information resources. As an organizational approach, unlike traditional top-down, rigid, controlled hierarchical classification systems, the social label system has three advantages: (1) the social labels are generated when network information resource users label network resources, and the same social labels form a new classification after being collected, and the classification is from bottom to top; (2) the social label is not controlled by experts, users can label by using any words, the social label has extremely high flexibility, usability and subjective cognition, and network resources can be 'flexible' and belong to a plurality of popular classifications. (3) In a social tagging system, a user can label network resources from multiple dimensions, multiple levels. Thus, its structure is non-hierarchical.
However, while having many advantages, the label method also has disadvantages, which are mainly expressed in the following two aspects: (1) most social label systems allow a user to input labels by himself, and the operation mode enables the user to easily control labeling behaviors, but the randomness of labeling also causes more noise, misspelling, ambiguity and user-defined labels without practical significance to be filled in the labels, so that the practicability of the labels is not hindered. For this reason, some social tagging systems have to give some guiding principles specifically to the user. (2) The data sparseness problem is that the tag type browsing is a new information organization mode, and has not yet been widely applied, especially in the chinese resources, the network resources adopting the organization mode are very rare, and on the other hand, users are not used to add a large number of tags to the network resources, so that the existing tag resources on the network are very rare.
In recent years, it is with this practical demand that tag recommendation technology has received a great deal of attention from academia and internet enterprises. The label recommendation is to provide a series of high-quality labels as candidates for the network information resources to be labeled through investigating, analyzing and mining the content of the network information resources and the labeling history, explicit or implicit relation of the user. The recommended purposes are mainly as follows: (1) the labeling program is simplified, and the use by a user is facilitated, so that the usability and the viscosity of the social label system are increased. (2) The quality of the label is improved, the situations of misspelling, ambiguity and the like are reduced, and the functions of the label in information resource organization, retrieval, utilization and discovery are improved. (3) The structure of the label space is changed, so that the label space is more quickly stabilized and converged, and further the semantics are developed.
At present, some developed social label recommendation systems for various network information resources exist at home and abroad, and the systems play very important roles in the aspects of information resource organization, retrieval, sharing, discovery and the like. These systems include: amazon for performing label recommendation on commodities, Delcious for performing label recommendation on webpage resources, Flickr for performing label recommendation on pictures, Bibsonomy for performing label recommendation on academic papers, a bean-lobe net for recommending labels for books and movies, a potato net for providing recommended labels for video sharing, and the like. The existing label recommendation system mainly adopts the traditional technology of recommending commodities in an electronic commerce system, and mainly comprises the following steps: content-based recommendation techniques, collaborative filtering-based recommendation techniques, association rule-based recommendation techniques, and hybrids of these techniques. On the basis of recommendation, the traditional recommendation technologies recommend based on the content of the resource or based on the historical result marked by the user. Most of the recommended algorithms are algorithms using data mining or machine learning. The traditional label recommendation technologies solve the problems of information overload and organization, classification and retrieval of information resources to a certain extent, but are not ideal in effect, and especially cannot recommend labels meeting the information requirements of users.
Disclosure of Invention
In order to meet the information requirements of users, and from the motivation of users to use social label systems, the information targets of the users are identified, and more accurate social labels are recommended for the users. The invention also provides a label recommendation system based on the method.
The invention is realized by adopting the following technical scheme: the invention provides a label recommendation method based on user motivation tendency, which comprises the following steps:
(1) calculating the motivation tendency of the user, the motivation tendency of each labeled resource and the motivation tendency of the resource to be labeled according to the user triples; the user triple comprises the labeling history of the user, the labeled resources and the corresponding labels as well as the resources to be labeled and the corresponding labels;
(2) selecting resources with similar motivation tendentiousness to resources to be labeled from the labeled resources, and calling the obtained resources as non-user-dependent similar resources;
(3) selecting resources similar to the motivation tendency of the user from the non-user-dependent similar resources, and calling the obtained resources as label recommendation candidate resources;
(4) merging all labels in the label recommendation candidate resources to obtain a merged label set;
(5) calculating the recommendation importance of each label in the combined label set;
(6) and recommending the labels according to the recommendation importance of each label from large to small.
The invention also provides a label recommendation system based on the user motivation tendency, which comprises a motivation tendency calculation module, a module for selecting non-user-dependent similar resources, a module for selecting label recommendation candidate resources, a label combination module, a recommendation importance calculation module and an output module;
the motivation tendency calculation module is used for calculating motivation tendency of the user, motivation tendency of each marked resource and motivation tendency of the resource to be marked;
the non-user-dependent similar resource selecting module is used for selecting resources similar to the motivation tendency of the resources to be labeled from the labeled resources to obtain the non-user-dependent similar resources;
the label recommendation candidate resource selection module is used for selecting resources similar to the motivation tendency of the user from the non-user-dependent similar resources to obtain label recommendation candidate resources;
the label merging module is used for merging all labels in the label recommendation candidate resources to obtain a merged label set;
the recommendation importance calculating module is used for calculating the recommendation importance of each label in the combined label set;
the output module is used for recommending the labels according to the recommendation importance of each label from large to small.
The method for recommending the label in the conventional social label system has the main points that the content of the resource or the co-occurrence structure of the label is the same, and the like, and the method provided by the invention directly starts from the relatively stable labeling motivation tendency of the user, obtains the labeling motivation tendency of the user, and recommends the label according to the labeling motivation tendency, so that the recommended label better conforms to the intention of the user, and the recommendation effect is better. The method can identify the motivation of the user for marking the network information resource, the motivation discovery provides good design reference for a design label recommendation system, and can generate guidance function for learning of the body in the label space, thereby being more beneficial to the stability of the social label structure and the semantic emergence of the social label.
Drawings
FIG. 1 is a flow of tag recommendation based on user motivational tendencies;
FIG. 2 is a schematic diagram of a special tag usage query in accordance with the present invention;
FIG. 3 is a tag cloud depicting motivational users according to the present invention;
FIG. 4 is a block diagram of a tag recommendation system according to the present invention.
Detailed Description
The present invention is described in further detail below with reference to the attached drawings and examples.
There are two main categories of motivational tendency described in the present invention, namely, categorical motivational tendency and descriptive motivational tendency, and their characteristics are shown in table 1.
TABLE 1 categorizes motivational trends and characteristics describing motivational trends
Tendency of classification motivation | Description of the inventionTendency of motivation | |
Purpose(s) to | Facilitating later browsing | Facilitating future queries and retrieval |
Resource label rate | Is low in | Height of |
Word list size | Limited by | Infinite number of elements |
Occurrence of synonyms | Chinese character shao (a Chinese character of 'shao') | Multiple purpose |
Tags from resource titles | Chinese character shao (a Chinese character of 'shao') | Multiple purpose |
Changing the cost of a tag | Big (a) | Small |
Specifically, the purpose of using tags by a tagging user with a tendency of classification motivation is to provide a browsing aid function for tagged resources. Therefore, users with a tendency to sort motivations may wish to create a stable vocabulary according to their preferences. For easy browsing, the word list is simpler, less redundant and better, so that the labeling user can avoid using words with the same semantics and select words which are easy to understand and remember. For example, when labeling a car, often only "car" will be present in the vocabulary of users with a tendency to sort motives, and words such as "automobile", "vehicle", etc. having the same meaning will not be used. Thus, from the results of the labeling, such a vocabulary more closely resembles a semantic classification system. Of course, as with conventional classification systems, the cost of modifying the classification system is relatively large.
The purpose of the label used by the labeling user with the description motivation tendency is to accurately describe the content of the labeled resource for later query and retrieval. To better support the query and browsing purposes, the user's vocabulary may introduce many unusual and synonymous words, for example, "car", "automatic", and "vehicle" may appear in the vocabulary when describing the car. In addition, users often wish to describe resources in a number of ways, not limiting the number of words used. It is also possible that words of the same meaning may change as cognition progresses during the annotation process. Thus, the vocabulary describing the motivational tendency of the user is an open, dynamic vocabulary.
In the invention, the following symbols are adopted to mark relevant parameters in the social label system, U represents a user, R represents a resource, such as a webpage, U represents a set of all users U marking the resource R, R represents a set of all marked resources of the user, and R representsuRepresents all resources that user u has tagged, | RuI represents the set RuThe number of labels in, t represents any one label, t1,t2,…,tnAll represent a specific label, T represents a label set given to all the labeled resources of the user, TuRepresents the set of all tags, T, used by user uuI represents the set TuNumber of labels in, TrAll tags, T, representing all users assigned to resource rrI represents the set TrThe number of tags in (a); ru(t) represents the resource that user u has tagged with tag t.
The invention provides a label recommendation method based on user motivation tendency, which comprises the following steps:
(1) calculating the motivation tendency of the user, the motivation tendency of each labeled resource and the motivation tendency of the resource to be labeled according to the user triples; the user triple comprises the labeling history of the user, the labeled resources and the corresponding labels as well as the resources to be labeled and the corresponding labels; and the label corresponding to the resource comprises the label of all users to the resource.
The motivational tendency of user u may be given by vector MuRepresents, i.e.:
Mu=(TRRu,LFTUu,TRCEu,TSOFu,STRu) (1)
wherein, TRRu,LFTUu,TRCEu,TSOFu,STRuThe meaning and calculation of 5 measures for motivational tendency are as follows:
a) user's labeled resource average label rate (Tags/Resources Ratio, TRR)
User's labeled resource average label rate TRRuThe average number of tags used by the user to label each resource is measured as the ratio of the size of the user's vocabulary to the total number of resources labeled by the user, as shown in equation (2).
TRRu=e-|Tu|/|Ru| (2)
Users who describe motivation tendencies may select various words to describe resources for descriptive purposes, and are not limited in number by theory. Users with a propensity to sort incentives tend to select fewer words to label resources for browsing purposes. Thus, its vocabulary is limited. Generally, users who classify motivational tendencies score significantly lower in this feature metric than users who describe motivational tendencies. That is, the TRR of the useruThe smaller the value, the more likely the user is to be classified; TRRuThe larger the value, the more likely the user is to be descriptive.
b) User Low Frequency Tag usage (LFTR)
To account for the use of tags, LFTR is useduTo delineate the extent to which the user uses those low frequency tags, which is equal to the proportion of the number of low frequency tags to the total number of tags in the user's vocabulary. The low frequency label is an infrequently used label marked for a few resources by a user, that is, a label marked for a few times is used. Its metric is calculated using equation (3):
where t represents any label, tmaxFor the most frequently used tags by the user, | R (t) |, | R (t)max) I is respectively the inclusion label t and the most frequent label tmaxN is the most frequent label tmaxThe integer of p-th of the resource number is obtained, p is more than 0 and less than or equal to 100,the number of resources marked by the set of low-frequency tags for user u, i.e. the tags contained in the set, is no more than n,the number of the low frequency tags is the user u.
It is apparent that 0. ltoreq. LFTRuLess than or equal to 1. When LFTRuWhen 1, all tags representing the user are used no more than n times, the tags all describe the resource from different corners, different sides, and the user does not mind using low frequency words. Of course, the user may be considered to have a descriptive motivational tendency. When LFTRuWhen the value is 0, it means that the user rarely uses the low frequency tag, and the low frequency tag is considered to be unfavorable for the classified browsing. If a low frequency tag is introduced, equal to leadNoise, which destroys the possibility of maintaining consistent classification, makes the user very interested in using low frequency words. Of course, the user may be considered to have a tendency to sort motivations.
c) Relative Conditional Entropy (TRCE) of each label of the user
For users with a tendency to sort motivations, they want the tags to have the greatest degree of discrimination, and only so efficiently browse. Therefore, the process of selecting a tag by the user is compared with the process of encoding information. The information encoding is to maximize the information entropy of the code, and the user selection of a distinguishing or useful label is to maximize the information entropy of the label. In other words, users with a classification tendency want all tags to be the same in use frequency, so that it is possible to maximize the information entropy of the tags, and thus facilitate browsing of the users. Conversely, users describing motivational tendencies are not interested in this.
When the user uses the tag to encode the resource, its conditional entropy Hu(R | T) reflects the effectiveness of this encoding process and can be calculated according to equation (4):
wherein p (r, t) is the joint distribution probability of the label t on the resource r, and p (r | t) is the probability of labeling the resource r by the label t.
In the calculation of the label information entropy, labels and resources are used as random variables. Conditional entropy can be interpreted as the uncertainty of the resource retention tag, mainly affected by the number of resources and the size of the vocabulary. In order to distinguish the difference between users, the conditional entropy is normalized to reserve the coding information, so that the conditional entropy H of the real observationu(R | T) is equal to ideal conditional entropy Hopt(R | T) for comparison. When each label has the same discrimination, i.e. p (r | t) of all labels is the same, the ideal conditional entropy H can be obtainedopt(R | T), where the conditional entropy is also at a maximum. Therefore, the conditional entropy in the ideal case can be used as a normalization factor, and on the basis of the normalization factor, the relative conditional entropy of the label is calculated as the formula (5)
Obviously, 0. ltoreq. TRCEuLess than or equal to 1. When TRCEuThe closer to 0, the closer to the ideal condition the conditional entropy of the user is, and the more equal probability distribution the label tends to. In this case, the labels have a strong ability to distinguish, and it can be determined that the user is likely to belong to a user with a tendency to sort motivations. Instead, it is likely to belong to a user who describes motivational tendencies.
d) User's Tag Semantic repeat Factor (Tag Semantic overlay Factor, TSOF)
For users with a tendency to have a classification motivation, it is desirable that the synonyms in the vocabulary of the users are as few as possible, so that the browsing efficiency can be improved. But for a user who has a tendency to describe motivation, they do not care about this, but rather, can describe resources more fully as well. Therefore, the motivational tendency of the user can be measured by calculating the similarity of the labels used by the user, as shown in formula (6).
Wherein, sim (t)i,tj) Is two labels ti,tjThe similarity between the two is calculated by the formula (7). In the formula f (t)i),f(tj) Are respectively a label ti,tjNumber of times the tab set of user u appears, f (t)i,tj) The number of times the tab sets co-occur at the user.
Where N is the total number of words in the user's tagset.
TSOF when the similarity of all tags of a user approaches 0uClose to 0, the motivational tendency of the user is indicated as classification tendency; otherwise, the motivation of the user is explainedTendentiousness is a descriptive tendency.
e) User's Special tag usage rate (STR)
Through the statistical analysis of the social label system labels, the following results are found: when users are labeled with question adverbs such as where, what, how, and how, the rest of label users are often selected from the titles of the resources, and through comparison analysis, the intention of the users describing the page content is very obvious (as shown in fig. 2). These query adverb labels are defined as special labels. Meanwhile, as can be seen from fig. 3, the motivations of these users to label other resources also tend to be descriptive motivations, for example, other labeling records of the user "breneaux" in the first record obviously contain other features (such as tag semantic repeat factors) of descriptive motivation tendencies.
Therefore, the usage rate of the special words when labeling the resources can also be used as one of the determination indexes of the user motivation tendency. If a user has a higher usage rate of a particular word, he has a tendency to describe the motivation. Conversely, the lower the descriptive motivational tendency he has. The usage of special tags is measured using equation (8).
STRu=card(t∈Tstr)/|Tu| (8)
Wherein, TstrWhere { who, when, what, when, where, how, … } is a special set of labels, it can be set as a set of all query adverbs in english. card (T e T)str) Inclusion in T for use by user ustrThe number of tags in (1), including the repeat count. It is apparent that 0. ltoreq. STRuWhen STR is less than or equal to 1, when STR is equal tou=card(t∈Tstr)/|TuThe closer | is to 1, the more likely user u is to have a tendency to describe motivation; if STRuThe closer to 0, the more likely user u has a propensity to sort motives.
The motivational tendency of each labeled resource can also be represented by a vector of 5 motivational metrics; for a resource r, its motivational tendency may be given by a vector MrRepresents, i.e.:
Mr=(TRRr,LFTUr,TRCEr,TSOFr,STRr) (9)
wherein, TRRr,LFTUr,TRCEr,TSOFr,STRrThe meaning and calculation of 5 metrics for motivational tendency of resources are as follows:
a) average label rate per user for tagged Resources (Tags/Resources Ratio, TRR)
Per user average label rate TRR for tagged resourcesrThe method is used for measuring the number of the tags used by each user of the labeled resource on average, and is equal to the ratio of the number of all the tags used by the resource to the number of the users labeling the resource, as shown in formula (10).
TRRr=e-|Tr|/|Ur| (10)
Wherein, | TrI represents the number of all tags given to resource r by all users, | UrL represents the number of all users who have tagged resource r. Generally, the resources that classify motivational trends score significantly lower in the feature metric than the resources that describe the motivational trends. That is, the TRR of the resourcerThe smaller the value, the more likely the resource is to be classified; TRRrThe larger the value, the more likely the resource is to be described.
b) Low Frequency Tag usage of tagged resources (LFTR)
To account for the use of tags, LFTR is usedrThe usage degree of the low-frequency labels assigned to the resource is characterized, and the usage degree is equal to the proportion of the number of the low-frequency labels to the total number of the labels in all the word lists assigned to the resource. The so-called low frequency tag is a tag that is not often assigned to the resource. Its metric is calculated using equation (11):
where t represents any label of the resource r, tmax' the label that is the most frequently used resource R, | Ru(t) | is the number of users using the label t, | Ru(tmax') | is the most frequent label t usedmax' the number of users, m is the most frequently used tag tmax' q is a integer which is one q of the number of users, q is more than 0 and less than or equal to 100,a set of low frequency tags for resource r, i.e., the number of users using tags in the set is m or less,the number of low frequency tags of resource r.
c) Relative Conditional Entropy (TRCE) of each label of a tagged resource
For resources that are trended by a classification incentive, the labels assigned to them should have the greatest degree of discrimination, and only then is it most efficient when the user is browsing. Thus, the process of tagging resources may be compared to the process of encoding information. The most effective information coding is to maximize the information entropy of the code, and selecting the label with the discrimination degree is to maximize the information entropy of the label. When the user uses the tag to encode the resource, its conditional entropy HrThe validity of this encoding process is reflected in (U | T), which can be calculated according to equation (12):
wherein p (u, t) is the joint distribution probability of the user u using the label t, and p (u | t) is the probability of the user u using the label t to label the resource r. When each label has the same discrimination, i.e. p (u | t) of all labels is the same, the ideal conditional entropy H can be obtainedropt(U | T), where the conditional entropy is also at a maximum.
In the calculation of the label information entropy, the label and the user are used as random variables. Conditional entropy can be interpreted as the uncertainty of the user's use of the tag, mainly influenced by the number of users and the size of the vocabulary. In order to distinguish the difference between the resources, the conditional entropy is normalized to reserve the coding information, so that the conditional entropy H of the real observationr(U | T) is equal to ideal conditional entropy Hropt(UT) for comparison. Therefore, the conditional entropy in the ideal case can be used as a normalization factor, and on the basis of the normalization factor, the relative conditional entropy of the label is calculated as the formula (13)
d) Tagged Semantic repeat Factor for tagged resources (Tag Semantic overlay Factor, TSOF)
For resources with a tendency of classification motivation, synonyms in the vocabulary of the resources are as few as possible, so that the browsing efficiency can be improved. But for a resource that has a tendency to describe motivation, the opposite is true if the tags are able to describe the resource more fully. Therefore, the motivational tendency of the resource can be measured by calculating the similarity of the labels assigned to the resource, as shown in equation (14).
Wherein, sim (t)i,tj) ' is two labels ti,tjThe similarity between them is calculated by the formula (15). In the formula f (t)i)’,f(tj) ' are respectively a label ti,tjThe number of occurrences of the tagset in resource r, f (t)i,tj) ' is two labels ti,tjThe number of times the set of tags in resource r co-occur.
Where N' is the total number of words in the tagset for resource r.
e) Special tag usage of tagged resources (STR)
The definition of the special label of the resource is the same as that of the special label of the user. The usage rate of the special words can also be used as one of the judgment indexes of the labeling motivation tendency of the resources. If a resource is given a higher usage of a particular word, it has a tendency to describe motivation. Conversely, the higher the classification motivational tendency he has. The specific tag usage is measured by equation (16).
STRr=card(t∈Tstr)′/|Tr| (16)
Wherein, TstrWhere { who, when, what, when, where, how, … } is a special set of labels, it can be set as a set of all query adverbs in english. card (T e T)str) ' Inclusion in T used for resource rstrThe number of tags in (1), including the repeat count. It is apparent that 0. ltoreq. STRrWhen STR is less than or equal to 1, when STR is equal tor=card(t∈Tstr)/|TrThe closer | is to 1, the more likely resource r is to have a tendency to describe motivation; if STRrThe closer to 0, the more likely the resource r is to have a classification motivation tendency.
Also, resources to be annotatedMove the machineThe tendency can also be expressed as a vector of 5 motivational metricsNamely:
wherein,5 measures for the motivational tendency of the resource, and the calculation of the 5 measures is the same as the calculation method of the motivational tendency of the labeled resource.
(2) Selecting resources with similar motivation tendentiousness to resources to be labeled from the labeled resources to obtain similar resources which are not depended by the user; namely, calculating the similarity of motive tendencies of each labeled resource and resource to be labeled, selecting the labeled resources with the similarity larger than a threshold value alpha, and synthesizing the selected resources into the resources independent of usersSimilar resource Rsim;
Specifically, when a resource is labeled, a user often selects a label that has been used before to label a new resource. So that resources similar to the resources to be labeled are found, and the matching degree of the resources and the current motivation tendency of the user is calculated. The label of the resource with high matching degree is used as a recommended candidate set, and the label meeting the user intention can be obtained. The tendency similarity between the resource to be labeled and the labeled resource of the user can be calculated by a vector space-based cosine method, such as formula (18).
Wherein M isrMotivational tendencies for resources that have been labeled for a userThe representation of the vector is carried out,is represented by a vector of motivational tendencies of the resource to be annotated. Setting a threshold value alpha as a control factor, and if the similarity is greater than or equal to the threshold value alpha (alpha is 0 to 1, which can be measured by experiments and is suggested to be set to 0.6), this means that the similarity degree of the motivation tendencies of the user labeled resources and the resources to be labeled is very high. Combining these labeled groups of very similar degree, using RsimRepresenting combined, non-user-dependent, similar resources, i.e.The similarity can also be calculated by adopting methods such as mutual information, Pearson similarity and the like.
(3) Relying on similar resources R in non-userssimSelecting resources similar to the motivation tendency of the user to obtain label recommendation candidate resources Rcad;
The non-user-dependent similar resource R is calculated by adopting a formula (19)simThe similarity of the motivational tendency of each resource to the motivational tendency of the user,
wherein M isrFor the vector representation of resources which are similar to the resources to be marked in motivation tendency of the marked resources of the user, MuIs represented by a vector of motivational tendencies of the user. Setting a threshold value beta as a control factor, and if the similarity is greater than or equal to the threshold value beta (the value of beta is 0 to 1, which can be measured through experiments and is suggested to be set to be 0.6), indicating that the labels of the resources can be recommended as labels meeting the user intention. Selecting non-user-dependent similar resources with similarity greater than threshold beta as label recommendation candidate resources, and using RcadRepresenting tag recommendation candidate resources, i.e.
(4) Recommending candidate resource R by labelcadAll the labels in the step (1) are combined to obtain a combined label set; recommending candidate resource R to labelcadEach resource in the system is combined with all labels thereof according to a formula (20) to obtain a combined label set;
(5) calculating the recommendation importance of each label in the combined label set; i.e., calculating the recommended importance of each tag in the consolidated set of tags according to equation (21)
Where p (w) is the resource to be recommendedEach word w in the resource to be recommendedThe content importance in (1) is calculated according to the formula (22); s (w, t) is the correlation between the word w and the tag t in the merged tagset, according to equation (23).
Wherein,is the word w in the resourceThe number of times of occurrence of (a),for tag t in resourceThe number of times of occurrence of (a),in resources for word w and tag tThe number of simultaneous occurrences of (a) and (b),as a resourceThe number of all the words in the list,recommending the number of all words contained in the candidate resource for all labels, | Rcad(w) | is the number of resources of the word w contained in all the tag recommendation candidate resources, and the word w is a word in the english language.
(6) And recommending the corresponding label according to the recommendation importance p (t | r) from large to small.
The invention also provides a label recommendation system based on the user motivation tendency, which comprises a motivation tendency calculation module (100), a selection non-user-dependent similar resource module (200), a selection label recommendation candidate resource module (300), a label merging module (400), a recommendation importance calculation module (500) and an output module (600), as shown in fig. 4;
the motivation tendency calculation module (100) is used for calculating the motivation tendency of the user, the motivation tendency of each labeled resource and the motivation tendency of the resource to be labeled;
the non-user-dependent similar resource selecting module (200) is used for selecting resources similar to the motivation tendency of the resources to be labeled from the labeled resources to obtain the non-user-dependent similar resources;
the label recommendation candidate resource selection module (300) is used for selecting resources similar to the motivation tendency of the user from the non-user-dependent similar resources to obtain label recommendation candidate resources;
the tag merging module (400) is used for merging all tags in the tag recommendation candidate resources to obtain a merged tag set;
the recommendation importance calculating module (500) is used for calculating the recommendation importance of each label in the combined label set;
the output module (600) is used for recommending the labels according to the recommendation importance of each label from large to small.
The present invention is not limited to the above embodiments, and those skilled in the art can implement the present invention in other embodiments according to the disclosure of the present invention, so that the design structure and idea of the present invention all fall into the protection scope of the present invention.
Claims (6)
1. A label recommendation method based on user motivation tendency comprises the following steps:
(1) calculating the motivation tendency of the user, the motivation tendency of each labeled resource and the motivation tendency of the resource to be labeled according to the user triples; the user triple comprises the labeling history of the user, the labeled resources and the corresponding labels as well as the resources to be labeled and the corresponding labels;
(2) selecting resources with similar motivation tendentiousness to resources to be labeled from the labeled resources, and calling the obtained resources as non-user-dependent similar resources;
(3) selecting resources similar to the motivation tendency of the user from the non-user-dependent similar resources, and calling the obtained resources as label recommendation candidate resources;
(4) merging all labels in the label recommendation candidate resources to obtain a merged label set;
(5) calculating the recommendation importance of each label in the combined label set;
(6) and recommending the labels according to the recommendation importance of each label from large to small.
2. The tag recommendation method according to claim 1, wherein the motivational tendency of user u in step (1) is Mu=(TRRu,LFTUu,TRCEu,TSOFu,STRu),TRRu,LFTUu,TRCEu,TSOFu,STRuThe metric indexes of the motivational tendency of the user u are calculated according to the following formula:
(a)TRRu=e-|Tu|/|Ru|;
wherein, TuRepresents the set of all tags, T, used by user uuI represents the set TuNumber of labels in, RuRepresents all resources that user u has tagged, | RuI represents the set RuThe number of middle tags;
wherein t represents any label, tmaxThe most frequently used label for user u, | R (t) | is the amount of resource containing label t, | R (t)max) I is the label t containing the most frequentmaxN is the most frequent label tmaxThe integer of p-th of the resource number is obtained, p is more than 0 and less than or equal to 100,for the set of low frequency tags for user u,the number of the low-frequency tags is the number of the user u;
wherein p (R, T) is the joint distribution probability of the label T on the resource R, p (R | T) is the probability of labeling the resource R by the label T, R represents the set of all labeled resources of the user u, T represents the label set given by all labeled resources of the user u, Hopt(R | T) is H when p (R | T) of all tags is the sameuThe value of (R | T);
wherein, sim (t)i,tj) Representing two labels ti,tjSimilarity between them, f (t)i),f(ti) Are respectively a label ti,tjNumber of times the tab set of user u appears, f (t)i,tj) Is two labels ti、tjThe number of times of common occurrence in the tag set of the user u, wherein N is the total number of words in the tag set of the user u;
(e)STRu=card(t∈Tstr)/|Tu|;
wherein, TstrFor a particular set of labels, card (T ∈ T)str) Packages for use by user uIs contained in TstrThe number of tags in (1), including the repeat count;
the motivation tendency of any resource r in the marked resources and the resources to be marked in the step (1) is Mr=(TRRr,LFTUr,TRCEr,TSOFr,STRr),TRRr,LFTUr,TRCEr,TSOFr,STRrEach metric is a measure of the motivational tendency of the resource r and is calculated according to the following formula:
(a’)TRRr=e-|Tr|/|Ur|;
wherein, | TrI represents the number of all tags given to resource r by all users, | UrL represents the number of all users who have tagged resource r;
where t represents any label of resource r, tmax' the label that is the most frequently used resource R, | Ru(t) | is the number of users using the label t, | Ru(tmax') | is the most frequent label t usedmax' the number of users, m is the most frequently used tag tmax' q is a integer which is one q of the number of users, q is more than 0 and less than or equal to 100,is a set of low frequency tags for resource r,the number of low frequency tags which are resources r;
wherein p (U, t) is the joint distribution probability of the user U using the label t, p (U | t) is the probability of the user U using the label t to label the resource r, U represents the set of all users labeling the resource r, Hropt(R | T) is H when p (u | T) of all tags is the samerThe value of (R | T);
wherein, sim (t)i,tj) ' is two labels ti,tjSimilarity between them, f (t)i)’,f(tj) ' are respectively a label ti,tjThe number of occurrences of the tagset in resource r, f (t)i,tj) ' is two labels ti,tjThe number of times of common occurrence in the tag set of the resource r, wherein N' is the total number of words in the tag set of the resource r;
(e’) STRr=card(t∈Tstr)′/|Tr|;
wherein, TstrFor a particular set of labels, card (T ∈ T)str) ' Inclusion in T used for resource rstrThe number of tags in (1), including the repeat count.
3. The tag recommendation method according to claim 1 or 2, wherein the following method is adopted in step (2) to obtain the non-user-dependent similar resource:
(3.1) respectively calculating the similarity between the motive tendency of each marked resource and the motive tendency of the resource to be marked;
and (3.2) selecting the marked resources with the similarity greater than the threshold value alpha, so as to obtain the similar resources independent of the user, wherein alpha is greater than 0 and less than 1.
4. The tag recommendation method according to claim 1 or 2, wherein the tag recommendation candidate resource is obtained in step (3) by adopting the following method:
(4.1) calculating the similarity between the motivational tendency of each resource in the non-user-dependent similar resources and the motivational tendency of the user;
and (4.2) selecting the non-user-dependent similar resources with the similarity larger than the threshold value beta, namely the label recommendation candidate resources, wherein the similarity is more than 0 and less than 1.
5. The tag recommendation method according to claim 1 or 2, wherein the step (5) calculates the merged tag set by using the following methodThe recommended importance of each tag in (1):
(5.1) calculating resources to be annotatedEach word w in the resource to be labeledThe content importance of (1), (p), (w),wherein,to mark resources for word wThe number of times of occurrence of (a),to be markedResource(s)The number of all the words in the list,recommending the number of all words contained in the candidate resource for all labels, | Rcad(w) | is the number of resources of the word w contained in all the label recommendation candidate resources;
wherein,tagging a resource to tag tThe number of times of occurrence of (a),for the word w and the label t in the resource to be labeledThe number of simultaneous occurrences of;
6. A label recommendation system based on user motivation tendency comprises a motivation tendency calculation module (100), a selection non-user-dependent similar resource module (200), a selection label recommendation candidate resource module (300), a label merging module (400), a recommendation importance calculation module (500) and an output module (600);
the motivation tendency calculation module (100) is used for calculating the motivation tendency of the user, the motivation tendency of each labeled resource and the motivation tendency of the resource to be labeled;
the non-user-dependent similar resource selecting module (200) is used for selecting resources similar to the motivation tendency of the resources to be labeled from the labeled resources to obtain the non-user-dependent similar resources;
the label recommendation candidate resource selection module (300) is used for selecting resources similar to the motivation tendency of the user from the non-user-dependent similar resources to obtain label recommendation candidate resources;
the tag merging module (400) is used for merging all tags in the tag recommendation candidate resources to obtain a merged tag set;
the recommendation importance calculating module (500) is used for calculating the recommendation importance of each label in the combined label set;
the output module (600) is used for recommending the labels according to the recommendation importance of each label from large to small.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110154353 CN102262653B (en) | 2011-06-09 | 2011-06-09 | Label recommendation method and system based on user motivation orientation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110154353 CN102262653B (en) | 2011-06-09 | 2011-06-09 | Label recommendation method and system based on user motivation orientation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102262653A true CN102262653A (en) | 2011-11-30 |
CN102262653B CN102262653B (en) | 2013-09-18 |
Family
ID=45009282
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201110154353 Expired - Fee Related CN102262653B (en) | 2011-06-09 | 2011-06-09 | Label recommendation method and system based on user motivation orientation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102262653B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103164463A (en) * | 2011-12-16 | 2013-06-19 | 国际商业机器公司 | Method and device for recommending labels |
CN103544510A (en) * | 2013-09-30 | 2014-01-29 | 小米科技有限责任公司 | Information processing method, information processing device and mobile terminal |
CN104199838A (en) * | 2014-08-04 | 2014-12-10 | 浙江工商大学 | User model building method based on label disambiguation |
CN104216881A (en) * | 2013-05-29 | 2014-12-17 | 腾讯科技(深圳)有限公司 | Method and device for recommending individual labels |
CN105989018A (en) * | 2015-01-29 | 2016-10-05 | 深圳市腾讯计算机系统有限公司 | Label generation method and label generation device |
CN107341242A (en) * | 2017-07-06 | 2017-11-10 | 太原理工大学 | A kind of label recommendation method and system |
CN107833082A (en) * | 2017-09-15 | 2018-03-23 | 广州唯品会研究院有限公司 | A kind of recommendation method and apparatus of commodity picture |
CN108334625A (en) * | 2018-02-09 | 2018-07-27 | 深圳壹账通智能科技有限公司 | Processing method, device, computer equipment and the storage medium of user information |
CN108415971A (en) * | 2018-02-08 | 2018-08-17 | 兰州智豆信息科技有限公司 | Recommend the method and apparatus of supply-demand information using knowledge mapping |
CN111221644A (en) * | 2018-11-27 | 2020-06-02 | 阿里巴巴集团控股有限公司 | Resource scheduling method, device and equipment |
CN112966682A (en) * | 2021-05-18 | 2021-06-15 | 江苏联著实业股份有限公司 | File classification method and system based on semantic analysis |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6655963B1 (en) * | 2000-07-31 | 2003-12-02 | Microsoft Corporation | Methods and apparatus for predicting and selectively collecting preferences based on personality diagnosis |
CN101751448A (en) * | 2009-07-22 | 2010-06-23 | 中国科学院自动化研究所 | Commendation method of personalized resource information based on scene information |
CN102004774A (en) * | 2010-11-16 | 2011-04-06 | 清华大学 | Personalized user tag modeling and recommendation method based on unified probability model |
-
2011
- 2011-06-09 CN CN 201110154353 patent/CN102262653B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6655963B1 (en) * | 2000-07-31 | 2003-12-02 | Microsoft Corporation | Methods and apparatus for predicting and selectively collecting preferences based on personality diagnosis |
CN101751448A (en) * | 2009-07-22 | 2010-06-23 | 中国科学院自动化研究所 | Commendation method of personalized resource information based on scene information |
CN102004774A (en) * | 2010-11-16 | 2011-04-06 | 清华大学 | Personalized user tag modeling and recommendation method based on unified probability model |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103164463A (en) * | 2011-12-16 | 2013-06-19 | 国际商业机器公司 | Method and device for recommending labels |
US9134957B2 (en) | 2011-12-16 | 2015-09-15 | International Business Machines Corporation | Recommending tags based on user ratings |
CN103164463B (en) * | 2011-12-16 | 2017-03-22 | 国际商业机器公司 | Method and device for recommending labels |
CN104216881A (en) * | 2013-05-29 | 2014-12-17 | 腾讯科技(深圳)有限公司 | Method and device for recommending individual labels |
CN103544510B (en) * | 2013-09-30 | 2016-10-26 | 小米科技有限责任公司 | Information processing method, device and mobile terminal |
CN103544510A (en) * | 2013-09-30 | 2014-01-29 | 小米科技有限责任公司 | Information processing method, information processing device and mobile terminal |
CN104199838B (en) * | 2014-08-04 | 2017-09-29 | 浙江工商大学 | A kind of user model constructing method based on label disambiguation |
CN104199838A (en) * | 2014-08-04 | 2014-12-10 | 浙江工商大学 | User model building method based on label disambiguation |
CN105989018A (en) * | 2015-01-29 | 2016-10-05 | 深圳市腾讯计算机系统有限公司 | Label generation method and label generation device |
CN105989018B (en) * | 2015-01-29 | 2020-04-21 | 深圳市腾讯计算机系统有限公司 | Label generation method and label generation device |
CN107341242A (en) * | 2017-07-06 | 2017-11-10 | 太原理工大学 | A kind of label recommendation method and system |
CN107833082A (en) * | 2017-09-15 | 2018-03-23 | 广州唯品会研究院有限公司 | A kind of recommendation method and apparatus of commodity picture |
CN108415971A (en) * | 2018-02-08 | 2018-08-17 | 兰州智豆信息科技有限公司 | Recommend the method and apparatus of supply-demand information using knowledge mapping |
CN108415971B (en) * | 2018-02-08 | 2021-07-23 | 兰州智豆信息科技有限公司 | Method and device for recommending supply and demand information by using knowledge graph |
CN108334625A (en) * | 2018-02-09 | 2018-07-27 | 深圳壹账通智能科技有限公司 | Processing method, device, computer equipment and the storage medium of user information |
CN108334625B (en) * | 2018-02-09 | 2020-05-29 | 深圳壹账通智能科技有限公司 | User information processing method and device, computer equipment and storage medium |
CN111221644A (en) * | 2018-11-27 | 2020-06-02 | 阿里巴巴集团控股有限公司 | Resource scheduling method, device and equipment |
CN111221644B (en) * | 2018-11-27 | 2023-06-13 | 阿里巴巴集团控股有限公司 | Resource scheduling method, device and equipment |
CN112966682A (en) * | 2021-05-18 | 2021-06-15 | 江苏联著实业股份有限公司 | File classification method and system based on semantic analysis |
Also Published As
Publication number | Publication date |
---|---|
CN102262653B (en) | 2013-09-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102262653A (en) | Label recommendation method and system based on user motivation orientation | |
Zhang et al. | Identification of the to-be-improved product features based on online reviews for product redesign | |
Cantador et al. | Categorising social tags to improve folksonomy-based recommendations | |
Liang et al. | Connecting users and items with weighted tags for personalized item recommendations | |
Welch et al. | Search result diversity for informational queries | |
CN112861990B (en) | Topic clustering method and device based on keywords and entities and computer readable storage medium | |
CN102004774A (en) | Personalized user tag modeling and recommendation method based on unified probability model | |
Anand et al. | Folksonomy-based fuzzy user profiling for improved recommendations | |
CN108021715B (en) | Heterogeneous label fusion system based on semantic structure feature analysis | |
Zhao et al. | WTL-CNN: A news text classification method of convolutional neural network based on weighted word embedding | |
CN102289514A (en) | Social label automatic labelling method and social label automatic labeller | |
Huang et al. | Multi-granular document-level sentiment topic analysis for online reviews | |
Yang et al. | Self-Attentive Neural Network for Hashtag Recommendation. | |
Hu et al. | EGC: A novel event-oriented graph clustering framework for social media text | |
Papadakis et al. | Content-based recommender systems taxonomy | |
CN116738068A (en) | Trending topic mining method, device, storage medium and equipment | |
Dattolo et al. | On social semantic relations for recommending tags and resources using folksonomies | |
CN105677830A (en) | Heterogeneous media similarity computing method and retrieving method based on entity mapping | |
Gorrab et al. | New hashtags’ weighting schemes for hashtag and user recommendation on twitter | |
Cuzzocrea et al. | An innovative user-attentive framework for supporting real-time detection and mining of streaming microblog posts | |
Ma et al. | Cross-media retrieval by cluster-based correlation analysis | |
Chen et al. | Multi-modal multi-layered topic classification model for social event analysis | |
Nagaraj et al. | A novel semantic level text classification by combining NLP and Thesaurus concepts | |
Marouf et al. | An integrated Approach to drive ontological structure from folksonomie | |
Shi et al. | An overview of sentence ordering task |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20130918 Termination date: 20140609 |