CN105138572B - Method and device for acquiring relevance weight of user tag - Google Patents

Method and device for acquiring relevance weight of user tag Download PDF

Info

Publication number
CN105138572B
CN105138572B CN201510446007.0A CN201510446007A CN105138572B CN 105138572 B CN105138572 B CN 105138572B CN 201510446007 A CN201510446007 A CN 201510446007A CN 105138572 B CN105138572 B CN 105138572B
Authority
CN
China
Prior art keywords
user
self
users
media content
user behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510446007.0A
Other languages
Chinese (zh)
Other versions
CN105138572A (en
Inventor
杨帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201510446007.0A priority Critical patent/CN105138572B/en
Publication of CN105138572A publication Critical patent/CN105138572A/en
Application granted granted Critical
Publication of CN105138572B publication Critical patent/CN105138572B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

the application discloses a method and a device for obtaining a relevance weight of a user tag. The method for acquiring the relevance weight of the user tag comprises the following steps: establishing a label database, wherein the label database comprises labels and category information corresponding to the labels; counting user behaviors and extracting user behavior events, wherein the user behavior events comprise user tags which are tags in a tag database and are associated with the user behaviors; and determining the relevance degree weight of the user label based on the user behavior event. According to the scheme of the application, the user label associated with the user behavior and the weight corresponding to the user label can be accurately obtained.

Description

Method and device for acquiring relevance weight of user tag
Technical Field
the present disclosure relates generally to data analysis technologies, and in particular, to a data analysis technology based on user behavior, and in particular, to a method and an apparatus for obtaining a relevance weight of a user tag.
background
In the internet, user behavior is often associated with content of interest to the user. For example, when a user is interested in a certain brand, organization, or business, he may be interested in microblogs of the brand, organization, or business, forward the content published by the brand, organization, or business to his friends, or comment on the content of the microblogs when the brand, organization, or business publishes microblogs.
Currently, the following three general behavior preferences are obtained accurately for a user:
a) Based on a machine learning method, user samples or user group samples are collected, then characteristic extraction and machine learning are carried out on sample user behaviors, and then a model is utilized to obtain user behavior interest labels.
b) The method comprises the steps of establishing a user behavior interest tag by a text-based keyword extraction technology, establishing a user relationship graph by using a user interaction relationship, and discovering user behavior interest by using methods like webpage ranking (PageRank) and the like.
c) User behavior interest discovery is performed based on an algorithm (LDA) of a topic model, and the method performs user behavior interest mining by using user relationship information and user label information.
however, the prior art as described above has the following drawbacks:
With respect to the above-mentioned scheme a), there is a limitation in that it is difficult for a specimen user having a commercial label to collect a specimen.
for the scheme b), although the scheme does not need to collect samples and has a good accuracy for mining wide interests and hobbies of users, the method has a low accuracy and is easy to cause misjudgment on labels which do not have a strong transfer effect.
For the scheme c), for the user behavior with weak theme, it is inconvenient to classify the seed words of the theme.
disclosure of Invention
In view of the foregoing defects or shortcomings in the prior art, it is desirable to provide a method and an apparatus for obtaining an association degree weight of a user tag, which can accurately obtain a user tag associated with a user behavior by counting the user behavior.
in a first aspect, an embodiment of the present application provides a method for obtaining an association degree weight of a user tag, including: establishing a label database, wherein the label database comprises labels and category information corresponding to the labels; counting user behaviors and extracting user behavior events, wherein the user behavior events comprise user tags which are tags in a tag database and are associated with the user behaviors; and determining the relevance degree weight of the user label based on the user behavior event.
in a second aspect, an embodiment of the present application further provides an apparatus for obtaining an association weight of a user tag, including: the system comprises a creating module, a searching module and a judging module, wherein the creating module is configured to establish a label database, and the label database comprises labels and category information corresponding to the labels; the extraction module is configured to count user behaviors and extract user behavior events, wherein the user behavior events comprise user tags, and the user tags are tags related to the user behaviors in a tag database; and the determining module is configured to determine the relevance degree weight of the user tag based on the user behavior event.
According to the scheme provided by the embodiment of the application, the user label associated with the user behavior and the weight corresponding to the user label can be accurately obtained.
In some implementation manners of the application, weights of user labels corresponding to different types of user behaviors can be calculated respectively, and then the weights corresponding to the user behaviors are superposed to obtain the weight of the user label.
In some implementation manners of the application, the obtained weight of the user behavior can be corrected based on the prior data, so that the finally obtained weight of the user label is more uniform with the actual preference of the user.
Drawings
other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a schematic flow chart diagram illustrating a method for obtaining a relevance weight of a user tag according to an embodiment of the present application;
FIG. 2 is a schematic block diagram illustrating the determination of a relevance weight of a user tag based on a user behavior event in FIG. 1;
Fig. 3 is a schematic structural diagram illustrating an apparatus for obtaining an association weight of a user tag according to an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
it should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Referring to fig. 1, a schematic flowchart of a method for obtaining a relevance weight of a user tag according to an embodiment of the present application is shown.
Specifically, in step 110, a tag database is established, wherein the tag database comprises tags and category information corresponding to the tags.
in some implementations, tags that meet preset conditions can be obtained from respective media platforms and added to a tag database. The self-media platform can comprise a microblog, a WeChat public platform, a WeChat friend circle, a forum of each website and the like. In acquiring the tags, for example, the authenticated official account numbers representing the brands, organizations or merchants in the self-media platforms can be added to the tag database as tags.
In addition, since the names of the tags may coincide, for example, the tag of "golf" may represent a model of an automobile, and may also represent a ball game such as golf. Therefore, in some implementations, in establishing the tag database, corresponding category information may be added to each tag, so as to clarify the category of the tag, so as to disambiguate the tag and avoid confusion among tags.
Next, in step 120, user behavior is counted and user behavior events are extracted.
in some implementations, the user behavior event can include, for example, a user tag that is a tag in a tag database that is associated with the user behavior.
In other implementations, the user behavior event may include at least one of a user name, a behavior occurrence time, and a user behavior category in addition to the user tag.
for example, user U at time T1 produces a behavior C1 associated with tag A in the tag database, for which user behavior events can be described by such a four-tuple of (U, T1, C1, A).
Next, in step 130, an association weight of the user tag is determined based on the user behavior event.
for example, the association degree weight of the user tag may be determined according to at least one of a category of the user behavior, a time when the user behavior occurs, a number of times the user behavior occurs, and the like.
in some implementations, the user behavior categories can include, for example, at least one of:
The user is interested in other users from the media platform. For example, if the other users of the self-media platform that the user U focuses on are official accounts of a certain brand, such "focusing" behavior of the user U can be considered to be associated with the tag corresponding to the brand, and thus, the tag becomes the user tag of the user U.
other users of interest to the user post self-media content. For example, if the self-media content published by another user K concerned by the user U includes content related to a tag g in the tag database, the user U may also be considered to be associated with the tag, and thus the tag becomes the user tag of the user U.
The user posts from media content. For example, if user U publishes self-media content that mentions a certain brand, user U may also be considered to be associated with a tag corresponding to the brand, and thus, the tag becomes the user tag of user U.
Users post comments for self-media content posted from other users of the media platform. For example, if user U mentions a certain brand when commenting on self-media content posted by other users, user U may also be considered to be associated with a tag corresponding to the brand, and thus, the tag becomes a user tag of user U.
Users forward self-media content published from other users of the media platform. For example, if a user mentions a certain brand when forwarding self-media content published by other users, user U may also be considered to be associated with a tag corresponding to the brand, and thus, the tag becomes the user tag of user U.
In some implementations, determining the relevance weight of the user tag based on the user behavior event may include, for example, step 131, superimposing weights of user behaviors associated with the user tag as the relevance weight of the user tag.
referring to fig. 2, there is shown a schematic block diagram 200 of step 130 in fig. 1, namely, determining a relevance weight of a user tag based on a user behavior event.
in fig. 2, the user tag is shown to be associated with by superimposing the weights of the user behaviors associated with the user tag as the association weight of the user tag.
For example, in 201, the user focuses on an official account of a brand from other users of the media platform, and an association degree f 1 between the user and a tag corresponding to the brand may be calculated based on the behavior of the user (211).
Similarly, in 202, the self-media contents published by other users concerned by the user U include contents related to a certain tag g in the tag database, and then the degree of association f 2 between the user U and the tag g can be calculated based on the behavior of the user (212).
Similarly, if the user published self-media content mentions a brand 203, then the degree of association f 3 between the user and the label corresponding to the brand may be calculated based on the behavior of the user (213).
similarly, at 204, a user mentions a brand in comments posted from media content published from other users of the media platform, then a degree of association f 4 between the user and a tag corresponding to the brand may be calculated based on the behavior of the user (214).
Similarly, in 205, a user mentions a brand from media content published by other users of the media platform, and then a degree of association f 5 between the user and a tag corresponding to the brand may be calculated based on the behavior of the user (215).
after the association degrees f 1 -f 5 of various user behaviors associated with a certain user tag are obtained, the final association degree weight (230) of the user tag and the user can be obtained by accumulating (220) f 1 -f 5.
In some implementations, when the user behavior includes a user's attention to other users from the media platform, the weight f 1 of the user behavior can be calculated by equation (1):
f1(g)=(Cg+TFIDFg)×Tg/2 (1)
Wherein C g is a hierarchical weight, the value of C g is a non-negative real number, and the value of C g is greater when the user sets category information for other users of the self-media platform to which the user is interested than the value of C g when the user does not set category information for other users of the self-media platform to which the user is interested.
for example, when the user U sets category information for an official account of a brand concerned by the user U, the value of C g may be set to 1, and if no category information is set for the official account, the value of C g may be 0.
Here, the category information may not be specifically limited, and as long as the user U sets the category information for the official account, it may be considered that the degree of attention of the user U to the official account is higher than the degree of attention of other attention objects to which the category information is not set, and further, the degree of association weight between the tag corresponding to the official account and the user is higher.
TFIDF g characterizes the interest relationship of all users in the sample for label g in some implementations, TFIDF g can be calculated by equation (2) below:
TFIDFg=countUser(g)/lg(totalUser+0.01) (2)
wherein totalUser is the number of users of the sample, countUser is the number of users associated with the user tag in the sample, and the operator lg represents the base-10 logarithm operation.
T g is a temporal weight, which in some implementations can be calculated by equation (3) below:
Tg=(Tc-T0)/(Tnow-T0) (3)
Wherein, T c is the user behavior occurrence date, T 0 is the preset initial date, and T now is the current date.
for example, the date of the official account corresponding to tag g is 5/8/2014, the preset initial date is 1/2009, and the current date is 20/2015, then, in some implementations, T 0 is 20090101, T c is 20140508, and T now is 20150720.
in some implementations, when the user behavior includes self-media content published by other users interested by the user, the weight f 2 of the user behavior can be calculated by the following formula (4):
f2(g)=close_f2(g)×credibility_f2(g) (4)
2For example, if the number and/or frequency of the associated behaviors between the user A and the user B exceed a preset value, the user A and the user B can be considered to have higher affinity.
For example, if the number and/or frequency of the association behaviors between the user a and the user B exceed a preset value, the user B may be considered as a close friend of the user a, that is, the user B belongs to the close friend set co of the user a, at this time, the value of close _ f 2 (g) may be obtained based on the following formula (5):
2In some implementations, user B may be deemed more trustworthy, for example, when the behavior of user B meets at least one condition, such as if the number of self-media content published by user B exceeds a predetermined number, the number of other users from the media platform that user B is interested in exceeds a predetermined number of users, and so forth.
In these implementations, creatibiity _ f 2 (g) may be calculated by the following equation (6):
here, B may be a behavior parameter of the behavior of the user B set in advance, and max is a predetermined threshold corresponding to the behavior parameter.
In some implementations, when the user behavior includes user post-published self-media content, the weight of the user behavior can be calculated by the following formula (7):
f3(g)=(Lg×Tg+Atg)/2 (7)
for example, when the self-media content published by the user has a positive emotional tendency, the value of L g is 1, and when the self-media content published by the user has a negative emotional tendency, the value of L g is-1.
T g is a time weight, and the value of T g can be calculated using equation (3) as described above.
At g is a focused attention mark, when the user pays attention to the user tag in the self-media content published by the user, At g is taken as 1, otherwise At g is taken as 0. in some implementations, for example, when the self-media content published by the user contains a symbol representing the focused attention (for example, the user contains a symbol of "@" in the self-media content published by the microblog platform, and an official account corresponding to the tag g is followed by the "@" symbol), the value of Atg may be set as 1, otherwise, when the self-media content published by the user does not contain the symbol representing the focused attention, the value of At g may be set as 0.
In some implementations, when the user behavior includes comments posted by the user on self-media content posted by other users of the self-media platform, the weight of the user behavior can be calculated by the following formula (8):
f4(g)=Lg×IRg×(Tg+Atg)/2 (8)
l g is emotion tendency identification, when the self-media content published by the user has positive emotion tendency, L g takes a value of 1, and when the self-media content published by the user has negative emotion tendency, L g takes a value of-1.
The IR g textual reference mark is that when a user posts comments on self-media content published from other users of the media platform and/or the user forwards content published from the media content published from other users of the media platform directly to the self-media content published from other users, the IR g is 1, otherwise 0.
T g is a time weight and can be calculated using equation (3) as described above.
The At g is an attention-focused mark, when a user pays attention to a user tag in self-media content published by the user, the value of At g is 1, otherwise, the value of At g is 0.
In some implementations, when the user behavior includes self-media content published by the user forwarded from other users of the media platform, the weight f 5 (g) of the user behavior may be calculated using the above equation (8), i.e., f 5 (g) ═ L g × IR g × (T g + At g)/2.
in some implementations, determining the relevance weight of the user tag based on the user behavior event can further include step 132 of correcting the weight of the user behavior based on a predetermined confidence factor. Here, the predetermined confidence factor may be associated with a confidence level of the user behavior. In some implementations, different confidence factors may be configured for different categories of user behavior. For example, the value of the confidence factor a may be determined by the following equation (9):
That is, when the user behavior is that the user pays attention to other users of the media platform, that other users who the user pays attention to publish the self-media content, or that the user publishes the self-media content, the value of the confidence factor a is 0.4, and when the user behavior is that the user pays attention to the self-media content published by other users of the media platform and/or forwards, the value of the confidence factor a is 0.6.
In some implementations, determining the relevance weight of the user tag based on the user behavior event can further include step 133 of correcting the weight of the user behavior based on a predetermined accuracy factor.
for example, a part of the number of users may be selected from the sample, and the accuracy of the weight of each type of user behavior may be determined based on the statistics of each type of user behavior and the calculated weight corresponding to the user behavior.
In some implementations, the association weight of the user U for the user tag g can be expressed, for example, in the following formula (10):
Where f i is a weight for each type of user behavior, A i is a confidence factor corresponding to the type of user behavior, and Z i is an accuracy factor corresponding to the type of user behavior.
referring to fig. 3, a schematic structural diagram 300 of the apparatus for obtaining a relevance weight of a user tag according to the embodiment of the present application is shown.
as shown in fig. 3, the means for obtaining the association degree weight of the user tag includes a creating module 310, an extracting module 320, and a determining module 330.
wherein the creation module 310 may be configured to create a tag database, the tag database including tags and category information corresponding to the tags.
The extraction module 320 may be configured to count user behaviors and extract user behavior events, where in some implementations, a user behavior event may include, for example, a user tag that is a tag in a tag database that is associated with a user behavior. Alternatively, in other implementations, the user behavior event may include at least one of a user name, a behavior occurrence time, and a user behavior category in addition to the user tag.
the determination module 330 may be configured to determine an association weight for a user tag based on a user behavior event.
in some implementations, the user behavior categories can include, for example, at least one of user interest in other users of the media platform, user interest in other users posted from media content, user posted comments from media content posted for other users from the media platform, user forwarded from media content posted by other users of the media platform.
In some implementations, the determining module 330 may be further configured to superimpose the weight of each user behavior associated with a user tag as the association weight of the user tag.
in some implementations, when the user behavior includes user attention to other users from the media platform, the determining module 330 may be further configured to determine the weight of the user behavior based on f 1 (g) ═ C g + TFIDF g) × T g/2.
Wherein C g is a hierarchical weight, the value of C g is a non-negative real number, and the value of C g is greater when the user sets category information for other users of the self-media platform to which the user is interested than the value of C g when the user does not set category information for other users of the self-media platform to which the user is interested.
TFIDF g ═ countUser (g)/lg (totalUser +0.01), where countUser is the number of users in the sample associated with the user tag and totalUser is the number of users in the sample.
T g is time weight, and T g ═ T c -T 0/T now -T 0, where T c is user behavior occurrence day, T 0 is preset inception date, and T now is current date.
In some implementations, when the user behavior includes self-media content published by other users of interest to the user, the determining module 330 may be further configured to determine a weight of the user behavior based on f 2 (g) ═ close _ f 2 (g) × credit _ f 2 (g).
Wherein close _ f 2 (g) is the affinity between the user and other users of interest to the user, and credit _ f 2 (g) is the credibility of other users.
in some implementations, when the user behavior includes user published self media content, the determining module 330 may be further configured to determine a weight of the user behavior based on f 3 (g) — (L g × T g + At g)/2.
wherein, L g is an emotional tendency mark, when the self-media content published by the user has a positive emotional tendency, the value of L g is 1, and when the self-media content published by the user has a negative emotional tendency, the value of L g is-1.
t g is time weight, and T g is (T c -T 0)/(T now -T 0), where T c is user behavior occurrence date, T 0 is preset initial date, and T now is current date.
The At g is an attention-focused mark, when a user pays attention to a user tag in self-media content published by the user, the value of At g is 1, otherwise, the value of At g is 0.
in some implementations, when the user behavior includes user comments posted from media content posted from other users of the media platform and/or user forwarded from media content posted from other users of the media platform, the determination module 330 may be further configured to determine a weight of the user behavior based on f 4 (g) ═ L g × IR g × (T g + At g)/2.
Wherein, L g is an emotional tendency mark, when the self-media content published by the user has a positive emotional tendency, the value of L g is 1, and when the self-media content published by the user has a negative emotional tendency, the value of L g is-1.
the IR g textual reference identifies that when a user posts comments for self-media content published from other users of the media platform and/or the user forwards content from self-media content published from other users of the media platform directly to self-media content published by other users, the IR g is 1, otherwise it is 0.
t g is time weight, and T g is (T c -T 0)/(T now -T 0), where T c is user behavior occurrence date, T 0 is preset initial date, and T now is current date.
The At g is an attention-focused mark, when a user pays attention to a user tag in self-media content published by the user, the value of At g is 1, otherwise, the value of At g is 0.
In some implementations, the determining module 330 may be further configured to correct the weight of the user behavior based on a predetermined confidence factor. Here, the predetermined confidence factor may be associated with the confidence of the user behavior, for example.
In some implementations, when the user behavior includes self-media content published by the user forwarded from other users of the media platform, the weight f 5 (g) of the user behavior may be performed in a similar manner as the calculation of f 4 (g), i.e., f 5 (g) ═ L g × IR g × (T g + At g)/2.
In some implementations, the determination module 330 may be further configured to correct the weight of the user behavior based on a predetermined accuracy factor.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present application may be implemented by software or hardware. The described units or modules may also be provided in a processor, and may be described as: a processor includes a creation module, an extraction module, and a determination module. Where the names of such units or modules do not in some cases constitute a limitation of the unit or module itself, for example, the creation module may also be described as a "module for building a tag database".
As another aspect, the present application also provides a computer-readable storage medium, which may be the computer-readable storage medium included in the apparatus in the above-described embodiments; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the formula input methods described herein.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (20)

1. A method for obtaining a relevance weight of a user tag is characterized by comprising the following steps:
Establishing a label database, wherein the label database comprises labels and category information corresponding to the labels;
Counting user behaviors and extracting user behavior events, wherein the user behavior events comprise user tags which are tags in the tag database and are associated with the user behaviors; and
Determining a relevance weight of a user tag based on the user behavior event;
under the condition that a preset condition is met, positive correlation is formed between the relevance weight of the user label and the time weight Tg, Tg = (Tc-T0)/(Tnow-T0), Tc is a user behavior occurrence date, T0 is a preset initial date, and Tnow is a current date;
Wherein the preset condition comprises at least one of the following:
The user behavior comprises that the user is interested in other users of the media platform;
the user behavior comprises the user publishing self-media content;
The user behavior comprises that the user posts comments aiming at the self-media content published by other users of the self-media platform and/or forwards the self-media content published by other users of the self-media platform.
2. the method of claim 1, wherein the user behavior event further comprises at least one of:
User name, action occurrence time, and user action category.
3. the method of claim 2, wherein the user behavior category comprises at least one of:
The user is interested in other users from the media platform;
Other users concerned by the user publish self-media content;
the user posts from media content;
The user posts comments aiming at self-media contents posted by other users of the self-media platform; and
The user forwards self-media content published by other users of the media platform.
4. The method of claim 1, wherein determining the relevancy weight of a user tag based on the user behavior event comprises:
And superposing the weight of each user behavior associated with the user label as the association degree weight of the user label.
5. the method of claim 3, wherein when the user behavior comprises the user's attention to other users from the media platform, the user behavior is weighted by:
f1(g)=(Cg+TFIDFg)×Tg/2;
Wherein C g is a hierarchical weight, the value of C g is a non-negative real number, and the value of C g is greater than the value of Cg when the user sets category information for other users of the self-media platform to which the user is interested;
TFIDF g is the concern of all users in a sample for tag g, TFIDF g = countUser (g)/lg (totalUser +0.01), where totalUser is the number of users in the sample and countUser is the number of users in the sample associated with the user tag.
6. The method of claim 3, wherein when the user behavior comprises self-media content published by other users concerned by the user, the user behavior is weighted by:
f2(g)=close_f2(g)×credibility_f2(g);
Wherein close _ f 2 (g) is the closeness between the user and the other users of interest;
credit _ f2(g) is the trustworthiness of the other users.
7. The method of claim 3, wherein when the user behavior comprises the user publishing self-media content, the user behavior is weighted by:
f3(g)=(Lg×Tg+Atg)/2;
L g is an emotional tendency identifier, Lg takes a value of 1 when the self-media content published by the user has a positive emotional tendency, and L g takes a value of-1 when the self-media content published by the user has a negative emotional tendency;
The At g is an attention-focused mark, when the user pays attention to the user tag in the self-media content published by the user, the value of At g is 1, otherwise, the value of At g is 0.
8. The method of claim 3, wherein when the user behavior comprises the user posting comments for and/or forwarding the self-media content posted by other users of the media platform, the user behavior is weighted by:
f4(g)=Lg×IRg×(Tg+Atg)/2;
l g is an emotional tendency identifier, when the self-media content published by the user has a positive emotional tendency, the value of L g is 1, and when the self-media content published by the user has a negative emotional tendency, the value of L g is-1;
The IR g is a textual pointing identifier, when the user posts comments on self-media content published by other users of the media platform and/or the content of the self-media content published by other users forwarded by the user from the media platform directly points to the self-media content published by other users, the IR g is 1, otherwise, the content is 0;
The At g is an attention-focused mark, when the user pays attention to the user tag in the self-media content published by the user, the value of At g is 1, otherwise, the value of At g is 0.
9. The method of any of claims 1-8, wherein determining the relevancy weight for a user tag based on the user behavior event further comprises:
Correcting the weight of the user behavior based on a preset credibility factor;
wherein the predetermined confidence factor is associated with a confidence level of the user behavior.
10. The method of claim 9, wherein determining the relevancy weight of a user tag based on the user behavior event further comprises:
Correcting the weight of the user behavior based on a predetermined accuracy factor.
11. an apparatus for obtaining a relevance weight of a user tag, comprising:
The system comprises a creating module, a searching module and a judging module, wherein the creating module is configured to establish a label database, and the label database comprises labels and category information corresponding to the labels;
the extraction module is configured to count user behaviors and extract a user behavior event, wherein the user behavior event comprises a user tag, and the user tag is a tag associated with the user behaviors in the tag database; and
The determining module is configured to determine a relevance weight of the user tag based on the user behavior event;
Under the condition that a preset condition is met, positive correlation is formed between the relevance weight of the user label and the time weight Tg, Tg = (Tc-T0)/(Tnow-T0), Tc is a user behavior occurrence date, T0 is a preset initial date, and Tnow is a current date;
Wherein the preset condition comprises at least one of the following:
The user behavior comprises that the user is interested in other users of the media platform;
The user behavior comprises the user publishing self-media content;
the user behavior comprises that the user posts comments aiming at the self-media content published by other users of the self-media platform and/or forwards the self-media content published by other users of the self-media platform.
12. The apparatus of claim 11, wherein the user behavior event further comprises at least one of:
User name, action occurrence time, and user action category.
13. the apparatus of claim 12, wherein the user behavior category comprises at least one of:
The user is interested in other users from the media platform;
other users concerned by the user publish self-media content;
the user posts from media content;
The user posts comments aiming at self-media contents posted by other users of the self-media platform; and
The user forwards self-media content published by other users of the media platform.
14. The apparatus of claim 11, wherein the determining module is further configured to:
and superposing the weight of each user behavior associated with the user label as the association degree weight of the user label.
15. The apparatus of claim 13, wherein the determining module is further configured for determining the weight of the user behavior based on f 1 (g) = (C g + TFIDF g) × T g/2 when the user behavior includes the user's attention to other users from the media platform;
wherein C g is a hierarchical weight, the value of C g is a non-negative real number, and the value of C g is greater when the user sets category information for other users of the self-media platform concerned by the user than the value of C g when the user does not set category information for other users of the self-media platform concerned by the user;
TFIDF g is the concern of all users in the sample for tag g, TFIDFg = countUser (g)/lg (totalUser +0.01), where countUser is the number of users in the sample associated with the user tag and totalUser is the number of users in the sample.
16. The apparatus of claim 13, wherein the determining module is further configured to determine the weight of the user behavior based on f 2 (g) = close _ f 2 (g) × credit _ f 2 (g) when the user behavior includes self-media content published by other users of interest to the user;
Wherein close _ f 2 (g) is the closeness between the user and the other users of interest;
credit _ f 2 (g) is the trustworthiness of the other users.
17. the apparatus of claim 13, wherein the determining module is further configured for determining a weight of the user behavior based on f 3 (g) = (L g x T g + At g)/2 when the user behavior comprises the user publishing media content;
wherein, L g is an emotional tendency identifier, when the self-media content published by the user has a positive emotional tendency, the value of L g is 1, and when the self-media content published by the user has a negative emotional tendency, the value of L g is-1;
The At g is an attention-focused mark, when the user pays attention to the user tag in the self-media content published by the user, the value of At g is 1, otherwise, the value of At g is 0.
18. The apparatus of claim 13, wherein the determining module is further configured for determining the weight of the user behavior based on f 4 (g) = L g x IR g x (T g + At g)/2 when the user behavior includes the user posting comments for self media content posted from other users of the media platform and/or the user forwarding self media content posted from other users of the media platform;
Wherein, L g is an emotional tendency identifier, when the self-media content published by the user has a positive emotional tendency, the value of L g is 1, and when the self-media content published by the user has a negative emotional tendency, the value of L g is-1;
The IR g is a textual pointing identifier, when the user posts comments on self-media content published by other users of the media platform and/or the content of the self-media content published by other users forwarded by the user from the media platform directly points to the self-media content published by other users, the IR g is 1, otherwise, the content is 0;
the At g is an attention-focused mark, when the user pays attention to the user tag in the self-media content published by the user, the value of At g is 1, otherwise, the value of At g is 0.
19. The apparatus of any of claims 11-18, wherein the determining module is further configured to:
Correcting the weight of the user behavior based on a preset credibility factor;
Wherein the predetermined confidence factor is associated with a confidence level of the user behavior.
20. the apparatus of claim 19, wherein the determining module is further configured to:
Correcting the weight of the user behavior based on a predetermined accuracy factor.
CN201510446007.0A 2015-07-27 2015-07-27 Method and device for acquiring relevance weight of user tag Active CN105138572B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510446007.0A CN105138572B (en) 2015-07-27 2015-07-27 Method and device for acquiring relevance weight of user tag

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510446007.0A CN105138572B (en) 2015-07-27 2015-07-27 Method and device for acquiring relevance weight of user tag

Publications (2)

Publication Number Publication Date
CN105138572A CN105138572A (en) 2015-12-09
CN105138572B true CN105138572B (en) 2019-12-10

Family

ID=54723921

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510446007.0A Active CN105138572B (en) 2015-07-27 2015-07-27 Method and device for acquiring relevance weight of user tag

Country Status (1)

Country Link
CN (1) CN105138572B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105872593A (en) * 2016-03-21 2016-08-17 乐视网信息技术(北京)股份有限公司 Barrage pushing method and device
CN108512674B (en) * 2017-02-24 2021-03-23 百度在线网络技术(北京)有限公司 Method, device and equipment for outputting information
CN111768213B (en) * 2020-09-03 2021-02-19 耀方信息技术(上海)有限公司 User label weight evaluation method
CN112650931B (en) * 2021-01-04 2023-05-30 杭州情咖网络技术有限公司 Content recommendation method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102654860A (en) * 2011-03-01 2012-09-05 北京彩云在线技术开发有限公司 Personalized music recommendation method and system
CN102760163A (en) * 2012-06-12 2012-10-31 奇智软件(北京)有限公司 Personalized recommendation method and device of characteristic information
CN102867016A (en) * 2012-07-18 2013-01-09 北京开心人信息技术有限公司 Label-based social network user interest mining method and device
CN103279533A (en) * 2013-05-31 2013-09-04 北京华悦博智科技有限责任公司 Method and system for social relationship recommendation
CN104035957A (en) * 2014-04-14 2014-09-10 百度在线网络技术(北京)有限公司 Search method and device
WO2015021937A1 (en) * 2013-08-14 2015-02-19 腾讯科技(深圳)有限公司 Method and device for user recommendation
CN104750789A (en) * 2015-03-12 2015-07-01 百度在线网络技术(北京)有限公司 Label recommendation method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102654860A (en) * 2011-03-01 2012-09-05 北京彩云在线技术开发有限公司 Personalized music recommendation method and system
CN102760163A (en) * 2012-06-12 2012-10-31 奇智软件(北京)有限公司 Personalized recommendation method and device of characteristic information
CN102867016A (en) * 2012-07-18 2013-01-09 北京开心人信息技术有限公司 Label-based social network user interest mining method and device
CN103279533A (en) * 2013-05-31 2013-09-04 北京华悦博智科技有限责任公司 Method and system for social relationship recommendation
WO2015021937A1 (en) * 2013-08-14 2015-02-19 腾讯科技(深圳)有限公司 Method and device for user recommendation
CN104035957A (en) * 2014-04-14 2014-09-10 百度在线网络技术(北京)有限公司 Search method and device
CN104750789A (en) * 2015-03-12 2015-07-01 百度在线网络技术(北京)有限公司 Label recommendation method and device

Also Published As

Publication number Publication date
CN105138572A (en) 2015-12-09

Similar Documents

Publication Publication Date Title
CN109325165B (en) Network public opinion analysis method, device and storage medium
US20160110648A1 (en) Determining trustworthiness and compatibility of a person
US10303731B2 (en) Social-based spelling correction for online social networks
US9201880B2 (en) Processing a content item with regard to an event and a location
US11122009B2 (en) Systems and methods for identifying geographic locations of social media content collected over social networks
US20130304818A1 (en) Systems and methods for discovery of related terms for social media content collection over social networks
US20180089541A1 (en) Training Image-Recognition Systems Using a Joint Embedding Model on Online Social Networks
US20130297581A1 (en) Systems and methods for customized filtering and analysis of social media content collected over social networks
US20130297694A1 (en) Systems and methods for interactive presentation and analysis of social media content collection over social networks
CN104376010B (en) User recommendation method and device
CN108305180B (en) Friend recommendation method and device
CN106126582A (en) Recommend method and device
US8965867B2 (en) Measuring and altering topic influence on edited and unedited media
US9712578B2 (en) Determining stories of interest based on quality of unconnected content
CN106886518A (en) A kind of method of microblog account classification
CN112771564A (en) Artificial intelligence engine that generates semantic directions for web sites to map identities for automated entity seeking
US20140147048A1 (en) Document quality measurement
Ilina et al. Social event detection on twitter
US20180032907A1 (en) Detecting abusive language using character n-gram features
CN105138572B (en) Method and device for acquiring relevance weight of user tag
CN105893484A (en) Microblog Spammer recognition method based on text characteristics and behavior characteristics
US20130066894A1 (en) Information processing system, information processing method, program, and non-transitory information storage medium
CN112989824A (en) Information pushing method and device, electronic equipment and storage medium
CN109446393B (en) Network community topic classification method and device
CN110633408A (en) Recommendation method and system for intelligent business information

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant