CN102867016A - Label-based social network user interest mining method and device - Google Patents

Label-based social network user interest mining method and device Download PDF

Info

Publication number
CN102867016A
CN102867016A CN2012102495828A CN201210249582A CN102867016A CN 102867016 A CN102867016 A CN 102867016A CN 2012102495828 A CN2012102495828 A CN 2012102495828A CN 201210249582 A CN201210249582 A CN 201210249582A CN 102867016 A CN102867016 A CN 102867016A
Authority
CN
China
Prior art keywords
label
user
social networks
interest
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012102495828A
Other languages
Chinese (zh)
Inventor
薛晔伟
马振江
伍星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING KAIXINREN INFORMATION TECHNOLOGY Co Ltd
Original Assignee
BEIJING KAIXINREN INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING KAIXINREN INFORMATION TECHNOLOGY Co Ltd filed Critical BEIJING KAIXINREN INFORMATION TECHNOLOGY Co Ltd
Priority to CN2012102495828A priority Critical patent/CN102867016A/en
Publication of CN102867016A publication Critical patent/CN102867016A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a label-based social network user interest mining method and a label-based social network user interest mining device. The label-based social network user interest mining method comprises the following steps of: collecting all data of a user in a social network, wherein the data comprises textual data generated by the user in the social network and association relationship data of the user in the social network with the textual content; acquiring all labels included in the textual data generated by the user and a weight corresponding to each label; converting the association relationship data of the user with the textual content into a user-relationship chain form; combining the labels of the user on all contents to obtain a label interest column of the user; vectorizing the label interest column; abstracting a label interest vector to obtain a low- category interest vector and a high-category interest vector; and providing corresponding interest information for the social network user according to a specific requirement of the application scene based on the low-category interest vector, the high-category interest vector and the label, thereby realizing social network user interest mining.

Description

A kind of social networks Users' Interests Mining method and apparatus based on label
Technical field
The present invention relates to the Web mining field, particularly a kind of social networks Users' Interests Mining method and apparatus based on label.
Background technology
Existing user interest obtains with the method for using and mainly is divided into following several different system:
1, related system: the method for this individual system is utilized the direct correlation that produces between user and the entity, adopts the methods of comforming such as collaborative filtering, may be interested with the user but related entity and this user-association not yet occur get up.
Advantage: simple and clear, often can obtain good judged result to user and the entity of belongingness;
Shortcoming: can not judge minority user's interest; User's interest can not be directly defined, user's fancy grade can only be judged for concrete entity.
2, entity tag system: the technological means that this individual system adopts is to provide input in the position that entity occurs, and the guiding user is briefly described entity, these Short Descriptions languages is collected and as the label of this entity.
Advantage: cost is little, only needs to provide function, and label is generated by the user;
Shortcoming: label can't standard, utilizes difficulty large; The user mostly is reluctant to fill in; For single entity, the number of labels of collection is few, describes also imperfect; Can't directly be described user interest.
3, taxonomic hierarchies: this individual system is set the classification of some user interests, when user's registration or using product first, requires the user therefrom to select several interested classifications, with this judgement as user interest;
Advantage: the user conflicts little, can directly define user interest;
Shortcoming: must do equally classification map to entity in advance, if physical quantities is more, can face the large and low problem of precision of mapping cost; Dirigibility is inadequate, can not correctly reflect the variation of user interest; Classification quantity is restricted, and can not carry out slightly careful description to user interest.
Summary of the invention
The objective of the invention is for the problems referred to above, propose a kind ofly take Users' Interests Mining method and the device of label as the basis, on the basis that solves the prior art defective, can excavate to greatest extent the interest of user in the social networks.
For achieving the above object, the invention provides a kind of social networks Users' Interests Mining method based on label, comprising:
Collect the data of user on social networks;
Be listed as according to data generating labels interest; Described label interest is classified all labels and the accordingly set of weight of label in the described data as;
Carry out user interest information according to label interest row and recommend, realized the social networks Users' Interests Mining.
Optionally, in one embodiment of the invention, describedly carry out user interest information according to label interest row and recommend to comprise:
Described label interest column vector obtains the label interest vector; The label interest vector is carried out abstract, obtain abstract result; Carrying out user interest information according to abstract result recommends.
Optionally, in one embodiment of the invention, described the label interest vector is carried out abstract comprising:
Be that abstract classification, abstract classification are that the best property of attribute mapping relation of this classification of various types of destination aggregation (mda) and label is divided into the abstract classification of low level and high-level abstract classification with the label interest vector according to the attribute of label, label in the label interest vector is assigned in the corresponding classification, merge label and respective weights in the corresponding abstract classification, obtain rudimentary classification interest vector and senior classification interest vector; According to the concrete needs of application scenarios, provide corresponding interest information according to rudimentary classification interest vector, senior classification interest vector and label for the social networks user, realize the social networks Users' Interests Mining.
Optionally, in one embodiment of the invention, described data comprise: the incidence relation data of user and character property content on the character property data that the user generates on the social networks and the social networks.
Optionally, in one embodiment of the invention, described being listed as according to data generating labels interest comprises:
Obtain accordingly weight of all labels that character property data that user on the described social networks generates comprise and each label institute; The incidence relation data of user on the described social networks and character property content are converted to user-concern chain form; Merge the label interest row that the label of user on all the elements obtains the user.
Optionally, in one embodiment of the invention, the method also comprises: user's all data on social networks of collecting are cleaned.
Optionally, in one embodiment of the invention, described cleaning comprises: the filtering advertisements content, to long literal only get front 500 words as analytic target and adopt label initiatively the method for matching content filter flame.
Optionally, in one embodiment of the invention, the character property data that the user generates on the described social networks are divided into title and content; Take string matching algorithm to obtain all labels that comprise in the character property data that user on the described social networks generates, according to the occurrence number of each label as the weight of this content on this label.
Optionally, in one embodiment of the invention, described title and content all comprise in the same label, and the weight of this label is that it is in title gained weight and in content gained weight sum.
Optionally, in one embodiment of the invention, the character property data that the user generates on the described social networks are expressed as {<T i, TF i,<T j, TF j...,<T k, TF k; Wherein, T iRepresent certain label, TF iRepresent label T iWeight in content.
Optionally, in one embodiment of the invention, the user of the incidence relation data of user and character property content on the described social networks-concern that chain is expressed as: U-〉{ C 1, C 2, C 3...; Wherein, U represents certain user, C iThe content of representative and the relevant relation of user U.
Optionally, in one embodiment of the invention, described label interest list is shown U-〉{<T i, ∑ TF i,<T j, ∑ TF j...,<T k, ∑ TF k.
Optionally, in one embodiment of the invention, also comprise time weighting WT in the incidence relation data of user and character property content on the described social networks iWT iExpression user U is at content C iThe time score that upper incidence relation is established, the then user of the incidence relation data of user and character property content on the described social networks-concern that chain is expressed as: U-〉{<C i, WT i,<C j, WT j...,<C k, WT k>; Described label interest list is shown U-〉{<T i, W i,<T j, W j...,<T k, W k; Wherein, W is expressed as the weight of word frequency and time factor.
Optionally, in one embodiment of the invention, described label interest vector is expressed as V-〉{ S 1, S 2..., S i..., S n, vectorial V represents user's interest, S iRepresent this vector at label T iCoordinate on the dimension is if user U has label T i, S then iValue be W i, otherwise, S iValue be that 0, n represents total number of tags.
Optionally, in one embodiment of the invention, described label T iOccurrence number is DF in user U iIf user U has label T i, S then iValue be W i/ DF iOtherwise, S iValue be 0.
For achieving the above object, the present invention also provides a kind of social networks Users' Interests Mining device based on label, comprising:
Data collection module is used for collecting the data of user on social networks;
Label interest column-generation unit is used for being listed as according to data generating labels interest; Described label interest is classified all labels and the accordingly set of weight of label in the described data as;
The interest digging unit is used for carrying out user interest information according to label interest row and recommends, and has realized the social networks Users' Interests Mining.
Optionally, in one embodiment of the invention, described label interest column-generation unit comprises:
Label interest vector generation module is used for described label interest column vector and obtains the label interest vector;
Label interest vector abstract module, abstract for the label interest vector is carried out, obtain abstract result;
The interest recommending module is used for carrying out user interest information according to abstract result and recommends.
Optionally, in one embodiment of the invention, described label interest vector abstract module is that abstract classification, abstract classification are that the best property of attribute mapping relation of this classification of various types of destination aggregation (mda) and label is divided into the abstract classification of low level and high-level abstract classification with the label interest vector according to the attribute of label, label in the label interest vector is assigned in the corresponding classification, merge label and respective weights in the corresponding abstract classification, obtain rudimentary classification interest vector and senior classification interest vector; Described interest recommending module provides corresponding interest information according to rudimentary classification interest vector, senior classification interest vector and label for the social networks user according to the concrete needs of application scenarios, realizes the social networks Users' Interests Mining.
Optionally, in one embodiment of the invention, the data that described data collection module is collected comprise: the incidence relation data of user and character property content on the character property data that the user generates on the social networks and the social networks.
Optionally, in one embodiment of the invention, described label interest column-generation unit obtains accordingly weight of all labels that character property data that user on the described social networks generates comprise and each label institute; The incidence relation data of user on the described social networks and character property content are converted to user-concern chain form; Merge the label interest row that the label of user on all the elements obtains the user.
Optionally, in one embodiment of the invention, this system also comprises the data cleansing unit that cleans for to user's all data on social networks of collecting.
Optionally, in one embodiment of the invention, the content that described data cleansing unit cleans comprises: the filtering advertisements content, to long literal only get front 500 words as analytic target and adopt label initiatively the method for matching content filter flame.
Optionally, in one embodiment of the invention, the character property data that the user generates on the social networks that described data collection module is collected are divided into title and content; Take string matching algorithm to obtain all labels that comprise in the character property data that user on the described social networks generates, according to the occurrence number of each label as the weight of this content on this label.
Optionally, in one embodiment of the invention, described title and content all comprise in the same label, and the weight of this label is that it is in title gained weight and in content gained weight sum.
Optionally, in one embodiment of the invention, the character property data that the user generates on the social networks that described data collection module is collected are expressed as {<T i, TF i,<T j, TF j...,<T k, TF k; Wherein, T iRepresent certain label, TF iRepresent label T iWeight in content.
Optionally, in one embodiment of the invention, the incidence relation data of user and character property content are expressed as on the social networks that described data collection module is collected: U-〉{ C 1, C 2, C 3...; Wherein, U represents certain user, C iThe content of representative and the relevant relation of user U.
Optionally, in one embodiment of the invention, the label interest list that described label interest column-generation unit obtains is shown U-〉{<T i, ∑ TF i,<T j, ∑ TF j...,<T k, ∑ TF k.
Optionally, in one embodiment of the invention, also comprise time weighting WT in the incidence relation data of user and character property content on the social networks that described data collection module is collected iWT iExpression user U is at content C iThe time score that upper incidence relation is established, the then user of the incidence relation data of user and character property content on the described social networks-concern that chain is expressed as: U-〉{<C i, WT i,<C j, WT j...,<C k, WT k; Described label interest list is shown U-〉{<T i, W i,<T j, W j...,<T k, W k; Wherein, W is expressed as the weight of word frequency and time factor.
Optionally, in one embodiment of the invention, the label interest vector that described label interest vector generation module obtains is expressed as V-〉{ S 1, S 2..., S i..., S n, vectorial V represents user's interest, S iRepresent this vector at label T iCoordinate on the dimension is if user U has label T i, S then iValue be W i, otherwise, S iValue be that 0, n represents total number of tags.
Optionally, in one embodiment of the invention, described label T iOccurrence number is DF in user U iIf user U has label T i, S then iValue be W i/ DF iOtherwise, S iValue be 0.
Technique scheme has following beneficial effect:
The technical program has been set up take " label interest row " text content analysis and the Users' Interests Mining as the basis, can excavate to greatest extent the interest of user in the social networks.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, the below will do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art, apparently, accompanying drawing in the following describes only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 is one of a kind of social networks Users' Interests Mining method flow diagram based on label of proposing of the present invention;
Fig. 2 is two of a kind of social networks Users' Interests Mining method flow diagram based on label of proposing of the present invention;
Fig. 3 is three of a kind of social networks Users' Interests Mining method flow diagram based on label of proposing of the present invention;
Fig. 4 is one of a kind of social networks Users' Interests Mining device block diagram based on label of proposing of the present invention;
Fig. 5 is two of a kind of social networks Users' Interests Mining device block diagram based on label of proposing of the present invention;
Fig. 6 be the present invention propose a kind of based on label interest column-generation unit block diagram in the social networks Users' Interests Mining device of label;
Fig. 7 is a kind of social networks Users' Interests Mining application of installation case block diagram based on label that the present invention proposes.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.
For solving the shortcoming of existing interest digging, a kind of social networks Users' Interests Mining method based on label has been proposed, as shown in Figure 1, one of a kind of social networks Users' Interests Mining method flow diagram based on label that proposes for the present invention.Comprise:
Collect the data of user on social networks, carry out the social networks Users' Interests Mining method based on label that user interest information is recommended according to data generating labels interest row and according to label interest row.Concrete work as shown in Figure 1, for Fig. 1 is.Comprise:
Step 101: collect the data of user on social networks;
Step 102: be listed as according to data generating labels interest; Described label interest is classified all labels and the accordingly set of weight of label in the described data as;
Step 103: carry out user interest information according to label interest row and recommend, realized the social networks Users' Interests Mining.
On the basis of Fig. 1, refinement how carry out user interest information according to label interest row and recommend.As shown in Figure 2, Fig. 2 is two of a kind of social networks Users' Interests Mining method flow diagram based on label of proposing of the present invention, comprising:
Step 201: collect user's all data on social networks; Wherein, described data comprise: the incidence relation data of user and character property content on the character property data that the user generates on the social networks and the social networks.
In step 201, collect the data of user on social networks, rely on the excavation of these data completing user interest.These data are divided into again two kinds: the incidence relation of user and character property content on the character property content that the user generates on the social networks and the social networks.The former is the content body of social networks, and the latter is the path of social network information circulation.For instance, user A has issued one piece of disclosed article C of access rights at social networks, and user B browses and transmitted this piece article, so, can collect content P and concern A-〉C and B-〉C.
Need not the user when in step 201, collecting data and participate in directly, can not produce the difficulty of collecting data.Adopt unified method to finish excavation and the statement of user interest, reach convenient follow-up every product to the application of user interest,
Step 202: obtain all labels that character property data that user on the described social networks generates comprise and each label institute accordingly weight divide; The incidence relation data of user on the described social networks and character property content are converted to user-concern chain form; Merge the label interest row that the label of user on all the elements obtains the user.
For all character property contents, it is divided into two kinds: title (or other Short Descriptions) and content, and also the importance of the two is fully different.Formulate a simple rule and distinguish this importance, each label that occurs in title can get 5 times weight mark than the label that occurs more in content.
For every section word content, with one fast string matching algorithm obtain all labels that wherein comprise, then divide as the weight of this content on this label according to the occurrence number of each label.If a label all comprises in title and content simultaneously, the weight of this label is exactly that it is in title gained weight with in content gained weight sum so.Like this, one section content just can be represented by a series of label and label weight.For example, with T iRepresent certain label, with TF iRepresent the weight of this label in content, so, content just can be used {<T i, TF i,<T j, TF j...,<T k, TF kRepresent.
For all relational datas, be converted into the form of user-pass tethers.For example, represent certain user, C with U iRepresent certain content, and comprise U-in the relation data C 1, U-C 2, U-C 3Etc. content, then the chain that concerns of user U is expressed as U-〉{ C 1, C 2, C 3....Further merge the label of user U on all the elements, just can obtain the interest label column of user U, U-{<T i, ∑ TF i,<T j, ∑ TF j...,<T k, ∑ TF k.
The user may have various relevances on social networks, for example, user's interest row just can be processed and integrate with unified method to photo, diary, discussion etc. with reference to above-mentioned two steps.And, can set up separately the user interest data for the content of special classification, to adapt to the application needs of special classification.
In addition, consider that user interest is not unalterable, we have introduced the concept of time equally.For example, with WT iRepresentative of consumer U is at content C iOn time weighting (be incidence relation establish time score), the time is more of a specified duration should value less.The pass tethers of top user U will be expressed as U-〉{<C i, WT i,<C j, WT j...,<C k, WT k.Further merge the label of user U on all the elements, just can obtain the interest label column of user U, U-{<T i, W i,<T j, W j...,<T k, W k.Wherein, W iIt is the weight of having mixed word frequency and time factor.Can excavate to greatest extent the interest of the user in the social networks like this, and this interest is direct, refinement, variable.
After the work of step 202 collection label was finished, the later stage only needed a small amount of maintenance.Also have, owing to do not use the text dividing method such as participle, can effectively avoid because the mistake that the cutting error is introduced, and the additional workload that causes thus.
Step 203: with described label interest column vector.
For the ease of subsequent treatment and statement, we are expressed as vectorial V in the Label space with user's interest, for example, and with the interest vector of V representative of consumer U, S iRepresent this vector at label T iCoordinate on the dimension then can be converted into interest vector, V-with above-mentioned user interest label column〉{ S 1, S 2..., S i..., S n, wherein, n represents total number of tags, if user U has label T i, S then iValue be W i, otherwise, S iValue be 0.Thus, all users' interest can be described with unified interest vector and calculate.
In addition, consider that the significance level of label itself is not identical, we use DF iRepresent label T iOccurrence number in all users (each user only remembers once).The DF value is larger, and the importance of care label is less, and differentiation power is more weak.Thus, can obtain, in the above-mentioned interest vector, label T iCorresponding coordinate S iValue, have label T at user U iThe time, become W i/ DF i
Step 204: with label interest vector abstract; That is: the attribute of label is abstract classification, abstract classification is various types of destination aggregation (mda), best property of attribute mapping relation according to classification and label, the label interest vector is divided into the abstract classification of low level and high-level abstract classification, label in the label interest vector is assigned in the corresponding classification, merge label and respective weights in the corresponding abstract classification, obtain rudimentary classification interest vector and senior classification interest vector; According to the concrete needs of application scenarios, provide corresponding interest information according to rudimentary classification interest vector, senior classification interest vector and label for the social networks user, realize the social networks Users' Interests Mining.
User interest with the vector form performance has the advantages such as careful, accurate, variable, but fine-drawn expression form can not adapt in some application scenario, especially can't allow the people get information about a user's cardinal principle interest preference.Therefore, when we describe the definition of label in 1, mentioned this concept of tag attributes.Tag attributes has represented the abstract classification under the label, is a concept more abstract than label, and we provide two levels, and other is abstract, low-level abstract totally 135 classifications, more higher leveled abstract be 16 large classifications.With CL kRepresent the abstract classification of certain low level, with CH jRepresent certain high level abstract classification.Have the label T of attribute for all i, T is then arranged i-〉 CL k, T i-〉 CH jRelation exist.
According to the best property of attribute mapping of classification and label relation, the coordinate of all labels among the interest vector V is carried out correspondence conversion, the method for conversion is, if certain label T iHave classification CL kOr CH j, then with the coordinate S of this label iBe assigned to corresponding classification, if this label has not only classification attribute, so its coordinate be assigned to all classifications successively.Take the low level classification as example, can obtain a new pass series, VcL-〉{<CL i, S i,<CL j, S i,<CL j, S j...,<CL k, S k, notice that each label can belong to several different classifications simultaneously, also have a large amount of different labels below each classification.Merge same class weight now, then the interest vector under the Label space can be converted into the interest vector under the rudimentary classification space.In like manner, also can generate senior class interest vector now.
So far, we obtain other user interest vector of three fineness levels such as senior classification, rudimentary classification, label, can carry out choice for use according to the concrete needs of application scenarios.
Step 204 is related by classification and label, has well finished the abstract of user interest, is convenient to use widely.And the process of excavation interest and result can find that to user transparent user self there is no the point of interest of recognizing, do not rely on the data of comforming, and can find minority user's interest.In addition, this technical scheme is excavated the user interest in various vertical fields flexibly by the control data source, is convenient to special application.
Used label is to adopt method artificial and that technology combines in the method, collects various substantive nouns from Chinese with English (same method also can be used for other languages).The process of collecting is mainly considered following factor: unique, representative, timeliness n.The method that technology is collected has guaranteed the needs of a large amount of collections, and the method for manual examination and verification has been guaranteed the correctness of label.
For new substantive noun, can be regularly or add at any time tag library, guarantee the identification for novel event.Label can be endowed the attribute of two grades in tag library, for example, " go " this label just has " non ball sport " and " sports " such one group of two-stage attribute, represents respectively the interest classification of a lower level and the interest classification of a higher level.And a label can have several attributes, respectively corresponding different interest classifications.The attribute of label has guaranteed precision by manually providing, and not all label can both have clearer and more definite affiliated classification, so not all label all needs to have attribute, has reduced like this labor workload.
On Fig. 2 basis, in order to guarantee the accuracy of data, further increased data cleansing.As shown in Figure 3, three of a kind of social networks Users' Interests Mining method flow diagram based on label that proposes for the present invention.
Step 201 ': the data that collect are cleaned.
The data that collect are cleaned, filtered ad content.In addition, for long word content, also only get its front 500 words as analytic target.Because we adopt the initiatively mode of matching content of label, so equal automatically to have finished the filtration of flame.
The application has also proposed a kind of social networks Users' Interests Mining device based on label.As shown in Figure 4, one of a kind of social networks Users' Interests Mining device block diagram based on label that proposes for the present invention.Comprise:
Data collection module 41 is used for collecting the data of user on social networks;
Label interest column-generation unit 42 is used for being listed as according to data generating labels interest; Described label interest is classified all labels and the accordingly set of weight of label in the described data as;
Interest digging unit 43 is used for carrying out user interest information according to label interest row and recommends, and has realized the social networks Users' Interests Mining.
In one embodiment of the invention, this system also comprises the data cleansing unit 41 ' that cleans for to user's all data on social networks of collecting.As shown in Figure 5, two of a kind of social networks Users' Interests Mining device block diagram based on label that proposes for the present invention.The method that data cleansing unit 41 ' cleans comprises: the filtering advertisements content, to long literal only get front 500 words as analytic target and adopt label initiatively the method for matching content filter flame.
As shown in Figure 6, for the present invention propose a kind of based on interest digging unit 43 block diagrams in the social networks Users' Interests Mining device of label, interest digging unit 43 comprises as can be known:
Label interest vector generation module 431 is used for label interest column vector and obtains the label interest vector;
Label interest vector abstract module 432, abstract for the label interest vector is carried out, obtain abstract result;
Interest recommending module 433 is used for carrying out user interest information according to abstract result and recommends.
Label interest vector abstract module 432 is that abstract classification, abstract classification are that the best property of attribute mapping relation of this classification of various types of destination aggregation (mda) and label is divided into the abstract classification of low level and high-level abstract classification with the label interest vector according to the attribute of label, label in the label interest vector is assigned in the corresponding classification, merge label and respective weights in the corresponding abstract classification, obtain rudimentary classification interest vector and senior classification interest vector; Interest recommending module 433 provides corresponding interest information according to rudimentary classification interest vector, senior classification interest vector and label for the social networks user according to the concrete needs of application scenarios, realizes the social networks Users' Interests Mining.
In one embodiment of the invention, the data that data collection module is collected comprise: the incidence relation data of user and character property content on the character property data that the user generates on the social networks and the social networks.Label interest column-generation unit obtains accordingly weight of all labels that character property data that user on the described social networks generates comprise and each label institute; The incidence relation data of user on the social networks and character property content are converted to user-concern chain form; Merge the label interest row that the label of user on all the elements obtains the user.
In one embodiment of the invention, the character property data that the user generates on the social networks that described data collection module is collected are divided into title and content; Take string matching algorithm to obtain all labels that comprise in the character property data that user on the described social networks generates, according to the occurrence number of each label as the weight of this content on this label.
In one embodiment of the invention, described title and content all comprise in the same label, and the weight of this label is that it is in title gained weight and in content gained weight sum.
In one embodiment of the invention, the character property data that the user generates on the social networks that described data collection module is collected are expressed as {<T i, TF i,<T j, TF j...,<T k, TF k; Wherein, T iRepresent certain label, TF iRepresent label T iWeight in content.The incidence relation data of user and character property content are expressed as on the social networks that data collection module is collected: U-〉{ C 1, C 2, C 3...; Wherein, U represents certain user, C iThe content of representative and the relevant relation of user U.The label interest list that label interest column-generation unit obtains is shown U-〉{<T i, ∑ TF i,<T j, ∑ T F j...,<T k, ∑ TF k.
In one embodiment of the invention, also comprise time weighting WT in the incidence relation data of user and character property content on the social networks that described data collection module is collected iWT iExpression user U is at content C iThe time score that upper incidence relation is established, the then user of the incidence relation data of user and character property content on the described social networks-concern that chain is expressed as: U-〉{<C i, WT i,<C j, WT j...,<C k, WT k; Described label interest list is shown U-〉{<T i, W i,<T j, W j...,<T k, W k>; Wherein, W is expressed as the weight of word frequency and time factor.
In one embodiment of the invention, the label interest vector that described label interest vector generation module obtains is expressed as V-〉{ S 1, S 2..., S i..., S n, vectorial V represents user's interest, S iRepresent this vector at label T iCoordinate on the dimension is if user U has label T i, S then iValue be W i, otherwise, S iValue be that 0, n represents total number of tags.
In one embodiment of the invention, described label T iOccurrence number is DF in user U iIf user U has label T i, S then iValue be W i/ DF iOtherwise, S iValue be 0.
As shown in Figure 7, a kind of social networks Users' Interests Mining application of installation case block diagram based on label that proposes for the present invention.This system can be applied to happy net community platform.The information flow that this system can automatically participate in from the user, the assembly that added, name robot mechanism etc. be various to comprise the interest of digging user the content of literal, and generate rudimentary classification interest vector and senior classification interest vector, system provides corresponding interest information according to rudimentary classification interest vector, senior classification interest vector and label for the social networks user, realizes the social networks Users' Interests Mining.
In the current internet, applications, most important resource is exactly the user.For the analysis of user data, also be the focus of research, and user interest then is the most important thing always.Obtaining Accurate user interest data have direct help to many Internet services, and for example: user interest can directly be served accurate advertisement and be thrown in, and promotes ad conversion rates; User interest can be applied to all commending systems and product, improves clicking rate; User interest can be applied to personalized search and other services, improves user satisfaction; Can contain whole Users' Interests Mining methods and applications scenes.
Above-described embodiment; purpose of the present invention, technical scheme and beneficial effect are further described; institute is understood that; the above only is the specific embodiment of the present invention; the protection domain that is not intended to limit the present invention; within the spirit and principles in the present invention all, any modification of making, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (30)

1. the social networks Users' Interests Mining method based on label is characterized in that, comprising:
Collect the data of user on social networks;
Be listed as according to data generating labels interest; Described label interest is classified all labels and the accordingly set of weight of label in the described data as;
Carry out user interest information according to label interest row and recommend, to realize the social networks Users' Interests Mining.
2. a kind of social networks Users' Interests Mining method based on label according to claim 1 is characterized in that, describedly carries out user interest information according to label interest row and recommends to comprise:
Described label interest column vector obtains the label interest vector; The label interest vector is carried out abstract, obtain abstract result; Carrying out user interest information according to abstract result recommends.
3. a kind of social networks Users' Interests Mining method based on label according to claim 2 is characterized in that, described the label interest vector is carried out abstract comprising:
Be that abstract classification, abstract classification are that the best property of attribute mapping relation of this classification of various types of destination aggregation (mda) and label is divided into the abstract classification of low level and high-level abstract classification with the label interest vector according to the attribute of label, label in the label interest vector is assigned in the corresponding classification, merge label and respective weights in the corresponding abstract classification, obtain rudimentary classification interest vector and senior classification interest vector; According to the concrete needs of application scenarios, provide corresponding interest information according to rudimentary classification interest vector, senior classification interest vector and label for the social networks user, realize the social networks Users' Interests Mining.
4. a kind of social networks Users' Interests Mining method based on label according to claim 1 is characterized in that described data comprise: the incidence relation data of user and character property content on the character property data that the user generates on the social networks and the social networks.
5. a kind of social networks Users' Interests Mining method based on label according to claim 4 is characterized in that, described being listed as according to data generating labels interest comprises:
Obtain accordingly weight of all labels that character property data that user on the described social networks generates comprise and each label institute; The incidence relation data of user on the described social networks and character property content are converted to user-concern chain form; Merge the label interest row that the label of user on all the elements obtains the user.
6. the described a kind of social networks Users' Interests Mining method based on label of arbitrary claim is characterized in that the method also comprises: user's all data on social networks of collecting are cleaned according to claim 1~5.
7. a kind of social networks Users' Interests Mining method based on label according to claim 6, it is characterized in that described cleaning comprises: the filtering advertisements content, to long literal only get front 500 words as analytic target and adopt label initiatively the method for matching content filter flame.
8. the described a kind of social networks Users' Interests Mining method based on label of arbitrary claim is characterized in that according to claim 4 ~ 5, and the character property data that the user generates on the described social networks are divided into title and content; Take string matching algorithm to obtain all labels that comprise in the character property data that user on the described social networks generates, according to the occurrence number of each label as the weight of this content on this label.
9. a kind of social networks Users' Interests Mining method based on label according to claim 8 is characterized in that described title and content all comprise in the same label, and the weight of this label is that it is in title gained weight and in content gained weight sum.
10. the described a kind of social networks Users' Interests Mining method based on label of arbitrary claim is characterized in that according to claim 4 ~ 5, and the character property data that the user generates on the described social networks are expressed as {<T i, TF i,<T j, TF j...,<T k, TF k; Wherein, T iRepresent certain label, TF iRepresent label T iWeight in content.
11. the described a kind of social networks Users' Interests Mining method based on label of arbitrary claim is characterized in that the user of the incidence relation data of user and character property content on the described social networks-concern that chain is expressed as: U-according to claim 4 ~ 5〉{ C 1, C 2, C 3...; Wherein, U represents certain user, C iThe content of representative and the relevant relation of user U.
12. the described a kind of social networks Users' Interests Mining method based on label of arbitrary claim is characterized in that described label interest list is shown U-according to claim 1 ~ 5〉{<T i, ∑ TF i,<T j, ∑ TF j...,<T k, ∑ TF k.
13. the described a kind of social networks Users' Interests Mining method based on label of arbitrary claim is characterized in that according to claim 4 ~ 5, also comprises time weighting WT in the incidence relation data of user and character property content on the described social networks iWT iExpression user U is at content C iThe time score that upper incidence relation is established, the then user of the incidence relation data of user and character property content on the described social networks-concern that chain is expressed as: U-〉{<C i, WT i,<C j, WT j...,<C k, WT k; Described label interest list is shown U-〉{<T i, W i,<T j, W j...,<T k, W k>; Wherein, W is expressed as the weight of word frequency and time factor.
14. a kind of social networks Users' Interests Mining method based on label according to claim 2 is characterized in that described label interest vector is expressed as V-〉{ S 1, S 2..., S i..., S n, vectorial V represents user's interest, S iRepresent this vector at label T iCoordinate on the dimension is if user U has label T i, S then iValue be W i, otherwise, S iValue be that 0, n represents total number of tags.
15. a kind of social networks Users' Interests Mining method based on label according to claim 14 is characterized in that described label T iOccurrence number is DF in user U iIf user U has label T i, S then iValue be W i/ DF iOtherwise, S iValue be 0.
16. the social networks Users' Interests Mining device based on label is characterized in that, comprising:
Data collection module is used for collecting the data of user on social networks;
Label interest column-generation unit is used for being listed as according to data generating labels interest; Described label interest is classified all labels and the accordingly set of weight of label in the described data as;
The interest digging unit is used for carrying out user interest information according to label interest row and recommends, to realize the social networks Users' Interests Mining.
17. a kind of social networks Users' Interests Mining device based on label according to claim 16 is characterized in that described interest digging unit comprises:
Label interest vector generation module is used for described label interest column vector and obtains the label interest vector;
Label interest vector abstract module, abstract for the label interest vector is carried out, obtain abstract result;
The interest recommending module is used for carrying out user interest information according to abstract result and recommends.
18. a kind of social networks Users' Interests Mining device based on label according to claim 17, it is characterized in that, described label interest vector abstract module is that abstract classification, abstract classification are that the best property of attribute mapping relation of this classification of various types of destination aggregation (mda) and label is divided into the abstract classification of low level and high-level abstract classification with the label interest vector according to the attribute of label, label in the label interest vector is assigned in the corresponding classification, merge label and respective weights in the corresponding abstract classification, obtain rudimentary classification interest vector and senior classification interest vector; Described interest recommending module provides corresponding interest information according to rudimentary classification interest vector, senior classification interest vector and label for the social networks user according to the concrete needs of application scenarios, realizes the social networks Users' Interests Mining.
19. a kind of social networks Users' Interests Mining device based on label according to claim 16, it is characterized in that the data that described data collection module is collected comprise: the incidence relation data of user and character property content on the character property data that the user generates on the social networks and the social networks.
20. a kind of social networks Users' Interests Mining device based on label according to claim 19, it is characterized in that described label interest column-generation unit obtains accordingly weight of all labels that character property data that user on the described social networks generates comprise and each label institute; The incidence relation data of user on the described social networks and character property content are converted to user-concern chain form; Merge the label interest row that the label of user on all the elements obtains the user.
21. the described a kind of social networks Users' Interests Mining device based on label of arbitrary claim according to claim 16 ~ 20, it is characterized in that this system also comprises the data cleansing unit that cleans for to user's all data on social networks of collecting.
22. a kind of social networks Users' Interests Mining device based on label according to claim 21, it is characterized in that the content that described data cleansing unit cleans comprises: the filtering advertisements content, to long literal only get front 500 words as analytic target and adopt label initiatively the method for matching content filter flame.
23. the described a kind of social networks Users' Interests Mining device based on label of arbitrary claim is characterized in that according to claim 19 ~ 20, the character property data that the user generates on the social networks that described data collection module is collected are divided into title and content; Take string matching algorithm to obtain all labels that comprise in the character property data that user on the described social networks generates, according to the occurrence number of each label as the weight of this content on this label.
24. a kind of social networks Users' Interests Mining device based on label according to claim 23 is characterized in that described title and content all comprise in the same label, the weight of this label is that it is in title gained weight and in content gained weight sum.
25. the described a kind of social networks Users' Interests Mining device based on label of arbitrary claim is characterized in that according to claim 19 ~ 20, the character property data that the user generates on the social networks that described data collection module is collected are expressed as {<T i, TF i,<T j, TF j...,<T k, TF k; Wherein, T iRepresent certain label, TF iRepresent label T iWeight in content.
26. the described a kind of social networks Users' Interests Mining method based on label of arbitrary claim according to claim 19 ~ 20, it is characterized in that the incidence relation data of user and character property content are expressed as on the social networks that described data collection module is collected: U-〉{ C 1, C 2, C 3...; Wherein, U represents certain user, C iThe content of representative and the relevant relation of user U.
27. the described a kind of social networks Users' Interests Mining device based on label of arbitrary claim is characterized in that the label interest list that described label interest column-generation unit obtains is shown U-according to claim 16 ~ 20〉{<T i, ∑ TF i,<T j, ∑ TF j...,<T k, ∑ TF k.
28. the described a kind of social networks Users' Interests Mining device based on label of arbitrary claim according to claim 19 ~ 20, it is characterized in that, also comprise time weighting WT in the incidence relation data of user and character property content on the social networks that described data collection module is collected iWT iExpression user U is at content C iThe time score that upper incidence relation is established, the then user of the incidence relation data of user and character property content on the described social networks-concern that chain is expressed as: U-〉{<C i, WT i,<C j, WT j...,<C k, WT k; Described label interest list is shown U-〉{<T i, W i,<T j, W j...,<T k, W k; Wherein, W is expressed as the weight of word frequency and time factor.
29. a kind of social networks Users' Interests Mining device based on label according to claim 17 is characterized in that the label interest vector that described label interest vector generation module obtains is expressed as V-〉{ S 1, S 2..., S i..., S n, vectorial V represents user's interest, S iRepresent this vector at label T iCoordinate on the dimension is if user U has label T i, S then iValue be W i, otherwise, S iValue be that 0, n represents total number of tags.
30. a kind of social networks Users' Interests Mining device based on label according to claim 29 is characterized in that described label T iOccurrence number is DF in user U iIf user U has label T i, S then iValue be W i/ DF iOtherwise, S iValue be 0.
CN2012102495828A 2012-07-18 2012-07-18 Label-based social network user interest mining method and device Pending CN102867016A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012102495828A CN102867016A (en) 2012-07-18 2012-07-18 Label-based social network user interest mining method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012102495828A CN102867016A (en) 2012-07-18 2012-07-18 Label-based social network user interest mining method and device

Publications (1)

Publication Number Publication Date
CN102867016A true CN102867016A (en) 2013-01-09

Family

ID=47445885

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012102495828A Pending CN102867016A (en) 2012-07-18 2012-07-18 Label-based social network user interest mining method and device

Country Status (1)

Country Link
CN (1) CN102867016A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103353920A (en) * 2013-05-31 2013-10-16 北京百度网讯科技有限公司 Method and device for recommending games based on SNS
CN103631949A (en) * 2013-12-11 2014-03-12 中国科学院计算技术研究所 Data acquisition method and system for social network
CN103870541A (en) * 2014-02-24 2014-06-18 微梦创科网络科技(中国)有限公司 Social network user interest mining method and system
CN104063476A (en) * 2014-06-30 2014-09-24 北京奇虎科技有限公司 Social network-based content recommending method and system
CN104915354A (en) * 2014-03-12 2015-09-16 深圳市腾讯计算机系统有限公司 Multimedia file pushing method and device
CN104915359A (en) * 2014-03-14 2015-09-16 华为技术有限公司 Theme label recommending method and device
CN105138572A (en) * 2015-07-27 2015-12-09 百度在线网络技术(北京)有限公司 Method and device for obtaining correlation weight of user tag
WO2016000555A1 (en) * 2014-06-30 2016-01-07 北京奇虎科技有限公司 Methods and systems for recommending social network-based content and news
CN106874314A (en) * 2015-12-14 2017-06-20 腾讯科技(深圳)有限公司 The method and apparatus of information recommendation
CN106960033A (en) * 2017-03-22 2017-07-18 广州优视网络科技有限公司 A kind of method and apparatus that label is marked to information flow
CN107451216A (en) * 2017-07-17 2017-12-08 广州特道信息科技有限公司 The granularity attribute recognition methods of label and device
CN108038732A (en) * 2017-12-25 2018-05-15 北京比利信息技术有限公司 A kind of brand advertising method for running and device
CN108256119A (en) * 2018-02-14 2018-07-06 北京方正阿帕比技术有限公司 A kind of construction method of resource recommendation model and the resource recommendation method based on the model
CN110737822A (en) * 2018-07-03 2020-01-31 百度在线网络技术(北京)有限公司 User interest mining method, device, equipment and storage medium
CN110956188A (en) * 2018-09-26 2020-04-03 北京融信数联科技有限公司 Population behavior track digital coding method based on mobile communication signaling data
CN111597220A (en) * 2019-02-21 2020-08-28 北京沃东天骏信息技术有限公司 Data mining method and device
CN111737588A (en) * 2020-08-24 2020-10-02 南京国睿信维软件有限公司 User portrait knowledge similarity calculation method
CN114169418A (en) * 2021-11-30 2022-03-11 北京百度网讯科技有限公司 Label recommendation model training method and device, and label obtaining method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751448A (en) * 2009-07-22 2010-06-23 中国科学院自动化研究所 Commendation method of personalized resource information based on scene information
WO2011139477A2 (en) * 2010-05-05 2011-11-10 Yahoo! Inc. Selecting content based on interest tags that are included in an interest cloud
CN102402594A (en) * 2011-11-04 2012-04-04 电子科技大学 Rich media individualized recommending method
CN102541921A (en) * 2010-12-24 2012-07-04 华东师范大学 Control method and device for recommending resources through tag extension

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751448A (en) * 2009-07-22 2010-06-23 中国科学院自动化研究所 Commendation method of personalized resource information based on scene information
WO2011139477A2 (en) * 2010-05-05 2011-11-10 Yahoo! Inc. Selecting content based on interest tags that are included in an interest cloud
CN102541921A (en) * 2010-12-24 2012-07-04 华东师范大学 Control method and device for recommending resources through tag extension
CN102402594A (en) * 2011-11-04 2012-04-04 电子科技大学 Rich media individualized recommending method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
杨丹等: "基于Web2.0 的社会性标签推荐系统", 《重庆工学院学报(自然科学)》 *
王世云等: "一种基于网络书签的个性化信息推荐方法", 《计算机系统应用》 *
郭伟光等: "一种社会化标注系统资源个性化推荐方法", 《计算机工程与应用》 *

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103353920A (en) * 2013-05-31 2013-10-16 北京百度网讯科技有限公司 Method and device for recommending games based on SNS
CN103353920B (en) * 2013-05-31 2017-05-17 北京百度网讯科技有限公司 Method and device for recommending games based on SNS
CN103631949B (en) * 2013-12-11 2016-01-27 中国科学院计算技术研究所 A kind of social network data acquisition method and system
CN103631949A (en) * 2013-12-11 2014-03-12 中国科学院计算技术研究所 Data acquisition method and system for social network
CN103870541A (en) * 2014-02-24 2014-06-18 微梦创科网络科技(中国)有限公司 Social network user interest mining method and system
CN103870541B (en) * 2014-02-24 2017-05-31 微梦创科网络科技(中国)有限公司 Social network user interest digging method and system
CN104915354A (en) * 2014-03-12 2015-09-16 深圳市腾讯计算机系统有限公司 Multimedia file pushing method and device
CN104915354B (en) * 2014-03-12 2020-01-10 深圳市腾讯计算机系统有限公司 Multimedia file pushing method and device
CN104915359A (en) * 2014-03-14 2015-09-16 华为技术有限公司 Theme label recommending method and device
CN104915359B (en) * 2014-03-14 2019-05-28 华为技术有限公司 Theme label recommended method and device
WO2016000555A1 (en) * 2014-06-30 2016-01-07 北京奇虎科技有限公司 Methods and systems for recommending social network-based content and news
CN104063476A (en) * 2014-06-30 2014-09-24 北京奇虎科技有限公司 Social network-based content recommending method and system
CN105138572A (en) * 2015-07-27 2015-12-09 百度在线网络技术(北京)有限公司 Method and device for obtaining correlation weight of user tag
CN105138572B (en) * 2015-07-27 2019-12-10 百度在线网络技术(北京)有限公司 Method and device for acquiring relevance weight of user tag
CN106874314A (en) * 2015-12-14 2017-06-20 腾讯科技(深圳)有限公司 The method and apparatus of information recommendation
CN106960033A (en) * 2017-03-22 2017-07-18 广州优视网络科技有限公司 A kind of method and apparatus that label is marked to information flow
CN106960033B (en) * 2017-03-22 2021-09-14 阿里巴巴(中国)有限公司 Method and device for labeling information stream
CN107451216A (en) * 2017-07-17 2017-12-08 广州特道信息科技有限公司 The granularity attribute recognition methods of label and device
CN108038732A (en) * 2017-12-25 2018-05-15 北京比利信息技术有限公司 A kind of brand advertising method for running and device
CN108256119A (en) * 2018-02-14 2018-07-06 北京方正阿帕比技术有限公司 A kind of construction method of resource recommendation model and the resource recommendation method based on the model
CN108256119B (en) * 2018-02-14 2021-12-28 北京方正阿帕比技术有限公司 Resource recommendation model construction method and resource recommendation method based on model
CN110737822A (en) * 2018-07-03 2020-01-31 百度在线网络技术(北京)有限公司 User interest mining method, device, equipment and storage medium
CN110956188A (en) * 2018-09-26 2020-04-03 北京融信数联科技有限公司 Population behavior track digital coding method based on mobile communication signaling data
CN111597220A (en) * 2019-02-21 2020-08-28 北京沃东天骏信息技术有限公司 Data mining method and device
CN111597220B (en) * 2019-02-21 2024-03-05 北京沃东天骏信息技术有限公司 Data mining method and device
CN111737588A (en) * 2020-08-24 2020-10-02 南京国睿信维软件有限公司 User portrait knowledge similarity calculation method
CN114169418A (en) * 2021-11-30 2022-03-11 北京百度网讯科技有限公司 Label recommendation model training method and device, and label obtaining method and device
CN114169418B (en) * 2021-11-30 2023-12-01 北京百度网讯科技有限公司 Label recommendation model training method and device and label acquisition method and device

Similar Documents

Publication Publication Date Title
CN102867016A (en) Label-based social network user interest mining method and device
Bi et al. Wisdom of crowds: Conducting importance-performance analysis (IPA) through online reviews
Fuchs et al. Big data analytics for knowledge generation in tourism destinations–A case from Sweden
CN109934619A (en) User's portrait tag modeling method, apparatus, electronic equipment and readable storage medium storing program for executing
CN107800801A (en) A kind of pushing learning resource method and system for learning preference based on user
He et al. A spatial-temporal topic model for the semantic annotation of POIs in LBSNs
CN108492224A (en) Based on deep learning online education Students ' Comprehensive portrait tag control system
CN106326413A (en) Personalized video recommending system and method
CN110321291A (en) Test cases intelligent extraction system and method
CN103455559A (en) Method and device for automatically recommending application
CN109658188A (en) Source of houses recommended method, device, equipment and storage medium based on big data analysis
CN104063521A (en) Method and device for achieving searching service
CN103038769A (en) System and method for directing content to users of a social networking engine
CN105224775A (en) Based on the method and apparatus that picture processing is arranged in pairs or groups to clothes
CN109325845A (en) A kind of financial product intelligent recommendation method and system
CN103365906A (en) System and method for achieving search and recommendation based on locations
CN103473128A (en) Collaborative filtering method for mashup application recommendation
CN108364192A (en) A kind of usage mining method, apparatus and electronic equipment
Marasinghe et al. Computer vision applications for urban planning: A systematic review of opportunities and constraints
CN116823410B (en) Data processing method, object processing method, recommending method and computing device
CN107451216A (en) The granularity attribute recognition methods of label and device
Zou et al. Deep Learning for Cross-Domain Data Fusion in Urban Computing: Taxonomy, Advances, and Outlook
Ramadiani et al. Evaluation of student academic performance using e-learning with the association rules method and the importance of performance analysis
Weis et al. A framework for GIS and imagery data fusion in support of cartographic updating
Aliakbarian et al. Integration of folksonomies into the process of map generalization

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C05 Deemed withdrawal (patent law before 1993)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130109