CN103761246A - Link network based user domain identifying method and device - Google Patents
Link network based user domain identifying method and device Download PDFInfo
- Publication number
- CN103761246A CN103761246A CN201310705515.7A CN201310705515A CN103761246A CN 103761246 A CN103761246 A CN 103761246A CN 201310705515 A CN201310705515 A CN 201310705515A CN 103761246 A CN103761246 A CN 103761246A
- Authority
- CN
- China
- Prior art keywords
- user
- field
- membership
- sorted
- degree
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a link network based user domain identifying method and device and belongs to the field of data mining and complicated networks. The device comprises a data collecting and preprocessing module, a domain prototype user collection constructing module and a user domain calculating module. The method includes step one, manually collecting an initial seed user; step two, collecting followed users of the seed user; step three, constructing a link network, and calculating memberships of each followed user to each domain; step four, sequencing the users according to the memberships; step five, constructing a domain prototype user collection for each domain; step fix, collecting followed users of a user to be classified; step seven, calculating the memberships of the user to be classified to each domain; step eight, sequencing the domains according to the memberships; step nine, adding domain labels. The link network based user domain identifying method and device are applicable to multiple social network platforms, capable of overcoming defects of short texts, and particularly suitable for fields of user modeling, personalized information search, recommendation and the like.
Description
Technical field
The invention belongs to Data Mining and complex network field, specifically, relate to the user field recognition technology based on linked network, specifically propose a kind of recognition methods of user field and device thereof based on linked network, the result of field identification can be used for the fields such as user group's division, network marketing.
Background technology
Flourish along with Social Media platform (as Sina's microblogging), increasing user by be even gradually accustomed to by such platform publicize, share, searching information and resource.These users are from different fields such as science and technology, finance and economics, physical culture, media, but a lot of user often more pays close attention to the news of this area and message and to belonging to the problem of this area or the propagation of event more easily exerts an influence.Therefore identify in advance the field that user pays close attention to, for some field labels of each user assignment, conveniently the network user is carried out to Classification Management and analysis, specifically to realizing user's accurate information pushing, the user who improves platform experiences or identifies the key user of every field, improves relevant unit the perception velocities of field event is being brought into play to very important effect.First traditional network user field recognition methods sets up the Feature Words set (field dictionary) of every field according to sample data, then by the homemade content of text of match user and field dictionary, classified in user's interest worlds.There are some problems that are difficult to overcome in this method: first, the complicacy of Chinese word segmentation makes the authority of field dictionary be difficult to guarantee; Secondly along with the development of instantaneous social networks, language use tends to simplification and variation, and neologisms constantly occur, as given power, fruit powder, refreshing horse, the difficulty increase that dictionary upgrades; In addition, on the such short text real-time platform of microblogging, user all the time can issuing microblog, and content relates to country, society, the even every aspect of domestic trivia of living, so be only difficult to from text the field of judging that user pays close attention to most.Above-mentioned reason causes content-based user field recognition technology precision lower; And the method that selection sort device is classified based on some attribute faces the problem of How to choose attribute and sorter, and existing sorter often shows very poor in the problem of multicategory classification.
Summary of the invention
Prior art is poor to microblogging situation adaptability, and as text based field identification faces the difficult problem of assurance field dictionary authority and real-time, and that sorting technique often shows in the multicategory classification problem as field is identified is very poor.For these problems, the present invention makes full use of the network structure attribute of the social media of this class of microblogging, adopts the technology based on linked network to carry out field identification to mass users, proposes a kind of recognition methods of user field and device thereof based on linked network.
A kind of user field recognition device based on linked network provided by the invention, comprises three modules: Data Collection and pretreatment module, field prototype user gathers structure module and user field computing module.
Wherein, the function of Data Collection and pretreatment module is to gather initial seed user, crawls initial seed user's concern user list; Field prototype user gathers and builds module and utilize every field initial seed user's concern user, is that each field builds prototype user; User field computing module is used for calculating and sequencing selection user's to be sorted every field degree of membership.
Data Collection and pretreatment module comprise that manual collecting sample module and operation reptile/request API (Application Programming Interface, application programming interface) obtain kind of child user and pay close attention to list block; Manual collecting sample module is for obtaining and store the initial seed user id of every field, and operation reptile/request API obtains kind of child user and pays close attention to list block for obtaining each initial seed user's concern user list.
Manual collecting sample module is transferred to operation reptile/request API by the initial seed user id of the every field of obtaining and obtains kind of child user and pay close attention to a list block, and operation reptile/request API obtains kind of child user concern list block and obtain according to the initial seed user id of every field the initial seed user's of every field concern user.
Field prototype user gathers and builds module and comprise: calculate kind of a child user follower field degree of membership module, in every field by degree of membership sequence line module, obtain the field prototype user collection modules of expansion; Calculate the field degree of membership that kind of child user follower field degree of membership module is used for the initial seed user's who stores and calculate every field concern user, in every field, by degree of membership sequence line module, the field degree of membership of paying close attention to user is sorted, obtain the field prototype user collection modules of expansion and gather for obtaining with the field prototype user of Memory Extension.
Calculate kind of child user follower field degree of membership module according to the initial seed user's of the every field of obtaining concern user, for each concern user, calculate the field degree of membership of this concern user for every field, and field degree of membership is transferred in every field by degree of membership sequence line module, in every field, by degree of membership sequence line module, each concern user's all spectra degree of membership is carried out to descending sort, and the field degree of membership after sequence is transferred to the field prototype user collection modules of obtaining expansion, obtain the field prototype user collection modules of expansion, for each field, front K user of the highest field degree of membership and the initial seed user in this field that selection has this field merge the prototype user's set that forms this field, the namely field prototype user of expansion set, wherein, K is positive integer.
User field computing module comprise operation reptile/request API obtain user to be sorted pay close attention to list block, calculate user to be sorted field degree of membership module, to each user by degree of membership sort field module, get a front A field as user field label model; Operation reptile/request API obtains user to be sorted and pays close attention to list block for obtaining user's to be sorted concern user list, calculate user to be sorted field degree of membership module, to each user by degree of membership sort field module, get a front A field as user field label model for storing user to be sorted field recognition result; Wherein, A is positive integer.
Operation reptile/request API obtains user to be sorted and pays close attention to list block according to user to be sorted, obtain user's to be sorted concern user, user's to be sorted concern user is transferred to and calculates user to be sorted field degree of membership module, calculate user to be sorted field degree of membership module and calculate the field degree of membership of user to be sorted for every field according to the field prototype user set of expansion and user's to be sorted concern user, and this user's to be sorted field degree of membership is transferred to each user by degree of membership sequence field module, each user is carried out to descending sort by degree of membership sequence field module to this user to be sorted field degree of membership, and the user to be sorted field degree of membership after sequence is transferred to and gets a front A field as user field label model, finally obtain user's to be sorted interest worlds label.
A kind of user field recognition methods based on linked network comprises the steps:
Step 1: collecting sample module gathers the initial seed user of every field in the network platform by hand, the information of collection is every field initial seed user id and user field label.
Step 2: obtain kind of a child user concern list block with operation reptile/request API and collect the initial seed user's of every field concern user list, operation reptile/request API obtains kind of an operating process for child user concern list block: write reptile program, the initial seed user id of the every field gathering according to manual collecting sample module, enter each kind of child user and pay close attention to original list, obtain all kinds of child user followers' id; Or the concern fetch interface that utilizes API to provide, asks all follower id of this user according to the initial seed user id of every field.
Step 3: the linked network that builds all kinds of child users, concrete mode is: using each user as a node, user comprises initial seed user and initial seed user's concern user, and the concern relation between every two users represents with a directed edge, and that arrow is pointed is the person of being concerned; Directed edge, based on linking relationship, has formed the linked network between all initial seed users and initial seed user's concern user.
With calculating seed user follower field degree of membership module, the concern user of the initial seed user based on probability model calculating every field is with respect to the field degree of membership of every field;
Particularly, represent any one initial seed user's concern user with f, i represents i field i=1,2..., and N, N is field number, S
irepresent the initial seed user set in i field, M (f, S
i) pay close attention to the field degree of membership of user f on i field, M (f, S
i) value is larger, paying close attention to user f, to belong to the possibility in i field larger.Suppose that one is paid close attention to user f at least by S
iin initial seed user pay close attention to, use n (f, S
i) represent initial seed user concern user f by S set
ithe number of times that middle user pays close attention to.
Pay close attention to so field degree of membership M (f, the S of user f
i) calculate with following formula:
Step 4: for each field, by degree of membership sequence line module, all initial seed users' concern user's field degree of membership value is carried out to descending sort in every field.
Step 5: use the number of users of the field prototype user collection modules expansion every field of obtaining expansion, build the field prototype user set of new expansion;
Concrete grammar is: for i field, in the initial seed user's in this field concern user, choose front K the user with Gao field, this field degree of membership, the initial seed user in a selected K user and this field merges, and the field prototype user who forms the expansion in i field gathers P
i, i=1,2..., N; K represents the most representative user's number that each field needs, and can get as the case may be different values.
Step 6: operation reptile/request API obtains user to be sorted and pay close attention to list block and obtain user's to be sorted concern user list, moves reptile/request API and obtains operating process that user to be sorted pays close attention to list with step 2.
Step 7: for each user to be sorted, use calculating user to be sorted field degree of membership module to calculate the field degree of membership of this user to be sorted with respect to every field; Specifically: according to the number of links of this user to be sorted and every field prototype user foundation, calculate the field degree of membership of this user to be sorted with respect to every field;
Particularly, a given user u to be sorted, is then used following formula to calculate the field degree of membership in corresponding i the field of user u to be sorted:
Wherein, u represents user to be sorted, and N is field number, and i represents i field, P
irepresent the field prototype user set of i field expansion, n (u, P
i) represent that in the concern user of user u to be sorted, appearing at user gathers P
iin quantity.
Step 8: each user is sorted to field module to each user to be sorted by degree of membership, descending sort is carried out to according to this user's to be sorted field degree of membership in all N field.
Step 9: get a front A field as user field label model using the interest worlds label of selecting front A field that user's to be sorted field degree of membership is the highest as user to be sorted
Advantage of the present invention and good effect are:
(1) the present invention is based on the user field recognition methods of linked network, do not consider user's content of text, but pay close attention to user's social networks structure, can overcome the poor difficulty of classification accuracy being caused by short text, correlativity, semantic polytrope.
(2) the present invention is based on the user field recognition methods of linked network, extensibility is very strong, not only can be applied to microblogging platform, can also be applied to exist between other users to pay close attention to or the network platform of friends.
(3) the present invention is based on the user field recognition methods of linked network, utilize concern relation to build user network, and the seed number of users of every field is expanded, not only greatly improve the probability that all kinds of child users are paid close attention to by user to be sorted, the quantity of expansion also can be adjusted according to concrete application scenarios data bulk, has guaranteed use elasticity.
(4) the present invention is based on the user field recognition methods of linked network, can overcome the shortcoming of short text, be particularly suitable for user modeling, the fields such as customized information search and recommendation.
(5) the present invention is based on the user field recognition device of linked network, clear in structure, user field identifying is divided into three main modular, and the coupling between each large module is low, and the concrete grammar of each inside modules changes does not affect the realization of the function of other modules.
figure of description
Fig. 1 is the one-piece construction figure of the user field recognition device based on linked network of the present invention;
Fig. 2 is each modular structure figure of the user field recognition device based on linked network of the present invention;
Fig. 3 is the process flow diagram of the user field recognition methods based on linked network of the present invention.
Embodiment
Below in conjunction with the drawings and specific embodiments, the present invention is further detailed.
The present invention proposes a kind of recognition methods of user field and device thereof based on linked network.The method, take a small amount of field kind child user as basis, is obtained the linked network of kind of child user, and join probability model is used the prototype user of this linked network Automatic Extraction every field.And then the prototype user based on every field again join probability model determine the most possible interested field of each user.The hypothesis of the technology of the present invention based on such common-sense, on social media (as Sina's microblogging) platform, when a user has special interest to certain field, can pay close attention to some more influential users that much equally this field interested user, especially people known; If user itself is certain field, as the VIP of sciemtifec and technical sphere, he also can be paid close attention to the interested user in this field by those.From this basic assumption, the present invention makes full use of the distinctive linked network of social platform and builds user's domain classification model.Utilize linked network, overcome to a certain extent, polytrope brief by microblogging text, the more processing difficult problem of bringing of modal particle.The present invention utilizes the concern relation between user to form network, tape label user based on some generates the kind child user set of every field, according to the kind child user set in each field, calculate other users' field classification degree of membership, thereby determine the interested field of user according to the value of degree of membership.Utilize and pay close attention to network structure and not only can overcome the short text only facing with microblogging content of text, the problem that semantic noise is large but also can calculate according to the size increase of adjustment every field seed set user's quantity, the accuracy that raising field is identified.
A kind of user field recognition device based on linked network provided by the invention, as shown in Figure 1, comprises three modules: Data Collection and pretreatment module 1, field prototype user gathers structure module 2 and user field computing module 3.
Wherein, the function of Data Collection and pretreatment module 1 is the concern user list that gathers initial seed user, crawls initial seed user; Field prototype user gathers and builds module 2 and utilize every field initial seed user's concern user, is that every field builds prototype user; The function of user field computing module 3 is: calculating and sequencing selection user's to be sorted every field degree of membership.
Data Collection and pretreatment module 1 comprise that manual collecting sample module 101, operation reptile/request API obtain kind of child user and pay close attention to list block 102, as shown in Figure 2.Manual collecting sample module 101 is for obtaining and store the initial seed user id of every field, and the initial seed user id of the every field of obtaining is transferred to operation reptile/request API obtains kind of a child user concern list block 102.Operation reptile/request API obtains kind of child user concern list block 102 and obtains according to the initial seed user id of every field the initial seed user's of every field concern user.
Field prototype user gathers and builds mould 2 and comprise: calculate kind of a child user follower field degree of membership module 201, in every field by degree of membership sequence line module 202 and obtain the field prototype user collection modules 203 of expansion, as shown in Figure 2.
Calculate kind of child user follower field degree of membership module 201 according to the initial seed user's of the every field of obtaining concern user, for each concern user, calculate this concern user for every field field degree of membership, and field degree of membership is transferred in every field by degree of membership sequence line module 202.In every field, all spectra degree of membership descending sort to each concern user by degree of membership sequence line module 202, is transferred to by the field degree of membership after sequence the field prototype user collection modules 203 of obtaining expansion.Obtain the field prototype user collection modules 203 of expansion, for obtaining with the field prototype user of Memory Extension, gather, specifically, for each field, front K user of the highest field degree of membership and the initial seed user in this field that selection has this field merge the prototype user's set that forms this field, the namely field prototype user of expansion set; Wherein, K is positive integer.
Kind of the child user follower field of a calculating degree of membership module 201, according to following formula calculate kind of child user pay close attention to user on i field field degree of membership M (f, S
i):
Wherein, f represents user, is specially certain and plants child user concern user, and i represents i field i=1,2..., and N, N is field number, S
irepresent the initial seed user set in i field, n (f, S
i) represent initial seed user concern user f by S set
ithe number of times that middle user pays close attention to.
User field computing module 3 comprise operation reptile/request API obtain user to be sorted pay close attention to list block 301, calculate user to be sorted field degree of membership module 302, to each user by sort field module 303 and get a front A field as user field label model 304, as shown in Figure 2 of degree of membership.Operation reptile/request API obtains user to be sorted and pays close attention to list block 301 according to user to be sorted, obtains user's to be sorted concern user, and user's to be sorted concern user is transferred to and calculates user to be sorted field degree of membership module 302.Calculate user to be sorted field degree of membership module 302 and calculate the field degree of membership of user to be sorted for every field according to the field prototype user set of expansion and user's to be sorted concern user, and this user's to be sorted field degree of membership is transferred to each user by degree of membership sequence field module 303.Each user is carried out to descending sort by degree of membership sequence field module 303 to this user to be sorted field degree of membership, and the user to be sorted field degree of membership sequential delivery after sequence is got to a front A field as user field label model 304.Get a front A field and get the interest worlds label of a front A field as user to be sorted as user field label model 304, and store user field recognition result.Wherein, A is positive integer, and concrete actual service condition is set.
Calculate user to be sorted field degree of membership module 302, for each user u to be sorted, first calculate the field prototype user who appears at each expansion in the concern user of user u to be sorted and gather P
iin quantity n (u, P
i), i=1,2 ..., N; Then, calculate the field degree of membership M (u, Pi) in corresponding i the field of user u to be sorted:
Wherein, u represents user to be sorted, and N is field number, and i represents i field, P
irepresent the field prototype user set of i field expansion, n (u, P
i) represent that in the concern user of user u to be sorted, appearing at user gathers P
iin quantity.
Data Collection and pretreatment module 1, field prototype user gathers the different phase that builds module 2 and user field computing module 3 these three modules difference respective user field identification process, Data Collection and pretreatment module 1 build module 2 for field prototype user gathers necessary Data support are provided, and field prototype user gathers and builds module 2 and provide necessary Data support for user field computing module 3.Specifically, the user that Data Collection and pretreatment module 1 are obtained pay close attention to relation data send into field prototype user gather build module 2, for the latter provides basic data support; Input enters field prototype user and gathers the calculating kind child user follower field degree of membership module 201 building in module 2 as data through Data Collection and pretreatment module 1, to process the storage data " the concern user list of each kind of child user " that obtain, and support field prototype user gathers the function realization that builds module 2; Field prototype user gathers and builds user that module 2 sends into according to Data Collection and pretreatment module 1 and pay close attention to relation data and determine that the prototype user of every field sends into user field computing module 3, for user field computing module 3 provides the support of prototype user data, through field prototype user, gather and build module 2 and processes the storage data that obtain " the field prototype user set of expansion " input enters in the computing module of user field 3 calculating user to be sorted field degree of membership module 302 as data, support the function realization of user field computing module 3.
User field computing module 3 is gathered according to Data Collection and pretreatment module 1 and field prototype user the data that build module 2 and provide and is finally realized user field recognition function.
Below in conjunction with Fig. 3 and embodiment, a kind of user field recognition methods based on linked network provided by the invention is described.
Step 1: collecting sample module 101 manual initial seed user who gathers every field in certain network platform by hand, the information of collection is every field initial seed user id and user field label.
In the embodiment of the present invention, on the field celebrity's roll of safeguarding at Sina's microblogging platform by the manual collecting sample module 101 in Data Collection and pretreatment module 1, gather the initial seed user of different field, field comprises: science and technology, finance and economics, physical culture, culture, fashion, media, education and amusement, the information of collection is every field initial seed user id and user field label.
For example, for the initial seed user's of sports field collection, concrete operations are: enter the microblogging celebrity's roll page http://data.weibo.com/top/influence/famous of Sina, from all spectra label, select physical culture label, the page will show the user of this field influence power seniority among brothers and sisters front 100, and these 100 users can be used as the kind child user in this field.Enter successively these 100 users' personal homepage, click " concern " link under its personal homepage head portrait, from the address field of jump page, can find the microblogging id of Sina of this kind of child user.The content showing as address field is:
Http:// weibo.com/1197161814/follow from=profile & wvr=5 & loc=tagfollow, " 1197161814 " are the kind child user id that needs collection, all kinds of child users that collect are saved in to text, main preservation field is user id and user field label, as: 1197161814, finance and economics.
The embodiment of the present invention is that each field has gathered 100 kind child user id by hand.
Step 2: the concern user list of collecting the initial seed user of every field.The concern user list that utilizes operation reptile/request API to obtain kind of child user to pay close attention to list block 102 and collect the initial kind child user of every field, operation reptile/request API obtains kind of an operating process for child user concern list block 102 and is:
Write reptile program, the every field initial seed user id gathering according to manual collecting sample module 101, enters each initial seed user's concern original list, obtains all kinds of child user followers' id; Or the concern fetch interface that utilizes the API of Sina to provide, according to the initial seed user id of every field, ask all follower id of this user, as shown in Figure 3: microblogging platform directly points to operation reptile/request API and obtains kind of a child user concern list block 102 and represent that the reptile program of writing obtains data from microblogging platform.
Utilize the microblogging API of Sina to collect every field initial seed user's concern user list.Friendships/friends/ids in the concern fetch interface that the embodiment of the present invention use microblogging API of Sina provides, obtains concern user list.In the program of writing, given parameters user id is uid, uses http GET request:
Https: //api.weibo.com/2/friendships/friends.json, can return to user and pay close attention to list.This request needs to obtain in advance the rights of using of paying close attention to interface in the API of Sina, and authority acquisition pattern can be referring to Sina's microblogging related development document.The concern user id list that each user returns has comprised all users that this user pays close attention to, and once focuses on a directed edge that is node sensing its concern user node corresponding from this user on linked network.
Step 3: the linked network that builds all kinds of child users, concrete mode is that, using each user as a node, user comprises kind of child user and initial seed user's concern user, concern relation between every two users represents with a directed edge, and that arrow is pointed is the person of being concerned.With calculating seed user follower field degree of membership module 201, the concern user of the initial seed user based on probability model calculating every field is with respect to the field degree of membership of every field
Particularly, represent initial seed user's concern user with f, i represents i field, i=1, and 2..., N, N is field number, in the embodiment of the present invention, N value is 8.Use S
irepresent the initial seed user set in i field, M (f, S
i) pay close attention to the field degree of membership of user f on i field, M (f, S
i) value is larger, paying close attention to user f, to belong to the possibility in i field larger.Suppose that one is paid close attention to user f at least by the kind child user S set in i field
iin initial user pay close attention to, use n (f, S
i) represent initial seed user concern user f by S set
ithe number of times that middle user pays close attention to, pay close attention to so the following formula of degree of membership that user f is under the jurisdiction of i field and calculate:
With the data instance in table 1, illustrate the computation process of initial seed user's concern user's field degree of membership below.
Table 1 is field initial seed user and pays close attention to user list
Field | Plant child user | Pay close attention to user |
Science and |
1 | 4,5,6 |
Finance and |
2 | 6,7,8 |
|
3 | 8,9,10 |
The field degree of membership of all concern user 4-10 of each kind of child user is in Table 2.
Table 2 is kind of a field degree of membership list for child user concern user list
Step 4: for each field, by degree of membership sequence line module 202, all initial seed users' concern user field degree of membership value is carried out to descending sort, i=1,2..., N in every field.
Table 3 is the ranking results of the concern user list field degree of membership of all kinds of child users of every field
Field | By user field degree of membership, sort from high to low |
Science and technology | 4,5,6,7,8,9,10 |
Finance and economics | 7,6,8,4,5,9,10 |
Media | 9,10,8,4,5,6,7 |
Step 5: use the field prototype user collection modules 203 of obtaining expansion to expand the number of users of every field, build the field prototype user set of new expansion, it is the set that the initial seed user of every field and the concern user with the highest field degree of membership form that the field prototype user of described expansion gathers.
Concrete grammar is: for i field, in the initial seed user's in this field concern user, choose front K the user with Gao field, this field degree of membership, the initial seed user in a selected K user and this field merges, and the field prototype user who forms the expansion in i field gathers P
i, i=1,2..., N; K is positive integer, represents the most representative user's number that each field needs, and can get as the case may be different values, and K gets 2 in embodiments of the present invention.
Table 4 is the field prototype user set of the expansion of every field
Field | Field prototype user |
Science and |
1,4,5 |
Finance and |
2,7,6 |
|
3,9,10 |
Step 6: the concern user profile of obtaining user to be sorted.Operation reptile/request API in user field computing module 3 obtains user to be sorted and pays close attention to list block 301 and collect user's to be sorted concern user list, and operation reptile/request API obtains the user's to be sorted operating process of concern user list with step 2; User to be sorted is the targeted customer who carries out domain classification, and user's to be sorted concern user is the foundation of carrying out field identification.
Table 5 is user to be sorted and pays close attention to user list
User to be sorted | Pay close attention to user |
11 | 2,6,14 |
12 | 3,10,15 |
13 | 1,4,5,6 |
Step 7: for each user to be sorted, the field prototype user set of the expansion of the every field obtaining with step 5, calculates the field degree of membership of this user to be sorted with respect to every field.
Use calculating user to be sorted field degree of membership module 302 to calculate user's to be sorted field degree of membership, detailed process is: a given user u to be sorted, is used n (u, P
i) represent that in the concern user of user u to be sorted, appearing at field prototype user gathers P
i, i=1,2 ..., the quantity in 8.Then, use following formula to calculate the field degree of membership in corresponding i the field of user u to be sorted:
Wherein, u represents user to be sorted, and N is field number, and i represents i field, P
irepresent the field prototype user set of i field expansion, n (u, P
i) represent that in the concern user of user u to be sorted, appearing at user gathers P
iin quantity.
Table 6 is the result of calculation of user to be sorted in the field of every field degree of membership
User to be sorted | Science and technology | Finance and economics | Media |
11 | 0=0/(0+2+0) | 1=2/(0+2+0) | 0=0/(0+2+0) |
12 | 0=0/(0+2+0) | 0=0/(0+2+0) | 1=2/(0+2+0) |
13 | 0.75=3/(3+0+1) | 0.25=1/(3+0+1) | 0=0/(3+0+1) |
Step 8: each user is sorted to field module 303 to each user to be sorted by degree of membership, by all N field, carry out descending sort according to this user's to be sorted field degree of membership.
To each user, by degree of membership sequence field module 303, will, to each user u to be sorted, descending sort be carried out to according to the field degree of membership of this user and every field in all 8 fields.
Table 7 is the ranking results of user to be sorted in the field of every field degree of membership
User | Field ranking results |
11 | Finance and economics, science and technology, media |
12 | Media, finance and economics, science and technology |
13 | Science and technology, finance and economics, media |
Step 9: select front A field that field degree of membership the is the highest interest worlds label as user u to be sorted.
Utilization get a front A field as user field label model 304 the interest worlds label using front A the highest user's to be sorted field degree of membership field as user u to be sorted.In the present embodiment, choosing A is 1.
Table 8 is each user's to be sorted interest worlds recognition result
User | Interest worlds |
11 | Finance and economics |
12 | Media |
13 | Science and technology |
The present invention proposes a kind of recognition methods of user field and device thereof based on linked network, the method makes full use of user's concern network link information, based on probability model, fully expands initial a small amount of kind child user set.The method that the present invention proposes is not considered user's content of text, but pays close attention to user's social networks structure, overcomes to a certain extent the poor problem of classification accuracy being caused by short text, correlativity, semantic polytrope.The extensibility of this method is very strong.First, method based on linked network not only can be applied to microblogging platform, can also be applied to the platform that has concern or friends between other users, as Renren Network, happy net, Yoqoo, the key distinction of applying in different platform is the source of kind of child user and gathers the mode of kind of child user; Secondly, field number N can be ten even more, the value of N depends primarily on concrete application situation the collection of convenient kind of child user as far as possible; In addition, the method that the present invention proposes is also easy to expand to dynamic transmission network or dynamic transmission network merges with concern linked network the network forming, still representative of consumer of each node on this network, but the line between node is paid close attention to relation or forwarding relation.
Claims (9)
1. the user field recognition device based on linked network, is characterized in that, comprises three modules: Data Collection and pretreatment module, and field prototype user gathers structure module and user field computing module;
Wherein, the function of Data Collection and pretreatment module is to gather initial seed user, crawls initial seed user's concern user list; Field prototype user gathers and builds module and utilize every field initial seed user's concern user, is that every field builds prototype user; User field computing module is used for calculating and sequencing selection user's to be sorted every field degree of membership;
Data Collection and pretreatment module comprise that manual collecting sample module and operation reptile/request API (Application Programming Interface, application programming interface) obtain kind of child user and pay close attention to list block;
Manual collecting sample module is for obtaining and store the initial seed user id of every field, and initial seed user id is transferred to operation reptile/request API obtains kind of a child user concern list block; Operation reptile/request API obtains kind of child user concern list block and obtains according to the initial seed user id of every field the initial seed user's of every field concern user;
Field prototype user gathers and builds module and comprise: calculate kind of a child user follower field degree of membership module, in every field by degree of membership sequence line module, obtain the field prototype user collection modules of expansion;
Calculate kind of child user follower field degree of membership module according to the initial seed user's of the every field of obtaining concern user, for each concern user, calculate the field degree of membership of this concern user for every field, and field degree of membership is transferred in every field by degree of membership sequence line module, in every field, by degree of membership sequence line module, each concern user's all spectra degree of membership is carried out to descending sort, and the field degree of membership after sequence is transferred to the field prototype user collection modules of obtaining expansion; Obtain the field prototype user collection modules of expansion, for each field, front K user of the highest field degree of membership and the initial seed user in this field that selection has this field merge the prototype user's set that forms this field, the field prototype user set of namely this field expansion; Wherein, K is positive integer;
User field computing module comprise operation reptile/request API obtain user to be sorted pay close attention to list block, calculate user to be sorted field degree of membership module, to each user by degree of membership sort field module, get a front A field as user field label model; Wherein, A is positive integer;
Operation reptile/request API obtains user to be sorted and pays close attention to list block according to user to be sorted, obtains user's to be sorted concern user, and user's to be sorted concern user is transferred to and calculates user to be sorted field degree of membership module; Calculate user to be sorted field degree of membership module and calculate the field degree of membership of user to be sorted for every field according to the field prototype user set of expansion and user's to be sorted concern user, and this user's to be sorted field degree of membership is transferred to each user by degree of membership sequence field module; Each user is carried out to descending sort by degree of membership sequence field module to this user to be sorted field degree of membership, and the user to be sorted field degree of membership after sequence is transferred to and gets a front A field as user field label model, finally obtain user's to be sorted interest worlds label.
2. a kind of user field recognition device based on linked network according to claim 1, it is characterized in that, described calculating kind child user follower field degree of membership module, according to field degree of membership M (f, S on following formula calculating initial seed user's corresponding i the field of concern user
i):
Wherein, f represents initial seed user's concern user, and N is field number, and i represents i field, S
irepresent the initial seed user set in i field, n (f, S
i) represent initial seed user concern user f by S set
ithe number of times that middle user pays close attention to.
3. a kind of user field recognition device based on linked network according to claim 1, it is characterized in that, described calculating user to be sorted field degree of membership module, calculates field degree of membership M (u, the P in corresponding i the field of user u to be sorted according to following formula
i):
Wherein, u represents user to be sorted, and N is field number, and i represents i field, P
irepresent the field prototype user set of the expansion in i field, n (u, P
i) represent that in the concern user of user u to be sorted, appearing at user gathers P
iin quantity.
4. application rights requires the user field recognition methods of a kind of user field recognition device based on linked network described in 1, it is characterized in that, comprises the steps:
Step 1: collecting sample module gathers the initial seed user of every field in the network platform by hand, the information of collection is every field initial seed user id and user field label;
Step 2: obtain kind of a child user concern list block with operation reptile/request API and collect the initial seed user's of every field concern user list;
Step 3: the linked network that builds all kinds of child users, concrete mode is: using each user as a node, user comprises initial seed user and initial seed user's concern user, and the concern relation between every two users represents with a directed edge, and that arrow is pointed is the person of being concerned;
With calculating seed user follower field degree of membership module, the concern user of the initial seed user based on probability model calculating every field is with respect to the field degree of membership of every field;
Step 4: for each field, by degree of membership sequence line module, all initial seed users' concern user's field degree of membership value is carried out to descending sort in every field;
Step 5: use the number of users of the field prototype user collection modules expansion every field of obtaining expansion, build the field prototype user set of new expansion;
Step 6: operation reptile/request API obtains user to be sorted and pay close attention to list block and obtain user's to be sorted concern user list;
Step 7: for each user to be sorted, use calculating user to be sorted field degree of membership module to calculate this user's to be sorted field degree of membership, specifically: according to the number of links of this user to be sorted and every field prototype user foundation, calculate the field degree of membership of this user to be sorted with respect to every field;
Step 8: each user is sorted to field module to each user to be sorted by degree of membership, descending sort is carried out to according to this user's to be sorted field degree of membership in all N field;
Step 9: get a front A field as user field label model using the interest worlds label of selecting front A field that user's to be sorted field degree of membership is the highest as user to be sorted.
5. user according to claim 4 field recognition methods, it is characterized in that, operation reptile/request API described in step 2 obtains kind of an operating process for child user concern list block: write reptile program, the initial seed user id of the every field gathering according to manual collecting sample module, enter each kind of child user and pay close attention to original list, obtain all concern users' id; Or the concern fetch interface that utilizes API to provide, according to the initial seed user id of every field, all follower id of request user.
6. user according to claim 4 field recognition methods, it is characterized in that, the linked network of all kinds of child users of the structure described in step 3, specifically: f represents user, is specially certain initial seed user's concern user, i represents i field, i=1,2..., N, N is field number, S
irepresent the initial seed user set in i field, M (f, S
i) pay close attention to the field degree of membership of user f on i field, use n (f, S
i) represent initial seed user concern user f by S set
ithe number of times that middle user pays close attention to, pays close attention to field degree of membership M (f, the S of user f so
i) calculate with following formula:
7. user according to claim 4 field recognition methods, it is characterized in that, the field prototype user set of the expansion that the structure described in step 5 is new, concrete methods of realizing is: for i field, in the initial seed user's in this field concern user, choose front K the user with Gao field, this field degree of membership, the initial seed user in a selected K user and this field merges, and the field prototype user who forms the expansion in i field gathers P
i, i=1 ..., N, K represents the most representative user's number that each field needs.
8. user according to claim 4 field recognition methods, it is characterized in that, operation reptile/request API described in step 6 obtains the operating process that user to be sorted pays close attention to list block: write reptile program, the user id to be sorted of the every field gathering according to manual collecting sample module, enter each user's to be sorted concern original list, obtain all concern users' id; Or the concern fetch interface that utilizes API to provide, according to the user id to be sorted of every field, all follower id of request user.
9. user according to claim 4 field recognition methods, it is characterized in that, the concrete grammar of the field degree of membership of the calculating user to be sorted described in step 7 is: a given user u to be sorted, is then used following formula to calculate the field degree of membership in corresponding i the field of user u to be sorted:
Wherein, N is field number, and i represents i field, P
irepresent the field prototype user set of i field expansion, n (u, P
i) represent that in the concern user of user u to be sorted, appearing at user gathers P
iin quantity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310705515.7A CN103761246B (en) | 2013-12-19 | 2013-12-19 | Link network based user domain identifying method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310705515.7A CN103761246B (en) | 2013-12-19 | 2013-12-19 | Link network based user domain identifying method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103761246A true CN103761246A (en) | 2014-04-30 |
CN103761246B CN103761246B (en) | 2017-02-08 |
Family
ID=50528485
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310705515.7A Active CN103761246B (en) | 2013-12-19 | 2013-12-19 | Link network based user domain identifying method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103761246B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104573034A (en) * | 2015-01-15 | 2015-04-29 | 中国联合网络通信集团有限公司 | CDR call ticket based user group division method and system |
CN104572932A (en) * | 2014-12-29 | 2015-04-29 | 微梦创科网络科技(中国)有限公司 | Method and device for determining interest label |
CN106021257A (en) * | 2015-12-31 | 2016-10-12 | 广州华多网络科技有限公司 | Method, device, and system for crawler to capture data supporting online programming |
CN106611350A (en) * | 2015-10-26 | 2017-05-03 | 阿里巴巴集团控股有限公司 | Method and device for mining potential user source |
CN116842145A (en) * | 2023-04-20 | 2023-10-03 | 海信集团控股股份有限公司 | Domain identification method and device based on city question-answering system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101655868B (en) * | 2009-09-03 | 2012-08-22 | 中国人民解放军信息工程大学 | Network data mining method, network data transmitting method and equipment |
-
2013
- 2013-12-19 CN CN201310705515.7A patent/CN103761246B/en active Active
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104572932A (en) * | 2014-12-29 | 2015-04-29 | 微梦创科网络科技(中国)有限公司 | Method and device for determining interest label |
CN104572932B (en) * | 2014-12-29 | 2017-11-24 | 微梦创科网络科技(中国)有限公司 | A kind of determination method and device of interest tags |
CN104573034A (en) * | 2015-01-15 | 2015-04-29 | 中国联合网络通信集团有限公司 | CDR call ticket based user group division method and system |
CN104573034B (en) * | 2015-01-15 | 2018-03-23 | 中国联合网络通信集团有限公司 | User group's division method and system based on CDR tickets |
CN106611350A (en) * | 2015-10-26 | 2017-05-03 | 阿里巴巴集团控股有限公司 | Method and device for mining potential user source |
CN106611350B (en) * | 2015-10-26 | 2020-06-05 | 阿里巴巴集团控股有限公司 | Method and device for mining potential user source |
CN106021257A (en) * | 2015-12-31 | 2016-10-12 | 广州华多网络科技有限公司 | Method, device, and system for crawler to capture data supporting online programming |
CN106021257B (en) * | 2015-12-31 | 2019-10-18 | 广州华多网络科技有限公司 | A kind of crawler capturing data method, apparatus and system for supporting online programming |
CN116842145A (en) * | 2023-04-20 | 2023-10-03 | 海信集团控股股份有限公司 | Domain identification method and device based on city question-answering system |
CN116842145B (en) * | 2023-04-20 | 2024-02-27 | 海信集团控股股份有限公司 | Domain identification method and device based on city question-answering system |
Also Published As
Publication number | Publication date |
---|---|
CN103761246B (en) | 2017-02-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102279851B (en) | Intelligent navigation method, device and system | |
CN104933622A (en) | Microblog popularity degree prediction method based on user and microblog theme and microblog popularity degree prediction system based on user and microblog theme | |
CN105335491B (en) | Behavior is clicked come to the method and system of user's Recommended Books based on user | |
CN105718579A (en) | Information push method based on internet-surfing log mining and user activity recognition | |
CN104484431B (en) | A kind of multi-source Personalize News webpage recommending method based on domain body | |
CN104077417A (en) | Figure tag recommendation method and system in social network | |
CN106250513A (en) | A kind of event personalization sorting technique based on event modeling and system | |
CN105426514A (en) | Personalized mobile APP recommendation method | |
CN103324666A (en) | Topic tracing method and device based on micro-blog data | |
CN103218400B (en) | Based on link and network community user group's division methods of content of text | |
CN101593200A (en) | Chinese Web page classification method based on the keyword frequency analysis | |
CN104965931A (en) | Big data based public opinion analysis method | |
CN103544188A (en) | Method and device for pushing mobile internet content based on user preference | |
CN104008203A (en) | User interest discovering method with ontology situation blended in | |
CN103226578A (en) | Method for identifying websites and finely classifying web pages in medical field | |
CN105138577A (en) | Big data based event evolution analysis method | |
CN104268230B (en) | A kind of Chinese micro-blog viewpoint detection method based on heterogeneous figure random walk | |
CN101393555A (en) | Rubbish blog detecting method | |
CN110134845A (en) | Project public sentiment monitoring method, device, computer equipment and storage medium | |
CN103761246A (en) | Link network based user domain identifying method and device | |
CN111191099B (en) | User activity type identification method based on social media | |
CN104899229A (en) | Swarm intelligence based behavior clustering system | |
CN103412930A (en) | Method for identifying attributes of internet users | |
CN104933475A (en) | Network forwarding behavior prediction method and apparatus | |
CN102567392A (en) | Control method for interest subject excavation based on time window |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |