CN103761246A - Link network based user domain identifying method and device - Google Patents

Link network based user domain identifying method and device Download PDF

Info

Publication number
CN103761246A
CN103761246A CN201310705515.7A CN201310705515A CN103761246A CN 103761246 A CN103761246 A CN 103761246A CN 201310705515 A CN201310705515 A CN 201310705515A CN 103761246 A CN103761246 A CN 103761246A
Authority
CN
China
Prior art keywords
user
field
membership
sorted
degree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310705515.7A
Other languages
Chinese (zh)
Other versions
CN103761246B (en
Inventor
刘春阳
程工
张旭
庞琳
王卿
吴俊杰
王亚琼
李红
韩小汀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Computer Network and Information Security Management Center
Original Assignee
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Computer Network and Information Security Management Center filed Critical National Computer Network and Information Security Management Center
Priority to CN201310705515.7A priority Critical patent/CN103761246B/en
Publication of CN103761246A publication Critical patent/CN103761246A/en
Application granted granted Critical
Publication of CN103761246B publication Critical patent/CN103761246B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a link network based user domain identifying method and device and belongs to the field of data mining and complicated networks. The device comprises a data collecting and preprocessing module, a domain prototype user collection constructing module and a user domain calculating module. The method includes step one, manually collecting an initial seed user; step two, collecting followed users of the seed user; step three, constructing a link network, and calculating memberships of each followed user to each domain; step four, sequencing the users according to the memberships; step five, constructing a domain prototype user collection for each domain; step fix, collecting followed users of a user to be classified; step seven, calculating the memberships of the user to be classified to each domain; step eight, sequencing the domains according to the memberships; step nine, adding domain labels. The link network based user domain identifying method and device are applicable to multiple social network platforms, capable of overcoming defects of short texts, and particularly suitable for fields of user modeling, personalized information search, recommendation and the like.

Description

A kind of recognition methods of user field and device thereof based on linked network
Technical field
The invention belongs to Data Mining and complex network field, specifically, relate to the user field recognition technology based on linked network, specifically propose a kind of recognition methods of user field and device thereof based on linked network, the result of field identification can be used for the fields such as user group's division, network marketing.
Background technology
Flourish along with Social Media platform (as Sina's microblogging), increasing user by be even gradually accustomed to by such platform publicize, share, searching information and resource.These users are from different fields such as science and technology, finance and economics, physical culture, media, but a lot of user often more pays close attention to the news of this area and message and to belonging to the problem of this area or the propagation of event more easily exerts an influence.Therefore identify in advance the field that user pays close attention to, for some field labels of each user assignment, conveniently the network user is carried out to Classification Management and analysis, specifically to realizing user's accurate information pushing, the user who improves platform experiences or identifies the key user of every field, improves relevant unit the perception velocities of field event is being brought into play to very important effect.First traditional network user field recognition methods sets up the Feature Words set (field dictionary) of every field according to sample data, then by the homemade content of text of match user and field dictionary, classified in user's interest worlds.There are some problems that are difficult to overcome in this method: first, the complicacy of Chinese word segmentation makes the authority of field dictionary be difficult to guarantee; Secondly along with the development of instantaneous social networks, language use tends to simplification and variation, and neologisms constantly occur, as given power, fruit powder, refreshing horse, the difficulty increase that dictionary upgrades; In addition, on the such short text real-time platform of microblogging, user all the time can issuing microblog, and content relates to country, society, the even every aspect of domestic trivia of living, so be only difficult to from text the field of judging that user pays close attention to most.Above-mentioned reason causes content-based user field recognition technology precision lower; And the method that selection sort device is classified based on some attribute faces the problem of How to choose attribute and sorter, and existing sorter often shows very poor in the problem of multicategory classification.
Summary of the invention
Prior art is poor to microblogging situation adaptability, and as text based field identification faces the difficult problem of assurance field dictionary authority and real-time, and that sorting technique often shows in the multicategory classification problem as field is identified is very poor.For these problems, the present invention makes full use of the network structure attribute of the social media of this class of microblogging, adopts the technology based on linked network to carry out field identification to mass users, proposes a kind of recognition methods of user field and device thereof based on linked network.
A kind of user field recognition device based on linked network provided by the invention, comprises three modules: Data Collection and pretreatment module, field prototype user gathers structure module and user field computing module.
Wherein, the function of Data Collection and pretreatment module is to gather initial seed user, crawls initial seed user's concern user list; Field prototype user gathers and builds module and utilize every field initial seed user's concern user, is that each field builds prototype user; User field computing module is used for calculating and sequencing selection user's to be sorted every field degree of membership.
Data Collection and pretreatment module comprise that manual collecting sample module and operation reptile/request API (Application Programming Interface, application programming interface) obtain kind of child user and pay close attention to list block; Manual collecting sample module is for obtaining and store the initial seed user id of every field, and operation reptile/request API obtains kind of child user and pays close attention to list block for obtaining each initial seed user's concern user list.
Manual collecting sample module is transferred to operation reptile/request API by the initial seed user id of the every field of obtaining and obtains kind of child user and pay close attention to a list block, and operation reptile/request API obtains kind of child user concern list block and obtain according to the initial seed user id of every field the initial seed user's of every field concern user.
Field prototype user gathers and builds module and comprise: calculate kind of a child user follower field degree of membership module, in every field by degree of membership sequence line module, obtain the field prototype user collection modules of expansion; Calculate the field degree of membership that kind of child user follower field degree of membership module is used for the initial seed user's who stores and calculate every field concern user, in every field, by degree of membership sequence line module, the field degree of membership of paying close attention to user is sorted, obtain the field prototype user collection modules of expansion and gather for obtaining with the field prototype user of Memory Extension.
Calculate kind of child user follower field degree of membership module according to the initial seed user's of the every field of obtaining concern user, for each concern user, calculate the field degree of membership of this concern user for every field, and field degree of membership is transferred in every field by degree of membership sequence line module, in every field, by degree of membership sequence line module, each concern user's all spectra degree of membership is carried out to descending sort, and the field degree of membership after sequence is transferred to the field prototype user collection modules of obtaining expansion, obtain the field prototype user collection modules of expansion, for each field, front K user of the highest field degree of membership and the initial seed user in this field that selection has this field merge the prototype user's set that forms this field, the namely field prototype user of expansion set, wherein, K is positive integer.
User field computing module comprise operation reptile/request API obtain user to be sorted pay close attention to list block, calculate user to be sorted field degree of membership module, to each user by degree of membership sort field module, get a front A field as user field label model; Operation reptile/request API obtains user to be sorted and pays close attention to list block for obtaining user's to be sorted concern user list, calculate user to be sorted field degree of membership module, to each user by degree of membership sort field module, get a front A field as user field label model for storing user to be sorted field recognition result; Wherein, A is positive integer.
Operation reptile/request API obtains user to be sorted and pays close attention to list block according to user to be sorted, obtain user's to be sorted concern user, user's to be sorted concern user is transferred to and calculates user to be sorted field degree of membership module, calculate user to be sorted field degree of membership module and calculate the field degree of membership of user to be sorted for every field according to the field prototype user set of expansion and user's to be sorted concern user, and this user's to be sorted field degree of membership is transferred to each user by degree of membership sequence field module, each user is carried out to descending sort by degree of membership sequence field module to this user to be sorted field degree of membership, and the user to be sorted field degree of membership after sequence is transferred to and gets a front A field as user field label model, finally obtain user's to be sorted interest worlds label.
A kind of user field recognition methods based on linked network comprises the steps:
Step 1: collecting sample module gathers the initial seed user of every field in the network platform by hand, the information of collection is every field initial seed user id and user field label.
Step 2: obtain kind of a child user concern list block with operation reptile/request API and collect the initial seed user's of every field concern user list, operation reptile/request API obtains kind of an operating process for child user concern list block: write reptile program, the initial seed user id of the every field gathering according to manual collecting sample module, enter each kind of child user and pay close attention to original list, obtain all kinds of child user followers' id; Or the concern fetch interface that utilizes API to provide, asks all follower id of this user according to the initial seed user id of every field.
Step 3: the linked network that builds all kinds of child users, concrete mode is: using each user as a node, user comprises initial seed user and initial seed user's concern user, and the concern relation between every two users represents with a directed edge, and that arrow is pointed is the person of being concerned; Directed edge, based on linking relationship, has formed the linked network between all initial seed users and initial seed user's concern user.
With calculating seed user follower field degree of membership module, the concern user of the initial seed user based on probability model calculating every field is with respect to the field degree of membership of every field;
Particularly, represent any one initial seed user's concern user with f, i represents i field i=1,2..., and N, N is field number, S irepresent the initial seed user set in i field, M (f, S i) pay close attention to the field degree of membership of user f on i field, M (f, S i) value is larger, paying close attention to user f, to belong to the possibility in i field larger.Suppose that one is paid close attention to user f at least by S iin initial seed user pay close attention to, use n (f, S i) represent initial seed user concern user f by S set ithe number of times that middle user pays close attention to.
Pay close attention to so field degree of membership M (f, the S of user f i) calculate with following formula:
M ( f , S i ) = n ( f , S i ) / Σ i = 1 N n ( f , S i ) , i = 1 , . . . , N
Step 4: for each field, by degree of membership sequence line module, all initial seed users' concern user's field degree of membership value is carried out to descending sort in every field.
Step 5: use the number of users of the field prototype user collection modules expansion every field of obtaining expansion, build the field prototype user set of new expansion;
Concrete grammar is: for i field, in the initial seed user's in this field concern user, choose front K the user with Gao field, this field degree of membership, the initial seed user in a selected K user and this field merges, and the field prototype user who forms the expansion in i field gathers P i, i=1,2..., N; K represents the most representative user's number that each field needs, and can get as the case may be different values.
Step 6: operation reptile/request API obtains user to be sorted and pay close attention to list block and obtain user's to be sorted concern user list, moves reptile/request API and obtains operating process that user to be sorted pays close attention to list with step 2.
Step 7: for each user to be sorted, use calculating user to be sorted field degree of membership module to calculate the field degree of membership of this user to be sorted with respect to every field; Specifically: according to the number of links of this user to be sorted and every field prototype user foundation, calculate the field degree of membership of this user to be sorted with respect to every field;
Particularly, a given user u to be sorted, is then used following formula to calculate the field degree of membership in corresponding i the field of user u to be sorted:
M ( u , P i ) = n ( u , P i ) / Σ i = 1 N n ( u , P i ) , i = 1 , . . . , N .
Wherein, u represents user to be sorted, and N is field number, and i represents i field, P irepresent the field prototype user set of i field expansion, n (u, P i) represent that in the concern user of user u to be sorted, appearing at user gathers P iin quantity.
Step 8: each user is sorted to field module to each user to be sorted by degree of membership, descending sort is carried out to according to this user's to be sorted field degree of membership in all N field.
Step 9: get a front A field as user field label model using the interest worlds label of selecting front A field that user's to be sorted field degree of membership is the highest as user to be sorted
Advantage of the present invention and good effect are:
(1) the present invention is based on the user field recognition methods of linked network, do not consider user's content of text, but pay close attention to user's social networks structure, can overcome the poor difficulty of classification accuracy being caused by short text, correlativity, semantic polytrope.
(2) the present invention is based on the user field recognition methods of linked network, extensibility is very strong, not only can be applied to microblogging platform, can also be applied to exist between other users to pay close attention to or the network platform of friends.
(3) the present invention is based on the user field recognition methods of linked network, utilize concern relation to build user network, and the seed number of users of every field is expanded, not only greatly improve the probability that all kinds of child users are paid close attention to by user to be sorted, the quantity of expansion also can be adjusted according to concrete application scenarios data bulk, has guaranteed use elasticity.
(4) the present invention is based on the user field recognition methods of linked network, can overcome the shortcoming of short text, be particularly suitable for user modeling, the fields such as customized information search and recommendation.
(5) the present invention is based on the user field recognition device of linked network, clear in structure, user field identifying is divided into three main modular, and the coupling between each large module is low, and the concrete grammar of each inside modules changes does not affect the realization of the function of other modules.
figure of description
Fig. 1 is the one-piece construction figure of the user field recognition device based on linked network of the present invention;
Fig. 2 is each modular structure figure of the user field recognition device based on linked network of the present invention;
Fig. 3 is the process flow diagram of the user field recognition methods based on linked network of the present invention.
Embodiment
Below in conjunction with the drawings and specific embodiments, the present invention is further detailed.
The present invention proposes a kind of recognition methods of user field and device thereof based on linked network.The method, take a small amount of field kind child user as basis, is obtained the linked network of kind of child user, and join probability model is used the prototype user of this linked network Automatic Extraction every field.And then the prototype user based on every field again join probability model determine the most possible interested field of each user.The hypothesis of the technology of the present invention based on such common-sense, on social media (as Sina's microblogging) platform, when a user has special interest to certain field, can pay close attention to some more influential users that much equally this field interested user, especially people known; If user itself is certain field, as the VIP of sciemtifec and technical sphere, he also can be paid close attention to the interested user in this field by those.From this basic assumption, the present invention makes full use of the distinctive linked network of social platform and builds user's domain classification model.Utilize linked network, overcome to a certain extent, polytrope brief by microblogging text, the more processing difficult problem of bringing of modal particle.The present invention utilizes the concern relation between user to form network, tape label user based on some generates the kind child user set of every field, according to the kind child user set in each field, calculate other users' field classification degree of membership, thereby determine the interested field of user according to the value of degree of membership.Utilize and pay close attention to network structure and not only can overcome the short text only facing with microblogging content of text, the problem that semantic noise is large but also can calculate according to the size increase of adjustment every field seed set user's quantity, the accuracy that raising field is identified.
A kind of user field recognition device based on linked network provided by the invention, as shown in Figure 1, comprises three modules: Data Collection and pretreatment module 1, field prototype user gathers structure module 2 and user field computing module 3.
Wherein, the function of Data Collection and pretreatment module 1 is the concern user list that gathers initial seed user, crawls initial seed user; Field prototype user gathers and builds module 2 and utilize every field initial seed user's concern user, is that every field builds prototype user; The function of user field computing module 3 is: calculating and sequencing selection user's to be sorted every field degree of membership.
Data Collection and pretreatment module 1 comprise that manual collecting sample module 101, operation reptile/request API obtain kind of child user and pay close attention to list block 102, as shown in Figure 2.Manual collecting sample module 101 is for obtaining and store the initial seed user id of every field, and the initial seed user id of the every field of obtaining is transferred to operation reptile/request API obtains kind of a child user concern list block 102.Operation reptile/request API obtains kind of child user concern list block 102 and obtains according to the initial seed user id of every field the initial seed user's of every field concern user.
Field prototype user gathers and builds mould 2 and comprise: calculate kind of a child user follower field degree of membership module 201, in every field by degree of membership sequence line module 202 and obtain the field prototype user collection modules 203 of expansion, as shown in Figure 2.
Calculate kind of child user follower field degree of membership module 201 according to the initial seed user's of the every field of obtaining concern user, for each concern user, calculate this concern user for every field field degree of membership, and field degree of membership is transferred in every field by degree of membership sequence line module 202.In every field, all spectra degree of membership descending sort to each concern user by degree of membership sequence line module 202, is transferred to by the field degree of membership after sequence the field prototype user collection modules 203 of obtaining expansion.Obtain the field prototype user collection modules 203 of expansion, for obtaining with the field prototype user of Memory Extension, gather, specifically, for each field, front K user of the highest field degree of membership and the initial seed user in this field that selection has this field merge the prototype user's set that forms this field, the namely field prototype user of expansion set; Wherein, K is positive integer.
Kind of the child user follower field of a calculating degree of membership module 201, according to following formula calculate kind of child user pay close attention to user on i field field degree of membership M (f, S i):
M ( f , S i ) = n ( f , S i ) / Σ i = 1 N n ( f , S i ) , i = 1 , . . . , N
Wherein, f represents user, is specially certain and plants child user concern user, and i represents i field i=1,2..., and N, N is field number, S irepresent the initial seed user set in i field, n (f, S i) represent initial seed user concern user f by S set ithe number of times that middle user pays close attention to.
User field computing module 3 comprise operation reptile/request API obtain user to be sorted pay close attention to list block 301, calculate user to be sorted field degree of membership module 302, to each user by sort field module 303 and get a front A field as user field label model 304, as shown in Figure 2 of degree of membership.Operation reptile/request API obtains user to be sorted and pays close attention to list block 301 according to user to be sorted, obtains user's to be sorted concern user, and user's to be sorted concern user is transferred to and calculates user to be sorted field degree of membership module 302.Calculate user to be sorted field degree of membership module 302 and calculate the field degree of membership of user to be sorted for every field according to the field prototype user set of expansion and user's to be sorted concern user, and this user's to be sorted field degree of membership is transferred to each user by degree of membership sequence field module 303.Each user is carried out to descending sort by degree of membership sequence field module 303 to this user to be sorted field degree of membership, and the user to be sorted field degree of membership sequential delivery after sequence is got to a front A field as user field label model 304.Get a front A field and get the interest worlds label of a front A field as user to be sorted as user field label model 304, and store user field recognition result.Wherein, A is positive integer, and concrete actual service condition is set.
Calculate user to be sorted field degree of membership module 302, for each user u to be sorted, first calculate the field prototype user who appears at each expansion in the concern user of user u to be sorted and gather P iin quantity n (u, P i), i=1,2 ..., N; Then, calculate the field degree of membership M (u, Pi) in corresponding i the field of user u to be sorted: M ( u , P i ) = n ( u , P i ) / Σ i = 1 N n ( u , P i ) , i = 1 , . . . , N
Wherein, u represents user to be sorted, and N is field number, and i represents i field, P irepresent the field prototype user set of i field expansion, n (u, P i) represent that in the concern user of user u to be sorted, appearing at user gathers P iin quantity.
Data Collection and pretreatment module 1, field prototype user gathers the different phase that builds module 2 and user field computing module 3 these three modules difference respective user field identification process, Data Collection and pretreatment module 1 build module 2 for field prototype user gathers necessary Data support are provided, and field prototype user gathers and builds module 2 and provide necessary Data support for user field computing module 3.Specifically, the user that Data Collection and pretreatment module 1 are obtained pay close attention to relation data send into field prototype user gather build module 2, for the latter provides basic data support; Input enters field prototype user and gathers the calculating kind child user follower field degree of membership module 201 building in module 2 as data through Data Collection and pretreatment module 1, to process the storage data " the concern user list of each kind of child user " that obtain, and support field prototype user gathers the function realization that builds module 2; Field prototype user gathers and builds user that module 2 sends into according to Data Collection and pretreatment module 1 and pay close attention to relation data and determine that the prototype user of every field sends into user field computing module 3, for user field computing module 3 provides the support of prototype user data, through field prototype user, gather and build module 2 and processes the storage data that obtain " the field prototype user set of expansion " input enters in the computing module of user field 3 calculating user to be sorted field degree of membership module 302 as data, support the function realization of user field computing module 3.
User field computing module 3 is gathered according to Data Collection and pretreatment module 1 and field prototype user the data that build module 2 and provide and is finally realized user field recognition function.
Below in conjunction with Fig. 3 and embodiment, a kind of user field recognition methods based on linked network provided by the invention is described.
Step 1: collecting sample module 101 manual initial seed user who gathers every field in certain network platform by hand, the information of collection is every field initial seed user id and user field label.
In the embodiment of the present invention, on the field celebrity's roll of safeguarding at Sina's microblogging platform by the manual collecting sample module 101 in Data Collection and pretreatment module 1, gather the initial seed user of different field, field comprises: science and technology, finance and economics, physical culture, culture, fashion, media, education and amusement, the information of collection is every field initial seed user id and user field label.
For example, for the initial seed user's of sports field collection, concrete operations are: enter the microblogging celebrity's roll page http://data.weibo.com/top/influence/famous of Sina, from all spectra label, select physical culture label, the page will show the user of this field influence power seniority among brothers and sisters front 100, and these 100 users can be used as the kind child user in this field.Enter successively these 100 users' personal homepage, click " concern " link under its personal homepage head portrait, from the address field of jump page, can find the microblogging id of Sina of this kind of child user.The content showing as address field is:
Http:// weibo.com/1197161814/follow from=profile & wvr=5 & loc=tagfollow, " 1197161814 " are the kind child user id that needs collection, all kinds of child users that collect are saved in to text, main preservation field is user id and user field label, as: 1197161814, finance and economics.
The embodiment of the present invention is that each field has gathered 100 kind child user id by hand.
Step 2: the concern user list of collecting the initial seed user of every field.The concern user list that utilizes operation reptile/request API to obtain kind of child user to pay close attention to list block 102 and collect the initial kind child user of every field, operation reptile/request API obtains kind of an operating process for child user concern list block 102 and is:
Write reptile program, the every field initial seed user id gathering according to manual collecting sample module 101, enters each initial seed user's concern original list, obtains all kinds of child user followers' id; Or the concern fetch interface that utilizes the API of Sina to provide, according to the initial seed user id of every field, ask all follower id of this user, as shown in Figure 3: microblogging platform directly points to operation reptile/request API and obtains kind of a child user concern list block 102 and represent that the reptile program of writing obtains data from microblogging platform.
Utilize the microblogging API of Sina to collect every field initial seed user's concern user list.Friendships/friends/ids in the concern fetch interface that the embodiment of the present invention use microblogging API of Sina provides, obtains concern user list.In the program of writing, given parameters user id is uid, uses http GET request:
Https: //api.weibo.com/2/friendships/friends.json, can return to user and pay close attention to list.This request needs to obtain in advance the rights of using of paying close attention to interface in the API of Sina, and authority acquisition pattern can be referring to Sina's microblogging related development document.The concern user id list that each user returns has comprised all users that this user pays close attention to, and once focuses on a directed edge that is node sensing its concern user node corresponding from this user on linked network.
Step 3: the linked network that builds all kinds of child users, concrete mode is that, using each user as a node, user comprises kind of child user and initial seed user's concern user, concern relation between every two users represents with a directed edge, and that arrow is pointed is the person of being concerned.With calculating seed user follower field degree of membership module 201, the concern user of the initial seed user based on probability model calculating every field is with respect to the field degree of membership of every field
Particularly, represent initial seed user's concern user with f, i represents i field, i=1, and 2..., N, N is field number, in the embodiment of the present invention, N value is 8.Use S irepresent the initial seed user set in i field, M (f, S i) pay close attention to the field degree of membership of user f on i field, M (f, S i) value is larger, paying close attention to user f, to belong to the possibility in i field larger.Suppose that one is paid close attention to user f at least by the kind child user S set in i field iin initial user pay close attention to, use n (f, S i) represent initial seed user concern user f by S set ithe number of times that middle user pays close attention to, pay close attention to so the following formula of degree of membership that user f is under the jurisdiction of i field and calculate:
M ( f , S i ) = n ( f , S i ) / Σ i = 1 8 n ( f , S i ) , i = 1 , . . . , 8 .
With the data instance in table 1, illustrate the computation process of initial seed user's concern user's field degree of membership below.
Table 1 is field initial seed user and pays close attention to user list
Field Plant child user Pay close attention to user
Science and technology 1 4,5,6
Finance and economics 2 6,7,8
Media 3 8,9,10
The field degree of membership of all concern user 4-10 of each kind of child user is in Table 2.
Table 2 is kind of a field degree of membership list for child user concern user list
Figure BDA0000441915300000082
Figure BDA0000441915300000091
Step 4: for each field, by degree of membership sequence line module 202, all initial seed users' concern user field degree of membership value is carried out to descending sort, i=1,2..., N in every field.
Table 3 is the ranking results of the concern user list field degree of membership of all kinds of child users of every field
Field By user field degree of membership, sort from high to low
Science and technology 4,5,6,7,8,9,10
Finance and economics 7,6,8,4,5,9,10
Media 9,10,8,4,5,6,7
Step 5: use the field prototype user collection modules 203 of obtaining expansion to expand the number of users of every field, build the field prototype user set of new expansion, it is the set that the initial seed user of every field and the concern user with the highest field degree of membership form that the field prototype user of described expansion gathers.
Concrete grammar is: for i field, in the initial seed user's in this field concern user, choose front K the user with Gao field, this field degree of membership, the initial seed user in a selected K user and this field merges, and the field prototype user who forms the expansion in i field gathers P i, i=1,2..., N; K is positive integer, represents the most representative user's number that each field needs, and can get as the case may be different values, and K gets 2 in embodiments of the present invention.
Table 4 is the field prototype user set of the expansion of every field
Field Field prototype user
Science and technology 1,4,5
Finance and economics 2,7,6
Media 3,9,10
Step 6: the concern user profile of obtaining user to be sorted.Operation reptile/request API in user field computing module 3 obtains user to be sorted and pays close attention to list block 301 and collect user's to be sorted concern user list, and operation reptile/request API obtains the user's to be sorted operating process of concern user list with step 2; User to be sorted is the targeted customer who carries out domain classification, and user's to be sorted concern user is the foundation of carrying out field identification.
Table 5 is user to be sorted and pays close attention to user list
User to be sorted Pay close attention to user
11 2,6,14
12 3,10,15
13 1,4,5,6
Step 7: for each user to be sorted, the field prototype user set of the expansion of the every field obtaining with step 5, calculates the field degree of membership of this user to be sorted with respect to every field.
Use calculating user to be sorted field degree of membership module 302 to calculate user's to be sorted field degree of membership, detailed process is: a given user u to be sorted, is used n (u, P i) represent that in the concern user of user u to be sorted, appearing at field prototype user gathers P i, i=1,2 ..., the quantity in 8.Then, use following formula to calculate the field degree of membership in corresponding i the field of user u to be sorted:
M ( u , P i ) = n ( u , P i ) / Σ i = 1 8 n ( u , P i ) , i = 1 , . . . , 8 .
Wherein, u represents user to be sorted, and N is field number, and i represents i field, P irepresent the field prototype user set of i field expansion, n (u, P i) represent that in the concern user of user u to be sorted, appearing at user gathers P iin quantity.
Table 6 is the result of calculation of user to be sorted in the field of every field degree of membership
User to be sorted Science and technology Finance and economics Media
11 0=0/(0+2+0) 1=2/(0+2+0) 0=0/(0+2+0)
12 0=0/(0+2+0) 0=0/(0+2+0) 1=2/(0+2+0)
13 0.75=3/(3+0+1) 0.25=1/(3+0+1) 0=0/(3+0+1)
Step 8: each user is sorted to field module 303 to each user to be sorted by degree of membership, by all N field, carry out descending sort according to this user's to be sorted field degree of membership.
To each user, by degree of membership sequence field module 303, will, to each user u to be sorted, descending sort be carried out to according to the field degree of membership of this user and every field in all 8 fields.
Table 7 is the ranking results of user to be sorted in the field of every field degree of membership
User Field ranking results
11 Finance and economics, science and technology, media
12 Media, finance and economics, science and technology
13 Science and technology, finance and economics, media
Step 9: select front A field that field degree of membership the is the highest interest worlds label as user u to be sorted.
Utilization get a front A field as user field label model 304 the interest worlds label using front A the highest user's to be sorted field degree of membership field as user u to be sorted.In the present embodiment, choosing A is 1.
Table 8 is each user's to be sorted interest worlds recognition result
User Interest worlds
11 Finance and economics
12 Media
13 Science and technology
The present invention proposes a kind of recognition methods of user field and device thereof based on linked network, the method makes full use of user's concern network link information, based on probability model, fully expands initial a small amount of kind child user set.The method that the present invention proposes is not considered user's content of text, but pays close attention to user's social networks structure, overcomes to a certain extent the poor problem of classification accuracy being caused by short text, correlativity, semantic polytrope.The extensibility of this method is very strong.First, method based on linked network not only can be applied to microblogging platform, can also be applied to the platform that has concern or friends between other users, as Renren Network, happy net, Yoqoo, the key distinction of applying in different platform is the source of kind of child user and gathers the mode of kind of child user; Secondly, field number N can be ten even more, the value of N depends primarily on concrete application situation the collection of convenient kind of child user as far as possible; In addition, the method that the present invention proposes is also easy to expand to dynamic transmission network or dynamic transmission network merges with concern linked network the network forming, still representative of consumer of each node on this network, but the line between node is paid close attention to relation or forwarding relation.

Claims (9)

1. the user field recognition device based on linked network, is characterized in that, comprises three modules: Data Collection and pretreatment module, and field prototype user gathers structure module and user field computing module;
Wherein, the function of Data Collection and pretreatment module is to gather initial seed user, crawls initial seed user's concern user list; Field prototype user gathers and builds module and utilize every field initial seed user's concern user, is that every field builds prototype user; User field computing module is used for calculating and sequencing selection user's to be sorted every field degree of membership;
Data Collection and pretreatment module comprise that manual collecting sample module and operation reptile/request API (Application Programming Interface, application programming interface) obtain kind of child user and pay close attention to list block;
Manual collecting sample module is for obtaining and store the initial seed user id of every field, and initial seed user id is transferred to operation reptile/request API obtains kind of a child user concern list block; Operation reptile/request API obtains kind of child user concern list block and obtains according to the initial seed user id of every field the initial seed user's of every field concern user;
Field prototype user gathers and builds module and comprise: calculate kind of a child user follower field degree of membership module, in every field by degree of membership sequence line module, obtain the field prototype user collection modules of expansion;
Calculate kind of child user follower field degree of membership module according to the initial seed user's of the every field of obtaining concern user, for each concern user, calculate the field degree of membership of this concern user for every field, and field degree of membership is transferred in every field by degree of membership sequence line module, in every field, by degree of membership sequence line module, each concern user's all spectra degree of membership is carried out to descending sort, and the field degree of membership after sequence is transferred to the field prototype user collection modules of obtaining expansion; Obtain the field prototype user collection modules of expansion, for each field, front K user of the highest field degree of membership and the initial seed user in this field that selection has this field merge the prototype user's set that forms this field, the field prototype user set of namely this field expansion; Wherein, K is positive integer;
User field computing module comprise operation reptile/request API obtain user to be sorted pay close attention to list block, calculate user to be sorted field degree of membership module, to each user by degree of membership sort field module, get a front A field as user field label model; Wherein, A is positive integer;
Operation reptile/request API obtains user to be sorted and pays close attention to list block according to user to be sorted, obtains user's to be sorted concern user, and user's to be sorted concern user is transferred to and calculates user to be sorted field degree of membership module; Calculate user to be sorted field degree of membership module and calculate the field degree of membership of user to be sorted for every field according to the field prototype user set of expansion and user's to be sorted concern user, and this user's to be sorted field degree of membership is transferred to each user by degree of membership sequence field module; Each user is carried out to descending sort by degree of membership sequence field module to this user to be sorted field degree of membership, and the user to be sorted field degree of membership after sequence is transferred to and gets a front A field as user field label model, finally obtain user's to be sorted interest worlds label.
2. a kind of user field recognition device based on linked network according to claim 1, it is characterized in that, described calculating kind child user follower field degree of membership module, according to field degree of membership M (f, S on following formula calculating initial seed user's corresponding i the field of concern user i):
M ( f , S i ) = n ( f , S i ) / Σ i = 1 N n ( f , S i ) , i = 1 , . . . , N
Wherein, f represents initial seed user's concern user, and N is field number, and i represents i field, S irepresent the initial seed user set in i field, n (f, S i) represent initial seed user concern user f by S set ithe number of times that middle user pays close attention to.
3. a kind of user field recognition device based on linked network according to claim 1, it is characterized in that, described calculating user to be sorted field degree of membership module, calculates field degree of membership M (u, the P in corresponding i the field of user u to be sorted according to following formula i):
M ( u , P i ) = n ( u , P i ) / Σ i = 1 N n ( u , P i ) , i = 1 , . . . , N
Wherein, u represents user to be sorted, and N is field number, and i represents i field, P irepresent the field prototype user set of the expansion in i field, n (u, P i) represent that in the concern user of user u to be sorted, appearing at user gathers P iin quantity.
4. application rights requires the user field recognition methods of a kind of user field recognition device based on linked network described in 1, it is characterized in that, comprises the steps:
Step 1: collecting sample module gathers the initial seed user of every field in the network platform by hand, the information of collection is every field initial seed user id and user field label;
Step 2: obtain kind of a child user concern list block with operation reptile/request API and collect the initial seed user's of every field concern user list;
Step 3: the linked network that builds all kinds of child users, concrete mode is: using each user as a node, user comprises initial seed user and initial seed user's concern user, and the concern relation between every two users represents with a directed edge, and that arrow is pointed is the person of being concerned;
With calculating seed user follower field degree of membership module, the concern user of the initial seed user based on probability model calculating every field is with respect to the field degree of membership of every field;
Step 4: for each field, by degree of membership sequence line module, all initial seed users' concern user's field degree of membership value is carried out to descending sort in every field;
Step 5: use the number of users of the field prototype user collection modules expansion every field of obtaining expansion, build the field prototype user set of new expansion;
Step 6: operation reptile/request API obtains user to be sorted and pay close attention to list block and obtain user's to be sorted concern user list;
Step 7: for each user to be sorted, use calculating user to be sorted field degree of membership module to calculate this user's to be sorted field degree of membership, specifically: according to the number of links of this user to be sorted and every field prototype user foundation, calculate the field degree of membership of this user to be sorted with respect to every field;
Step 8: each user is sorted to field module to each user to be sorted by degree of membership, descending sort is carried out to according to this user's to be sorted field degree of membership in all N field;
Step 9: get a front A field as user field label model using the interest worlds label of selecting front A field that user's to be sorted field degree of membership is the highest as user to be sorted.
5. user according to claim 4 field recognition methods, it is characterized in that, operation reptile/request API described in step 2 obtains kind of an operating process for child user concern list block: write reptile program, the initial seed user id of the every field gathering according to manual collecting sample module, enter each kind of child user and pay close attention to original list, obtain all concern users' id; Or the concern fetch interface that utilizes API to provide, according to the initial seed user id of every field, all follower id of request user.
6. user according to claim 4 field recognition methods, it is characterized in that, the linked network of all kinds of child users of the structure described in step 3, specifically: f represents user, is specially certain initial seed user's concern user, i represents i field, i=1,2..., N, N is field number, S irepresent the initial seed user set in i field, M (f, S i) pay close attention to the field degree of membership of user f on i field, use n (f, S i) represent initial seed user concern user f by S set ithe number of times that middle user pays close attention to, pays close attention to field degree of membership M (f, the S of user f so i) calculate with following formula:
M ( f , S i ) = n ( f , S i ) / Σ i = 1 N n ( f , S i ) , i = 1 , . . . , N .
7. user according to claim 4 field recognition methods, it is characterized in that, the field prototype user set of the expansion that the structure described in step 5 is new, concrete methods of realizing is: for i field, in the initial seed user's in this field concern user, choose front K the user with Gao field, this field degree of membership, the initial seed user in a selected K user and this field merges, and the field prototype user who forms the expansion in i field gathers P i, i=1 ..., N, K represents the most representative user's number that each field needs.
8. user according to claim 4 field recognition methods, it is characterized in that, operation reptile/request API described in step 6 obtains the operating process that user to be sorted pays close attention to list block: write reptile program, the user id to be sorted of the every field gathering according to manual collecting sample module, enter each user's to be sorted concern original list, obtain all concern users' id; Or the concern fetch interface that utilizes API to provide, according to the user id to be sorted of every field, all follower id of request user.
9. user according to claim 4 field recognition methods, it is characterized in that, the concrete grammar of the field degree of membership of the calculating user to be sorted described in step 7 is: a given user u to be sorted, is then used following formula to calculate the field degree of membership in corresponding i the field of user u to be sorted:
M ( u , P i ) = n ( u , P i ) / Σ i = 1 N n ( u , P i ) , i = 1 , . . . , N
Wherein, N is field number, and i represents i field, P irepresent the field prototype user set of i field expansion, n (u, P i) represent that in the concern user of user u to be sorted, appearing at user gathers P iin quantity.
CN201310705515.7A 2013-12-19 2013-12-19 Link network based user domain identifying method and device Active CN103761246B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310705515.7A CN103761246B (en) 2013-12-19 2013-12-19 Link network based user domain identifying method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310705515.7A CN103761246B (en) 2013-12-19 2013-12-19 Link network based user domain identifying method and device

Publications (2)

Publication Number Publication Date
CN103761246A true CN103761246A (en) 2014-04-30
CN103761246B CN103761246B (en) 2017-02-08

Family

ID=50528485

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310705515.7A Active CN103761246B (en) 2013-12-19 2013-12-19 Link network based user domain identifying method and device

Country Status (1)

Country Link
CN (1) CN103761246B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573034A (en) * 2015-01-15 2015-04-29 中国联合网络通信集团有限公司 CDR call ticket based user group division method and system
CN104572932A (en) * 2014-12-29 2015-04-29 微梦创科网络科技(中国)有限公司 Method and device for determining interest label
CN106021257A (en) * 2015-12-31 2016-10-12 广州华多网络科技有限公司 Method, device, and system for crawler to capture data supporting online programming
CN106611350A (en) * 2015-10-26 2017-05-03 阿里巴巴集团控股有限公司 Method and device for mining potential user source
CN116842145A (en) * 2023-04-20 2023-10-03 海信集团控股股份有限公司 Domain identification method and device based on city question-answering system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101655868B (en) * 2009-09-03 2012-08-22 中国人民解放军信息工程大学 Network data mining method, network data transmitting method and equipment

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572932A (en) * 2014-12-29 2015-04-29 微梦创科网络科技(中国)有限公司 Method and device for determining interest label
CN104572932B (en) * 2014-12-29 2017-11-24 微梦创科网络科技(中国)有限公司 A kind of determination method and device of interest tags
CN104573034A (en) * 2015-01-15 2015-04-29 中国联合网络通信集团有限公司 CDR call ticket based user group division method and system
CN104573034B (en) * 2015-01-15 2018-03-23 中国联合网络通信集团有限公司 User group's division method and system based on CDR tickets
CN106611350A (en) * 2015-10-26 2017-05-03 阿里巴巴集团控股有限公司 Method and device for mining potential user source
CN106611350B (en) * 2015-10-26 2020-06-05 阿里巴巴集团控股有限公司 Method and device for mining potential user source
CN106021257A (en) * 2015-12-31 2016-10-12 广州华多网络科技有限公司 Method, device, and system for crawler to capture data supporting online programming
CN106021257B (en) * 2015-12-31 2019-10-18 广州华多网络科技有限公司 A kind of crawler capturing data method, apparatus and system for supporting online programming
CN116842145A (en) * 2023-04-20 2023-10-03 海信集团控股股份有限公司 Domain identification method and device based on city question-answering system
CN116842145B (en) * 2023-04-20 2024-02-27 海信集团控股股份有限公司 Domain identification method and device based on city question-answering system

Also Published As

Publication number Publication date
CN103761246B (en) 2017-02-08

Similar Documents

Publication Publication Date Title
CN102279851B (en) Intelligent navigation method, device and system
CN104933622A (en) Microblog popularity degree prediction method based on user and microblog theme and microblog popularity degree prediction system based on user and microblog theme
CN105335491B (en) Behavior is clicked come to the method and system of user's Recommended Books based on user
CN105718579A (en) Information push method based on internet-surfing log mining and user activity recognition
CN104484431B (en) A kind of multi-source Personalize News webpage recommending method based on domain body
CN104077417A (en) Figure tag recommendation method and system in social network
CN106250513A (en) A kind of event personalization sorting technique based on event modeling and system
CN105426514A (en) Personalized mobile APP recommendation method
CN103324666A (en) Topic tracing method and device based on micro-blog data
CN103218400B (en) Based on link and network community user group's division methods of content of text
CN101593200A (en) Chinese Web page classification method based on the keyword frequency analysis
CN104965931A (en) Big data based public opinion analysis method
CN103544188A (en) Method and device for pushing mobile internet content based on user preference
CN104008203A (en) User interest discovering method with ontology situation blended in
CN103226578A (en) Method for identifying websites and finely classifying web pages in medical field
CN105138577A (en) Big data based event evolution analysis method
CN104268230B (en) A kind of Chinese micro-blog viewpoint detection method based on heterogeneous figure random walk
CN101393555A (en) Rubbish blog detecting method
CN110134845A (en) Project public sentiment monitoring method, device, computer equipment and storage medium
CN103761246A (en) Link network based user domain identifying method and device
CN111191099B (en) User activity type identification method based on social media
CN104899229A (en) Swarm intelligence based behavior clustering system
CN103412930A (en) Method for identifying attributes of internet users
CN104933475A (en) Network forwarding behavior prediction method and apparatus
CN102567392A (en) Control method for interest subject excavation based on time window

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant