CN108596444A - The method and device of large scale community network user sampling based on diversification strategy - Google Patents

The method and device of large scale community network user sampling based on diversification strategy Download PDF

Info

Publication number
CN108596444A
CN108596444A CN201810284916.2A CN201810284916A CN108596444A CN 108596444 A CN108596444 A CN 108596444A CN 201810284916 A CN201810284916 A CN 201810284916A CN 108596444 A CN108596444 A CN 108596444A
Authority
CN
China
Prior art keywords
user
properties
representative
sampling
large scale
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810284916.2A
Other languages
Chinese (zh)
Other versions
CN108596444B (en
Inventor
桑维
唐杰
刘德兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201810284916.2A priority Critical patent/CN108596444B/en
Publication of CN108596444A publication Critical patent/CN108596444A/en
Application granted granted Critical
Publication of CN108596444B publication Critical patent/CN108596444B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The method and device for the large scale community network user sampling based on diversification strategy that the invention discloses a kind of, wherein method includes the following steps:Several user representatives are extracted by utility function;Several user representatives are divided into multiple set of properties according to the attribute of each user representative of several user representatives, to obtain the model that set of properties represents degree;The maximum value for obtaining utility function, user is represented to be selected from multiple set of properties;According to user is represented worst group of representative degree is selected using the sampling of diversification strategy.This method can effectively reduce the data scale of network, make being easily processed for data processing scale change, it also contributes to remove no representative user simultaneously, concentrate more valuable user group in research network, and then the accuracy rate of sampling is effectively improved, while also being showed on time complexity very efficient.

Description

The method and device of large scale community network user sampling based on diversification strategy
Technical field
The present invention relates to computer technology and web-information technology field, more particularly to a kind of extensive society based on diversification strategy The method and device of meeting network user's sampling.
Background technology
Currently, user's subset how is found from large scale network can statistically represent whole network, it is society One extremely important problem of meeting network analysis.It can be applied in a variety of applications, such as recommend to learn in wechat public platform Art information, the commending friends in social networks.Sampling for large scale network user is NP (Non- in theory Deterministic Polynomial, np problem) problem.The plan that some selections represent user is proposed in existing research Slightly, but for the not different concrete form of different Sampling Strategies.
In the related art, importance of the statistics layering sampling consideration of proposition to user so that each attribute in sample Distribution is consistent as far as possible with entirety, and Griunded Theory are that one kind emphasizing multifarious Sampling Strategies, and the relevant technologies Have studied a kind of strategy for being similar to political election, being represented by total user election.But the relevant technologies are in social networks User sample problem almost without particular study, and only considered the similitude between node, and be difficult what explanation was selected It is representative in terms of which to represent user.
Invention content
The present invention is directed to solve at least some of the technical problems in related technologies.
For this purpose, an object of the present invention is to provide a kind of, the large scale community network user based on diversification strategy takes out The method of sample, this method can effectively improve the accuracy rate of sampling, while also be showed on time complexity very efficient.
It is another object of the present invention to propose a kind of large scale community network user sampling based on diversification strategy Device.
In order to achieve the above objectives, one aspect of the present invention embodiment proposes a kind of large scale community based on diversification strategy The method of network user's sampling, includes the following steps:Several user representatives are extracted by utility function;According to the several users The several user representatives are divided into multiple set of properties by the attribute of each user representative represented, and degree is represented to obtain set of properties Model;The maximum value for obtaining the utility function represents user to be selected from the multiple set of properties;According to the representative User selects worst group of representative degree using the sampling of diversification strategy.
The method of the large scale community network user sampling based on diversification strategy of the embodiment of the present invention, passes through effectiveness letter Number has also contemplated into the diversity of attribute, and can see the user selected is made that contribution for which set of properties, with And the size of contribution makes being easily processed for data processing scale change, while also having to effectively reduce the data scale of network Help remove no representative user, concentrates more valuable user group in research network, and then effectively improve sampling Accuracy rate, while also showed on time complexity very efficiently.
In addition, the side of the large scale community network user sampling according to the above embodiment of the present invention based on diversification strategy Method can also have following additional technical characteristic:
Further, in one embodiment of the invention, the utility function is:
Wherein, G=(V, E) indicates social networks, and wherein V, which is represented, includes | V | the point set of=N number of user,Table Show and contain | E | the side collection of=M customer relationship, X ∈ Rn×dFor attribute matrix, T is user's subset, λlIt is one and set of properties (Vl,ajl) the relevant positive integer of size, λlDefault value is | Vl|-1, P (T, l) is user's subset T for set of properties (Vl,ajl) Representative degree, VlGather for all users of attribute l, ajlFor an attribute.
Further, in one embodiment of the invention, the set of properties represents the model of degree
Wherein, R (T, vi,ajl) it is in some specific object ajlUpper user's subset T is to user viRepresentative degree, value model It encloses for [0,1], default definition has a line to be connected to v when T interior jointsi, then R (T, vi,ajl) value be 1, otherwise value be 0.
Further, in one embodiment of the invention, if 1≤l≤t, and P (T, l)>0, then all properties group is equal It is represented, has the P (T, l) of relative equilibrium to each set of properties, to avoid set of properties by excessive or too small representative.
In order to achieve the above objectives, another aspect of the present invention embodiment proposes a kind of extensive society based on diversification strategy The device of meeting network user's sampling, including:Abstraction module extracts several user representatives for passing through utility function;Grouping module, The several user representatives are divided into multiple set of properties by the attribute for each user representative according to the several user representatives, To obtain the model that set of properties represents degree;Acquisition module, the maximum value for obtaining the utility function, with from the multiple It is selected in set of properties and represents user;Processing module, for according to it is described represent user using diversification strategy sampling select representative Spend worst group.
The device of the large scale community network user sampling based on diversification strategy of the embodiment of the present invention, passes through effectiveness letter Number has also contemplated into the diversity of attribute, and can see the user selected is made that contribution for which set of properties, with And the size of contribution makes being easily processed for data processing scale change, while also having to effectively reduce the data scale of network Help remove no representative user, concentrates more valuable user group in research network, and then effectively improve sampling Accuracy rate, while also showed on time complexity very efficiently.
In addition, the dress of the large scale community network user sampling according to the above embodiment of the present invention based on diversification strategy Following additional technical characteristic can also be had by setting:
Further, in one embodiment of the invention, the utility function is:
Wherein, G=(V, E) indicates social networks, and wherein V, which is represented, includes | V | the point set of=N number of user,Table Show and contain | E | the side collection of=M customer relationship, X ∈ Rn×dFor attribute matrix, T is user's subset, λlIt is one and set of properties (Vl,ajl) the relevant positive integer of size, λlDefault value is | Vl|-1, P (T, l) is user's subset T for set of properties (Vl,ajl) Representative degree, VlGather for all users of attribute l,a jlFor an attribute.
Further, in one embodiment of the invention, the set of properties represents the model of degree
Wherein, λlDefault value is | Vl|-1, P (T, l) is user's subset T for set of properties (Vl,ajl) representative degree, VlFor All users of attribute l gather, ajlFor an attribute.
Further, in one embodiment of the invention, if 1≤l≤t, and P (T, l)>0, then all properties group is equal It is represented, has the P (T, l) of relative equilibrium to each set of properties, to avoid set of properties by excessive or too small representative.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partly become from the following description Obviously, or practice through the invention is recognized.
Description of the drawings
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments Obviously and it is readily appreciated that, wherein:
Fig. 1 is the side according to the large scale community network user sampling based on diversification strategy of one embodiment of the invention The flow chart of method;
Fig. 2 is the dress according to the large scale community network user sampling based on diversification strategy of one embodiment of the invention The structural schematic diagram set.
Specific implementation mode
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, it is intended to for explaining the present invention, and is not considered as limiting the invention.
The large scale community network based on diversification strategy proposed according to embodiments of the present invention is described with reference to the accompanying drawings The method and device of user's sampling, describe to propose according to embodiments of the present invention first with reference to the accompanying drawings based on diversification strategy The method of large scale community network user sampling.
Fig. 1 is the method for the large scale community network user sampling based on diversification strategy of one embodiment of the invention Flow chart.
As shown in Figure 1, the method for large scale community network user sampling that should be based on diversification strategy includes the following steps:
In step S101, several user representatives are extracted by utility function.
It is understood that for problem of sampling, such as simple random sampling, based on the sampling of figure traversal, based on random trip The sampling walked, these strategy sampling have reference, the pumping that the embodiment of the present invention proposes very much for large-scale consumer sampling User representative is taken to take into account social effectiveness, it is proposed that utility function assesses the user representative selected.
In one embodiment of the invention, utility function is:
Wherein, G=(V, E) indicates social networks, and wherein V, which is represented, includes | V | the point set of=N number of user,Table Show and contain | E | the side collection of=M customer relationship, X ∈ Rn×dFor attribute matrix, T is user's subset, λlIt is one and set of properties (Vl,ajl) the relevant positive integer of size, λlDefault value is | Vl|-1, P (T, l) is user's subset T for set of properties (Vl,ajl) Representative degree, VlGather for all users of attribute l, ajlFor an attribute.
Specifically, main purpose of the embodiment of the present invention is to extract user representative using utility function, and G=can be used (V, E) is indicated, wherein V, which is represented, includes | V | the point set of=N number of user,Expression contains | E |=M customer relationship Side collection.In addition, defined attribute collection A={ aj}J=1...d, wherein d represents the number of attribute.Then, inventive embodiments can obtain Attribute matrix X ∈ Rn×dSo that every a line X of matrixi=[xik]K=1...d, correspond to user viThe property set of ∈ V and xikThen table Show user viIn attribute akOn value.
The embodiment of the present invention can define a function to indicate the representative degree of user's subset T.Specifically, it gives Determine any user subsetWith a user vi, defined function R (T, vi,aj) indicate in some specific object ajOn, T is to vi Representative degree, value range be [0,1].As R (T, vi,ajWhen)=1, in attribute ajOn, user viIt is complete by user's subset T U.S.A represents.Particularly, whenWhen, for arbitrary vi,aj,R(T,vi,aj)=0.Represent the definition of degree function R very Flexibly, in addition this definition mode can also be conveniently added other information.For example, when the embodiment of the present invention considers in network When information, a simple directly method is that each user is considered as an attribute, is each node viAll define one Neighborhood.
Based on defined above, the embodiment of the present invention is proposed selects user representative with utility function Q (G, X, G, T). Wherein, utility function:
Wherein, λlIt is one and set of properties (Vl,ajl) the relevant positive integer of size.
In step s 102, several user representatives are divided into according to the attribute of each user representative of several user representatives more A set of properties, to obtain the model that set of properties represents degree.
In one embodiment of the invention, set of properties represents the model of degree
Wherein, R (T, vi,ajl) it is in some specific object ajlUpper user's subset T is to user viRepresentative degree, value model It encloses for [0,1], default definition has a line to be connected to v when T interior jointsi, then R (T, vi,ajl) value be 1, otherwise value be 0.
It is understood that according to the different attribute of user they are each divided into when choosing user different Group gives model for different groups:
If for attribute ajl, VlIn all users all represented by subset T perfections, claim set of properties (Vl,ajl) complete by T Beautiful represents.
Specifically, for a set of properties (Vl,ajl) (1≤l≤t), user's subset T is defined to set of properties (Vl,ajl) Represent degree as:
If for attribute ajl, VlIn all users all represented by subset T perfections, claim set of properties (Vl,ajl) complete by T Beautiful represents.
In step s 103, the maximum value for obtaining utility function represents user to be selected from multiple set of properties.
It is understood that the embodiment of the present invention solves the maximum value of utility function according to utility function Q (G, X, G, T) Indicate the optimal of representative user selection at this time, problem can be converted into:
Specifically, (1) gives a social networks G=(V, E), wherein V is all users set in network, and E is Side collection indicates the customer relationship in network;(2) user property value matrix X ∈ Rn×d;(3) set of properties set G;(4) user is represented Number k collects T with user is represented;(5) utility function Q, the representative user that user's quantization is selected are integrated into the representative journey on all properties Degree.Former problem can be converted into an optimization problem:
The problem of embodiment of the present invention is proposed for before finds representative user from extensive social networks, and constructs Representative user's Selection Model, it is main it is to be applied be the sampling of diversification strategy, purpose of this strategy is make the representative of selection Diversification.
In step S104, worst group of representative degree is selected using the sampling of diversification strategy according to user is represented.
Further, in one embodiment of the invention, if 1≤l≤t, and P (T, l)>0, then all properties group is equal It is represented, has the P (T, l) of relative equilibrium to each set of properties, to avoid set of properties by excessive or too small representative.
It is understood that assuming for 1 all≤l≤t, P (T, l)>0, i.e., all properties group is all to a certain extent It is represented, " diversification " emphasized herein refers to the P (T, l) for having relative equilibrium to each set of properties, has avoided the occurrence of Some set of properties are by excessive or too small represent.
Specifically, the embodiment of the present invention selects worst one group of representativeness using the sampling of diversification strategy.The present invention is real The user representative's model for applying example selection is mainly the sampling of diversification strategy, and the purpose of this Sampling Strategies is to make the representative of selection Diversification.When k numerical value is smaller, diversification means that the representative selected comes from set of properties as much as possible as possible.Such case It is not discussed in embodiments of the present invention, because under normal circumstances, the representative number that the embodiment of the present invention is chosen can all be more than attribute The number of group, therefore only need to select one from each set of properties and represent and i.e. certifiable covered all set of properties.It is false If for 1 all≤l≤t, P (T, l)>0, i.e. all properties group is all represented to a certain extent, and that emphasizes herein is " more Memberization " refers to the P (T, l) for having relative equilibrium to each set of properties, has avoided the occurrence of some set of properties by excessive or too small It represents.
The size of set of properties may be very crucial.For the more set of properties of number, representing degree generally can be opposite It is larger.The target of the embodiment of the present invention is just to try to avoid such case, for arbitraryThe value phase of P (T, l) To balance.Meanwhile it needing to consider a kind of extreme situation:The size of all properties group all, and the embodiment of the present invention Target is so that all P (T, l), P (T, l)=0 are identical as possible.In this case, all representative users selected gather No better than empty set, because of P (T, l)=0 at this time,It needs to avoid such case when actually selection represents user Generation, the embodiment of the present invention still requires each set of properties that will have certain representative degree, then for diversification strategy Sampling, the utility function provided are as follows:
Wherein λlIt is one and set of properties (Vl,ajl) the relevant positive integer of size, it can generally take λl=| Vl|-1。λl·P The set of properties of (T, l) value minimum is commonly referred to as " worst group of representative degree ".The performance of the utility function is depending on " representative degree is worst The effect of group ".
A kind of efficient calculation is found in addition, difficulties of the embodiment of the present invention are the data huge and complicated for internet Method can efficiently handle large-scale data.The meaning of inventive embodiments is that the sampling algorithm designed is being protected simultaneously It can ensure the dynamic of data while demonstrate,proving certain precision.
Sampling is a kind of effective method for reducing data scale, and basic thought is replaced with a small-scale sample Initial data.The sample selected can be only concerned when actual analysis of the embodiment of the present invention, to a certain extent, selected node More unchecked node has more research significance to the contribution bigger of network.The embodiment of the present invention attempts the thought based on sampling, How research effectively selects user representative from large scale network.Specifically, one large-scale social networks is given, it would be desirable to It provides a sampling algorithm and finds a certain number of users and these users all are used with representing as much as possible in network Family.Further, it is desirable to which sampling algorithm ensures also to ensure certain efficiency while certain precision to meet the dynamic of data.
Further, for the sampling problem in large-scale social network user, the embodiment of the present invention give one it is general Model.The basic thought of method is to propose diversification strategy sampling model by user's sampling problem, is to allow extract sample Example comes from multiple set of properties as far as possible, and devises effective algorithm.Experiment shows that this method is substantially better than other algorithms, There is higher accuracy rate, while also being showed on time complexity very efficient.
To sum up, elementary object of the embodiment of the present invention is a kind of diversification strategy sampling model of design, for large scale network User's sampling algorithm can effectively reduce data scale, be researched and analysed convenient for subsequent, can be in the preprocessing process of data It is used widely.It discusses first and extracts user representative from extensive social networks, then to extracting user representative's difference It defines that set of properties represents degree and discovery represents user, both definition is explained respectively, are next devised polynary Change tactful sampling algorithm, finally test the algorithm that will newly propose and be applied in actual data, experimental result discovery newly proposes Algorithm is substantially better than previous baseline algorithm.
The method of the large scale community network user sampling based on diversification strategy proposed according to embodiments of the present invention, leads to It crosses utility function to have also contemplated into the diversity of attribute, and can see the user selected and which set of properties is made that Contribution, and the size of contribution make being easily processed for data processing scale change to effectively reduce the data scale of network, It also contributes to remove no representative user simultaneously, concentrates more valuable user group in research network, and then effectively The accuracy rate of sampling is improved, while also being showed on time complexity very efficient.
The large scale community network based on diversification strategy for describing to propose according to embodiments of the present invention referring next to attached drawing The device of user's sampling.
Fig. 2 is the device of the large scale community network user sampling based on diversification strategy of one embodiment of the invention Structural schematic diagram.
As shown in Fig. 2, the device 10 of large scale community network user sampling that should be based on diversification strategy includes:Extract mould Block 100, grouping module 200, acquisition module 300 and processing module 400.
Wherein, abstraction module 100 is used to extract several user representatives by utility function.Grouping module 200 is used for basis Several user representatives are divided into multiple set of properties by the attribute of each user representative of several user representatives, to obtain set of properties representative The model of degree.Acquisition module 300 is used to obtain the maximum value of utility function, and user is represented to be selected from multiple set of properties. Processing module 400 be used for according to represent user using diversification strategy sampling select worst group of representative degree.The embodiment of the present invention fills The data scale of network can effectively be reduced by setting 10, make being easily processed for data processing scale change, while also contributing to remove There is no representative user, concentrate more valuable user group in research network, and then effectively improve the accuracy rate of sampling, What is also showed on time complexity simultaneously is very efficient.
Further, in one embodiment of the invention, utility function is:
Wherein, G=(V, E) indicates social networks, and wherein V, which is represented, includes | V | the point set of=N number of user,Table Show and contain | E | the side collection of=M customer relationship, X ∈ Rn×dFor attribute matrix, T is user's subset, λlIt is one and set of properties (Vl,ajl) the relevant positive integer of size, λlDefault value is | Vl|-1, P (T, l) is user's subset T for set of properties (Vl,ajl) Representative degree, VlGather for all users of attribute l, ajlFor an attribute.
Further, in one embodiment of the invention, set of properties represents the model of degree
Wherein, R (T, vi,ajl) it is in some specific object ajlUpper user's subset T is to user viRepresentative degree, value model It encloses for [0,1], default definition has a line to be connected to v when T interior jointsi, then R (T, vi,ajl) value be 1, otherwise value be 0.
Further, in one embodiment of the invention, if 1≤l≤t, and P (T, l)>0, then all properties group is equal It is represented, has the P (T, l) of relative equilibrium to each set of properties, to avoid set of properties by excessive or too small representative.
It should be noted that the aforementioned embodiment of the method to the large scale community network user sampling based on diversification strategy Explanation be also applied for the embodiment based on diversification strategy large scale community network user sampling device, herein It repeats no more.
The device of the large scale community network user sampling based on diversification strategy proposed according to embodiments of the present invention, leads to It crosses utility function to have also contemplated into the diversity of attribute, and can see the user selected and which set of properties is made that Contribution, and the size of contribution make being easily processed for data processing scale change to effectively reduce the data scale of network, It also contributes to remove no representative user simultaneously, concentrates more valuable user group in research network, and then effectively The accuracy rate of sampling is improved, while also being showed on time complexity very efficient.
In the description of the present invention, it is to be understood that, term "center", " longitudinal direction ", " transverse direction ", " length ", " width ", " thickness ", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom" "inner", "outside", " up time The orientation or positional relationship of the instructions such as needle ", " counterclockwise ", " axial direction ", " radial direction ", " circumferential direction " be orientation based on ... shown in the drawings or Position relationship is merely for convenience of description of the present invention and simplification of the description, and does not indicate or imply the indicated device or element must There must be specific orientation, with specific azimuth configuration and operation, therefore be not considered as limiting the invention.
In addition, term " first ", " second " are used for description purposes only, it is not understood to indicate or imply relative importance Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or Implicitly include at least one this feature.In the description of the present invention, the meaning of " plurality " is at least two, such as two, three It is a etc., unless otherwise specifically defined.
In the present invention unless specifically defined or limited otherwise, term " installation ", " connected ", " connection ", " fixation " etc. Term shall be understood in a broad sense, for example, it may be being fixedly connected, may be a detachable connection, or integral;Can be that machinery connects It connects, can also be electrical connection;It can be directly connected, can also can be indirectly connected through an intermediary in two elements The interaction relationship of the connection in portion or two elements, unless otherwise restricted clearly.For those of ordinary skill in the art For, the specific meanings of the above terms in the present invention can be understood according to specific conditions.
In the present invention unless specifically defined or limited otherwise, fisrt feature can be with "above" or "below" second feature It is that the first and second features are in direct contact or the first and second features pass through intermediary mediate contact.Moreover, fisrt feature exists Second feature " on ", " top " and " above " but fisrt feature be directly above or diagonally above the second feature, or be merely representative of Fisrt feature level height is higher than second feature.Fisrt feature second feature " under ", " lower section " and " below " can be One feature is directly under or diagonally below the second feature, or is merely representative of fisrt feature level height and is less than second feature.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office It can be combined in any suitable manner in one or more embodiments or example.In addition, without conflicting with each other, the skill of this field Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples It closes and combines.
Although the embodiments of the present invention has been shown and described above, it is to be understood that above-described embodiment is example Property, it is not considered as limiting the invention, those skilled in the art within the scope of the invention can be to above-mentioned Embodiment is changed, changes, replacing and modification.

Claims (8)

1. a kind of method of the large scale community network user sampling based on diversification strategy, which is characterized in that including following step Suddenly:
Several user representatives are extracted by utility function;
The several user representatives are divided into multiple set of properties according to the attribute of each user representative of the several user representatives, To obtain the model that set of properties represents degree;
The maximum value for obtaining the utility function represents user to be selected from the multiple set of properties;And
According to it is described represent user using diversification strategy sampling select worst group of representative degree.
2. the method for the large scale community network user sampling according to claim 1 based on diversification strategy, feature It is, the utility function is:
Wherein, G=(V, E) indicates social networks, and wherein V, which is represented, includes | V | the point set of=N number of user,Expression contains Have | E | the side collection of=M customer relationship, X ∈ Rn×dFor attribute matrix, T is user's subset, λlIt is one and set of properties (Vl,ajl) The relevant positive integer of size, λlDefault value is | Vl|-1, P (T, l) is user's subset T for set of properties (Vl,ajl) representative Degree, VlGather for all users of attribute l, ajlFor an attribute.
3. the method for the large scale community network user sampling according to claim 2 based on diversification strategy, feature It is, the set of properties represents the model of degree
Wherein, R (T, vi,ajl) it is in some specific object ajlUpper user's subset T is to user viRepresentative degree, value range is [0,1], default definition have a line to be connected to v when T interior jointsi, then R (T, vi,ajl) value be 1, otherwise value be 0.
4. the method for the large scale community network user sampling according to claim 3 based on diversification strategy, feature It is, if 1≤l≤t, and P (T, l)>0, then all properties group represented, to each set of properties have relative equilibrium P (T, L), to avoid set of properties by excessive or too small representative.
5. a kind of device of the large scale community network user sampling based on diversification strategy, which is characterized in that including:
Abstraction module extracts several user representatives for passing through utility function;
Grouping module, the attribute for each user representative according to the several user representatives divide the several user representatives For multiple set of properties, to obtain the model that set of properties represents degree;
Acquisition module, the maximum value for obtaining the utility function represent user to be selected from the multiple set of properties;With And
Processing module, for according to it is described represent user using diversification strategy sampling select worst group of representative degree.
6. the device of the large scale community network user sampling according to claim 5 based on diversification strategy, feature It is, the utility function is:
Wherein, G=(V, E) indicates social networks, and wherein V, which is represented, includes | V | the point set of=N number of user,Expression contains | E | the side collection of=M customer relationship, X ∈ Rn×dFor attribute matrix, T is user's subset, λlIt is one and set of properties (Vl,ajl) big Small relevant positive integer, λlDefault value is | Vl|-1, P (T, l) is user's subset T for set of properties (Vl,ajl) representative degree, Vl Gather for all users of attribute l, ajlFor an attribute.
7. the device of the large scale community network user sampling according to claim 6 based on diversification strategy, feature It is, the set of properties represents the model of degree
Wherein, R (T, vi,ajl) it is in some specific object ajlUpper user's subset T is to user viRepresentative degree, value range is [0,1], default definition have a line to be connected to v when T interior jointsi, then R (T, vi,ajl) value be 1, otherwise value be 0.
8. the device of the large scale community network user sampling according to claim 7 based on diversification strategy, feature It is, if 1≤l≤t, and P (T, l)>0, then all properties group represented, to each set of properties have relative equilibrium P (T, L), to avoid set of properties by excessive or too small representative.
CN201810284916.2A 2018-04-02 2018-04-02 Method and device for sampling large-scale social network users based on diversified strategies Active CN108596444B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810284916.2A CN108596444B (en) 2018-04-02 2018-04-02 Method and device for sampling large-scale social network users based on diversified strategies

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810284916.2A CN108596444B (en) 2018-04-02 2018-04-02 Method and device for sampling large-scale social network users based on diversified strategies

Publications (2)

Publication Number Publication Date
CN108596444A true CN108596444A (en) 2018-09-28
CN108596444B CN108596444B (en) 2021-06-29

Family

ID=63625174

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810284916.2A Active CN108596444B (en) 2018-04-02 2018-04-02 Method and device for sampling large-scale social network users based on diversified strategies

Country Status (1)

Country Link
CN (1) CN108596444B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831219A (en) * 2012-08-22 2012-12-19 浙江大学 Coverable clustering algorithm applying to community discovery
US20140180977A1 (en) * 2012-12-21 2014-06-26 Nec Laboratories America, Inc. Computationally Efficient Whole Tissue Classifier for Histology Slides
CN105976207A (en) * 2016-05-11 2016-09-28 山东大学 Information search result generation method and system based on multi-attribute dynamic weight distribution
CN106372072A (en) * 2015-07-20 2017-02-01 北京大学 Location-based recognition method for user relations in mobile social network
US20170124652A1 (en) * 2015-11-02 2017-05-04 Andrew Macleod Beaven Portfolio optimization by means of delta ratio quantified estimation error
CN106875278A (en) * 2017-01-19 2017-06-20 浙江工商大学 Social network user portrait method based on random forest

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831219A (en) * 2012-08-22 2012-12-19 浙江大学 Coverable clustering algorithm applying to community discovery
US20140180977A1 (en) * 2012-12-21 2014-06-26 Nec Laboratories America, Inc. Computationally Efficient Whole Tissue Classifier for Histology Slides
CN106372072A (en) * 2015-07-20 2017-02-01 北京大学 Location-based recognition method for user relations in mobile social network
US20170124652A1 (en) * 2015-11-02 2017-05-04 Andrew Macleod Beaven Portfolio optimization by means of delta ratio quantified estimation error
CN105976207A (en) * 2016-05-11 2016-09-28 山东大学 Information search result generation method and system based on multi-attribute dynamic weight distribution
CN106875278A (en) * 2017-01-19 2017-06-20 浙江工商大学 Social network user portrait method based on random forest

Also Published As

Publication number Publication date
CN108596444B (en) 2021-06-29

Similar Documents

Publication Publication Date Title
Stehman Sampling designs for accuracy assessment of land cover
CN103678635B (en) Online music aggregation recommendation method based on label directed graph
Bach et al. Towards unambiguous edge bundling: Investigating confluent drawings for network visualization
Mouchet et al. Towards a consensus for calculating dendrogram‐based functional diversity indices
Shi et al. Citing for high impact
CN111460311A (en) Search processing method, device and equipment based on dictionary tree and storage medium
Jamakovic et al. On the relationships between topological measures in real-world networks
CN107944035A (en) A kind of image recommendation method for merging visual signature and user's scoring
CN105893641A (en) Job recommending method
CN104077415B (en) Searching method and device
CN103823888A (en) Node-closeness-based social network site friend recommendation method
CN104933624A (en) Community discovery method of complex network and important node discovery method of community
CN103888541A (en) Method and system for discovering cells fused with topology potential and spectral clustering
CN107391670A (en) A kind of mixing recommendation method for merging collaborative filtering and user property filtering
CN109543708A (en) Merge the mode identification method towards diagram data of topological characteristic
CN109978042A (en) A kind of adaptive quick K-means clustering method of fusion feature study
Leydesdorff et al. Mapping the Chinese Science Citation Database in terms of aggregated journal–journal citation relations
CN106227881A (en) A kind of information processing method and server
CN107358534A (en) The unbiased data collecting system and acquisition method of social networks
CN110069500A (en) A kind of non-relational database dynamic hybrid index method
CN108596444A (en) The method and device of large scale community network user sampling based on diversification strategy
CN106126681A (en) A kind of increment type stream data clustering method and system
CN104766091B (en) Space and spectrum synergetic structure and multiple dimensioned understanding method, the system of remote sensing image
Hassan et al. Mace: A dynamic caching framework for mashups
CN107615188A (en) Control logic illustrates analysis apparatus and control logic figure analytic method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant