Detailed Description
In order to make those skilled in the art better understand the technical solutions in one or more embodiments of the present disclosure, the technical solutions in one or more embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in one or more embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all embodiments. All other embodiments that can be derived by one of ordinary skill in the art from one or more embodiments of the disclosure without making any creative effort shall fall within the scope of protection of the disclosure.
At least one embodiment of the present specification provides a method for mining a social circle of a user, for example, the social circle may be a fellow of the user, a colleague circle of the user, or a classmate circle of the user. As to what kind of scenario the social circle obtained by mining is applied to, this embodiment of the present specification is not limited, for example, the social circle may be applied to credit evaluation for the user, or the social circle may also be applied to commodity recommendation or friend recommendation for the user.
Fig. 1 illustrates a social circle mining method provided by at least one embodiment of the present specification, which may include:
in step 100, a personal relationship network is constructed.
In this step, the personal relationship network is a network formed by establishing an association relationship between users. The incidence relation between users is very wide, and the establishment of the relation between the users can be different for different companies and different service scenes. For example, one of the association relations may be established by the user performing operations such as transferring money, giving a red packet, adding a friend, and the like, or by the user a charging the mobile phone of the user B, so that the association relation between the user a and the user B can be established.
In the process of constructing the human relationship network, users establishing connection relationship are ensured to know each other as much as possible. Regarding how to judge that users know each other, a plurality of different identification methods can be adopted according to different service scenes, for example, for users with a transfer relation, users with transfer times larger than a certain number of threshold values can be considered to know each other.
In addition, the personal relationship network constructed in this step includes a plurality of users having an association relationship with each other, each user may be used as one of the network nodes in the personal relationship network, and if two network nodes can be connected by a connecting edge, it indicates that the users corresponding to the nodes have an association relationship with each other.
The relationship network is equivalent to a database of an association relationship, and when a social circle of a certain user is to be mined, as long as the user is a network node in the relationship network, the social circle of the user can be mined based on the relationship network. For example, assuming that the relationship network constructed in this step includes user a, user B, and user C, when the social circle of user a is to be mined to serve as a basis for credit evaluation of user a, the social circle of user a may be mined based on the relationship network; similarly, when the social circle of the user B is to be mined, the mining may also be performed based on the relationship network.
Assuming that the social circle of the user C, which is called a target user, is to be mined currently, the following steps 102 to 108 are continued, so that the social circle of the user C can be mined based on the relationship network.
In addition, the human relationship network of the step is used as a basis for social circle mining, and can be continuously updated so as to be more complete. For example, the wrong user association in the personal relationship network may be corrected, or a newly found user association may be supplemented into the network.
In step 102, a local network to which the target user belongs is extracted from the context relationship network.
In this step, the target user is one of the network nodes of the personal relationship network.
The human-vein relationship network constructed in step 100 may be very large, for example, the network includes billions of nodes and billions of edges, and if the community division is directly performed on the whole network, not only is the resource consumption large, but also the effect may be not ideal. Therefore, in this step, the local network to which the target user belongs is extracted from the personal relationship network, the local network is a network including the target user, the network is a part of the personal relationship network, and any network node in the network has direct or indirect association with the network node corresponding to the target user.
For example, the local network may be an N degree local network of the target user, N being a natural number. And the edge nodes in the N-degree local network are connected with the starting point node through N continuous connecting edges, and the starting point node is the network node corresponding to the target user.
The following examples are local networks when N-1 and N-2, and so on when N is greater than 2.
Fig. 2 illustrates a one-degree local network of the network node 21 corresponding to the target user, and the mining process of the one-degree local network takes the network node 21 corresponding to the target user as a starting point to obtain each neighboring node directly connected to the starting point, for example, the neighboring node may include a node 22, a node 23, a node 24, and the like, and the neighboring node serves as a one-degree neighboring node of the starting point. Also, the one-degree neighbor node has a connection edge with the starting point, for example, a connection edge L1 exists between the node 22 and the node 21, and a connection edge L2 exists between the node 23 and the node 21. In addition, there may also be a connecting edge between each one-degree neighbor node, for example, there is a connecting edge L3 between the node 22 and the node 23. The network shown in fig. 2, which is composed of one-degree neighbor nodes and a starting point, may be used as a one-degree local network of a target user.
Fig. 3 illustrates a two-degree local network of the network node 21 corresponding to the target user, and each first-degree neighbor node is taken as a starting point to acquire each neighbor node directly connected to the first-degree neighbor node as a two-degree neighbor node of the target user. For example, the node 25 is a neighbor node to which the one-degree neighbor node 26 is directly connected, and may be referred to as a two-degree neighbor node. The node 27 is a neighbor node to which the one-degree neighbor node 23 is directly connected, and is also a two-degree neighbor node. The network formed by the network node 21 and its first-degree neighbor node and second-degree neighbor node may be referred to as a second-degree local network of the target user.
When N in the N-degree local network is a natural number greater than 2, the extraction manner is similar to that of the first-degree and second-degree local networks, for example, when N is 3, the neighbor nodes directly connected to the respective second-degree neighbor nodes are obtained on the basis of the second-degree local network, and the third-degree neighbor nodes are obtained. And on the basis of the two-degree local network, the three-degree neighbor nodes and the connecting edges between the nodes are added, so that the three-degree local network can be obtained. The extraction modes of other local networks with N values are similar and are not described in detail.
As described above, for the generation of the N-degree local network when N is greater than 1, the generation may be performed as follows: when N is equal to i, i is a natural number greater than 1, then after obtaining the i-1 degree local network of the target user, the method further includes: taking each i-1 degree neighbor node as a starting point, acquiring each neighbor node directly connected with the i-1 degree neighbor node as an i degree neighbor node of the target user; and taking a network formed by the i-1 degree local network and the i degree neighbor nodes as the i degree local network of the target user.
In step 104, community division is performed on the local network to obtain at least one community network.
Each local network is actually a network formed by the contacts of one user in a mutual connection mode. Each user will recognize different people at different life stages, and there is a rule: people who know in the same stage often know each other, for example, people who know in the middle school stage are classmates of the same class, and people who know each other in the same class; people who know in different stages are rarely known among themselves, and colleagues in middle school stage and colleagues in working stage are mostly unknown. The rule is changed on the network structure that the contact persons in the same stage often form a community, and the contact persons in different stages belong to different communities respectively.
In this step, at least one community included in the local network may be identified by using a community discovery algorithm, where each community may correspond to a user relationship group, for example, a relationship between users of a community is a relationship between friends and relatives, or a relationship between users of a community is a relationship between colleagues, or a relationship between users of a community is a relationship between classmates. The Community discovery (Community Detection) algorithm is used to discover a Community structure in a network, and may also be regarded as a clustering algorithm, and the Community discovery algorithm may be of various types, for example, a Louvain clustering algorithm, a Fast Unfolding algorithm, and the like.
In terms of life common sense, in adjacent life stages, some contacts are often repeated, for example, the senior high school classmates B and A of the user A check into the same college, so that the B is in the senior high school classmates community of the A and the college classmates community of the A. This phenomenon appears in the network structure that one node belongs to a plurality of communities. Therefore, for the at least one identified community, an overlapping community discovery algorithm may be further used to identify overlapping network nodes that repeatedly appear in multiple communities, and allocate the overlapping network nodes to the multiple communities to which the overlapping network nodes belong, so as to obtain at least one community network.
For example, referring to FIG. 4, the node 41 in FIG. 4 is located in both the community S1 and the community S2, and the node 41 may be referred to as an overlay network node. Such overlapping network nodes may be identified using an overlapping community discovery algorithm and may be distributed among multiple communities to which they belong. For example, the resulting community network S1 includes the node 41, and the community network S2 also includes the node 41.
There are also various overlapping community discovery algorithms, such as the COPRA algorithm (a community discovery algorithm based on tag delivery proposed by Gregory in 2010), etc. An overlapping community discovery algorithm is used as follows:
1) for each network node u in the local network G, a connectivity graph partitioning algorithm may be used to perform network segmentation on the local network G to form a plurality of sub-networks. For each sub-network egonet _ i, a new node u _ i is created and connected with all nodes in egonet _ i, and the original node u is deleted.
2) And performing network cutting on the newly generated network by using a connected graph partitioning algorithm to form a plurality of connected subgraphs. The 'connected graph division' is to divide a graph G into a plurality of connected subgraphs, and to ensure that for any two nodes i and j, if and only if a communication path exists between i and j, i and j are in the same subgraph.
3) For each node u _ i in each connected subgraph, mapping to the original node u before its first step. Thus, each connected subgraph is a community network.
After identifying each community network, the type determination of the community network can be continued.
In this example, the type decision may be identified using a type decision model whose input is a community aggregation feature of a community network and whose output is a type of social circle to which the community network belongs.
As follows, a method of training a type decision model, which is a multi-class model, is explained. For example, the type of social circle to which the community network belongs may include: the social circle type identification method is characterized in that a family circle, a classmate circle and a colleague circle identify the type of a social circle to which a community network belongs, and belongs to the multi-classification problem. For example, the type determination model may use a logistic regression, random forest, or the like, and the present example may use a random forest model.
Firstly, determining samples and sample characteristics needed by model training, and then performing model training by using the samples and the sample characteristics to obtain a type judgment model. Therefore, how to obtain a sample, and how to obtain sample characteristics are mainly described as follows.
1) Sample generation:
the sample generation is to obtain a social circle, such as a classmate circle or a friend circle.
For example, the sample may be obtained by manual marking in a questionnaire manner, and the relationship between the plurality of users may be obtained by questionnaire, and the relationship may be relatives and friends, or may be classmates.
As another example, the samples may also be obtained based on an assumption. This assumption may be: if the user and most users in a certain social circle are colleagues, the social circle is the colleague circle of the user, and so on. Based on this assumption, only relatives, co-workers, or classmates of the relationship-to-granularity need to be taken, and samples of the relationship circle granularity can be generated. How to obtain the relationship type data of the relationship degree has a plurality of different sources, taking a payment treasure as an example, a plurality of relations between relatives and friends exist in close payment service, a plurality of relations between students exist in campus card recharging service, and the like.
2) Sample characteristics:
after obtaining the social circle sample, the sample characteristics can be obtained as follows:
first, at least one base feature is determined for use as a basis for identifying a social circle type.
For example, the user's basic features may include: age, gender, family name, school, address, household location, etc. Each of which may be referred to as a base feature, each user in a social circle may have the base feature, such as the age of the user.
Secondly, for each basic feature, feature aggregation is carried out on the basic features of each user in the community network, and user aggregation features corresponding to the basic features are obtained.
For example, on determination of the type of social circle, a group feature of the granularity of the social circle may be extracted.
For example, the basic features of each user in the community network may be feature aggregated to obtain the user aggregated features. Taking age as an example, the ages of the users can be aggregated, and the aggregation manner includes but is not limited to: mode, variance, and information entropy.
Mode: mode is a statistical term that represents the average level of data with a statistically significant number of central trend points.
Variance: variance is a measure of the degree of dispersion of a random variable or a set of data.
Information entropy: information entropy is a measure of the degree of systematic misordering, and in general, what symbols a system sends out are uncertain, and it can be measured according to the probability of its occurrence. The probability is high, the occurrence chance is many, and the uncertainty is small; otherwise, it is large.
Taking age as an example, the ages of all users in a social circle form a set:
S={age1,age2,……ageN};
the Mode of the age refers to the age with the largest occurrence number in the set S;
the Variance of age (Variance) refers to the square of the difference between each age value and the average age value
The information Entropy of age, Encopy, refers to the degree of confusion of age, and is calculated as follows:
wherein p isiIs the ratio of the number of occurrences of each age in the set S
Each of the above-described mode of the age, the variance of the age, and the information entropy of the age may be referred to as one "user aggregation feature" corresponding to the base feature "age". The number of the user aggregation features corresponding to one basic feature may be at least one, for example, in the above example, the user aggregation features corresponding to the "age" of the basic feature include three: the mode of the age, the variance of the age, and the information entropy of the age. Similarly, other basic features may also correspond to at least one user aggregation feature.
And thirdly, using the set of user aggregation characteristics corresponding to each basic characteristic as the community aggregation characteristics. Taking the example that the basic features include "age" and "surname", then the set of user aggregation features, such as the mode of age, the variance of age, the entropy of age, the mode of surname, the variance of surname, and the entropy of surname, may be referred to as a community aggregation feature.
The community aggregation feature can be used as a sample feature of a social circle and can be used as a model input in the type recognition of the social circle, and accordingly the model is trained.
The type judgment model can be obtained by pre-training, and the trained model can be applied to the social circle mining process of the target user. After each community network is identified, the type of the social circle to which the community network belongs can be identified by utilizing the pre-trained type judgment model.
In step 106, for each community network, aggregating the basic features of each user in the community network to obtain a community aggregation feature corresponding to the community network.
For example, at least one base feature for use as a basis for identifying a social circle type may be determined; for each basic feature, performing feature aggregation on the basic features of each user in the community network to obtain a user aggregation feature corresponding to the basic feature; the number of the user aggregation characteristics corresponding to one basic characteristic is at least one; and taking the set of the user aggregation characteristics corresponding to each basic characteristic as the community aggregation characteristics.
In step 108, the community aggregation feature is used as an input parameter, and a type judgment model obtained through pre-training is input to obtain a social circle type to which the community network belongs.
For example, assuming that the local network of the target user identifies three community networks, the type of each community network can be identified by the type determination model, and the type may be a fellow or a classmate circle.
The social circle mining method combines network structure characteristics, applies the technologies of community discovery, group characteristic mining and the like, and identifies various types of social circles of relatives, friends and the like of the user according to the group characteristics of the community, so that the mining of social relations is deeper and more specific, and the mining of the social circles of the user is more accurate.
Corresponding to the above social circle mining method, fig. 5 is a schematic structural diagram of a social circle mining device provided in at least one embodiment of the present specification, where the social circle mining device is used for mining a social circle of a target user in a personal relationship network, and the target user is one of network nodes in the personal relationship network; the apparatus may include: a network extraction module 51, a community division module 52, a feature obtaining module 53 and a type discrimination module 54.
A network extraction module 51, configured to extract a local network to which a target user belongs from a personal relationship network;
the community division module 52 is configured to perform community division on the local network to obtain at least one community network;
the feature obtaining module 53 is configured to, for each community network, aggregate basic features of each user in the community network to obtain a community aggregation feature corresponding to the community network;
and a type discrimination module 54, configured to input a pre-trained type determination model using the community aggregation feature as an input parameter, so as to obtain a social circle type to which the community network belongs.
In an example, the network extracting module 51 is specifically configured to extract an N-degree local network of the target user from the personal relationship network, where N is a natural number; and the edge nodes in the N-degree local network are connected with the starting point node through N continuous connecting edges, and the starting point node is the network node corresponding to the target user.
In one example, the community partitioning module 52 is specifically configured to: identifying at least one community included in the local network using a community discovery algorithm; for the at least one community obtained by identification, identifying overlapping network nodes which repeatedly appear in a plurality of communities by using an overlapping community discovery algorithm; and distributing the overlapped network nodes in a plurality of communities to which the overlapped network nodes belong to obtain at least one community network.
In an example, the feature obtaining module 53 is specifically configured to: determining at least one basic feature for being a basis for identifying the type of the social circle; for each basic feature, performing feature aggregation on the basic features of each user in the community network to obtain a user aggregation feature corresponding to the basic feature; the number of the user aggregation characteristics corresponding to one basic characteristic is at least one; and taking the set of the user aggregation characteristics corresponding to each basic characteristic as the community aggregation characteristics.
The apparatuses or modules illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the modules may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present description.
The execution order of the steps in the flows shown in the above-described figures is not limited to the order in the flow charts. Furthermore, the description of each step may be implemented in software, hardware or a combination thereof, for example, a person skilled in the art may implement it in the form of software code, and may be a computer executable instruction capable of implementing the corresponding logical function of the step. When implemented in software, the executable instructions may be stored in a memory and executed by a processor in the device.
For example, one or more embodiments of the present disclosure also provide a social circle mining apparatus corresponding to the above method. The apparatus may include a processor, a memory, and computer instructions stored on the memory and executable on the processor, the processor being operable to perform the following steps by executing the instructions:
extracting a local network to which a target user belongs from a relationship network; the target user is one of the network nodes of the personal relationship network;
carrying out community division on the local network to obtain at least one community network;
for each community network, aggregating the basic characteristics of each user in the community network to obtain community aggregation characteristics corresponding to the community network;
and inputting the type judgment model obtained by pre-training by taking the community aggregation characteristics as input parameters to obtain the social circle type to which the community network belongs.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
One or more embodiments of the present description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the data processing apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to part of the description of the method embodiment.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The above description is only for the purpose of illustrating the preferred embodiments of the one or more embodiments of the present disclosure, and is not intended to limit the scope of the one or more embodiments of the present disclosure, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the one or more embodiments of the present disclosure should be included in the scope of the one or more embodiments of the present disclosure.