CN114548582B

CN114548582B - Dynamic social network community evolution prediction method, system, storage medium and device

Info

Publication number: CN114548582B
Application number: CN202210192777.7A
Authority: CN
Inventors: 丁静怡; 成若晖; 宋健; 曹小卫; 焦李成; 吴建设
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2022-02-28
Filing date: 2022-02-28
Publication date: 2024-05-31
Anticipated expiration: 2042-02-28
Also published as: CN114548582A

Abstract

The invention discloses a dynamic social network community evolution prediction method, a system, a storage medium and equipment, which are used for establishing an optimization-based time window division mechanism model from the universality of a time window division mechanism, wherein the time frame can adaptively adjust the size and the number of time windows according to a specific network, and the method can improve the quality of network community tracking and ensure the accuracy of prediction under the condition of reducing training sets through experimental verification on a real network.

Description

Dynamic social network community evolution prediction method, system, storage medium and device

Technical Field

The invention belongs to the technical field of social networks, and particularly relates to a dynamic social network community evolution prediction method, a system, a storage medium and equipment.

Background

Social networks refer to relatively stable relationship systems formed by interactions between members or organizations of social individuals. Social networks are typically represented graphically, nodes representing individuals or organizations, and edges representing various social interactions, such as partnerships and friendships. Over time, social networks evolve into dynamic networks based on user activity. For example, a new user represented by a node joins the network and an old user represented by a node stops the activity or leaves the network. At the same time, edges may also be created or disappeared due to interactions between users. Today, almost everyone is part of a different dynamic social network, such as Facebook, twitter. There are many interesting patterns or knowledge in dynamic social networks, making the field of Social Network Analysis (SNA) more and more popular.

The community structure is a group of nodes closely connected in the network and is an important feature of the dynamic social network. With the development of dynamic social networks, different communities experience different events, namely birth, death, growth, shrinkage, merger and splitting. Predicting community key events is an important component of the current research of SNAs. The method for researching the prediction population evolution has practical application value in public safety, public health, marketing and other aspects. In the public safety field, observing population evolution helps identify individuals or populations that support or are prone to crimes.

Existing community evolution prediction methods generally divide a dynamic network into several snapshots of fixed size, also referred to as snapshots. The dynamic graph is represented as a sequence of static snapshots of different time periods. The communities in each snapshot are then identified independently using a community detection algorithm. The community then compares all community combination pairs contained in the neighboring snapshots using a tracking algorithm to match and determine the key event. And finally, finding a community evolution sequence according to the key events identified by the community tracking algorithm, extracting a plurality of characteristics from each community, and establishing a model to predict the next event possibly experienced by the community.

Brodka in its published paper "Community evolution prediction IN DYNAMIC associated networks" (IEEE, 2015) the dataset is divided into disjoint and overlapping time stamps. Communities are then tracked using the GED community tracking method. The classifier is trained by using the historical event types, the community sizes and the current community sizes of the three communities in the past as features, and finally predicts the next key event. Dakiche in its published paper "SENSITIVE ANALYSIS of TIMEFRAME TYPE AND Size Impact on Community Evolution Prediction" the network is manually partitioned according to network activity. The network snapshots they divide have different sizes and overlapping rates. They set a larger time frame during periods of less network activity and a smaller time frame of higher overlap rate during periods of less network activity. However, the manual method is not easy to popularize and has no strict theoretical support. Later Dakiche et al invented a method to find the time frame sizes of different networks. The snapshot size of the method is estimated based on the number of nodes that appear, disappear, and remain in two consecutive independent snapshots. However, the fixed overlap rate between snapshots means that this partitioning approach is not perfect. Dakiche et al propose a new event prediction custom network split framework (TNSEP). Their frames count the number of edges at different times. The start time and end time of snapshots of different sizes are calculated by dividing the edge count of a snapshot into a fixed percentage. However, although Dakiche et al can find an unfixed sequence of time slot sizes based on the distribution of network edges, TNSEP still needs to assume that parameters are already set and that the overlap rate between time slots is fixed. The setting of the overlap rate should also vary with the network itself. During periods of network inactivity, not only a large time frame is set, but the overlap rate must not be too high to waste excessive computing resources. When the network is in an active state, a small, high-overlap time frame should be used to capture more network evolution details, avoiding missing too much information. The size of the series of timeframes and the rate of overlap between them should not be fixed, but should vary with changes in the network. This network partitioning mechanism is what is needed in the art and is most suitable for networks.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a dynamic social network community evolution prediction method, a dynamic social network community evolution prediction system, a dynamic social network community evolution prediction storage medium and dynamic social network community evolution prediction equipment, which are used for establishing an optimization-based time window division mechanism model from the aspect of generality of a time window division mechanism, wherein the time frame can adaptively adjust the size and the number of time windows according to specific networks; through experimental verification on several real networks, the quality of network community tracking can be improved, and the prediction accuracy can be ensured under the condition of reducing training sets.

The invention adopts the following technical scheme:

A dynamic social network community evolution prediction method comprises the following steps:

S1, obtaining an initialized population based on a designed gene coding strategy, randomly selecting two parent chromosomes with the same coding length in the initialized population, generating offspring by using a two-point crossover operator on the parent chromosomes, and carrying out stricture on the offspring by using a set constraint condition; mutation operation is carried out on the generated offspring by using mutation operators, and the offspring are strict based on constraint conditions; based on the fitness function, screening the offspring by using a rapid non-dominant sorting operator and an individual crowding distance operator to obtain an optimal partitioning scheme of the self-adaptive time window;

s2, dividing a time window of the dynamic social network according to the optimal division scheme obtained in the step S1 to obtain network snapshots under each window, and obtaining overlapping social areas in each network snapshot by using a party filtering method;

S3, calculating inclusion degree of every two communities in adjacent snapshots according to the sequence from front to back of the snapshot time stamp of the overlapped community set obtained in the step S2 by using a GED community tracking method to obtain a community evolution sequence of each community, and identifying evolution events according to the community evolution sequence to obtain key events experienced by each sequence;

S4, calculating various topological features of all communities in the community evolution sequence obtained in the step S3; extracting 5 historical community features from communities which accord with at least 5 network snapshots in a community evolution sequence, splicing the community features with 4 experienced evolution events to form one-dimensional features, and taking the evolution event at the next moment of the community as a tag of the features; machine learning is performed on the tagged feature data using RandomForest classifiers, and predictions are made using trained RandomForest classifiers.

Specifically, in step S1, the start time of the time window is used as a gene to form one chromosome, the end time of the time window is used as another chromosome, and the two chromosomes form a complete adaptive time window dividing scheme; the population initialization selects the number of time stamps from 5 to the original data, and 5 random individuals are generated for each length to finish the problem of window number self-adaption; the method comprises the following steps:

a dynamic social network is partitioned into T, Each set of adaptive timestamps is a set of individuals containing two chromosomes, representing the start and end times of all windows, respectively; constraints for generating the initial individual are as follows:

wherein, Defining an adaptive timeframe starting from the dynamic network start,Defining the end time of the adaptive time period as the end time of the dynamic social network,AndIt is provided that the latter time period does not completely contain the former time period, that no missing network exists between the former time period and the latter time period, and that adjacent time windows overlap or do not overlap by any length,

Specifically, in step S1, the fitness function is as follows:

F2＝δ²(Fl)

wherein F1 is the expected value and the set value error of fluctuation, τ is the number of time windows, delta is between 0 and 1, fl _s is the network fluctuation between two adjacent continuous independent snapshots, fl is the fluctuation set, F2 is the standard deviation of the calculated fluctuation set Fl, For a snapshot of time window T _i, S _i is the number of nodes contained in the snapshot,For the snapshot of time window T _i, S _i+1 is the number of nodes contained in the snapshot, N _a is the number of occurring nodes between two consecutive independent snapshots, N _d is the number of disappearing nodes between two consecutive independent snapshots, and N _r is the number of holding nodes between two consecutive independent snapshots.

Specifically, in step S3, the inclusion degree of every two communities between adjacent snapshotsThe calculation is as follows:

Wherein the first part is the proportion of two community public nodes to the first community, and the NI (x) in the second part is the node x in the graph A node index calculated in (a)Snapshot/>, for time window T _k The community with the reference number i in the middle,Snapshot/>, for time window T _k+1 A community labeled j.

Specifically, in step S3, the key events include formation, survival, merging, splitting; dissolution, growth and shrinkage.

Specifically, in step S4, various topological features of all communities are as follows:

Wherein x ₁…x₇ is the snapshot corresponding to the time window T _k Middle communityIs the event that the community happens at the next moment,/>, is the multiple topological features ofAnd the community features are obtained by splicing the features and the events.

Specifically, in step S4, the data format of the one-dimensional feature is:

wherein, The method is a community characteristic of the community at the moment T _k-4,T_k-3…T_k in a community evolution sequence.

In a second aspect, an embodiment of the present invention provides a dynamic social network community evolution prediction system, which is characterized in that the system includes:

the division module obtains an initialization population based on a designed gene coding strategy, randomly selects two parent chromosomes with the same coding length in the initialization population, generates offspring by using a two-point crossover operator on the parent chromosomes, and performs stricture on the offspring by using a set constraint condition; mutation operation is carried out on the generated offspring by using mutation operators, and the offspring are strict based on constraint conditions; based on the fitness function, screening the filial generation by using a rapid non-dominant sequencing operator and an individual crowding distance operator to obtain an optimal partitioning scheme of the self-adaptive time window;

The community module is used for dividing a time window of the dynamic social network according to the optimal division scheme obtained by the division module to obtain network snapshots under each window, and obtaining overlapping communities in each network snapshot by using a party filtering method;

The tracking module is used for calculating inclusion degree of every two communities in adjacent snapshots according to the sequence from front to back of the snapshot time stamp of the overlapped community set obtained by the community module by using a GED community tracking method to obtain a community evolution sequence of each community, and identifying evolution events according to the community evolution sequence to obtain key events experienced by each sequence;

The prediction module is used for calculating various topological characteristics of all communities in the community evolution sequence obtained by the tracking module; extracting 5 historical community features from communities which accord with at least 5 network snapshots in the community evolution sequence, splicing the community features with 4 experienced evolution events to form one-dimensional features, and taking the evolution event at the next moment of the communities as a tag of the features; machine learning is performed on the tagged feature data using RandomForest classifiers, and predictions are made using trained RandomForest classifiers.

In a third aspect, a computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the dynamic social network community evolution prediction method described above when the computer program is executed.

In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium, including a computer program, where the computer program when executed by a processor implements the steps of the dynamic social network community evolution prediction method described above.

Compared with the prior art, the invention has at least the following beneficial effects:

The invention discloses a dynamic social network community evolution prediction method, which is characterized in that an optimization-based time window division mechanism model is established from the universality of a time window division mechanism, and the time frame can adjust the size and the number of time windows according to the self-adaption of a specific network; the size and the overlapping rate of a time frame can be selected according to network activities in different periods; the information loss in tracking the community evolution tracking is reduced, and the quality of the community evolution sequence tracked by the tracking algorithm is further improved; compared with the current mainstream prediction model, the community evolution prediction model can achieve better prediction precision.

Furthermore, the invention adopts the gene coding strategy to optimize the time frame division problem, uses two chromosomes to cooperate, one represents the window period time and the other represents the window end time, can represent all different time window division schemes, ensures the completeness of the initial population, and perfectly enables the chromosomes to accord with the research in the dynamic social network field by combining the constraint function of the invention.

Furthermore, the invention provides a new fitness function, which is specially aimed at the field of dynamic social network. In most studies, networks are arbitrarily divided into a series of time frames with fixed overlap rates and sizes, usually in months or years, but their sizes and overlap rates are not necessarily suitable for current networks. But different dynamic networks have their own characteristics and structures, and the networks also exhibit different activity states during different periods of the network. As a first step of dynamic network analysis, the selection of the size and overlapping rate of the time frames is important, if too large a time window is used in a fast active network period, information and events can be lost, and if fine and dense time windows are used in an inactive network period, resources can be wasted, which directly affects the follow-up community tracking result and final prediction accuracy. The adaptability function designed by the invention estimates the activity state of the dynamic social network by using the number of the nodes which appear, disappear and remain in the adjacent network, and the multi-target adaptability function ensures that each time window follows the activity state of the current network, and ensures the consistency between the time windows.

Furthermore, the method for calculating the inclusion degree among communities uses a more reasonable method for calculating the inclusion degree among communities. In most studies, similarity calculation for pairwise communities only considers the ratio of the number of common nodes. However, the GED method considers both the number of community members and the quality of the community members. The quantity is reflected by the first part of the formula and the quality is reflected by the second part of the formula, i.e. how much the contribution of the important members is. A balance is guaranteed between a large community of most less important members and a small community of only a few but important members.

Furthermore, the invention can realize the identification of 7 community evolution events, and the identified events are more accurate according to the number of nodes among communities and the inclusion degree obtained by quality calculation. The final community evolution sequence has higher quality.

Furthermore, the invention considers various community topological characteristics, and has the characteristics of the whole graph layer, the characteristics of the substructures (communities) layer and the characteristics of the node layer for the hierarchy of the graph network structure. Most current research only considers community-level features, but ignores the feature richness. The method not only comprises common community level features such as community size and density features, but also comprises node level features such as average clustering coefficient and average degree, so that the richness of the features is greatly increased.

Furthermore, the invention considers using the community evolution event at the next moment as a label and uses a plurality of historical community features to combine, so that the number of the features can be increased, the diversity of the features is improved, and the robustness of the algorithm is ensured. According to the method, the evolution event contained in the community evolution sequence process is spliced as a feature, so that the community feature is expanded.

In summary, the invention obtains the time window division scheme which is most suitable for the unfixed size and the overlapping rate of the network data through evolution learning, so as to improve the community tracking quality and the event prediction accuracy.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

FIG. 1 is a flowchart of the present invention, wherein (a) is an exemplary diagram of dividing time windows, community discovery, evolution tracking for raw data, and (b) is a schematic diagram of each step;

FIG. 2 is a schematic diagram of a conventional time frame and an adaptive time frame, wherein (a) is a disjoint time frame, (b) is an overlapping time frame, and (c) is an adaptive time frame;

FIG. 3 is a schematic diagram of crossover operators according to the present invention, wherein (a) is an example graph of parent chromosomes, each group of chromosomes represents a time window partitioning scheme, (b) is an example graph of two-point crossover of the chromosome at the start time, (c) is an example graph of partial mapping crossover and collision detection of the chromosome at the end time, and (d) is a diagram of a stricture example of offspring;

FIG. 4 is a schematic diagram illustrating community evolution event identification according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art without making any inventive effort, are intended to be within the scope of the present invention.

In the description of the present invention, it will be understood that the terms "comprises" and "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations, e.g., a and/or B, may represent: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

It should be understood that although the terms first, second, third, etc. may be used to describe the preset ranges, etc. in the embodiments of the present invention, these preset ranges should not be limited to these terms. These terms are only used to distinguish one preset range from another. For example, a first preset range may also be referred to as a second preset range, and similarly, a second preset range may also be referred to as a first preset range without departing from the scope of embodiments of the present invention.

Depending on the context, the word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.

Various structural schematic diagrams according to the disclosed embodiments of the present invention are shown in the accompanying drawings. The figures are not drawn to scale, wherein certain details are exaggerated for clarity of presentation and may have been omitted. The various regions, shapes of layers and relative sizes and positional relationships between them shown in the drawings are merely exemplary, may in practice deviate due to manufacturing tolerances or technical limitations, and one skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions as actually required.

The invention provides a dynamic social network community evolution prediction method, which is used for adaptively adjusting the size and the number of each time window and the overlapping rate between windows according to a specific network. The self-adaptive time frame provided by the invention can improve the quality of community tracking in the dynamic social network and greatly improve the accuracy of community evolution prediction.

Referring to fig. 2, (a) is a disjoint time frame, in which all windows have the same size and do not overlap each other. Although its setting is simple, only the size of the time frame needs to be set. However, due to excessive dead space, many information is missed and only information of limited value is captured for a rapidly changing dynamic network. (b) And overlapping time frames, wherein the overlapping time frames comprise time windows with the same size and fixed overlapping rate. Overlapping time frames allow community tracking algorithms to find more valuable events, but waste resources when the changes inside the network are slow. (c) The self-adaptive time frame is not limited by the fixed window number and size, the activity degree of the network can be better matched, and the quality of community tracking and event prediction can be improved.

Referring to fig. 1, the method for predicting the evolution of a dynamic social network community of the present invention includes the following steps:

S1, self-adaptive time window division;

s101, generating an initialization population based on a designed gene coding strategy;

The gene coding strategy selects the starting time and the ending time of all snapshots using a time window division scheme as chromosomes, and sets a dynamic social network to be divided into T (T= { T ^start,T^end|T^start,T^end∈{t₁,…,t_m }). Each set of adaptive timestamps is designed as a set of individuals containing two chromosomes, representing the start and end times of all windows, respectively. According to the concept of the dynamic network analysis task, the constraint conditions of the initial individual are designed as follows:

wherein, Defining an adaptive timeframe starting from the dynamic network start,The end time of the adaptive time period is defined as the end time of the dynamic social network, the remaining formulas define that the next time period does not completely contain the previous time period, no missing network exists between the previous time period and the next time period, and adjacent time windows may overlap or not overlap any length.

According to the formula, the initial population generates 5 time windows to n of the time stamp of the original data, and 5 random chromosomes meeting constraint conditions under each time window number represent different time window division schemes;

s102, randomly selecting two parent chromosomes with the same coding length from a mating pool;

Referring to fig. 3, the offspring of the selected scheme is generated using a two-point crossover operator. Following chromosomal crossover, unconfinement genes may occur, resulting in partial time frame deletions or window inversions. The invention adjusts offspring based on the constraint condition, and re-randomizes genes in the chromosome which do not meet the constraint condition within the range which meets the constraint condition.

S103, performing mutation operation on the generated offspring within a range conforming to constraint conditions by using a mutation operator;

S104, screening offspring by using a rapid non-dominant sorting operator and an individual crowding distance operator based on a designed fitness function;

the fitness function is as follows:

F2＝δ²(Fl)

For any snapshot in any time frame, the nodes contained therein are The present invention uses the number of nodes of the appearance (N _a), disappearance (N _d), and hold (N _r) of two consecutive independent time snapshots to calculate the fluctuation of the network (fl _s). Where the smaller fl _s, the more static the network tends to be, which makes it difficult to capture critical events such as merging, splitting, shrinking, and expanding during community evolution. Also, the larger fl _s, the faster the network changes, which may lead to some communities developing in a discontinuous manner. In order to aggregate low-activity interactions into the same time frame, while periods of high-interaction are divided into more time frames, to ensure that the network does not waste too much resources during low-activity periods during analysis, and does not ignore informative activities, the adaptive time frames of the present invention are not fixed in size, they may be disjoint or overlapping.

Where F1 represents the expected and set point error of the fluctuation, δ is set between 0 and 1, typically 0.3, to ensure that there is some interaction between adjacent snapshots, but not extreme, and F2 ensures consistent activity of the timestamps at different times by calculating the standard deviation of the fluctuation set Fl.

S105, returning to the step S102 when the iteration times do not meet the set value.

S2, community discovery;

S201, aggregating static networks, namely snapshots, of the dynamic social network in each window according to the obtained self-adaptive window dividing scheme;

S202, a CPM algorithm is used for finding communities in each snapshot.

The party filtering CPM method (clique percolation method) is used to find overlapping communities, and the party (clique) is a collection of vertices where any two points are connected, i.e., a complete subgraph. The nodes in the community are closely connected, the edge density is high, and a party line (clique) is easy to form. Thus, edges inside communities are more likely to form a large complete sub-graph, while edges between communities are less likely to form a large complete sub-graph. Communities are discovered by finding a party in the network. The k-party represents a complete subgraph of k nodes in the network, and if one k-party overlaps with another k-1 node, then the two k-party are connected. The set of all k-derivatives that are in communication with each other is a k-derivative community.

S3, GED community tracking and matching;

s301, calculating the mutual inclusion degree of every two communities between adjacent snapshots AndThe inclusion degree evaluation function not only comprises the number of the shared nodes, but also comprises the social status of the shared nodes;

The calculation formula is as follows:

where NI (x) is a node indicator (e.g., centrality, mediocre, page rank, etc.) that is used to evaluate the importance of nodes in the community.

S302, identifying a key event.

Referring to fig. 4, the ged method supports the discovery of seven key events: forming, surviving, merging, splitting, dissolving, growing and contracting, these events serve as markers to train a community evolution prediction model.

S4, community evolution prediction.

S401, calculating various topological features of all communities in each evolution sequence;

and creating an evolution sequence for the community according to the tracking information of the GED. Each sequence comprising a community Examples and events that occur in the community and in the next frame.

S402, inputting the features of the community evolution sequences containing the first 5 histories into a RandomForest classifier for learning, and taking an event occurring at the next moment of the community as a prediction tag;

The topological characteristics of all communities in the community evolution sequence are calculated, and the included characteristics comprise density, cohesive force, node centrality and the like, and refer to table 1.

TABLE 1 Community topology features for use with the present invention

The structure consisting of the features of the community instance and its events is the final form of the training and testing classifier data of the present invention, as follows:

the method comprises the steps of inputting RandomForest features of a community evolution sequence containing the first 5 histories into a RandomForest classifier for learning, and taking an event occurring at the next moment of the community as a prediction tag, wherein the form is as follows:

S403, inputting the evolution sequence of the event to be predicted into a trained classifier to obtain an event prediction result.

In still another embodiment of the present invention, a dynamic social network community evolution prediction system is provided, which can be used to implement the dynamic social network community evolution prediction method described above, and specifically, the dynamic social network community evolution prediction system includes a partitioning module, a community module, a tracking module, and a prediction module.

The dividing module obtains an initialization population based on a designed gene coding strategy, randomly selects two parent chromosomes with the same coding length in the initialization population, generates offspring by using two-point crossover algorithms for the parent chromosomes, and performs stricture on the offspring by using set constraint conditions; mutation operation is carried out on the generated offspring by using mutation operators, and the offspring are strict based on constraint conditions; based on the fitness function, screening the filial generation by using a rapid non-dominant sorting operator and an individual crowding distance operator to obtain an optimal dividing scheme of the self-adaptive time window;

In yet another embodiment of the present invention, a terminal device is provided, the terminal device including a processor and a memory, the memory for storing a computer program, the computer program including program instructions, the processor for executing the program instructions stored by the computer storage medium. The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application Specific Integrated Circuits (ASIC), off-the-shelf Programmable gate arrays (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., which are computing and control cores of the terminal adapted to implement one or more instructions, in particular adapted to load and execute one or more instructions to implement a corresponding method flow or a corresponding function; the processor of the embodiment of the invention can be used for the operation of the dynamic social network community evolution prediction method, and comprises the following steps:

Obtaining an initialized population based on a designed gene coding strategy, randomly selecting two parent chromosomes with the same coding length in the initialized population, generating offspring by using a two-point crossover operator on the parent chromosomes, and carrying out stricture on the offspring by using set constraint conditions; mutation operation is carried out on the generated offspring by using mutation operators, and the offspring are strict based on constraint conditions; based on the fitness function, screening the filial generation by using a rapid non-dominant sorting operator and an individual crowding distance operator to obtain an optimal partitioning scheme of the self-adaptive time window; dividing a time window of the dynamic social network according to an optimal dividing scheme to obtain network snapshots under each window, and obtaining overlapping communities in each network snapshot by using a party filtering method; calculating inclusion degree of every two communities in adjacent snapshots according to the sequence from front to back of the snapshot time stamp of the obtained overlapped community set by using a GED community tracking method to obtain a community evolution sequence of each community, and identifying evolution events according to the community evolution sequence to obtain key events experienced by each sequence; calculating various topology characteristics of all communities in the community evolution sequence; extracting 5 historical community features from communities which accord with at least 5 network snapshots in the community evolution sequence, splicing the community features with 4 experienced evolution events to form one-dimensional features, and taking the evolution event at the next moment of the communities as a feature tag; machine learning is performed on the tagged feature data using RandomForest classifiers, and predictions are made using trained RandomForest classifiers.

In a further embodiment of the present invention, the present invention also provides a storage medium, in particular, a computer readable storage medium (Memory), which is a Memory device in a terminal device, for storing programs and data. It will be appreciated that the computer readable storage medium herein may include both a built-in storage medium in the terminal device and an extended storage medium supported by the terminal device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also stored in the memory space are one or more instructions, which may be one or more computer programs (including program code), adapted to be loaded and executed by the processor. The computer readable storage medium may be a high-speed RAM Memory or a Non-Volatile Memory (Non-Volatile Memory), such as at least one magnetic disk Memory.

One or more instructions stored in a computer-readable storage medium may be loaded and executed by a processor to implement the respective steps of the method for dynamic social network community evolution prediction in the above embodiments; one or more instructions in a computer-readable storage medium are loaded by a processor and perform the steps of:

Obtaining an initialized population based on a designed gene coding strategy, randomly selecting two parent chromosomes with the same coding length in the initialized population, generating offspring by using a two-point crossover operator on the parent chromosomes, and carrying out stricture on the offspring by using set constraint conditions; mutation operation is carried out on the generated offspring by using mutation operators, and the offspring are strict based on constraint conditions; based on the fitness function, screening the filial generation by using a rapid non-dominant sorting operator and an individual crowding distance operator to obtain an optimal partitioning scheme of the self-adaptive time window; dividing a time window of the dynamic social network according to an optimal dividing scheme to obtain network snapshots under each window, and obtaining overlapping communities in each network snapshot by using a party filtering method; calculating inclusion degree of every two communities in adjacent snapshots according to the sequence from front to back of the snapshot time stamp of the obtained overlapped community set by using a GED community tracking method to obtain a community evolution sequence of each community, and identifying evolution events according to the community evolution sequence to obtain key events experienced by each sequence; calculating various topology characteristics of all communities in the community evolution sequence; extracting 5 historical community features from communities which accord with at least 5 network snapshots in a community evolution sequence, splicing the community features with 4 evolution events to form one-dimensional features, and taking the evolution event at the next moment of the communities as a feature tag; machine learning is performed on the tagged feature data using RandomForest classifiers, and predictions are made using trained RandomForest classifiers.

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The effects of the present invention are further described below in connection with simulation experiments.

1. Simulation conditions:

The hardware platform of the simulation experiment of the invention is: CPU is Intel (R) Core (TM) i7-7700HQ, main frequency is 2.8GHz, memory is 16GB, and GPU is NVIDIA 1060.

The software platform of the simulation experiment of the invention is: windows operating system and python3.6.

2. Simulation content and result analysis:

The simulation experiment of the invention is carried out by using a plurality of data sets by adopting the invention and four prior arts, and the quality of the community sequence obtained after community tracking and the prediction performance of the final community event are compared.

The data set used in the simulation experiment is four real data sets, namely DBLP data set, autonomous System data set, yelp data set and AS-Caida data set.

DBLP the dataset contains statistics of articles commonly published by authors in the computer domain. DBLP the dataset contains title, author, year, publication location. Paper index identification, and reference identification for each published paper. The invention selects the configuration diagram of the cooperative article in 1995-2015. Each node represents an author and each edge represents a paper authored by two authors.

The AS dataset contains daily communications networks in BGP logs. The invention selects data from 12 months in 1998, 30 days to 2 months in 1999 and 27 days, and establishes a social network on daily communication records. Wherein each identifier is considered a node and each relationship between two identifiers is considered an edge.

The AS-Caida dataset is similar to the AS dataset. But AS-Caida is information about provider to customer and provider to provider. The present invention uses provider information from month 1 of 2007 to month 12 of 2009 to build a dynamic social network.

The yellow dataset is a collection of merchant, comment, and user data. The present invention creates a boundary between each user and his or her friends from month 7 in 2010 to month 7 in 2014 using user object data.

In simulation experiments, four prior art techniques employed refer to:

Bordka et al, in the paper "PREDICTING COMMUNITY EVOLUTION IN SOCIAL NETWORKS" (IEEE, 2015) published by the same, propose a method for predicting community evolution events, which divides a dataset into traditional disjoint and overlapping snapshots, then tracks communities using a get community tracking method, and finally trains and predicts using a classifier.

Ilhan et al, in the paper "PREDICTING COMMUNITY EVOLUTION BASED ON TIME SERIES Modeling" (IEEE, 2015) filed on the same. The method applies ARIMA technology to predict community characteristics of the next time period, and predicts a community evolution event of the next time by using the predicted community characteristics.

Dakiche et al, published paper "Community Evolution Prediction in Dynamic SocialNetworksusing Community Features'Change Rates"(ASONAM,2019), propose a method for predicting community evolution that uses the rate of change of community features rather than absolute values to predict critical events of the community.

Tajeuna et al, in their published paper "Modeling AND PREDICTING Community Structure CHANGES IN TIME-Evolving Social Networks" (TKDE, 2019). They devised a way to find the time window size that best fits the network. The time window size of the method is estimated based on the number of nodes that appear, disappear, and remain in two consecutive independent snapshots.

First, the present invention investigates three types of time frames for each dataset, each time frame being of a different size. The invention uses CPM algorithm to detect community structure of each time period, and uses GED method to determine evolution event. After obtaining community sequences of the community evolving over time, two general standards (APCC, APNP) were used to evaluate the quality of community sequences obtained under three different time frameworks.

Because the communities found are different under different time frames, the invention compares the average scores of all community evolution sequences under different time frames.

First, the similarity between communities and each other is evaluated with popular Pearson correlation coefficients, defined as follows:

Wherein v _i and v _j are communities respectively AndA corresponding transition probability vector. v _i or v _j reflectsOrProportion of nodes shared with each community found throughout the time sequence. The invention uses Average Pearson Correlation Coefficient (APCC) to calculate global similarity of the community evolution sequence S _C as follows:

another criterion is used to measure whether the nodes of the original community drift in the community evolution sequence, i.e. the original community

Whether the nodes of the region remain in the subsequent community. The Average Proportion of Node Persistence (APNP) in the community evolution sequence S _C is expressed as:

Wherein the method comprises the steps of For communityThese values were normalized in the experiment for all nodes included in (c).

"Self-Adaptive" in Table 2 represents the Adaptive time window partitioning method proposed by the present invention, "disjoint" represents the conventional disjoint time window partitioning method, and "overlapping" the conventional overlapping time window partitioning method.

TABLE 2 Performance evaluation Table for tracking Community evolution sequence by the present invention and existing Window partitioning method

As can be seen from the combination of Table 2, in As and AS-Caida, the quality of the tracked community evolution under the disjoint time window division scheme is slightly better than that under the overlapping time window division scheme, and the quality of the community sequence obtained by the self-adaptive time window division method on all data sets of As and AS-Caida, DBLP, facebook is optimal.

The invention further carries out experiments on the community evolution event, and takes the community evolution prediction as a supervised learning task, wherein the future of the community evolution is predicted by using the history of the community evolution.

The prediction result is evaluated through F-Measure, which is the harmonic average value of precision and recall, and is as follows:

TABLE 3 predictive Performance evaluation Table of the present invention and existing Community evolution prediction methods

The adaptive time window dividing method provided by the invention greatly improves the prediction precision by combining the different prediction models shown in the table 3 and the prediction results of the adaptive time period prediction model provided by the invention, and is superior to the existing traditional dividing method in almost all aspects. As can be seen from the results for the AS data set and AS-Caida, the prediction accuracy of the self-adaptive time frame provided by the invention is better in dissolving and shrinking aspects. However, it is not much different from other algorithms that use traditional overlapping time frames, possibly because the activity of the data itself is more balanced. For DBLP and Facebook datasets, the adaptive time frame prediction achieved the best results among all events, with significant improvement.

The simulation experiment shows that: the self-adaptive time window dividing method provided by the invention can reduce event misjudgment and omission in the community tracking process, and the detected community evolution sequence has higher quality. The method and the system can obviously improve the effect of dynamic social network data on community tracking and community event prediction. The invention designs a very excellent self-adaptive time window dividing method by analyzing network activities.

In summary, according to the dynamic social network community evolution prediction method, system, storage medium and device, the optimal time window size and the optimal overlapping rate are selected according to the activities of the networks in different periods by utilizing the optimization idea. The method reduces information loss in the process of tracking community evolution in different periods, thereby ensuring the quality of the community evolution sequence tracked by a tracking algorithm. A new community evolution prediction model is established, and compared with the current mainstream prediction model, the community evolution prediction model can achieve better prediction precision in a real network.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above is only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited by this, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims

1. The dynamic social network community evolution prediction method is characterized by comprising the following steps of:

S1, obtaining an initialized population based on a designed gene coding strategy, randomly selecting two parent chromosomes with the same coding length in the initialized population, generating offspring by using a two-point crossover operator on the parent chromosomes, and carrying out stricture on the offspring by using a set constraint condition; mutation operation is carried out on the generated offspring by using mutation operators, and the offspring are strict based on constraint conditions; based on the fitness function, the offspring is screened by using a rapid non-dominant sorting operator and an individual crowding distance operator, and an optimal partitioning scheme of the self-adaptive time window is obtained, wherein the fitness function is as follows:

F2＝δ²(Fl)

Wherein F1 is the expected value and the set value error of fluctuation, τ is the number of time windows, delta is between 0 and 1, fl _s is the network fluctuation between two adjacent continuous independent snapshots, fl is the fluctuation set, F2 is the standard deviation of the calculated fluctuation set Fl, G _Ti is the snapshot of the time window T _i, S _i is the number of nodes contained in the snapshot, For the snapshot of the time window T _i, S _i+1 is the number of nodes contained in the snapshot, N _a is the number of nodes appearing between two consecutive independent snapshots, N _d is the number of nodes disappearing between two consecutive independent snapshots, and N _r is the number of nodes remaining between two consecutive independent snapshots;

s2, dividing a time window of the dynamic social network according to the optimal division scheme obtained in the step S1 to obtain network snapshots under each window, and obtaining overlapping communities in each network snapshot by using a party filtering method;

s4, calculating various topological features of all communities in the community evolution sequence obtained in the step S3; extracting 5 historical community features from communities which accord with at least 5 network snapshots in the community evolution sequence, splicing the community features with 4 experienced evolution events to form one-dimensional features, and taking the evolution event at the next moment of the communities as a feature tag; and performing machine learning on the tagged feature data by using a Random Forest classifier, and predicting by using the trained Random Forest classifier.

2. The method for predicting evolution of a dynamic social network community according to claim 1, wherein in step S1, a gene encoding strategy uses a start time of a time window as a gene to form one chromosome, uses an end time of the time window as another chromosome, and the two chromosomes form a complete adaptive time window dividing scheme; the population initialization selects the number of time stamps from 5 to the original data, and 5 random individuals are generated for each length to finish the problem of window number self-adaption; the method comprises the following steps:

A dynamic social network is divided into T, t= { T ^start,T^end|T^start,T^end∈{t₁,…,t_m }; each set of adaptive timestamps is a set of individuals containing two chromosomes, representing the start and end times of all windows, respectively; constraints for generating the initial individual are as follows:

T₁ ^start＝t₁

wherein T ₁ ^start＝t₁ specifies that the adaptive time frame starts from the beginning of the dynamic network, Defining the end time of the adaptive time period as the end time of the dynamic social network,AndIt is provided that the latter time period does not completely contain the former time period, that no missing network exists between the former time period and the latter time period, and that adjacent time windows overlap or do not overlap by any length,

3. The method for dynamic social network community evolution prediction according to claim 1, wherein in step S3, the inclusion degree of each other between every two communities between adjacent snapshotsThe calculation is as follows:

wherein the first part is the proportion of two community public nodes to the first community, and the NI (x) in the second part is the node x in the graph A node index calculated in (a)Snapshot/>, for time window T _k Communities with the middle designation i,Snapshot/>, for time window T _k+1 A community labeled j.

4. The method of claim 1, wherein in step S3, the key events include formation, survival, merging, splitting, dissolution, growth, and contraction.

5. The method for predicting evolution of a dynamic social network community according to claim 1, wherein in step S4, the various topological features of all communities are as follows:

6. The method for predicting evolution of a dynamic social network community according to claim 1, wherein in step S4, the data form of the one-dimensional feature is:

7. A dynamic social network community evolution prediction system, comprising:

The division module obtains an initialization population based on a designed gene coding strategy, randomly selects two parent chromosomes with the same coding length in the initialization population, generates offspring by using a two-point crossover operator on the parent chromosomes, and performs stricture on the offspring by using a set constraint condition; mutation operation is carried out on the generated offspring by using mutation operators, and the offspring are strict based on constraint conditions; based on the fitness function, the offspring is screened by using a rapid non-dominant sorting operator and an individual crowding distance operator, and an optimal partitioning scheme of the self-adaptive time window is obtained, wherein the fitness function is as follows:

F2＝δ²(Fl)

Wherein F1 is the expected value and the set value error of fluctuation, τ is the number of time windows, delta is between 0 and 1, fl _s is the network fluctuation between two adjacent continuous independent snapshots, fl is the fluctuation set, F2 is the standard deviation of the calculated fluctuation set Fl, For a snapshot of time window T _i, S _i is the number of nodes contained in the snapshot,For the snapshot of the time window T _i, S _i+1 is the number of nodes contained in the snapshot, N _a is the number of nodes appearing between two consecutive independent snapshots, N _d is the number of nodes disappearing between two consecutive independent snapshots, and N _r is the number of nodes remaining between two consecutive independent snapshots;

The prediction module is used for calculating various topological characteristics of all communities in the community evolution sequence obtained by the tracking module; extracting 5 historical community features from communities which accord with at least 5 network snapshots in the community evolution sequence, splicing the community features with 4 experienced evolution events to form one-dimensional features, and taking the evolution event at the next moment of the communities as a feature tag; and performing machine learning on the tagged feature data by using a Random Forest classifier, and predicting by using the trained Random Forest classifier.

8. A computer readable storage medium storing one or more programs, wherein the one or more programs comprise instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-6.

9. A computing device, comprising:

One or more processors, memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 1-6.