CN110611582A

CN110611582A - Opportunistic social network effective data transmission method based on node socialization

Info

Publication number: CN110611582A
Application number: CN201910347872.8A
Authority: CN
Inventors: 吴嘉; 严晔琴; 陈志刚; 刘佳琦
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2019-04-28
Filing date: 2019-04-28
Publication date: 2019-12-24

Abstract

The invention provides an opportunistic social network effective data transmission method based on node socialization, which comprises the steps of dividing nodes in a network into a plurality of different communities, removing some low-efficiency nodes according to the attributes of optimal relay nodes, and carrying out community reduction; measuring the availability of the nodes by proposing the concepts of sending trust, receiving trust, residual caching and activity indexing; in an opportunistic social network, it is more likely that a node is the optimal relay node if the nodes meet these characteristics at the same time. The reduced efficient community transmission data packet is beneficial to maintaining the continuity, stability and efficiency of the data transmission process. Simulation results show that the packet delivery rate of the ETNS is 13% higher than that of the epidemic algorithm, and the ETNS has lower transmission delay and routing overhead.

Description

Opportunistic social network effective data transmission method based on node socialization

Technical Field

The invention belongs to the technical field of computers, and particularly relates to an opportunistic social network effective data transmission method based on node socialization.

Background

With the popularization of networks and the development of social informatization, information dissemination based on various online social platforms has become an extremely important means. Many social platforms, such as Facebook, Instagram, and Twitter, have sufficient capabilities to support billions of users participating in the information transfer process. Through the network platform, people can attract more people to pay attention by sharing interesting things in life. When users communicate and surf the internet through mobile devices, they can publish photos or videos anytime and anywhere.

In social networks, it is important that the network have the ability to communicate at high speed. Through the evaluation of human communication activities and their interest preferences, historical information of data exchange activities may be recorded and analyzed. With the development of online communication platforms, personal commodity recommendation becomes effective. However, the process of retrieving large amounts of structured data from human activity is very complex, requiring significant storage and computational resources. This makes some conventional wireless sensor network approaches unsuitable.

When we face this problem, it becomes necessary and important to establish a suitable environment in the wireless network to ensure the stability of data transmission. Opportunistic networks are an operating architecture suitable for wireless communication research. The biggest characteristic of the scheme is that information transmission among nodes needs to find out 'opportunity'. This information transfer method can provide communication services through cooperation between node movement and nodes. Opportunistic networking approaches are increasingly being applied in social networking scenarios because people's movements cause intermittent connections between wireless devices carried. In an online social network, "opportunities" may be provided by reliable neighbors that have sufficient resources and cache space to hold what we want to share, such as pictures and videos, or have similar points of interest to share our experiences. The "storage" and "carry" states also apply to the online social network, as nodes in the online social network need to wait for the presence of appropriate neighbors. A "forward" state in a social network may represent an efficient data transfer process. In the study of social networks, "opportunities" mean the possibility to decide whether useful information can be propagated. In a social network, only reliable neighbors can participate in the selection of an optimal relay node when node communication is established.

Despite the emergence of community-based opportunistic network propagation strategies, how to partition effective communities remains a hot issue. This is not because existing community partitioning methods are not feasible, but because community partitioning does not consider whether all nodes in the community can meet the transmission requirements. In fact, most processes of community division consider interest points and social relationships of nodes in real scenes, but not every node in the community is suitable for propagation. Such a community node is an inefficient node requiring a large amount of resources for transmission, but its transmission performance is not ideal, so it is necessary to provide a method for reducing the cost. Social network communications that simultaneously transmit large amounts of data may result in excessive energy consumption, low transmission rates, and transmission delays. Therefore, there is a need to propose a community reduction method to improve the performance of community-based algorithms.

Disclosure of Invention

The invention provides an opportunistic social network effective data transmission method based on node socialization, which aims to divide nodes in a network into a plurality of different communities by using a clustering method and simultaneously provides a group reduction method based on optimal relay node attributes, so that message transmission among a source node, the communities and a target node is more efficient.

An opportunistic social network effective data transmission method based on node socialization comprises the following steps:

constructing an undirected graph of network nodes, and calculating a clustering coefficient of each node according to the undirected graph;

selecting a node with the largest clustering coefficient from a node set to be subjected to community clustering division as a clustering center, performing community clustering division on neighbor nodes of the selected clustering center, and transmitting data to be transmitted by using nodes in the divided communities;

judging whether the similarity between the neighbor nodes in the circular area where the clustering center is located and the clustering center is higher than the average value of the similarity between all the neighbor nodes in the circular area where the clustering center is located and the clustering center, if so, dividing the corresponding neighbor nodes into the community where the clustering center is located, otherwise, waiting for the next clustering division;

if no node with the similarity larger than the average value of the similarities exists in the circular area where the current clustering center is located, or the current clustering is ended after the number of the nodes contained in the current community reaches a set Minpts, and the node with the largest aggregation coefficient in the current node set is selected as the next clustering center;

finishing node clustering after all nodes finish community division or when the maximum value of clustering times is reached;

the initial value of a node set to be subjected to community clustering division comprises all nodes in a network, each node is deleted from the node set after being divided into communities, and each clustering center is used as an initial node in one community;

the neighbor nodes of the clustering center are nodes contained in a circular area with the clustering center as a central point and the radius of the circular area as an Eps set value.

The values of the Eps set value and the set Minpts are determined by experimental adjustment in different scenes by adopting empirical values.

The nodes in the network are divided into a plurality of different communities by using a clustering method, so that data transmission is more effective;

further, the clustering coefficient of the nodes is calculated according to the following formula:

wherein, C_iRepresenting a node v_iCluster coefficient of (v), node v_iDegree in undirected graph is k_i，E_iRepresented in an undirected graph, node v_iAnd k is_iActual number of connecting edges, T, between individual neighbor nodes_iRepresenting a node v_iKth of (1)_iThe maximum number of connections that can be formed by a neighbor node.

Further, in the node community clustering and dividing process, if overlapping communities exist, the nodes are divided into communities with higher modularity values;

wherein the modularity value of the community is Q_c(X_n)，

Q_c(X_n) Representing Community X_nThe value of the module-wise value of (c),representing Community X_nThe number of internal nodes;representing Community X_nNumber of connecting edges when containing x nodes, d_xIs community X_nDegree of a node when x nodes are included;is the current community X_nThe total number of inner node connecting edges; x_nThe number n of communities is represented, the initial value of n is 0, and the maximum value is the maximum value of the clustering times.

Further, the similarity between nodes is calculated according to the following formula:

wherein, S (x)_i,x_j) Representing a node v_iAnd node v_jSimilarity between, x_ik,x_jkRepresented in an undirected graph, node v_i，v_jShortest paths to node k, respectively; m represents the number of nodes contained in the undirected graph corresponding to the network node.

Further, each node v is calculated_iProperty integrated characteristic value k of_iComparing the attributes of each nodeSum of the characteristic value and a set threshold value k, if_iIf the value is less than kappa, the corresponding node v_iData is not transmitted as a relay node:

N_R(j,i)＝N_Rh(j,i)+N_Rm(j,i),N_S(j,i)＝N_Sh(j,i)+N_Sm(j,i),

a_ijis the value of the ith row and jth column element in the matrix A, A is the characteristic evaluation matrix of the node,

omega represents a weighting coefficient of the node credit value in calculation, and the value range is [0,1 ]; n is 0 as the initial value, and the maximum value is the maximum value of the clustering times;

the normalization coefficient of the information entropy is defined asFor keeping the entropy value of each attribute positive; when ω is 0, the local credit value is the average ratingThe value of the node participating in the transaction, if there are many malicious nodes in the network and collusion exists, the omega is set to 0, so that the trust value can be more fair and fair; experiments show that when the number of malicious nodes is greater than 40% of the total number of nodes, collusion cheating can be avoided by setting ω to 0. But when the number of malicious nodes is less than 40% of the total number of nodes, the effect of ω being 1.6 is the best.

Vr_iRepresents a collection of nodes, Vs, that have traded with the node and are on the receiving side_iRepresenting a collection of nodes that have transacted with the node and are on the sender side. LR_ijRepresenting a node v_iAs a reputation evaluation for sender node j at the receiver, LS_ijRepresenting a node v_iTo a receiver node v when acting as a sender_jThe reputation of (2) is evaluated; LR_ijRepresenting a node v_iAs a reputation evaluation for sender node j at the receiver, LS_ijRepresenting a node v_iTo a receiver node v when acting as a sender_jA reputation evaluation of. LR_ijAnd LS_ijHas a value interval of [0,1]]The initial values are all 0.5;

N_Rh(i, j) and N_Rm(i, j) respectively represent a node v_iAs a receiver node with v_jThe number of times loyalty and malicious transactions occur;

N_Sh(i, j) and N_Sm(i, j) respectively represent a node v_iAs a sender node and v_jThe times of the integrity and the malicious transaction are obtained through the statistics of historical transmission records;

within a time interval t, node v_iThe amount of data received is denoted r_i(t)，B_Sr_i(t) represents the amount of buffer occupied and data collected r during data reception_i(t) is in a linear relationship, B_SCollecting buffers occupied by units of data for nodes, B_TConsumption of the cache; j. the design is a square_n,k(t) represents the number of channels, Σ, assigned to a node_k∈κJ_n,k(t)≤1；

The activity level of a node is measured by converting a daily timestamp T into a mapping time. The seconds of the mapping time and the seconds of the daily time are defined asAnd τ, which are both in the value range of [0,86400 ], representing each second of the day.

The current time stamp of the node is defined as T, and the time zone where the node is located is represented as N_zone,Max_τIs the maximum value of τ; v represents the average velocity of all nodes in the history.

The method belongs to a method for simplifying communities, can screen inefficient nodes which do not accord with transmission conditions in a community structure, and improves the efficiency of a data transmission strategy based on the community.

Further, the maximum value of the clustering times is the number of important nodes;

the important node is a node with a nomination value larger than the average nomination value of all nodes;

the node nomination value obtains the interactive relation among users according to the behavior logs of the users, the interactive relation is obtained according to the interactive relation among the users, the nomination value of the node is increased by 1 every time the node successfully forwards data, and the initial value of the nomination value of each node is 1;

node v_iThe nomination value after the s-th successful data transmission in the history record is

Wherein, the node v_iThe number of adjacent nodes of (a) is pi-1;representing a node v_iThe size of the adjacent matrix is pi x pi, nodes are sequentially arranged at the row head and the column head of the adjacent matrix, if a connecting edge exists between two nodes, the element of the corresponding two nodes in the adjacent matrix takes a value of 1, otherwise, the element takes a value of 0,the value of (d) is 0.

The nomination value is determined by the times of successful information transmission of the nodes in the current history record, and each node is endowed with a nomination in the initial stage. After each successful data forwarding, the nomination value of one node contains the original nomination and the nomination of other nodes connected with the node. After data forwarding is successfully performed each time, the nomination value needs to be updated correspondingly. The nomination value is used for measuring the importance of the node, the node with the importance higher than the average value is considered as an important node, and the number of the important nodes determines the clustering frequency.

Further, assume node v_iAnd node v_jWhen they meet, v_iStoring data to be forwarded;

if node v_jIs the target node for data transmission, then node v_iTransmitting information to node v_jAnd deleting the message in the send queue;

if node v_jNot the target node for data transmission, node v_iThe transmission method of (1) is divided into the following two types:

step 7.1: intra-community transmission;

if the target node is at the current node v_iWithin a community of v, and v_jAlso in the current community, node v_iTransmitting the data information to the node, otherwise, not transmitting;

step 7.2: inter-community transmission;

if the target node is not at node v_iCommunity of interest, node v_iWill be sent to node v_jA request frame asks whether the target node is at node v_jThe community to which the user belongs; after receiving the request frame, the node v_jWill check the current community, confirm whether the target node is in the current community, and send a response frame to the node v_j(ii) a If the target node is at v_jIn the community, node v_iSending data information to node v_jOtherwise, the hair is not sent.

Most community-based routing algorithms take into account node attributes and social relationships, and do not take into account the energy consumption of inefficient nodes, which accounts for a large proportion of the routing cost. In order to improve a network propagation strategy, the invention provides an effective propagation strategy based on node socialization, and nodes in a network are divided into a plurality of different communities. The scheme also relates to a community reduction method for removing some inefficient nodes according to the attributes of the optimal relay node. The invention provides concepts of sending trust, receiving trust, keeping cache and activity index to measure the availability of the node. In an opportunistic social network, it is more likely that a node is the optimal relay node if the nodes meet these characteristics at the same time. According to the entropy value of each characteristic, the usability of the node is determined by comprehensively considering a plurality of characteristics, and the number of inefficient nodes in the community can be effectively reduced. The reduced efficient community transmission data packet is beneficial to maintaining the continuity, stability and efficiency of the data transmission process. Simulation results show that the packet delivery rate of the ETNS is 13% higher than that of the epidemic algorithm, and the ETNS has lower transmission delay and routing overhead.

Advantageous effects

The invention provides an opportunistic social network effective data transmission method based on node socialization, which comprises three stages, wherein in the first stage, the interactive relationship among users is obtained according to behavior logs of the users, so that communities are divided by adopting a clustering method according to the similarity of nodes; in the second stage, an attribute quantization strategy of the user node characteristics is constructed by combining the attribute characteristics which need to be met by the relay node in the opportunistic social network for successful message transmission, and the attribute quantization strategy is used as a judgment basis of the low-efficiency nodes in the community; and finally, carrying out effective data transmission by combining the reduced efficient communities. Based on online social network user behavior records and the association relation of heterogeneous nodes, a characteristic condition which needs to be met when the nodes are comprehensively considered for successful data transmission is provided, and the community structure is reduced by combining the condition, so that the inefficient nodes in the community are reduced;

modeling an experiment under The ONE (Opportunistic network environment) simulation platform by means of Mapreduce and Rdd calculation frames based on a real social network data set; the experimental result shows that compared with the FCNS algorithm, the ESR algorithm, the EWDCR algorithm and the traditional Epidemic algorithm, the method has better transmission success rate and routing overhead.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a diagram illustrating a clustering-based community partitioning process proposed by the present invention;

FIG. 3 is a process flow diagram of the clustering-based community partitioning process proposed by the present invention;

fig. 4 is a quartering graph of transmission success rates in a data transmission strategy performed by 5 different methods, namely ETNS, FCNS, ESR, EWDCR, and Epidemic, in example 1, wherein (a) is a quartering graph of transmission success rates when an Infocom5 data set is selected for simulation, (b) is a quartering graph of transmission success rates when an Infocom6 data set is selected for simulation, (c) is a quartering graph of transmission success rates when a Cambridge data set is selected for simulation, and (d) is a quartering graph of transmission success rates when an Intel data set is selected for simulation;

fig. 5 is a graph showing comparison of transmission success rates in data transmission strategies performed by 5 different methods, namely ETNS, FCNS, ESR, EWDCR, and Epidemic, in example 1, where (a) is a graph showing comparison of transmission success rates when an Infocom5 data set is selected for simulation, (b) is a graph showing comparison of transmission success rates when an Infocom6 data set is selected for simulation, (c) is a graph showing comparison of transmission success rates when a Cambridge data set is selected for simulation, and (d) is a graph showing comparison of transmission success rates when an Intel data set is selected for simulation;

fig. 6 is a comparison graph of end-to-end delay in a data transmission strategy performed by 5 different methods, namely ETNS, FCNS, ESR, EWDCR, and Epidemic, in embodiment 1, where (a) is a comparison graph of end-to-end delay when an Infocom5 dataset is selected for simulation, (b) is a comparison graph of end-to-end delay when an Infocom6 dataset is selected for simulation, (c) is a comparison graph of end-to-end delay when a Cambridge dataset is selected for simulation, and (d) is a comparison graph of end-to-end delay when an Intel dataset is selected for simulation;

fig. 7 is a comparison graph of the routing overhead in the data transmission strategy performed by 5 different methods, namely ETNS, FCNS, ESR, EWDCR, and Epidemic, in example 1, where (a) is a comparison graph of the routing overhead when an Infocom5 dataset is selected for simulation, (b) is a comparison graph of the routing overhead when an Infocom6 dataset is selected for simulation, (c) is a comparison graph of the routing overhead when a Cambridge dataset is selected for simulation, and (d) is a comparison graph of the routing overhead when an Intel dataset is selected for simulation.

Detailed Description

The invention will be further described with reference to the following figures and examples.

The invention provides a schematic diagram of an opportunistic social network effective data transmission method based on node socialization, which is shown in fig. 1-3 and comprises the following concrete implementation steps:

step 1: constructing an undirected graph of network nodes, and calculating a clustering coefficient of each node according to the undirected graph;

calculating the number of important nodes in the network to obtain clustering times, and calculating the number of important nodes in the network to obtain clustering times;

the nodes of the network actively participating in information transmission and data forwarding are considered as important nodes, and the importance of the nodes is measured through a nomination mechanism of the nodes, and the specific process is as follows: node v_iThe nomination value of the node every time data is successfully forwardedIt is increased by 1. The node nomination accumulation process is as follows:

wherein the content of the first and second substances,representing a node v_iNumber of nominations after s-th successful transmission of data, initial nominationsIs defined as 1.Node v representing a network_iIs adjacent to the element in the ith row and the jth column in the matrix. To enable comparison of nodes in the network, the nominated cumulative values are normalized and expressed as:

if the normalization process is performed after each iteration, the nomination value accumulation process can be expressed as:

step 2: carrying out community division on the nodes;

firstly, selecting a node with the largest clustering coefficient in a node set as a clustering center, then comparing the similarity of surrounding nodes and the clustering center with the average similarity, dividing nodes higher than the average level into the community, not dividing nodes lower than the average level into the current community, and waiting for the next division process. And if the overlapped communities exist, dividing the nodes into communities with higher modularity values. The specific calculation method is as follows:

wherein, a node v is set_iDegree of (is k)_i,E_iRepresenting a node v_iIs actually between the kth neighbor nodesThe number of connecting sides of (a); t is_iRepresenting a node v_iC, the maximum number of connections that the kth neighbor node may form_iRepresenting a node v_iThe cluster coefficient of (2). In equation (5), G is assumed to be an undirected graph with M nodes, where G ═ x₁,x₂,...,x_m},x_ik,x_jkRespectively represent nodes v_i，v_jShortest path to node k, S (x)_i,x_j) Representing a node v_iAnd node v_jThe similarity between them. In the formula (6), Q_c(X_n) Representing Community X_nThe value of the module-wise value of (c),representing Community X_nThe number of internal nodes;representing Community X_nNumber of connecting edges when containing x nodes, d_xIs community X_nDegree of a node when x nodes are included;is the current community X_nThe total number of inner node connecting edges; x_nThe number n of communities is represented, the initial value of n is 0, and the maximum value is the maximum value of the clustering times.

And step 3: and based on the transmission attribute existing in the interactive process of the user, the attribute characteristics required to be met by the node in the transmission process are presumed, and the attribute characteristics are measured to obtain the evaluation standard of the inefficient node. The method comprises the following specific steps:

step 3.1: each node performs local reputation evaluation on other nodes based on historical transaction records and social relationships. Since social relationships are difficult to quantify, integrity assessments are made based only on historical transaction records. Defining an evaluation mechanism during each transaction, giving good or bad evaluation by the node after each transaction, quantizing the local credit value according to the evaluation information, and calculating the model as follows:

among them, LR_ijRepresenting a node v_iAs receiver to sender node v_jReputation evaluation of (LS)_ijRepresenting a node v_iTo a receiver node v when acting as a sender_jA reputation evaluation of. LR_ijAnd LS_ijHas a value interval of [0,1]]The initial values are all 0.5; LR_ijAnd LS_ijA value of greater than 0.5 is considered trustworthy, with closer to 1 being more trustworthy, whereas the node is considered untrustworthy. N is a radical of_Rh(i, j) and N_Rm(i, j) respectively represent a node v_iAs a receiver node with v_jThe number of times loyalty and malicious transactions occur. N is a radical of_Sh(i, j) and N_Sm(i, j) respectively represent a node v_iAs a sender node and v_jThe times of the integrity and the malicious transaction are obtained through the statistics of historical transmission records. Setting a penalty factor N_punThe descending speed of the credit value is faster than the ascending speed, and the punishment to the malicious transaction is reflected.

The local reputation evaluation of the node can intuitively evaluate the honesty degree of a certain node. However, in a trusted mechanism, the global reputation value of a node as a sender (receiver) should be evaluated by all nodes that have traded with the node. The global reputation value for a node as a sender is defined herein as GR_iAnd the global reputation value of the node as the receiver is defined as GS_i. For the evaluation content of the node, the opinion of the node with higher integrity is more important than that of the node with lower integrity. Similarly, if a node has multiple stable connections and data transmissions with the node, the opinion of the evaluation between the two nodes is more reliable. Thus, the global reputation value GR of a node_iAnd GS_iLocal reputation value LR that should be for all nodes connected to it_ijAnd LS_ijAnd comprehensively measuring the transaction times and the credit evaluation value. Here, the expression is performed by means of weighted average:

wherein N is_R(j,i)＝N_Rh(j,i)+N_Rm(j,i),N_S(j,i)＝N_Sh(j,i)+N_Sm(j, i), ω represents the weighting coefficient of the node reputation value in the calculation, Vr_iRepresents a collection of nodes, Vs, that have traded with the node and are on the receiving side_iRepresenting a collection of nodes that have transacted with the node and are on the sender side. LR_ijRepresenting a node v_iAs a reputation evaluation for sender node j at the receiver, LS_ijRepresenting a node v_iTo a receiver node v when acting as a sender_jA reputation evaluation of. And the weighting function 1-exp (-N (j, i)/5), and the evaluation content of the node is more important as the number of times of connecting the node with the node is more negative and exponential increases.

If a node passes n rounds of transaction of time slices and the like, the global credit value of the node as a sender can be considered to be based on the global credit value of the node as a receiver in the previous round. This way, the number of iterative computations of equation (8) can be reduced, saving communication and computation overhead:

step 3.2: the residual cache of the nodes in the network is an important factor to be considered when information transmission and data forwarding are carried out;

within a time interval t, node v_iThe amount of data received is denoted r_i(t) of (d). Amount r of data occupied by buffer and collected during data reception_i(t) is linear and is denoted B_Sr_i(t) wherein B_SThe cache occupied by the unit data is collected for the node. If the sink node allocates a channel to node v_iThen node v_iB is consumed as cache_TAnd sending the data. Node v is thus_iTotal buffer in time slot tCan be expressed as:

due to the amount r of data received_i(t)≤r_maxAnd allocates the channel number Σ_k∈κJ_n,k(t) is less than or equal to 1, and the upper limit of the cache of any node in a time gap is B_max＝B_sr_max+B_T. The remaining caches of the nodes at the current time t are:

step 3.3: and defining an activity index. To convert the daily time to the mapped time, we define the seconds of the mapped time and the seconds of the daily time as the mapped time, respectivelyAnd τ, which are both in the value range of [0,86400 ], representing each second of the day. The difference between these two times is that T can only be a positive integer butMay be a decimal number. By analyzing the data set, the average number of messages transmitted per second M can be obtained^*And the number of messages transmitted per second M_τ. At M_τWhile T is continuously changing, M^*Is unchanged in size. Very obviously, M in the operating period_τIs obviously higher than M under the rest time at night_τSo that persistent changes related to node activity can be described. The mapping function is defined as follows:

for the original timestamp T of the data set, we define its corresponding mapping timestamp asTo convert the original timestamp into a mapping timestamp, we first need to convert T into the second τ of its corresponding mapping time by the following formula_*：

Where mod (a, b) returns the modulus value of a divided by b. The time zone in which the node is located is represented as N_zone，Max_τIs the maximum value of τ. The time zone is related to the location of the node, so we should add one N for each T in the calculation_zone*N_secondsThe value of (a). After integrating equations (12) and (13), the mapping time is obtained as:

in a social network, the movement of nodes brings about some communication opportunities. We determine whether a node belongs to an active time based on the sum of the node's distance over the mapping period, represented by equation 15, where v represents the average speed of the node in the history:

and 4, step 4: and distributing the weight of each attribute characteristic according to the information entropy function. For each community in the network, analysis is carried out based on various social attributes of the nodes, and the influence of the social attributes of various nodes on information transmission is measured, so that the low-efficiency nodes in the community are reduced. Here, we use the concept of information entropy for weight assignment. The entropy of the information is a variable quantity which can describe the disorder degree of the information, and the larger the entropy value is, the higher the disorder degree of the information is, and the corresponding information has the lowest utility. The information entropy is defined as:

wherein E (F)_i) Each representsCharacteristic F_iEntropy of (2). p (x)_i) Is represented by F_iThe selection of the function has different selection modes according to different scenes, so that the function can be determined when the scene is selected.

According to the property of the weight of the attribute feature in the application scene of the scheme, the function E needs to have symmetry, monotonicity, continuity and additivity. When the information entropy is used for weight analysis, the arrangement sequence of the characteristic values is changed without changing the weight corresponding to the characteristic values. Meanwhile, the feature number determination time function has continuity for its variables and changes monotonically with the degree of importance of the evaluation feature. Based on these principles, the function is constructed:

wherein, E (x)₁,...,x_u) Representing the entropy function, x, of the information used in the method_iEach attribute feature represents a node, and u represents the number of attribute features.

And 5: the method comprises the following steps of analyzing the global trust value of a node as a sender, the global trust value as a receiver, node cache and the frequency of movement of the node in a period of time, and establishing an evaluation matrix A of the characteristics:

and carrying out normalization processing on the data matrix to obtain a calculation matrix Y. Wherein maxx_ij，minx_ij，Respectively representing the maximum value, the minimum value and the average value of the jth column element of the data matrix A.

According to formula (19), the corresponding entropy value of each feature index is calculated. Here taking the negative signTo ensure that the entropy value is positive, the normalization coefficient is defined as

For each feature index, the relative weight can be found as:

step 6: for all nodes in the community, node filtering is carried out according to the node comprehensive characteristic indexes, and the attribute characteristics of the nodes are combined in a weighted mode to obtain comprehensive characteristics:

in order to measure whether a node has transmission capability, a threshold value k is set to determine that the node can meet the transmission condition required as a relay node, and then inefficient nodes are deleted. The core of the method lies in providing a viewpoint that nodes in the community can not meet the transmission condition, and creatively providing a community reduction method comprehensively considering the transmission required condition. By the scheme of reducing the nodes in the community, the nodes which do not meet the transmission requirement in the community can be filtered and deleted. After a small number of nodes which do not meet the forwarding condition are reduced, the nodes in the community are closely connected, and the transmission capacity is high.

And 7: through the steps, several communities with close social relations are obtained in the network. The nodes in these communities have a high degree of confidence, activity and sufficient cache space. The effective transmission is performed by data transfer between communities. Suppose node v_iAnd node v_jWhen they meet, v_iAnd storing the data to be forwarded. If node v_jIs the target node, then node v_iWill transmit information to node v_jAnd in the send queueThis message is deleted. If node v meets_jNot the target node, node v_iThe transmission method of (a) can be divided into the following two types:

step 7.1: and (4) intra-community transmission. If the target node is at the current node v_iWithin a community of v, and v_jAlso in the current community, node v_iAnd transmitting the data information to the node, otherwise, not transmitting.

Step 7.2: and (5) inter-community transmission. If the target node is not at node v_iCommunity of interest, node v_iWill be sent to node v_jA request frame asks whether the target node is at node v_jThe community to which the user belongs. After receiving the request frame, the node v_jWill check the current community, confirm whether the target node is in the current community, and send a response frame to the node v_j. If the target node is at v_jIn the community, node v_iSending data information to node v_jOtherwise, the hair is not sent.

Example 1:

in this example, using data sets from the social network of people in CRAWARD that move with imote devices, the original four data sets being social data provided by Cambridge university, we extracted key fields about user behavior records and user attribute information, including 4546 photos, 2662 photo publisher nodes, 40808 user nodes, and 618491 edges. The four datasets employed are the Infocom5 dataset, the Infocom6 dataset, the Cambridge dataset, and the Intel dataset, respectively.

The implementation is realized on The ONE simulation tool, a calculation programming model is built as a data calculation layer by means of a Mapreduce and Rdd calculation framework by means of taking an HDFS (distributed file system) as a data storage layer, data are efficiently and quickly processed in parallel, The model and The algorithm are built to solve The initial node with maximized influence, different comparative experiments are designed to analyze The selection effect and quality of The initial node, and therefore The correctness of The theoretical analysis method is verified.

In the embodiment, an ETNS algorithm based on node socialization information is mainly designed, and compared with ETNS, FCNS, ESR, EWDCR and Epidemic models, a design comparison experiment compares a propagation effect, and the effectiveness of the model and the algorithm on a data transmission strategy is verified.

Simulation results show that the ETNS algorithm has good performance in the community division process. The algorithm in the Infocom6 dataset performed best among the four datasets used in the experiment. This is the only dataset that results in the actual 6 clusters, which is the closest dataset to the actual results of humans. The community partitioning results in the Infocom5 and Cambridge datasets are also good, but slightly lower than the algorithmic performance in the Infocom6 dataset. Because the number of experimental copies in the data set is small, or the group of nodes are randomly distributed and do not accord with the characteristic of human clustering, the performance of the data set is poorer than that of other three data sets. Simulation results show that the ETNS algorithm is feasible and effective for carrying out community division in an actual data set.

In fig. 4, we used the quartile map to analyze the experimental results. The quartile map has 5 symbols (minimum, 1/4 values, median, 3/4 values, and maximum). The quartile may represent the distribution center, the concentration of distribution ratios, and the distribution range. In fig. 4, ETNS has a higher center of interest, a smaller spread range and a more focused range of interest than other algorithms.

FIG. 5 is a graph showing the comparison of transmission success rates in the data transmission strategies performed by 5 different methods, ETNS, FCNS, ESR, EWDCR and Epidemic, in example 1; when the simulation time is less than one day, the advantages of the ETNS algorithm are not obvious, and the performance of ETNS is similar to the other four algorithms. With the increase of simulation time, we can find that the transmission rate of ETNS is always the highest of these algorithms, because successful data transmission can be achieved by filtering the active nodes in the community. In the ETNS algorithm, nodes in a network are divided into a plurality of communities, and each pair of nodes with high similarity in the communities may communicate frequently. Meanwhile, the ETNS algorithm provides a node reduction strategy based on multiple attributes, so that a large number of inappropriate and inefficient nodes can be reduced, and the high availability of community nodes can bring the highest delivery rate. The ESR algorithm is a routing algorithm based on communities, but the reduction method of the nodes does not consider social attributes, so ETNS has better performance. For FCNS and EWDCR algorithms, the similarity does not take into account the trustworthiness of the nodes and the available buffer space, which may result in the selected relay node being unavailable. In addition, the epidemic algorithm has a large number of message copies that affect the data transmission efficiency, and thus the ETNS method has a relatively low transmission rate compared to the other four algorithms.

FIG. 6 is a comparison graph of end-to-end latency in the data transmission strategy performed by 5 different methods, ETNS, FCNS, ESR, EWDCR, and Epidemic, in example 1; ETNS has the lowest average end-to-end delay compared to the other four algorithms. Because the ETNS analyzes the comprehensive characteristics of the nodes, a community reduction strategy is provided, and the low-efficiency nodes which are not beneficial to the transmission process can be reduced, so that the average end-to-end time delay is reduced. In contrast, the epidemic algorithm has no requirement for the next hop node, and messages are transmitted blindly, resulting in a drastic increase in routing and forwarding delays. The ESR algorithm effectively limits the number of copies and therefore the transmission delay is lower than the epidemic algorithm. In addition, the FCNS algorithm analyzes the transmission preferences before data transmission. In the EWDCR algorithm, data is passed through neighbors and related nodes. Thus, the average end-to-end latency of the FCNS and EWDCR algorithms is lower than that of the conventional routing algorithms. Of these five algorithms, the average end-to-end delay of ETNS is optimal

FIG. 7 is a comparison of the routing overhead in the data transmission strategy performed by 5 different methods, ETNS, FCNS, ESR, EWDCR and Epidemic, in example 1. The average overhead of the ETNS algorithm is kept to a minimum level at all times because it employs a community-aware strategy that takes into account the comprehensive nature of the transmission. In the ETNS algorithm, nodes are divided into several closely related groups, and the probability of successful transmission between nodes is high. Therefore, the ETNS routing scheme occupies less time and resources, and the cost is greatly reduced on average. The ESR algorithm only considers the effect of nodes on the information flow, ignoring the current availability of the next hop node, resulting in latency overhead. In epidemic algorithms, redundant message replication requires a lot of time and resources, which is a major cause of huge routing overhead. In the FCNS and EWDCR algorithms, the similarity between nodes can effectively reduce the routing overhead, but the routing overhead can still be optimized because the resource consumption caused by some unavailable nodes can be reduced. In summary, ETNS has the lowest routing overhead among the five algorithms.

From the above experiments, the research method comprehensively considers the advantages of the community in the transmission process based on the user behavior record and the complex social relationship of the user, and provides a screening mode of low-efficiency nodes in the community, so that the transmission strategy based on the community can be more efficient, and the experiments show that the research method provided by the inventor has higher data transmission efficiency and lower routing overhead.

Claims

1. An opportunistic social network effective data transmission method based on node socialization is characterized by comprising the following steps:

2. The method of claim 1, wherein the clustering coefficients of the nodes are calculated according to the following formula:

3. The method according to claim 1, wherein in the node community clustering division process, if there are overlapping communities, the nodes are divided into communities with higher modularity values;

wherein the modularity value of the community is Q_c(X_n)，

4. The method of claim 1, wherein the similarity between nodes is calculated according to the following formula:

5. The method of claim 1, wherein each node v is computed_iProperty integrated characteristic value k of_iComparing the attribute comprehensive characteristic value of each node with a set threshold value kappa if_iIf the value is less than kappa, the corresponding node v_iData is not transmitted as a relay node:

τ_*＝mod(T_i+N_zone*N_seconds，Max_τ)

N_R(j,i)＝N_Rh(j,i)+N_Rm(j,i),N_S(j,i)＝N_Sh(j,i)+N_Sm(j,i),

N_Rh(i, j) and N_Rm(i, j) respectively represent a node v_iAs a receiver nodeAnd v_jThe number of times loyalty and malicious transactions occur;

6. The method according to any one of claims 1-5, wherein the maximum value of the clustering times is the number of significant nodes;

Wherein, the node v_iThe number of adjacent nodes of (a) is pi-1;representing a node v_iIn the ith row of the adjacency matrixElement of column j, node v_iThe size of the adjacent matrix is pi x pi, the nodes are sequentially arranged at the row head and the column head of the adjacent matrix, if a connecting edge exists between the two nodes, the element of the corresponding two nodes in the adjacent matrix takes the value of 1, otherwise, the element takes the value of 0,the value of (d) is 0.

7. The method of claim 6, wherein assume node v_iAnd node v_jWhen they meet, v_iStoring data to be forwarded;

step 7.1: intra-community transmission;

step 7.2: inter-community transmission;