CN111814065A

CN111814065A - Information propagation path analysis method and device, computer equipment and storage medium

Info

Publication number: CN111814065A
Application number: CN202010592379.5A
Authority: CN
Inventors: 曹合心; 蔡健
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-06-24
Filing date: 2020-06-24
Publication date: 2020-10-23
Anticipated expiration: 2040-06-24
Also published as: WO2021258998A1; CN111814065B

Abstract

The application relates to big data, and provides an information propagation path analysis method, which comprises the following steps: acquiring the propagation record information of the network; dividing the network into at least one community network according to the propagation record information, and determining candidate sampling nodes and initial propagation nodes in each community network; calculating the node composition proportion of a connecting node, a key node and a common node in the candidate sampling node through the node type identification of the candidate sampling node; determining a sampling strategy based on the calculation of the node composition proportion and a preset node incidence relation; the node association relation comprises that the connection node is associated with a key node, and the key node is associated with a common node; sampling the initial propagation node and the candidate sampling node according to a sampling strategy; and carrying out visual presentation on the collected initial propagation nodes and the candidate sampling nodes to generate an information propagation path diagram. In addition, the application also relates to a block chain technology, and the propagation record information can be stored in the block chain. The method and the device improve the accuracy of information propagation path analysis.

Description

Information propagation path analysis method and device, computer equipment and storage medium

Technical Field

The present application relates to big data, and in particular, to an information propagation path analysis method and apparatus, a computer device, and a storage medium.

Background

In big data, people or computers can form a network through complex contact, the people or computers can be regarded as nodes in the network, and data, information and the like can be transmitted in the network. With the development of internet technology, it is often necessary to analyze the propagation of information in a network. For example, in a marketing network, by analyzing the propagation path of a product in the network, product information can be propagated more extensively in the network at a lower cost. The virus propagation, the fraudulent link and the like in the epidemic situation can be analyzed in a network mode.

There is a Key Opinion Leader ("KOL") in the network, which has more and more accurate information and is accepted or trusted by more related groups, and which can propagate the information to more nodes in the network and have a greater influence on the nodes. The analysis of the information propagation path in the network is usually to perform node sampling for KOL, and to present the sub-network reflecting the information propagation trend in a visual manner. However, the conventional network propagation path analysis technology usually only performs manual sampling on a predefined KOL, the type of a sampling node is single, and the sampling node needs to be manually adjusted in different scenarios; or all nodes are randomly sampled without difference, and when the KOL ratio in the nodes is low, the key propagation path is easily missed. Therefore, the traditional network propagation path analysis technology is low in accuracy.

Disclosure of Invention

An object of the embodiments of the present application is to provide an information propagation path analysis method, an information propagation path analysis device, a computer device, and a storage medium, so as to solve the problem of low accuracy of information propagation path analysis.

In order to solve the above technical problem, an embodiment of the present application provides an information propagation path analysis method, which adopts the following technical solutions:

acquiring the propagation record information of the network;

dividing the network into at least one community network according to the propagation record information, and determining candidate sampling nodes and initial propagation nodes in each community network;

calculating the node composition proportion of a connecting node, a key node and a common node in the candidate sampling node according to the node type identification of the candidate sampling node;

determining a sampling strategy based on the calculated node composition proportion and a preset node incidence relation; the node incidence relation comprises that a connecting node is associated with a key node, and the key node is associated with a common node;

sampling the initial propagation node and the candidate sampling node according to the sampling strategy;

and carrying out visual presentation on the collected initial propagation nodes and the candidate sampling nodes to generate an information propagation path diagram.

In order to solve the above technical problem, an embodiment of the present application further provides an information propagation path analysis device, including:

the information acquisition module is used for acquiring the propagation record information of the network;

the network dividing module is used for dividing the network into at least one community network according to the propagation record information and determining candidate sampling nodes and initial propagation nodes in each community network;

the proportion calculation module is used for calculating the node composition proportion of a connecting node, a key node and a common node in the candidate sampling nodes according to the node type identification of the candidate sampling nodes;

the strategy determining module is used for determining a sampling strategy based on the calculated node composition proportion and a preset node incidence relation; the node incidence relation comprises that a connecting node is associated with a key node, and the key node is associated with a common node;

the node sampling module is used for sampling the initial propagation node and the candidate sampling node according to the sampling strategy;

and the path generation module is used for visually presenting the acquired initial propagation nodes and the acquired candidate sampling nodes to generate an information propagation path diagram.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the information propagation path analysis method when executing the computer program.

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps of the information propagation path analysis method described above.

Compared with the prior art, the embodiment of the application mainly has the following beneficial effects: dividing a network into at least one community network according to propagation record information, wherein each community network propagates the information independently, and determines candidate sampling nodes and initial propagation nodes in each community network; and calculating the node composition proportion of three nodes, namely a connecting node, a key node and a common node in the candidate sampling nodes according to the node type identification of the candidate sampling nodes. When an initial propagation node or a candidate sampling node is collected, sampling is carried out according to a sampling strategy, the sampling strategy is comprehensively determined by a node composition proportion and a preset node incidence relation, wherein the node incidence relation comprises a connection node incidence key node and a key node incidence common node, so that balanced sampling of various nodes can be ensured, and the accuracy of node sampling is improved; the information propagation path graph is used for presenting the propagation path of the information in the network and is composed of collected candidate sampling nodes and initial propagation nodes, so that the accuracy of information propagation path analysis is guaranteed.

Drawings

In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow diagram for one embodiment of an information propagation path analysis method according to the present application;

FIG. 3 is a schematic diagram of a partitioned community network in one embodiment;

FIG. 4 is a flow diagram for one embodiment of step 204 of FIG. 2;

FIG. 5 is a flowchart of one embodiment of step 2042 of FIG. 4;

FIG. 6 is a schematic diagram of an information propagation path graph generated in one embodiment;

fig. 7 is a schematic configuration diagram of an embodiment of an information propagation path analyzing apparatus according to the present application;

FIG. 8 is a schematic block diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving picture expert group Audio Layer III, motion picture expert compression standard Audio Layer 3), MP4 players (Moving picture expert group Audio Layer IV, motion picture expert compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the

terminal devices

101, 102, 103.

It should be noted that the information propagation path analysis method provided in the embodiments of the present application is generally executed by a server, and accordingly, the information propagation path analysis device is generally disposed in the server.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continuing reference to FIG. 2, a flow diagram of one embodiment of an information propagation path analysis method in accordance with the present application is shown. The information propagation path analysis method comprises the following steps:

step 201, acquiring the propagation record information of the network.

In this embodiment, an electronic device (for example, a server shown in fig. 1) on which the information propagation path analysis method operates may acquire the propagation record information of the network by a wired connection manner or a wireless connection manner. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.

The propagation record information may be information for recording a propagation condition of certain information among nodes in the network. The propagation record information may include node identification, node type identification, propagation relationships between nodes, and propagation times. The propagation record information may further include other attribute information of the node, for example, when a person is used as a node in the network, the propagation record information may further include information such as the sex and age of the person. The propagation relationship may record whether information propagation occurs between nodes and a propagation direction when the information propagation occurs.

Specifically, the propagation record server stores a portion of the propagation record information, such as the node identification. The propagation recording server monitors and records information propagation in the network to obtain propagation recording information in the aspect of information propagation. And the propagation record server summarizes the propagation record information and sends the information to a server for executing information propagation path analysis.

The server for performing the information propagation path analysis and the propagation record server may be the same server or different servers.

Table 1 and table 2 are propagation record information in an embodiment, and specifically, referring to table 1, a customer in the marketing campaign is used as a node in the network, a customer ID (Identity document) is used as a node identifier, and gender and age are attribute information of the node, and a node type identifier is further included. Table 2 records the propagation relationships and propagation times between nodes.

Node identification (customer ID)	Sex	Age (age)	Node type identification
				Id1	male	23	Key node
Id2	female	25	Common node

TABLE 1

Propagation node	Propagated node	Propagation time
			Id1	Id2	2020-02-04 12:00:00

TABLE 2

In one embodiment, the propagation record information may be stored in a database from which the server retrieves the propagation record information. It is emphasized that, in order to further ensure the privacy and security of the propagation record information, the propagation record information may also be stored in a node of a block chain.

When people are used as nodes in the network and people are used as carriers to realize information transmission, the transmission recorded information can be counted manually and then uploaded to a server.

The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

Step 202, dividing the network into at least one community network according to the propagation record information, and determining candidate sampling nodes and initial propagation nodes in each community network.

Where the social networks may be sub-networks of networks, each social network propagates information independently of the other. The initial propagation node may be a node in the social network that generates the propagation operation earliest, and nodes other than the initial propagation node in the social network may be candidate sampling nodes.

Specifically, the server divides the network into at least one community network according to the propagation record information, and nodes in the same community network can be connected through a propagation relation to form a closed sub-network; nodes of different social networks cannot be linked by a propagation relationship.

And the server searches a node with the earliest propagation time in each community network according to the propagation time in the propagation record information, takes the searched node as an initial propagation node of each community network, and takes a node which is not the initial propagation node in each community network as a candidate sampling node.

In addition, the initial propagation node is not the origin of information propagation in the entire network. The origin node of the information propagation is the "producer" or "publisher" of the information, and the origin node propagates the information to the initial propagation node. The above description of the social network propagating information independently of each other holds true without considering the origin node.

For example, a mobile phone manufacturer a issues a marketing microblog of a mobile phone at an official microblog 'a mobile phone', and the 'a mobile phone' is an origin node in the whole network; the mobile phone A transmits the microblog to a plurality of initial transmission nodes, and the initial transmission nodes continue to transmit the microblog. In the present application, the origin node "a handset" is not analyzed.

And 203, calculating the node composition proportion of the connecting node, the key node and the common node in the candidate sampling node according to the node type identification of the candidate sampling node.

The node type identifier is used for representing the propagation characteristics of the node, and the candidate sampling nodes can be divided into connection nodes, key nodes and common nodes according to the propagation characteristics of the node. The node composition ratio may be a ratio of the connection node, the key node, and the common node in the candidate sampling node.

Specifically, the server reads node type identifiers of the candidate sampling nodes from the propagation record information, and the node type identifiers represent the propagation characteristics of the nodes during information propagation. The candidate sampling nodes are divided into connection nodes (connectors), Key nodes (KOL, i.e. Key Opinion leaders) and common nodes (normal) according to the node type identifiers.

The key nodes are key opinion leaders in the network, information can be widely spread in the network, the key nodes can be considered to spread the information to a plurality of common nodes, and the common nodes can measure the spreading capacity of the key nodes. The connecting node may act as an intermediary for information dissemination, propagating the information to the key nodes, thereby causing widespread dissemination of the information in the network.

The server counts the respective node numbers of the connecting node, the key node and the common node in the candidate sampling nodes according to the node type identification, and adds the node numbers of the three types of candidate sampling nodes to obtain the total number of the candidate sampling nodes, so that the proportion of the connecting node, the key node and the common node in the candidate sampling nodes, namely the node composition proportion, is obtained through calculation.

In an embodiment, before step 203, the method may further include: determining the node propagation quantity of each candidate sampling node according to the propagation record information; determining the node type of each candidate sampling node according to the node propagation number; and adding node type identifications corresponding to respective node types to the candidate sampling nodes.

Wherein the node propagation number may be the number of nodes to which the candidate sampling node propagates the information.

Specifically, the server counts the number of times of information propagation performed by each candidate sampling node from the propagation record information, so as to obtain the node propagation number of each candidate sampling node.

The server acquires a preset propagation number threshold value, determines candidate sampling nodes with the node propagation number larger than the propagation number threshold value as key nodes, and adds node type identifiers of the key nodes; searching nodes which transmit information to the key nodes from the nodes of the non-key nodes, determining the searched nodes as connection nodes, and adding node type identifiers of the connection nodes; and finally, determining the candidate sampling nodes which are not the key nodes and the connection nodes as common nodes, and adding the node type identifiers of the common nodes.

In one embodiment, the node type identification may also be added by the propagation record server. And the propagation record server counts the node propagation quantity of each node before uploading the propagation record information, and adds a node type identifier to each node according to a propagation quantity threshold value. After the community network is divided, the server sets the node type identification of the initial propagation node as invalid, the nodes do not participate in the calculation of the node composition proportion any more, and the node composition proportion is calculated only according to the node type identification of the candidate sampling node; the server can also add the identification of the initial propagation node to the initial propagation node after dividing the community network.

In the embodiment, the node propagation number of the candidate sampling node is obtained from the propagation record information, the node propagation number reflects the capability of the node propagation information, the node type of the candidate sampling node can be accurately determined according to the node propagation number, and the accuracy of the node composition proportion calculation is ensured.

Step 204, determining a sampling strategy based on the calculated node composition proportion and a preset node incidence relation; the node association relation comprises that the connecting node is associated with a key node, and the key node is associated with a common node.

The sampling strategy is used for instructing the server to carry out node sampling. Node associations may represent a tendency or trend of information to propagate among different types of nodes, including but not limited to: the connecting nodes are associated with key nodes, and the key nodes are associated with common nodes.

Specifically, the initial propagation node is a starting point of information propagation in the community network, and the server starts sampling from the initial propagation node. The initial propagation node may set a full acquisition or may set a partial acquisition.

The node association relationship comprises: the initial propagation node is associated with a connection node, a key node and a common node; the connecting node is associated with the key node; the key node is associated with the common node; the common node is associated with the common node. A node of one type has a tendency or trend to propagate information to other types of nodes with which it is associated.

Collecting initial propagation nodes during one round of sampling; according to the node incidence relation, the collection of the connection node, the key node and the common node during two rounds of sampling can be determined; collecting key nodes and common nodes during three-wheel sampling; and collecting common nodes during four-wheel sampling.

In each sampling round, the server may use a relative ratio of node composition ratios of the connection node, the key node, and the common node in the candidate sampling nodes as a sampling ratio of the connection node, the key node, and the common node.

And the server takes the type of the collected nodes and the corresponding sampling proportion in each sampling as a sampling strategy.

And step 205, sampling the initial propagation node and the candidate sampling node according to a sampling strategy.

Specifically, the server randomly samples the initial propagation nodes and the candidate sampling nodes in each community network according to a sampling strategy.

In random sampling, if a node is not acquired, the propagation path associated with the node is discarded, i.e., other nodes extending from the node are not considered in the sampling.

And step 206, carrying out visual presentation on the collected initial propagation nodes and the candidate sampling nodes to generate an information propagation path diagram.

The information propagation path graph is a graph which shows the propagation path and the propagation trend of information in the network through partial nodes in the network.

Specifically, the server creates a blank initial path graph, and marks the acquired initial propagation nodes and candidate sampling nodes in the initial path graph. The initial propagation node may be marked in the center of the initial path graph, and then the connection node, the key node, and the common node in the candidate sampling nodes are marked in order according to the propagation record information. And finally, the server connects the initial propagation node with various candidate sampling nodes to obtain an information propagation path diagram. The information propagation path graph can also comprise node identifications of the initial propagation node and the candidate sampling node.

In the embodiment, a network is divided into at least one community network according to propagation record information, each community network propagates information independently, and candidate sampling nodes and initial propagation nodes in each community network are determined; and calculating the node composition proportion of three nodes, namely a connecting node, a key node and a common node in the candidate sampling nodes according to the node type identification of the candidate sampling nodes. When an initial propagation node or a candidate sampling node is collected, sampling is carried out according to a sampling strategy, the sampling strategy is comprehensively determined by a node composition proportion and a preset node incidence relation, wherein the node incidence relation comprises a connection node incidence key node and a key node incidence common node, so that balanced sampling of various nodes can be ensured, and the accuracy of node sampling is improved; the information propagation path graph is used for presenting the propagation path of the information in the network and is composed of collected candidate sampling nodes and initial propagation nodes, so that the accuracy of information propagation path analysis is guaranteed.

Further, the step 202 may include: initializing the node identification of each node in the propagation record information into a community label of each node; for each node in the network, determining a minimum community label from community labels corresponding to the node and adjacent nodes of the node; updating the community label of the node to the determined minimum community label so as to update the community label of each node in an iterative manner; when the community label of each node is not changed any more, dividing the network into at least one community network according to the community label of each node; and according to the propagation time in the propagation record information, determining a node with the earliest propagation time in each community network as an initial propagation node of each community network, and determining a node without the earliest propagation time in each community network as a candidate sampling node.

The node identifier may be an identifier of a node, and the node identifier may be a character string combined by letters, numbers, special symbols, and the like. The social label of a node may identify the social network to which the node belongs. When two nodes have a propagation relation, the two nodes are adjacent to each other. The propagation time may be the time at which the propagation operation occurred. The minimum community tag may be the smallest of the community tags of a node and its neighbors.

Specifically, the propagation record information includes node identifiers of the nodes, the node identifiers of the nodes are different from each other, and the server may initialize the node identifiers of the nodes to the community tags of the nodes. The nodes belong to the community network corresponding to the community label.

In one embodiment, the server randomly generates community tags for the nodes, the randomly generated community tags being different from each other.

After initialization, the social networks with the number equal to that of the nodes exist in the network, and iterative updating is needed to be carried out on the social tags so as to merge the social networks. For each node of the network, the community tags corresponding to the node and the adjacent nodes thereof are compared, so that the minimum community tag in the node and the adjacent nodes thereof is determined.

When the node identification is a number, selecting the node identification with the minimum value as a minimum community label in a numerical value comparison mode; when the node identifier is a character string, a single character or a character string is subjected to relatively large and small operations according to the dictionary order, the size of an ASCII (American Standard Code for Information exchange) Code value may be used as a Standard for character comparison, and the node identifier having the smallest ASCII Code value may be selected as the smallest community tag.

And each node updates the determined minimum community label to the community label of the node when each round of iterative update is carried out. And after each iteration update is finished, determining the minimum community label from the community labels of the node and the adjacent nodes thereof again, and then performing the next iteration update.

And when the community labels of all the nodes are not changed any more in the iterative updating, dividing the network into at least one community network according to the community labels of the nodes. In the same community network, all nodes have the same community label, and the nodes with different community labels are divided into different community networks.

And the server searches the node with the earliest propagation time in each community network according to the propagation time in the propagation record information, determines the searched node as the initial propagation node of the community network where the node is located, and sets the rest nodes in the community network as candidate sampling nodes. Each social network has an initial propagation node.

For example, FIG. 3 below is a diagram illustrating the partitioning of a social network, in one embodiment. For each node in the left network in fig. 3, the node id of the node is set as its own community tag, and then the community tag of the node identified as id0 is id0, the community tag of the node identified as id1 is id1, and so on.

For each node, before each iteration of updating, determining the minimum community label from the community labels of the node and the adjacent nodes. For the node id0, from the community tags id0, id1 and id3, the character string id0 has the smallest ASCII code value, and id0 is selected as the smallest community tag; similarly, for node ids 1 and 3, the minimum community tag is also id 0; and for node id2, the minimum community tag is id 1. Then at the first iteration update, the community tags of nodes id0, id1, and id3 are updated to id0, and the community tag of node id2 is updated to id 1.

In the second iteration, the minimum community labels of the nodes id0, id1 and id3 are not changed, the community labels of the three nodes are not changed, the community label of the node id2 is changed to id0, and the community labels of the nodes id0, id1, id2 and id3 are all the same.

The node ids 0, 1, 2 and 3 have no information propagation with other nodes in the network, such as the node id4, so the community tags of the nodes, such as the node id4, cannot affect the node ids 0, id1, id2 and id 3. At this time, the community tags of the nodes id0, id1, id2 and id3 do not change any more, and after the iteration is finished, the community tags are all id 0.

Similarly, when the nodes id4, id5, id6 and id7 are iterated, the community label is id 4; when the nodes id8, id9 and id10 are iterated, the community label is id 8. According to the community tags, the network can be divided into three community networks on the right in fig. 3.

In one embodiment, the network may be split into multiple social networks by the ConnectedComponent () function in the graph x module under the spark framework, with the input of the function being the propagation record information in tables 1 and 2. Spark is a fast, general-purpose computing engine designed specifically for large-scale data processing; GraphX is the component in Spark for graph and graph computation; the connectidcomponent () function, i.e., the link algorithm, is used for discovery of the social network in the network.

In the embodiment, community tags are added to all nodes, and the minimum community tag is determined from the community tags corresponding to the nodes and the adjacent nodes of the nodes; updating the community label of the node to the determined minimum community label so as to update the community label of each node in an iterative manner; in the iterative updating, the nodes in the same community network lead the community labels of the nodes to tend to be the same through the propagation relation among the nodes, so that the network can be accurately divided into the community networks according to the community labels when the iterative updating is finished, and meanwhile, the initial propagation nodes and the candidate sampling nodes in the community networks can be accurately positioned according to the propagation time, thereby ensuring the accuracy of node sampling.

Further, as shown in fig. 4, the step 204 may include:

step 2041, the sampling ratio of the initial propagation node in a round of sampling is set according to the preset sampling ratio.

The sampling ratio may be a ratio of the sampling number of a certain type of node in each sampling round to the sampling number of the sampling round. The preset sampling proportion can be the proportion of the collected initial propagation nodes in all the initial propagation nodes in a preset sampling round.

Specifically, the server starts sampling from the initial propagation node, and the initial propagation node is collected in one round of sampling. The server can collect all initial propagation nodes and can also read the preset sampling proportion, and the sampling proportion of the initial propagation nodes in one round of sampling is set as the preset sampling proportion.

In order to view the information propagation path from a global perspective, all the initial propagation nodes are preferably collected, i.e., the sampling ratio of the initial propagation nodes in a round of sampling is preferably set to 1.

Step 2042, based on the calculated node composition ratio and the preset node incidence relation, determining the sampling ratio of the connection node, the key node and the common node in each round of sampling after one round of sampling.

In each round of sampling after one round of sampling, the server acquires candidate sampling nodes, and the types of the candidate sampling nodes acquired in each round of sampling can be different. And the server determines the type of the candidate sampling node involved in each round of sampling after one round of sampling according to the node incidence relation.

Based on the node incidence relation, the connection node, the key node and the common node which are associated with the initial propagation node and collected in the two rounds of sampling can be determined; during three-round sampling, collecting key nodes associated with the connection nodes collected in the two-round sampling, and collecting common nodes associated with the key nodes collected in the two-round sampling; and when four rounds of sampling are carried out, acquiring common nodes related to key nodes acquired in the three rounds of sampling.

And after one round of sampling, calculating the sampling proportion of the connecting node, the key node and the common node in each round of sampling according to the type of the candidate sampling node related in each round of sampling and the node composition proportion of the connecting node, the key node and the common node.

For example, it is assumed that the node composition ratio of the connection node in the candidate sampling node is 5%, the node composition ratio of the key node is 5%, and the node composition ratio of the common node is 90%. During two-round sampling, the connection nodes, the key nodes and the common nodes are collected according to the proportion of 1:1:18 (5%: 5%: 90%); during three-wheel sampling, the key nodes and the common nodes are collected according to the proportion of 1:18 (5%: 90%).

It is stated here that the incidence relation between the connection node and the common node and between the common node and the common node can be ignored in the actual sampling because the number of the common nodes to which the connection node and the common node directly propagate is small. Meanwhile, four rounds of sampling correspond to four rounds of information propagation, and after the four rounds of information propagation, the information propagation is weak, so that analysis can be omitted.

Step 2043, the sampling proportion of the nodes in each sampling round is determined as a sampling strategy.

Specifically, the server takes the types of the nodes needing to be collected in four rounds of sampling and the corresponding sampling proportion as a sampling strategy. The sampling strategy indicates that the server collects initial propagation nodes and various candidate sampling nodes from each community network.

In the embodiment, the sampling proportion of the initial propagation node in one round of sampling is determined according to the preset sampling proportion; determining sampling proportions of a connection node, a key node and a common node in the candidate sampling nodes in each sampling round after one sampling round according to the node composition proportion and the node incidence relation, thereby obtaining a sampling strategy; the determination of the sampling strategy integrates the node incidence relation and the node composition proportion, and the balanced sampling of various nodes is ensured.

Further, as shown in fig. 5, the step 2042 may include:

step 20421, based on the calculated node composition ratio and the preset node incidence relation, initializing the sampling ratio of the connection node, the key node and the common node in each round of sampling after one round of sampling.

Specifically, the server determines the type of candidate sampling nodes needing to be acquired in each round of sampling after one round of sampling according to a preset node incidence relation, and initializes the sampling proportion in each round of sampling.

And during initialization, directly taking the relative ratio of the node composition proportions of the connecting node, the key node and the common node in the candidate sampling nodes as the sampling proportions of the connecting node, the key node and the common node.

Step 20422, the node composition ratio is compared to a preset ratio threshold.

The ratio threshold may be a preset node composition ratio threshold, and is used to determine whether the sampling ratios of the connection node, the key node, and the common node need to be adjusted.

Specifically, the server obtains a preset proportion threshold value, and compares the node composition proportion of the connection node, the key node and the common node with the proportion threshold value respectively. When the node composition ratio smaller than the ratio threshold value exists, the ratio of some candidate sampling nodes is small, and some important candidate sampling nodes may be missed in random sampling, so that the related propagation path is missed.

And 20423, when the node composition proportion smaller than the proportion threshold exists, adjusting the sampling proportion obtained by initialization according to a preset adjusting value.

The preset adjustment value may be a preset sampling ratio adjustment value, and the sampling ratio may be adjusted by adding or subtracting.

Specifically, when there is a node composition ratio smaller than the ratio threshold, the server needs to adjust the sampling ratio obtained by initialization. During adjustment, the server obtains a preset adjustment value, adjusts the sampling proportion obtained by initialization according to the preset adjustment value, and obtains the final sampling proportion.

For example, assume that the node composition ratio is the connection node: key nodes: common node is 9%: 1%: 90%, the proportion threshold value is 3%, and the node composition proportion of the key nodes is smaller than the proportion threshold value, which indicates that the number of the key nodes is less. If the node composition proportion is directly used as the sampling proportion, the sampling proportion is the connection node during two rounds of sampling: key nodes: common node is 9%: 1%: 90 percent. If there is a critical node associated with a large number of common nodes and the critical node is missed in a random sampling, the more important propagation path is lost. Therefore, the sampling proportion of the key node may be increased according to the preset adjustment value, for example, if the preset adjustment value is 2%, the sampling proportion of the key node is adjusted to 3%.

When the sampling ratio of the connection node and the normal node is reduced, it may be preferable to reduce the sampling ratio of the normal node because the normal node generally occupies the maximum number. The sampling proportion is the connection node: key nodes: common node is 9%: 3%: 88 percent. Similarly, if the node composition proportion of the key node is 2%, the sampling proportion of the key node can be adjusted to 4% in the two-round sampling; if the node composition ratio of the key node is 0.5%, the sampling ratio of the key node can be adjusted to 4.5% by up-regulation twice during two sampling rounds, which is not exemplified herein.

In this embodiment, after initializing sampling ratios of the connection node, the key node, and the common node, comparing the node composition ratio with a preset ratio threshold, and when there is a node composition ratio smaller than the ratio threshold, adjusting the sampling ratio obtained by initialization according to a preset adjustment value in order to ensure that the type of node is collected as much as possible, so as to improve sampling balance.

Further, the step 205 may include: when one round of sampling is carried out, sampling is carried out on the initial propagation node according to a sampling strategy; when sampling after one round of sampling is carried out, inquiring candidate sampling nodes transmitted by the nodes acquired in the previous round of sampling according to the transmission record information; and sampling the inquired candidate sampling nodes according to a sampling strategy.

Specifically, the server needs to perform multiple sampling rounds, and when sampling is performed in one round, the initial propagation nodes are collected according to a sampling strategy.

When sampling after one round of sampling is carried out, the candidate sampling nodes transmitted by the nodes collected in the previous round of sampling are inquired according to the transmission record information.

After the candidate sampling nodes are inquired, counting the number of the inquired candidate sampling nodes; and then according to the sampling proportion in the sampling strategy, calculating the sampling number of various candidate sampling nodes in the sampling, and then sampling the inquired candidate sampling nodes according to the calculated sampling number.

When two rounds of sampling are carried out, the candidate sampling nodes related to the acquired initial propagation nodes are firstly inquired, the number of the candidate sampling nodes is counted, and the transmissible number of the two rounds of sampling is obtained. And calculating the sampling number of the connection nodes, the key nodes and the common nodes in the two-round sampling according to the sampling proportion of the two-round sampling recorded in the sampling strategy. And randomly acquiring the connection nodes, the key nodes and the common nodes according to the sampling number.

When three-round sampling is carried out, the key nodes and the common nodes are randomly acquired according to a sampling strategy in the candidate sampling nodes associated with the connection nodes and the key nodes acquired in the two-round sampling.

And when four-wheel sampling is carried out, randomly collecting common nodes in candidate sampling nodes related to the key nodes collected in the three-wheel sampling according to a sampling strategy.

In the embodiment, an initial propagation node is sampled according to a sampling strategy; when sampling after one round of sampling is carried out, candidate sampling nodes transmitted by the nodes acquired in the previous round of sampling are inquired according to the propagation record information, and then the inquired candidate sampling nodes are sampled according to the sampling strategy, so that the acquired nodes are all related to the nodes acquired in the previous round of sampling, and the continuity of the information propagation path is ensured.

Further, the step 206 may include: adding the collected initial propagation nodes and the candidate sampling nodes into the initial path graph according to the propagation record information; setting display modes of connection nodes, key nodes and common nodes in an initial propagation node and candidate sampling nodes in an initial path graph; and connecting the set initial propagation node, the connection node, the key node and the common node to obtain the information propagation path graph.

Specifically, the server creates a blank initial path graph, and adds the collected initial propagation nodes and candidate sampling nodes to the initial path graph. The server may arrange the initial propagation nodes and the candidate sampling nodes according to the propagation record information so that the nodes having the direct propagation relationship have a closer distribution in the initial path graph.

In order to more intuitively and obviously highlight the information propagation path, the server sets different types of nodes into different display modes. The display mode comprises color display and shape display. When color display is adopted, the initial propagation node, the connection node, the key node and the common node can be set to be different colors. For example, the initial propagation nodes may be represented by red dots and the connection nodes by blue dots. When the shape display is adopted, the initial propagation node, the connection node, the key node and the common node can be set to be different shapes. For example, the initial propagation nodes may be represented by circles and the connection nodes by triangles.

And the server is connected with the initial propagation node, the connection node, the key node and the common node which are set in the initial path graph, so that the information propagation path graph is obtained.

The server can store the information propagation path map in a database, can also send the information propagation path map to a specified terminal for displaying, and can also upload the information propagation path map to a block chain for storage.

In one embodiment, the information propagation path graph can be generated by utilizing gephi (JVM-based complex network analysis software mainly used for interactive visualization and detection open source tools of various networks and complex systems, dynamic graphs and hierarchical graphs). And the server inputs the collected propagation record information between the initial propagation node and the candidate sampling node into gephi, and the gephi generates an information propagation path graph.

In the embodiment, the acquired initial propagation nodes and the candidate sampling nodes are added into the initial path graph according to the propagation record information, and the connection nodes, the key nodes and the common nodes in the initial propagation nodes and the candidate sampling nodes are set to be different in display modes for distinguishing, so that the generated information propagation path graph can accurately and clearly display the information propagation path.

FIG. 6 is a diagram of information propagation paths generated in one embodiment. Specifically, referring to fig. 6, the graph center is the publisher origin node of the information; the solid circles in the graph are initial propagation nodes; the hollow circle is a connecting node; the solid square is a key node; the hollow squares are common nodes. Fig. 6 can show the propagation trend and propagation path of certain information in the network through the collected nodes.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

With further reference to fig. 7, as an implementation of the method shown in fig. 2, the present application provides an embodiment of an information propagation path analyzing apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be applied to various electronic devices.

As shown in fig. 7, the information propagation path analyzing apparatus 300 according to the present embodiment includes: an information obtaining module 301, a network dividing module 302, a proportion calculating module 303, a policy determining module 304, a node sampling module 305, and a path generating module 306, wherein:

an information obtaining module 301, configured to obtain propagation record information of a network. It is emphasized that, in order to further ensure the privacy and security of the propagation record information, the propagation record information may also be stored in a node of a block chain.

The network dividing module 302 is configured to divide the network into at least one community network according to the propagation record information, and determine candidate sampling nodes and initial propagation nodes in each community network.

And the proportion calculation module 303 is configured to calculate a node composition proportion of a connection node, a key node, and a common node in the candidate sampling nodes according to the node type identifier of the candidate sampling node.

A policy determination module 304, configured to determine a sampling policy based on the calculated node composition ratio and a preset node association relationship; the node association relation comprises that the connecting node is associated with a key node, and the key node is associated with a common node.

And a node sampling module 305, configured to sample the initial propagation node and the candidate sampling node according to a sampling strategy.

And a path generating module 306, configured to perform visual presentation on the acquired initial propagation nodes and the candidate sampling nodes, and generate an information propagation path graph.

In some optional implementations of this embodiment, the network partitioning module 302 includes: the system comprises a tag adding submodule, a minimum determining submodule, a tag updating submodule, a network dividing submodule and a node determining submodule, wherein:

and the label adding submodule is used for initializing the node identification of each node in the propagation record information into the community label of each node.

And the minimum determining submodule is used for determining the minimum community label from the community labels corresponding to the nodes and the adjacent nodes of the nodes for each node in the network.

And the label updating submodule is used for updating the community label of the node to the determined minimum community label so as to update the community label of each node in an iterative manner.

And the network dividing submodule is used for dividing the network into at least one community network according to the community label of each node when the community label of each node is not changed any more.

And the node determining submodule is used for determining a node with the earliest propagation time in each community network as an initial propagation node of each community network according to the propagation time in the propagation record information, and determining a node without the earliest propagation time in each community network as a candidate sampling node.

In some optional implementations of the present embodiment, the information propagation path analyzing apparatus 300 further includes: the device comprises a quantity determining module, a type determining module and an identification adding module, wherein:

and the quantity determining module is used for determining the node propagation quantity of each candidate sampling node according to the propagation record information.

And the type determining module is used for determining the node type of each candidate sampling node through the node propagation quantity.

And the identifier adding module is used for adding node type identifiers corresponding to respective node types to the candidate sampling nodes.

In some optional implementations of this embodiment, the policy determining module 304 includes: a round of setting submodule, a proportion determining submodule and a strategy determining submodule, wherein:

and the one-round setting submodule is used for setting the sampling proportion of the initial propagation node in one-round sampling according to the preset sampling proportion.

And the proportion determining submodule is used for determining the sampling proportion of the connection node, the key node and the common node in each round of sampling after one round of sampling based on the calculated node composition proportion and the preset node incidence relation.

And the strategy determination submodule is used for determining the sampling proportion of the nodes in each determined sampling round as a sampling strategy.

In some optional implementations of this embodiment, the ratio determining submodule includes: proportion initialization unit, proportion comparison unit and proportion adjustment unit, wherein:

and the proportion initialization unit is used for initializing the sampling proportion of the connection node, the key node and the common node in each round of sampling after one round of sampling based on the calculated node composition proportion and the preset node incidence relation.

And the proportion comparison unit is used for comparing the node composition proportion with a preset proportion threshold value.

And the proportion adjusting unit is used for adjusting the sampling proportion obtained by initialization according to a preset adjusting value when the node composition proportion smaller than the proportion threshold exists.

In some optional implementations of this embodiment, the node sampling module 305 includes: the system comprises an initial sampling submodule, a node query submodule and a node sampling submodule, wherein:

and the initial sampling submodule is used for sampling the initial propagation node according to a sampling strategy when one round of sampling is carried out.

And the node query submodule is used for querying candidate sampling nodes transmitted to the nodes acquired in the previous sampling according to the transmission record information when sampling is performed after one sampling.

And the node sampling submodule is used for sampling the inquired candidate sampling nodes according to a sampling strategy.

In some optional implementations of this embodiment, the path generating module 306 includes: node adds submodule piece, shows and sets up submodule piece and nodal connection submodule piece, wherein:

and the node adding submodule is used for adding the acquired initial propagation node and the candidate sampling node into the initial path graph according to the propagation record information.

And the display setting submodule is used for setting display modes of the initial propagation node in the initial path graph and the connection node, the key node and the common node in the candidate sampling node.

And the node connection submodule is used for connecting the set initial propagation node, the connection node, the key node and the common node to obtain an information propagation path diagram.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 8, fig. 8 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only computer device 4 having components 41-43 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable gate array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a flash Card (FlashCard), and the like, which are provided on the computer device 4. Of course, the memory 41 may also include both internal and external storage devices of the computer device 4. In this embodiment, the memory 41 is generally used for storing an operating system installed in the computer device 4 and various types of application software, such as program codes of the information propagation path analysis method. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute the program code stored in the memory 41 or process data, for example, execute the program code of the information propagation path analysis method.

The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.

The computer device provided in this embodiment may perform the steps of the information propagation path analysis method described above. Here, the steps of the information propagation path analysis method may be the steps in the information propagation path analysis methods of the respective embodiments described above.

The present application further provides another embodiment, which is to provide a computer-readable storage medium storing an information propagation path analysis program, which is executable by at least one processor to cause the at least one processor to perform the steps of the information propagation path analysis program as described above.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. An information propagation path analysis method, characterized by comprising the steps of:

acquiring the propagation record information of the network;

2. The information propagation path analysis method according to claim 1, wherein the step of dividing the network into at least one community network according to the propagation record information and determining candidate sampling nodes and initial propagation nodes in each community network specifically comprises:

initializing the node identification of each node in the propagation record information into a community label of each node;

for each node in the network, determining a minimum community tag from community tags corresponding to the node and adjacent nodes of the node;

updating the community label of the node to the determined minimum community label so as to iteratively update the community label of each node;

when the community label of each node is not changed any more, dividing the network into at least one community network according to the community label of each node;

and according to the propagation time in the propagation record information, determining a node with the earliest propagation time in each community network as an initial propagation node of each community network, and determining a node without the earliest propagation time in each community network as a candidate sampling node.

3. The information propagation path analysis method according to claim 1, wherein before the step of calculating a node composition ratio of a connection node, a key node, and a common node in the candidate sampling nodes by using the node type identifiers of the candidate sampling nodes, the method further comprises:

determining the node propagation quantity of each candidate sampling node according to the propagation record information;

determining the node type of each candidate sampling node according to the node propagation number;

and adding node type identifications corresponding to respective node types to the candidate sampling nodes.

4. The information propagation path analysis method according to claim 1, wherein the step of determining a sampling strategy based on the calculated node composition ratio and a preset node association relation specifically includes:

setting the sampling proportion of the initial propagation node in one round of sampling according to a preset sampling proportion;

determining sampling proportions of the connection node, the key node and the common node in each round of sampling after one round of sampling based on the calculated node composition proportion and a preset node incidence relation;

and determining the sampling proportion of the nodes in each sampling round as a sampling strategy.

5. The information propagation path analysis method according to claim 4, wherein the step of determining the sampling proportion of the connection node, the key node, and the common node in each round of sampling after one round of sampling based on the calculated node composition proportion and a preset node association relationship specifically includes:

initializing sampling proportions of the connection node, the key node and the common node in each round of sampling after one round of sampling based on the calculated node composition proportion and a preset node incidence relation;

comparing the node composition ratio with a preset ratio threshold;

and when the node composition proportion smaller than the proportion threshold exists, adjusting the sampling proportion obtained by initialization according to a preset adjustment value.

6. The information propagation path analysis method according to claim 4, wherein the step of sampling the initial propagation node and the candidate sampling node according to the sampling policy specifically includes:

when sampling for one round, sampling the initial propagation node according to the sampling strategy;

when sampling is carried out after one round of sampling, inquiring candidate sampling nodes transmitted by the nodes acquired in the previous round of sampling according to the transmission record information;

and sampling the inquired candidate sampling nodes according to the sampling strategy.

7. The information propagation path analysis method according to claim 1, wherein the step of visually presenting the collected initial propagation node and the collected candidate sampling node and generating the information propagation path map specifically includes:

adding the collected initial propagation nodes and candidate sampling nodes into an initial path graph according to the propagation record information;

setting display modes of a connection node, a key node and a common node in an initial propagation node and a candidate sampling node in the initial path graph;

and connecting the set initial propagation node, the connection node, the key node and the common node to obtain an information propagation path diagram.

8. An information propagation path analysis device, comprising:

9. A computer device comprising a memory in which a computer program is stored and a processor which, when executing the computer program, implements the steps of the information propagation path analysis method according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which, when being executed by a processor, carries out the steps of the information propagation path analysis method according to any one of claims 1 to 7.