WO2021258998A1

WO2021258998A1 - Information propagation path analysis method and apparatus, and computer device and storage medium

Info

Publication number: WO2021258998A1
Application number: PCT/CN2021/096857
Authority: WO
Inventors: 曹合心; 蔡健
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-06-24
Filing date: 2021-05-28
Publication date: 2021-12-30
Also published as: CN111814065A; CN111814065B

Abstract

An information propagation path analysis method, comprising: acquiring propagation record information of a network (201); dividing the network into at least one community network according to the propagation record information, and determining candidate sampling nodes and an initial propagation node in each community network (202); calculating the node composition ratio of connection nodes, key nodes and ordinary nodes in the candidate sampling nodes by means of node type identifiers of the candidate sampling nodes (203); determining a sampling policy on the basis of the calculated node composition ratio and a preset node association relationship, wherein the node association relationship comprises the connection nodes being associated with the key nodes and the key nodes being associated with the ordinary nodes (204); sampling the initial propagation nodes and the candidate sampling nodes according to the sampling policy (205); and performing visual presentation on collected initial propagation nodes and candidate sampling nodes to generate an information propagation path map (206). In addition, the propagation record information can be stored in a blockchain. By means of the method, the accuracy of information propagation path analysis is improved.

Description

Information propagation path analysis method, device, computer equipment and storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 24, 2020, the application number is 202010592379.5, and the invention title is "information propagation path analysis method, device, computer equipment and storage medium". The entire content of the application is approved The reference is incorporated in this application.

Technical field

This application relates to big data, and in particular to an information propagation path analysis method, device, computer equipment and storage medium.

Background technique

In big data, people or computers can form a network through complex connections. People or computers can be regarded as nodes in the network, and data and information can be transmitted in the network. With the development of Internet technology, it is often necessary to analyze the spread of information on the Internet. For example, in a marketing network, by analyzing the spreading path of a product in the network, product information can be spread in a wider range on the network at a lower cost. The spread of viruses and fraudulent links in the epidemic can all be analyzed through the Internet.

There is an important node KOL (Key Opinion Leader) in the network. KOL has more and more accurate information and is accepted or trusted by more relevant groups. KOL can spread information to more nodes in the network , And have a greater impact on the node. The analysis of the information dissemination path in the network is usually to sample KOL nodes, and present the sub-networks that reflect the trend of information dissemination in a visual way. However, the inventor realizes that traditional network propagation path analysis techniques usually only manually sample pre-defined KOLs. The type of sampling node is single, and the sampling node needs to be manually adjusted in different scenarios; or all nodes need to be adjusted manually. Differential random sampling, when the KOL in the node is relatively low, it is easy to miss the critical propagation path. It can be seen that the accuracy of traditional network propagation path analysis technology is low.

Summary of the invention

The purpose of the embodiments of the present application is to propose an information propagation path analysis method, device, computer equipment, and storage medium, so as to solve the problem of low accuracy of information propagation path analysis.

In order to solve the above technical problems, an embodiment of the present application provides an information propagation path analysis method, which adopts the following technical solutions:

Obtain information on the distribution record of the network;

Dividing the network into at least one social network according to the dissemination record information, and determining candidate sampling nodes and initial dissemination nodes in each social network;

Calculating the node composition ratio of the connecting node, the key node and the common node in the candidate sampling node through the node type identification of the candidate sampling node;

The sampling strategy is determined based on the calculated node composition ratio and the preset node association relationship; the node association relationship includes connecting nodes associated with key nodes, and key nodes associated with ordinary nodes;

Sampling the initial propagation node and the candidate sampling node according to the sampling strategy;

Visually present the collected initial propagation nodes and candidate sampling nodes to generate an information propagation path diagram.

In order to solve the above technical problems, an embodiment of the present application also provides an information propagation path analysis device, including:

The information acquisition module is used to acquire the dissemination record information of the network;

A network division module, configured to divide the network into at least one social network according to the dissemination record information, and determine candidate sampling nodes and initial dissemination nodes in each social network;

A ratio calculation module, configured to calculate the node composition ratio of connecting nodes, key nodes, and ordinary nodes in the candidate sampling node through the node type identification of the candidate sampling node;

The strategy determination module is used to determine the sampling strategy based on the calculated node composition ratio and the preset node association relationship; the node association relationship includes connecting nodes associated with key nodes, and key nodes associated with ordinary nodes;

A node sampling module, configured to sample the initial propagation node and the candidate sampling node according to the sampling strategy;

The path generation module is used to visually present the collected initial propagation nodes and candidate sampling nodes to generate an information propagation path graph.

In order to solve the foregoing technical problems, an embodiment of the present application further provides a computer device, including a memory and a processor. The memory stores computer-readable instructions. When the processor executes the computer-readable instructions, the following steps are implemented:

Obtain information on the distribution record of the network;

In order to solve the foregoing technical problems, embodiments of the present application also provide a computer-readable storage medium, which stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the following steps are implemented:

Obtain information on the distribution record of the network;

Compared with the prior art, the embodiments of this application mainly have the following beneficial effects: first divide the network into at least one social network according to the dissemination record information, each social network disseminates information independently of each other, and determines candidate sampling nodes and each community The initial propagation node in the group network; according to the node type identification of the candidate sampling node, calculate the node composition ratio of the candidate sampling node, the connection node, the key node and the ordinary node. When collecting initial propagation nodes or candidate sampling nodes, sampling is performed according to the sampling strategy. The sampling strategy is comprehensively determined by the composition ratio of the nodes and the preset node association relationship. The node association relationship includes connecting nodes to key nodes and key nodes to ordinary nodes. This ensures that all kinds of nodes can be sampled in a balanced manner, and the accuracy of node sampling is improved; the information dissemination path graph is used to show the dissemination path of information in the network, and is composed of collected candidate sampling nodes and initial dissemination nodes. Ensure the accuracy of the analysis of the information propagation path.

Description of the drawings

In order to explain the solution in this application more clearly, the following will briefly introduce the drawings used in the description of the embodiments of the application. Obviously, the drawings in the following description are some embodiments of the application. Ordinary technicians can obtain other drawings based on these drawings without creative work.

Figure 1 is an exemplary system architecture diagram to which the present application can be applied;

Fig. 2 is a flowchart of an embodiment of an information propagation path analysis method according to the present application;

FIG. 3 is a schematic diagram of dividing a social network in an embodiment;

FIG. 4 is a flowchart of a specific implementation of step 204 in FIG. 2;

FIG. 5 is a flowchart of a specific implementation manner of step 2042 in FIG. 4;

Fig. 6 is a schematic diagram of an information propagation path diagram generated in an embodiment;

Fig. 7 is a schematic structural diagram of an embodiment of an information propagation path analysis device according to the present application;

Fig. 8 is a schematic structural diagram of an embodiment of a computer device according to the present application.

detailed description

Unless otherwise defined, all technical and scientific terms used herein have the same meanings as commonly understood by those skilled in the technical field of the application; the terms used in the specification of the application herein are only for describing specific embodiments. The purpose is not to limit the application; the terms "including" and "having" in the description and claims of the application and the above-mentioned description of the drawings and any variations thereof are intended to cover non-exclusive inclusions. The terms "first", "second", etc. in the specification and claims of this application or the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific sequence.

Reference to "embodiments" herein means that a specific feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described herein can be combined with other embodiments.

In order to enable those skilled in the art to better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings.

As shown in FIG. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 is used to provide a medium for communication links between the

terminal devices

101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.

The user can use the

terminal devices

101, 102, and 103 to interact with the server 105 through the network 104 to receive or send messages and so on. Various communication client applications, such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, and social platform software, can be installed on the

terminal devices

101, 102, and 103.

The

terminal devices

101, 102, 103 may be various electronic devices with a display screen and support web browsing, including but not limited to smart phones, tablets, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image experts compress standard audio layer 4) players, laptop portable computers and desktop computers, etc.

The server 105 may be a server that provides various services, for example, a background server that provides support for pages displayed on the

terminal devices

101, 102, and 103.

It should be noted that the information propagation path analysis method provided by the embodiments of the present application is generally executed by a server, and accordingly, the information propagation path analysis device is generally set in the server.

It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there can be any number of terminal devices, networks, and servers.

Continuing to refer to FIG. 2, there is shown a flowchart of an embodiment of an information propagation path analysis method according to the present application. The information propagation path analysis method includes the following steps:

Step 201: Obtain network propagation record information.

In this embodiment, the electronic device (for example, the server shown in FIG. 1) on which the information propagation path analysis method runs can obtain the propagation record information of the network through a wired connection or a wireless connection. It should be pointed out that the above wireless connection methods can include, but are not limited to, 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other currently known or future wireless connection methods .

Among them, the dissemination record information may be information that records the dissemination of a certain kind of information among nodes in the network. The propagation record information may include node identification, node type identification, propagation relationship between nodes, and propagation time. The dissemination record information may also include other attribute information of the node. For example, when a person is a node in the network, the dissemination record information may also include information such as the person's gender and age. The propagation relationship can record whether information dissemination occurs between nodes, and the direction of propagation when information dissemination occurs.

Specifically, the propagation record server stores a part of the propagation record information, such as the node identifier. The dissemination record server monitors and records the information dissemination in the network, and obtains the dissemination record information of the information dissemination. The dissemination record server aggregates dissemination record information and sends it to the server for performing information dissemination path analysis.

The server that performs information propagation path analysis and the propagation record server can be the same server or different servers.

Table 1 and Table 2 are the dissemination record information in an embodiment. Specifically, referring to Table 1, the customers in the marketing activities are used as nodes in the network, and the customer ID (Identity document, identity identification number) is used as the node identifier, gender, Age is the attribute information of the node, and also includes the node type identification. Table 2 records the propagation relationship and propagation time between nodes.

节点标识(客户ID)Node ID (Customer ID)	性别gender	年龄age	节点类型标识Node type identification
Id1Id1	malemale	23twenty three	关键节点Key node
Id2Id2	femalefemale	2525	普通节点Normal node

Table 1

传播节点Propagation node	被传播节点Propagated node	传播时间Propagation time
Id1Id1	Id2Id2	2020-02-0412:00:002020-02-0412:00:00

Table 2

In one embodiment, the propagation record information may be stored in a database, and the server obtains the propagation record information from the database. It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned propagation record information, the above-mentioned propagation record information may also be stored in a node of a blockchain.

When people are used as nodes in the network and people are used as carriers to realize information dissemination, the dissemination record information can be counted manually and then uploaded to the server.

The blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain is essentially a decentralized database. It is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Step 202: Divide the network into at least one social network according to the dissemination record information, and determine candidate sampling nodes and initial dissemination nodes in each social network.

Among them, the social network can be a sub-network of the network, and each social network disseminates information independently of each other. The initial dissemination node can be the node that produces the earliest dissemination operation in the social network, and nodes other than the initial dissemination node in the social network are used as candidate sampling nodes.

Specifically, the server divides the network into at least one social network based on the dissemination record information. Nodes in the same social network can be connected through a dissemination relationship to form a closed sub-network; nodes of different social networks cannot use the dissemination relationship related.

According to the propagation time in the propagation record information, the server finds the node with the earliest propagation time in each social network, and uses the searched node as the initial propagation node of each social network, and the node that is not the initial propagation node in each social network as a candidate Sampling node.

In addition, the initial dissemination node is not the origin of information dissemination in the entire network. The origin node of information dissemination is the "producer" or "publisher" of the information, and the origin node propagates the information to the initial dissemination node. The above description of social networks disseminating information independently of each other is established without considering the origin node.

For example, if mobile phone manufacturer A publishes a mobile phone marketing Weibo on its official Weibo "A mobile phone", "A mobile phone" is the origin node in the entire network; "A mobile phone" spreads Weibo to multiple initial spreads Node, the initial dissemination node continues to disseminate the Weibo. In this application, the origin node "A mobile phone" is not analyzed and processed.

Step 203: Calculate the node composition ratio of the connecting node, the key node and the ordinary node in the candidate sampling node through the node type identification of the candidate sampling node.

Among them, the node type identifier is used to characterize the propagation characteristics of the node, and the candidate sampling nodes can be divided into connection nodes, key nodes, and ordinary nodes according to the propagation characteristics of the nodes. The node composition ratio can be the proportion of connected nodes, key nodes and common nodes in the candidate sampling nodes.

Specifically, the server reads the node type identification of the candidate sampling node from the propagation record information, and the node type identification represents the propagation characteristics of the node when the information is propagated. Candidate sampling nodes are divided into connector, key opinion leader (KOL, namely key opinion leader) and normal node according to the node type identification.

Among them, key nodes are key opinion leaders in the network, which can cause widespread dissemination of information in the network. It can be considered that key nodes spread information to multiple ordinary nodes, and ordinary nodes can measure the dissemination ability of key nodes. Connecting nodes can be used as an intermediary for information dissemination, disseminating information to key nodes, thereby causing the widespread dissemination of information in the network.

The server counts the number of connection nodes, key nodes and ordinary nodes in the candidate sampling nodes according to the node type identification, and adds the number of nodes of the three types of candidate sampling nodes to obtain the total number of candidate sampling nodes, thereby calculating the connection nodes and key nodes. The proportion of nodes and ordinary nodes in the candidate sampling nodes, that is, the proportion of nodes.

In one embodiment, before step 203, it may further include: determining the number of node propagation of each candidate sampling node according to the propagation record information; determining the node type of each candidate sampling node through the number of node propagation; The node type identifier corresponding to the node type.

Wherein, the number of node propagation may be the number of nodes to which candidate sampling nodes will propagate information.

Specifically, the server counts the number of times each candidate sampling node performs information dissemination from the dissemination record information, so as to obtain the number of node disseminations of each candidate sampling node.

The server obtains the preset propagation quantity threshold, determines the candidate sampling node whose propagation quantity is greater than the propagation quantity threshold as a key node, and adds the node type identification of the key node; and then finds and propagates the information to the key node from the nodes of the non-key nodes The node that is found is determined as a connecting node, and the node type identifier of the connecting node is added; finally, the candidate sampling node that is neither a key node nor a non-connected node is determined as a normal node, and the node type identifier of the normal node is added.

In an embodiment, the node type identification may also be added by the propagation record server. The propagation record server counts the number of node propagations of each node before uploading the propagation record information, and adds a node type identifier to each node according to the propagation number threshold. After dividing the social network, the server sets the node type identification of the initial propagation node as invalid. These nodes no longer participate in the calculation of the node composition ratio, and only calculate the node composition ratio based on the node type identification of the candidate sampling nodes; the server can also divide the community After the group network, add the identification of the initial propagation node to the initial propagation node.

In this embodiment, the number of node propagations of candidate sampling nodes is obtained from the propagation record information. The number of node propagations reflects the ability of the node to propagate information. According to the number of node propagations, the node type of candidate sampling nodes can be accurately determined, ensuring the composition ratio of nodes Accuracy of calculation.

Step 204: Determine a sampling strategy based on the calculated node composition ratio and the preset node association relationship; the node association relationship includes connecting nodes associated with key nodes, and key nodes associated with ordinary nodes.

Among them, the sampling strategy is used to instruct the server to sample nodes. The node association relationship may indicate the tendency or trend of information spread among different types of nodes. The node association relationship includes but is not limited to: connecting nodes are associated with key nodes, and key nodes are associated with ordinary nodes.

Specifically, the initial dissemination node is the starting point of information dissemination in the social network, and the server starts sampling from the initial dissemination node. The initial propagation node can be set to collect all or part of it.

The node association relationship includes: the initial propagation node is associated with the connecting node, the key node and the ordinary node; the connecting node is associated with the key node; the key node is associated with the ordinary node; the ordinary node is associated with the ordinary node. One type of node has a tendency or tendency to spread information to other types of nodes with which it is associated.

The initial propagation node is collected during the first round of sampling; according to the node association relationship, the connection nodes, key nodes, and ordinary nodes can be determined during the second round of sampling; key nodes and ordinary nodes are collected during the third round of sampling; ordinary nodes are collected during the fourth round of sampling.

In each round of sampling, the server can use the relative ratio of the node composition ratio of the connection node, the key node and the ordinary node among the candidate sampling nodes as the sampling ratio of the connection node, the key node and the ordinary node.

The server regards the types of nodes collected and the corresponding sampling ratio in each round of sampling as the sampling strategy.

Step 205, sampling the initial propagation node and candidate sampling nodes according to the sampling strategy.

Specifically, the server randomly samples the initial propagation nodes and candidate sampling nodes in each social network according to the sampling strategy.

When performing random sampling, if a node is not collected, the propagation path related to the node is discarded, that is, other nodes extending from the node are no longer considered in the sampling.

Step 206: Visually present the collected initial propagation nodes and candidate sampling nodes to generate an information propagation path graph.

Among them, the information dissemination path diagram is a diagram that shows the path and dissemination trend of information in the network through some nodes in the network.

Specifically, the server creates a blank initial path graph, and marks the collected initial propagation nodes and candidate sampling nodes in the initial path graph. The initial propagation node can be marked in the center of the initial path graph, and then the connected nodes, key nodes and common nodes in the candidate sampling nodes can be marked in an orderly manner according to the propagation record information. Finally, the server connects the initial propagation node with various candidate sampling nodes to obtain an information propagation path diagram. The information dissemination path graph may also include the node identifiers of the initial dissemination node and candidate sampling nodes.

In this embodiment, the network is first divided into at least one social network based on the dissemination record information, and each social network disseminates information independently of each other, and the candidate sampling nodes and the initial dissemination nodes in each social network are determined; according to the candidate sampling nodes The node type identification of, calculates the node composition ratio of the three nodes of the connection node, the key node and the ordinary node among the candidate sampling nodes. When collecting initial propagation nodes or candidate sampling nodes, sampling is performed according to the sampling strategy. The sampling strategy is comprehensively determined by the composition ratio of the nodes and the preset node association relationship. The node association relationship includes connecting nodes to key nodes and key nodes to ordinary nodes. This ensures that all kinds of nodes can be sampled in a balanced manner, and the accuracy of node sampling is improved; the information dissemination path graph is used to show the dissemination path of information in the network, and is composed of collected candidate sampling nodes and initial dissemination nodes. Ensure the accuracy of the analysis of the information propagation path.

Further, the above step 202 may include: initializing the node identification of each node in the dissemination record information to the community label of each node; for each node in the network, the community label corresponding to the node and its neighboring nodes Determine the minimum community label in the node; update the community label of the node to the determined minimum community label to iteratively update the community label of each node; when the community label of each node no longer changes, according to each The community label of the node divides the network into at least one social network; according to the propagation time in the dissemination record information, the node with the earliest dissemination time in each social network is determined as the initial dissemination node of each social network, and each social network The node that does not have the earliest propagation time is determined as the candidate sampling node.

Among them, the node identifier may be the identifier of the node, and the node identifier may be a combination of letters, numbers, and special symbols. The community label of the node can identify the social network to which the node belongs. When two nodes have a propagation relationship, the two nodes are adjacent to each other. The propagation time can be the time when the propagation operation takes place. The minimum community label may be the smallest community label among the community labels of a node and its neighboring nodes.

Specifically, the propagation record information includes the node identifier of each node, and the node identifier of each node is different, and the server may initialize the node identifier of each node to the community label of each node. The node belongs to the social network corresponding to the social tag.

In one embodiment, the server randomly generates the community label of each node, and the randomly generated community labels are different from each other.

After initialization, there is a social network equal to the number of nodes in the network, and the community label needs to be updated iteratively to merge the social network. For each node of the network, the community label corresponding to the node itself and its neighboring nodes is compared to determine the minimum community label of the node and its neighboring nodes.

When the node ID is a number, the node ID with the smallest value is selected as the smallest community label by means of numerical comparison; when the node ID is a string, a single character or string is compared in lexicographic order. The size of the ASCII (American Standard Code for Information Interchange, American Standard Code for Information Interchange) code value is used as the standard for character comparison, and the node identifier with the smallest ASCII code value is selected as the smallest community label.

When each node performs each round of iterative update, it will update the determined minimum community label to its own community label. After each round of iterative update is completed, the minimum community label is determined from the community labels of the node itself and its neighboring nodes, and then the next round of iterative update is performed.

When the community labels of all nodes no longer change in the iterative update, the network is divided into at least one social network according to the community labels of the nodes. In the same social network, all nodes have the same community label, and nodes with different community labels are divided into different social networks.

According to the propagation time in the propagation record information, the server finds the node with the earliest propagation time in each social network, and determines the found node as the initial propagation node of the social network where the node is located, and the remaining nodes in the social network Set as a candidate sampling node. Every social network has an initial propagation node.

For example, Figure 3 below is a schematic diagram of dividing a social network in an embodiment. For each node in the network on the left in Figure 3, set the node ID of the node to its own community label, then the community label of the node whose node ID is id0 is id0, and the community label of the node whose node ID is id1 is id1, and so on.

For each node, before each round of iterative update, determine the minimum community label from the community labels of itself and its neighboring nodes. For node id0, from the community labels id0, id1 and id3, the string id0 has the smallest ASCII code value, and id0 is selected as the smallest community label; similarly, for nodes id1 and id3, the smallest community label is also id0; For node id2, the smallest community label is id1. Then in the first round of iterative update, the community labels of nodes id0, id1, and id3 are updated to id0, and the community labels of node id2 are updated to id1.

In the second round of iterative update, the minimum community labels of nodes id0, id1 and id3 will no longer change, and the community labels of these three nodes will not change. The community label of node id2 will become id0 and node id0. The community tags of, id1, id2, and id3 are all the same.

There is no information dissemination between nodes id0, id1, id2, and id3 and other nodes in the network such as node id4, so the community tags of nodes such as node id4 cannot affect nodes id0, id1, id2, and id3. At this time, the community labels of nodes id0, id1, id2, and id3 will no longer change. After the iteration is completed, the community labels are all id0 when the iteration is completed.

In the same way, the community label of nodes id4, id5, id6, and id7 is id4 when the iteration is completed, and the community label of nodes id8, id9, and id10 is id8 when the iteration is completed. According to the community tags, the network can be divided into the three social networks on the right in Figure 3.

In one embodiment, the network can be split into multiple social networks through the ConnectedComponent() function in the graphx module under the spark framework, and the input of the function is the propagation record information in Table 1 and Table 2. Spark is a fast and universal computing engine designed for large-scale data processing; GraphX is a component used for graphs and graph calculations in Spark; ConnectedComponent() function is a connected component algorithm, which is used for the discovery of social networks in the network.

In this embodiment, a community label is added to each node, and the minimum community label is determined from the community labels corresponding to the node and the neighboring nodes of the node; the community label of the node is updated to the determined minimum community label to Iteratively update the community label of each node; in the iterative update, the nodes in the same social network communicate with each other so that the community labels of the nodes tend to be the same, which can be used at the end of the iterative update , According to the community label, the network is accurately divided into social networks, and at the same time, the initial propagation nodes and candidate sampling nodes in the social network can be accurately located according to the propagation time, thereby ensuring the accuracy of node sampling.

Further, as shown in FIG. 4, the foregoing step 204 may include:

Step 2041: Set the sampling ratio of the initial propagation node in a round of sampling according to the preset sampling ratio.

Among them, the sampling ratio can be the ratio of the number of samples of a certain type of node to the number of samples in this round of sampling in each round of sampling. The preset sampling ratio may be the ratio of the collected initial propagation nodes to all the initial propagation nodes in a preset round of sampling.

Specifically, the server starts sampling from the initial propagation node, and collects the initial propagation node in one round of sampling. The server can collect all initial propagation nodes, or read the preset sampling ratio, and set the sampling ratio of the initial propagation node in a round of sampling to the preset sampling ratio.

In order to observe the information propagation path from a global perspective, it is preferable to collect all the initial propagation nodes, that is, it is preferable to set the sampling ratio of the initial propagation nodes in one round of sampling to 1.

Step 2042: Based on the calculated node composition ratio and the preset node association relationship, determine the sampling ratio of the connecting node, the key node and the ordinary node in each round of sampling after the first round of sampling.

In each round of sampling after one round of sampling, the server collects candidate sampling nodes, and the types of candidate sampling nodes collected in each round of sampling may be different. The server determines the types of candidate sampling nodes involved in each round of sampling after one round of sampling according to the node association relationship.

Based on the node association relationship, the connection nodes, key nodes and common nodes associated with the initial propagation node can be determined in the second round of sampling; in the third round of sampling, the key nodes associated with the connection nodes collected in the second round of sampling can be collected. Collect the common nodes associated with the key nodes collected in the second round of sampling; in the fourth round of sampling, collect the common nodes associated with the key nodes collected in the third round of sampling.

After one round of sampling, according to the types of candidate sampling nodes involved in each round of sampling, and the composition ratio of connecting nodes, key nodes and ordinary nodes, calculate the sampling ratio of connecting nodes, key nodes and ordinary nodes in each round of sampling.

For example, suppose that among the candidate sampling nodes, the node composition ratio of connecting nodes is 5%, the node composition ratio of key nodes is 5%, and the node composition ratio of ordinary nodes is 90%. In the second round of sampling, the connected nodes, key nodes, and ordinary nodes are collected at a ratio of 1:1:18 (5%:5%:90%); in the three rounds of sampling, the key nodes and ordinary nodes are collected in accordance with 1:18 (5%:5%:90%). %:90%).

It is stated here that since the number of ordinary nodes directly spread to connecting nodes and ordinary nodes is small, the association relationship between connecting nodes and ordinary nodes to ordinary nodes and ordinary nodes to ordinary nodes can be ignored in actual sampling. At the same time, the four rounds of sampling correspond to the four rounds of dissemination of information. After the four rounds of dissemination, the dissemination of information is already weak and no analysis is needed.

Step 2043: Determine the determined sampling ratio of nodes in each round of sampling as a sampling strategy.

Specifically, the server uses the types of nodes that need to be collected in the four rounds of sampling and the corresponding sampling ratio as a sampling strategy. The sampling strategy instructs the server to collect initial propagation nodes and various candidate sampling nodes from various social networks.

In this embodiment, the sampling ratio of the initial propagation node in a round of sampling is determined according to the preset sampling ratio; then the connection nodes, key nodes, and ordinary nodes in the candidate sampling nodes are determined according to the node composition ratio and node association relationship in a round of sampling The sampling ratio in each subsequent round of sampling is used to obtain the sampling strategy; the determination of the sampling strategy integrates the node association relationship and the node composition ratio to ensure that various nodes can be sampled in a balanced manner.

Further, as shown in FIG. 5, the foregoing step 2042 may include:

Step 20421: Based on the calculated node composition ratio and the preset node association relationship, initialize the sampling ratio of the connection node, the key node and the ordinary node in each round of sampling after the first round of sampling.

Specifically, the server determines the types of candidate sampling nodes that need to be collected in each round of sampling after one round of sampling according to the preset node association relationship, and initializes the sampling ratio in each round of sampling.

In the initialization, the relative ratio of the node composition ratio of the connecting node, the key node and the ordinary node in the candidate sampling node is directly used as the sampling ratio of the connecting node, the key node and the ordinary node.

Step 20422: Compare the node composition ratio with a preset ratio threshold.

Wherein, the ratio threshold may be a preset node composition ratio threshold, which is used to determine whether it is necessary to adjust the sampling ratio of connected nodes, key nodes, and ordinary nodes.

Specifically, the server obtains a preset ratio threshold, and compares the node composition ratios of the connected nodes, key nodes, and ordinary nodes with the ratio thresholds. When there is a node composition ratio that is less than the ratio threshold, it indicates that a certain type of candidate sampling node accounts for a small proportion. In random sampling, some important candidate sampling nodes may be missed, thereby missing the relevant propagation path.

Step 20423: When there is a node composition ratio that is less than the ratio threshold, adjust the sampling ratio obtained by initialization according to the preset adjustment value.

Wherein, the preset adjustment value may be a preset sampling ratio adjustment value, and the sampling ratio may be adjusted through addition and subtraction.

Specifically, when there is a node composition ratio that is less than the ratio threshold, the server needs to adjust the sampling ratio obtained by initialization. During adjustment, the server obtains the preset adjustment value, and adjusts the sampling ratio obtained by initialization according to the preset adjustment value to obtain the final sampling ratio.

For example, suppose the node composition ratio is connected node: key node: ordinary node=9%:1%:90%, the ratio threshold is 3%, and the node composition ratio of key nodes is less than the ratio threshold, indicating that the number of key nodes is small. If the node composition ratio is directly used as the sampling ratio, the sampling ratio in the second round of sampling is connected nodes: key nodes: ordinary nodes = 9%: 1%: 90%. If there is a key node associated with a large number of ordinary nodes, and the key node is missed in random sampling, the more important propagation path will be lost. Therefore, the sampling ratio of the key nodes can be increased according to the preset adjustment value. For example, if the preset adjustment value is 2%, the sampling ratio of the key nodes is adjusted to 3%.

When reducing the sampling ratio of connected nodes and ordinary nodes, the sampling ratio of ordinary nodes can be reduced first, because ordinary nodes usually occupy the largest number. At this time, the sampling ratio is connected nodes: key nodes: ordinary nodes = 9%: 3%: 88%. Similarly, if the node composition ratio of key nodes is 2%, the sampling ratio of key nodes can be adjusted to 4% in the second round of sampling; if the node composition ratio of key nodes is 0.5%, the key nodes can be adjusted in the second round of sampling. The sampling ratio of the node is adjusted up twice, adjusted to 4.5%, and no more examples will be given here.

In this embodiment, after initializing the sampling ratios of connection nodes, key nodes, and ordinary nodes, the node composition ratio is compared with a preset ratio threshold. When there is a node composition ratio less than the ratio threshold, to ensure that as much as possible After collecting this type of node, adjust the sampling ratio obtained by initialization according to the preset adjustment value to improve the balance of sampling.

Further, the above step 205 may include: during a round of sampling, sampling the initial propagation node according to the sampling strategy; when sampling after a round of sampling, querying the node locations collected in the previous round of sampling according to the propagation record information. Propagated candidate sampling nodes; according to the sampling strategy, sampling the queried candidate sampling nodes.

Specifically, the server needs to perform multiple rounds of sampling, and in one round of sampling, the initial propagation node is collected according to the sampling strategy.

When sampling after a round of sampling, first query the candidate sampling nodes to which the nodes collected in the previous round of sampling are propagated according to the propagation record information.

After querying the candidate sampling nodes, count the number of candidate sampling nodes that were queried; then, according to the sampling ratio in the sampling strategy, calculate the sampling number of various candidate sampling nodes in the round of sampling, and then query according to the calculated sampling number The candidate sampling node that has been reached is sampled.

When performing the second round of sampling, first query the candidate sampling nodes associated with the collected initial propagation node and count the number of them to obtain the number of propagation possible for the second round of sampling. According to the sampling ratio of the second round of sampling recorded in the sampling strategy, calculate the sampling number of connected nodes, key nodes and common nodes in the second round of sampling. Then randomly collect connection nodes, key nodes, and ordinary nodes according to the number of samples.

In the three rounds of sampling, among the candidate sampling nodes associated with the connected nodes and key nodes collected in the second round of sampling, the key nodes and common nodes are randomly collected according to the sampling strategy.

In the four rounds of sampling, among the candidate sampling nodes associated with the key nodes collected in the three rounds of sampling, ordinary nodes are randomly collected according to the sampling strategy.

In this embodiment, the initial propagation node is first sampled according to the sampling strategy; when sampling after a round of sampling, the candidate sampling nodes propagated to by the nodes collected in the previous round of sampling are queried according to the propagation record information, and then according to the sampling The strategy samples the queried candidate sampling nodes to ensure that the collected nodes are all related to the nodes collected in the previous round of sampling, and the continuity of the information propagation path is ensured.

Further, the above step 206 may include: adding the collected initial propagation nodes and candidate sampling nodes to the initial path graph according to the propagation record information; setting the initial propagation nodes in the initial path graph and the connection nodes in the candidate sampling nodes, Display mode of key nodes and ordinary nodes; connect the set up initial propagation node, connecting node, key node and ordinary node to get the information propagation path diagram.

Specifically, the server creates a blank initial path graph, and adds the collected initial propagation nodes and candidate sampling nodes to the initial path graph. The server may arrange the initial propagation nodes and candidate sampling nodes according to the propagation record information, so that nodes with direct propagation relationships have a closer distribution in the initial path graph.

In order to highlight the information dissemination path more intuitively and clearly, the server sets different types of nodes to different display modes. Display methods include color display and shape display. When using color display, the initial propagation node, connection node, key node and common node can be set to different colors. For example, the initial propagation node can be represented by red dots, and the connecting nodes can be represented by blue dots. When using shape display, the initial propagation node, connection node, key node and common node can be set to different shapes. For example, the initial propagation node can be represented by a circle, and the connecting node can be represented by a triangle.

The server connects the initial propagation nodes, connection nodes, key nodes and common nodes that have been set up in the initial path graph to obtain the information propagation path graph.

The server can store the information transmission path diagram in the database, or send the information transmission path diagram to a designated terminal for display, or upload the information transmission path diagram to the blockchain for storage.

In one embodiment, gephi (a JVM-based complex network analysis software, mainly used for various networks and complex systems, interactive visualization and detection of dynamic and hierarchical graphs) can be used to generate an information propagation path diagram. The server inputs the collected initial propagation node and the propagation record information between the candidate sampling nodes into gephi, and gephi generates an information propagation path graph.

In this embodiment, according to the propagation record information, the collected initial propagation nodes and candidate sampling nodes are added to the initial path graph, and the connection nodes, key nodes, and ordinary nodes in the initial propagation node and candidate sampling nodes are set to each Different display methods are distinguished to ensure that the generated information dissemination path diagram can more accurately and clearly show the information dissemination path.

Fig. 6 is an information propagation path diagram generated in an embodiment. Specifically, referring to Figure 6, the center of the figure is the origin node of the information publisher; the solid circles in the figure are initial propagation nodes; the open circles are connected nodes; the solid squares are key nodes; and the open squares are ordinary nodes. Figure 6 shows the trend and path of the spread of certain information in the network through the collected nodes.

The information propagation path analysis method in this application can be applied to the field of big data, and can process massive structured data; in addition, this application also relates to fraud detection in financial technology.

A person of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by instructing relevant hardware through computer-readable instructions, which can be stored in a computer-readable storage medium. When the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Among them, the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

It should be understood that although the various steps in the flowchart of the drawings are displayed in sequence as indicated by the arrows, these steps are not necessarily performed in sequence in the order indicated by the arrows. Unless there is a clear description in this article, the execution of these steps is not strictly limited in order, and they can be executed in other orders. Moreover, at least part of the steps in the flowchart of the drawings may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times, and the order of execution is also It is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.

With further reference to FIG. 7, as an implementation of the method shown in FIG. 2, this application provides an embodiment of an information propagation path analysis device. The device embodiment corresponds to the method embodiment shown in FIG. Specifically, it can be applied to various electronic devices.

As shown in FIG. 7, the information propagation path analysis device 300 in this embodiment includes: an information acquisition module 301, a network division module 302, a ratio calculation module 303, a strategy determination module 304, a node sampling module 305, and a path generation module 306. in:

The information acquisition module 301 is used to acquire network propagation record information. It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned propagation record information, the above-mentioned propagation record information may also be stored in a node of a blockchain.

The network dividing module 302 is configured to divide the network into at least one social network according to the dissemination record information, and determine candidate sampling nodes and initial dissemination nodes in each social network.

The ratio calculation module 303 is used to calculate the node composition ratio of the connection node, the key node and the ordinary node in the candidate sampling node through the node type identification of the candidate sampling node.

The strategy determination module 304 is configured to determine a sampling strategy based on the calculated node composition ratio and the preset node association relationship; the node association relationship includes connecting nodes associated with key nodes, and key nodes associated with ordinary nodes.

The node sampling module 305 is configured to sample the initial propagation node and candidate sampling nodes according to the sampling strategy.

The path generation module 306 is used to visually present the collected initial propagation nodes and candidate sampling nodes to generate an information propagation path graph.

In some optional implementations of this embodiment, the aforementioned network division module 302 includes: a label adding submodule, a minimum determination submodule, a label update submodule, a network division submodule, and a node determination submodule, wherein:

The label adding submodule is used to initialize the node identification of each node in the propagation record information to the community label of each node.

The minimum determination sub-module is used to determine the minimum community label from the community labels corresponding to the node and the neighboring nodes of the node for each node in the network.

The label update submodule is used to update the community label of the node to the determined minimum community label to iteratively update the community label of each node.

The network division sub-module is used to divide the network into at least one social network according to the community label of each node when the community label of each node no longer changes.

The node determination sub-module is used to determine the node with the earliest propagation time in each social network as the initial propagation node of each social network according to the propagation time in the propagation record information, and determine the node in each social network that does not have the earliest propagation time It is a candidate sampling node.

In some optional implementation manners of this embodiment, the above-mentioned information propagation path analysis device 300 further includes: a quantity determination module, a type determination module, and an identity addition module, wherein:

The quantity determining module is used to determine the node propagation quantity of each candidate sampling node according to the propagation record information.

The type determination module is used to determine the node type of each candidate sampling node through the number of node propagation.

The identifier adding module is used to add a node type identifier corresponding to the respective node type to each candidate sampling node.

In some optional implementation manners of this embodiment, the above-mentioned strategy determination module 304 includes: a round setting sub-module, a ratio determination sub-module, and a strategy determination sub-module, wherein:

The one-round setting sub-module is used to set the sampling ratio of the initial propagation node in one round of sampling according to the preset sampling ratio.

The proportion determination sub-module is used to determine the sampling proportions of connecting nodes, key nodes and common nodes in each round of sampling after one round of sampling based on the calculated node composition proportion and the preset node association relationship.

The strategy determination sub-module is used to determine the determined sampling ratio of the nodes in each round of sampling as the sampling strategy.

In some optional implementation manners of this embodiment, the aforementioned ratio determination sub-module includes: a ratio initial unit, a ratio comparison unit, and a ratio adjustment unit, wherein:

The ratio initial unit is used to initialize the sampling ratio of the connection node, the key node and the common node in each round of sampling after the first round of sampling based on the calculated node composition ratio and the preset node association relationship.

The ratio comparison unit is used to compare the node composition ratio with a preset ratio threshold.

The ratio adjustment unit is configured to adjust the sampling ratio obtained by initialization according to the preset adjustment value when there is a node composition ratio smaller than the ratio threshold.

In some optional implementation manners of this embodiment, the aforementioned node sampling module 305 includes: an initial sampling submodule, a node query submodule, and a node sampling submodule, where:

The initial sampling sub-module is used to sample the initial propagation node according to the sampling strategy during a round of sampling.

The node query sub-module is used to query the candidate sampling nodes to which the nodes collected in the previous round of sampling are propagated according to the propagation record information when sampling after a round of sampling.

The node sampling sub-module is used to sample the queried candidate sampling nodes according to the sampling strategy.

In some optional implementation manners of this embodiment, the aforementioned path generation module 306 includes: a node adding submodule, a display setting submodule, and a node connection submodule, where:

The node adding sub-module is used to add the collected initial propagation nodes and candidate sampling nodes to the initial path graph according to the propagation record information.

The display setting sub-module is used to set the display mode of connection nodes, key nodes and common nodes in the initial propagation node and candidate sampling nodes in the initial path graph.

The node connection sub-module is used to connect the set up initial propagation node, connecting node, key node and common node to obtain the information propagation path graph.

In order to solve the above technical problems, the embodiments of the present application also provide computer equipment. Please refer to FIG. 8 for details. FIG. 8 is a block diagram of the basic structure of the computer device in this embodiment.

The computer device 4 includes a memory 41, a processor 42, and a network interface 43 that are mutually communicatively connected via a system bus. It should be pointed out that the figure only shows the computer device 4 with the components 41-43, but it should be understood that it is not required to implement all the shown components, and more or fewer components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions. Its hardware includes, but is not limited to, a microprocessor, a dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.

The computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.

The memory 41 includes at least one type of computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium includes flash memory, hard disk, and multimedia card. , Card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), Programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk equipped on the computer device 4, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) card, flash card (Flash Card), etc. Of course, the memory 41 may also include both the internal storage unit of the computer device 4 and its external storage device. In this embodiment, the memory 41 is generally used to store an operating system and various application software installed in the computer device 4, such as computer-readable instructions for an information propagation path analysis method. In addition, the memory 41 can also be used to temporarily store various types of data that have been output or will be output.

In some embodiments, the processor 42 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips. The processor 42 is generally used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to run computer-readable instructions or process data stored in the memory 41, for example, run the computer-readable instructions of the information propagation path analysis method.

The network interface 43 may include a wireless network interface or a wired network interface, and the network interface 43 is generally used to establish a communication connection between the computer device 4 and other electronic devices.

The computer device provided in this embodiment can execute the steps of the information propagation path analysis method described above. Here, the steps of the information propagation path analysis method may be the steps in the information propagation path analysis method of each of the foregoing embodiments.

In this embodiment, the network is first divided into at least one social network based on the dissemination record information, each social network disseminates information independently of each other, and candidate sampling nodes and initial dissemination nodes in each social network are determined; according to the candidate sampling nodes The node type identification of, calculates the node composition ratio of the three nodes of the connection node, the key node and the ordinary node among the candidate sampling nodes. When collecting initial propagation nodes or candidate sampling nodes, sampling is performed according to the sampling strategy. The sampling strategy is comprehensively determined by the composition ratio of the nodes and the preset node association relationship. The node association relationship includes connecting nodes to key nodes and key nodes to ordinary nodes. This ensures that all kinds of nodes can be sampled in a balanced manner, and the accuracy of node sampling is improved; the information dissemination path graph is used to show the dissemination path of information in the network, and is composed of collected candidate sampling nodes and initial dissemination nodes. Ensure the accuracy of the analysis of the information propagation path.

This application also provides another implementation manner, that is, a computer-readable storage medium that stores computer-readable instructions for information propagation path analysis, and the computer-readable storage medium stores computer-readable instructions for information propagation path analysis. The computer-readable instructions may be executed by at least one processor, so that the at least one processor executes the steps of the computer-readable instructions for information propagation path analysis as described above.

Through the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present application.

Obviously, the above-described embodiments are only a part of the embodiments of the application, rather than all of the embodiments. The drawings show preferred embodiments of the application, but do not limit the patent scope of the application. This application can be implemented in many different forms. On the contrary, the purpose of providing these embodiments is to make the understanding of the disclosure of this application more thorough and comprehensive. Although this application has been described in detail with reference to the foregoing embodiments, for those skilled in the art, they can still modify the technical solutions described in the foregoing specific embodiments, or equivalently replace some of the technical features. . All equivalent structures made by using the contents of the description and drawings of this application, directly or indirectly used in other related technical fields, are similarly within the scope of patent protection of this application.

Claims

An information propagation path analysis method includes the following steps:

Obtain information on the distribution record of the network;

Dividing the network into at least one social network according to the dissemination record information, and determining candidate sampling nodes and initial dissemination nodes in each social network;

Calculating the node composition ratio of the connecting node, the key node and the common node in the candidate sampling node through the node type identification of the candidate sampling node;

The sampling strategy is determined based on the calculated node composition ratio and the preset node association relationship; the node association relationship includes connecting nodes associated with key nodes, and key nodes associated with ordinary nodes;

Sampling the initial propagation node and the candidate sampling node according to the sampling strategy;

Visually present the collected initial propagation nodes and candidate sampling nodes to generate an information propagation path diagram.
The information dissemination path analysis method according to claim 1, wherein the network is divided into at least one social network according to the dissemination record information, and the candidate sampling node and the initial dissemination node in each social network are determined The specific steps include:

Initialize the node identifier of each node in the dissemination record information to the community label of each node;

For each node in the network, determine the smallest community label from the community labels corresponding to the node and the neighboring nodes of the node;

Updating the community label of the node to the determined minimum community label to iteratively update the community label of each node;

When the community label of each node no longer changes, divide the network into at least one social network according to the community label of each node;

According to the dissemination time in the dissemination record information, the node with the earliest dissemination time in each social network is determined as the initial dissemination node of each social network, and the node in each social network that does not have the earliest dissemination time is determined as Candidate sampling node.
The information dissemination path analysis method according to claim 1, wherein the step of calculating the node composition ratio of connecting nodes, key nodes, and ordinary nodes in the candidate sampling node through the node type identification of the candidate sampling node ,Also includes:

Determining the number of node propagations of each candidate sampling node according to the propagation record information;

Determining the node type of each candidate sampling node according to the number of node propagation;

A node type identifier corresponding to the respective node type is added to each candidate sampling node.
The information propagation path analysis method according to claim 1, wherein the step of determining the sampling strategy based on the calculated node composition ratio and the preset node association relationship specifically comprises:

Set the sampling ratio of the initial propagation node in a round of sampling according to the preset sampling ratio;

Based on the calculated node composition ratio and the preset node association relationship, determine the sampling ratio of the connection node, the key node and the common node in each round of sampling after the first round of sampling;

Determine the sampling ratio of nodes in each round of sampling as the sampling strategy.
The information propagation path analysis method according to claim 4, wherein the connection node, the key node and the ordinary node are determined in a round based on the calculated node composition ratio and the preset node association relationship. The steps of sampling ratio in each round of sampling after sampling specifically include:

Based on the calculated node composition ratio and the preset node association relationship, initialize the sampling ratio of the connection node, the key node, and the ordinary node in each round of sampling after the first round of sampling;

Comparing the composition ratio of the nodes with a preset ratio threshold;

When there is a node composition ratio smaller than the ratio threshold, the sampling ratio obtained by initialization is adjusted according to the preset adjustment value.
The information propagation path analysis method according to claim 4, wherein the step of sampling the initial propagation node and the candidate sampling node according to the sampling strategy specifically comprises:

During a round of sampling, sampling the initial propagation node according to the sampling strategy;

When sampling after a round of sampling, query the candidate sampling nodes to which the nodes collected in the previous round of sampling are propagated according to the propagation record information;

According to the sampling strategy, sampling the queried candidate sampling nodes.
The information dissemination path analysis method according to claim 1, wherein the step of visually presenting the collected initial dissemination nodes and candidate sampling nodes to generate an information dissemination path graph specifically comprises:

Add the collected initial propagation nodes and candidate sampling nodes to the initial path graph according to the propagation record information;

Setting the display modes of connection nodes, key nodes and common nodes in the initial propagation node and candidate sampling nodes in the initial path graph;

Connect the set-up initial propagation node, the connection node, the key node, and the ordinary node to obtain an information propagation path graph.
An information propagation path analysis device, including:

The information acquisition module is used to acquire the dissemination record information of the network;

A network division module, configured to divide the network into at least one social network according to the dissemination record information, and determine candidate sampling nodes and initial dissemination nodes in each social network;

A ratio calculation module, configured to calculate the node composition ratio of connecting nodes, key nodes, and ordinary nodes in the candidate sampling node through the node type identification of the candidate sampling node;

The strategy determination module is used to determine the sampling strategy based on the calculated node composition ratio and the preset node association relationship; the node association relationship includes connecting nodes associated with key nodes, and key nodes associated with ordinary nodes;

A node sampling module, configured to sample the initial propagation node and the candidate sampling node according to the sampling strategy;

The path generation module is used to visually present the collected initial propagation nodes and candidate sampling nodes to generate an information propagation path graph.
A computer device includes a memory and a processor, wherein computer readable instructions are stored in the memory, and the processor implements the following steps when executing the computer readable instructions:

Obtain information on the distribution record of the network;

Dividing the network into at least one social network according to the dissemination record information, and determining candidate sampling nodes and initial dissemination nodes in each social network;

Calculating the node composition ratio of the connecting node, the key node and the common node in the candidate sampling node through the node type identification of the candidate sampling node;

The sampling strategy is determined based on the calculated node composition ratio and the preset node association relationship; the node association relationship includes connecting nodes associated with key nodes, and key nodes associated with ordinary nodes;

Sampling the initial propagation node and the candidate sampling node according to the sampling strategy;

Visually present the collected initial propagation nodes and candidate sampling nodes to generate an information propagation path diagram.
The computer device according to claim 9, wherein the step of dividing the network into at least one social network according to the dissemination record information, and determining candidate sampling nodes and initial dissemination nodes in each social network specifically comprises :

Initialize the node identifier of each node in the dissemination record information to the community label of each node;

For each node in the network, determine the smallest community label from the community labels corresponding to the node and the neighboring nodes of the node;

Updating the community label of the node to the determined minimum community label to iteratively update the community label of each node;

When the community label of each node no longer changes, divide the network into at least one social network according to the community label of each node;

According to the dissemination time in the dissemination record information, the node with the earliest dissemination time in each social network is determined as the initial dissemination node of each social network, and the node in each social network that does not have the earliest dissemination time is determined as Candidate sampling node.
The computer device according to claim 9, wherein, before the step of calculating the node composition ratio of connecting nodes, key nodes, and ordinary nodes in the candidate sampling node through the node type identification of the candidate sampling node, the The processor also implements the following steps when executing the computer-readable instructions:

Determining the number of node propagations of each candidate sampling node according to the propagation record information;

Determining the node type of each candidate sampling node according to the number of node propagation;

A node type identifier corresponding to the respective node type is added to each candidate sampling node.
The computer device according to claim 9, wherein the step of determining the sampling strategy based on the calculated node composition ratio and the preset node association relationship specifically comprises:

Set the sampling ratio of the initial propagation node in a round of sampling according to the preset sampling ratio;

Based on the calculated node composition ratio and the preset node association relationship, determine the sampling ratio of the connection node, the key node and the common node in each round of sampling after the first round of sampling;

Determine the sampling ratio of nodes in each round of sampling as the sampling strategy.
The computer device according to claim 12, wherein said connection node, said key node and said ordinary node are determined based on the calculated node composition ratio and preset node association relationship after a round of sampling The steps of sampling ratio in each round of sampling specifically include:

Based on the calculated node composition ratio and the preset node association relationship, initialize the sampling ratio of the connection node, the key node, and the ordinary node in each round of sampling after the first round of sampling;

Comparing the composition ratio of the nodes with a preset ratio threshold;

When there is a node composition ratio smaller than the ratio threshold, the sampling ratio obtained by initialization is adjusted according to the preset adjustment value.
The computer device according to claim 12, wherein the step of sampling the initial propagation node and the candidate sampling node according to the sampling strategy specifically comprises:

During a round of sampling, sampling the initial propagation node according to the sampling strategy;

When sampling after a round of sampling, query the candidate sampling nodes to which the nodes collected in the previous round of sampling are propagated according to the propagation record information;

According to the sampling strategy, sampling the queried candidate sampling nodes.
A computer-readable storage medium on which computer-readable instructions are stored; wherein, when the computer-readable instructions are executed by a processor, the following steps are implemented:

Obtain information on the distribution record of the network;

Dividing the network into at least one social network according to the dissemination record information, and determining candidate sampling nodes and initial dissemination nodes in each social network;

Calculating the node composition ratio of the connecting node, the key node and the common node in the candidate sampling node through the node type identification of the candidate sampling node;

The sampling strategy is determined based on the calculated node composition ratio and the preset node association relationship; the node association relationship includes connecting nodes associated with key nodes, and key nodes associated with ordinary nodes;

Sampling the initial propagation node and the candidate sampling node according to the sampling strategy;

Visually present the collected initial propagation nodes and candidate sampling nodes to generate an information propagation path diagram.
The computer-readable storage medium according to claim 15, wherein the network is divided into at least one social network according to the dissemination record information, and the candidate sampling node and the initial dissemination node in each social network are determined The specific steps include:

Initialize the node identifier of each node in the dissemination record information to the community label of each node;

For each node in the network, determine the smallest community label from the community labels corresponding to the node and the neighboring nodes of the node;

Updating the community label of the node to the determined minimum community label to iteratively update the community label of each node;

When the community label of each node no longer changes, divide the network into at least one social network according to the community label of each node;

According to the dissemination time in the dissemination record information, the node with the earliest dissemination time in each social network is determined as the initial dissemination node of each social network, and the node in each social network that does not have the earliest dissemination time is determined as Candidate sampling node.
The computer-readable storage medium according to claim 15, wherein the step of calculating the node composition ratio of connecting nodes, key nodes, and ordinary nodes in the candidate sampling node through the node type identification of the candidate sampling node When the computer-readable instructions are executed by the processor, the following steps are also implemented:

Determining the number of node propagations of each candidate sampling node according to the propagation record information;

Determining the node type of each candidate sampling node according to the number of node propagation;

A node type identifier corresponding to the respective node type is added to each candidate sampling node.
The computer-readable storage medium according to claim 15, wherein the step of determining a sampling strategy based on the calculated node composition ratio and a preset node association relationship specifically comprises:

Set the sampling ratio of the initial propagation node in a round of sampling according to the preset sampling ratio;

Based on the calculated node composition ratio and the preset node association relationship, determine the sampling ratio of the connection node, the key node and the common node in each round of sampling after the first round of sampling;

Determine the sampling ratio of nodes in each round of sampling as the sampling strategy.
The computer-readable storage medium according to claim 18, wherein the connection node, the key node, and the ordinary node are determined in a round based on the calculated node composition ratio and a preset node association relationship. The steps of sampling ratio in each round of sampling after sampling specifically include:

Based on the calculated node composition ratio and the preset node association relationship, initialize the sampling ratio of the connection node, the key node, and the ordinary node in each round of sampling after the first round of sampling;

Comparing the composition ratio of the nodes with a preset ratio threshold;

When there is a node composition ratio smaller than the ratio threshold, the sampling ratio obtained by initialization is adjusted according to the preset adjustment value.
The computer-readable storage medium according to claim 18, wherein the step of sampling the initial propagation node and the candidate sampling node according to the sampling strategy specifically comprises:

During a round of sampling, sampling the initial propagation node according to the sampling strategy;

When sampling after a round of sampling, query the candidate sampling nodes to which the nodes collected in the previous round of sampling are propagated according to the propagation record information;

According to the sampling strategy, sampling the queried candidate sampling nodes.