WO2021258998A1 - Information propagation path analysis method and apparatus, and computer device and storage medium - Google Patents

Information propagation path analysis method and apparatus, and computer device and storage medium Download PDF

Info

Publication number
WO2021258998A1
WO2021258998A1 PCT/CN2021/096857 CN2021096857W WO2021258998A1 WO 2021258998 A1 WO2021258998 A1 WO 2021258998A1 CN 2021096857 W CN2021096857 W CN 2021096857W WO 2021258998 A1 WO2021258998 A1 WO 2021258998A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
sampling
nodes
propagation
candidate
Prior art date
Application number
PCT/CN2021/096857
Other languages
French (fr)
Chinese (zh)
Inventor
曹合心
蔡健
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021258998A1 publication Critical patent/WO2021258998A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Definitions

  • This application relates to big data, and in particular to an information propagation path analysis method, device, computer equipment and storage medium.
  • people or computers can form a network through complex connections. People or computers can be regarded as nodes in the network, and data and information can be transmitted in the network. With the development of Internet technology, it is often necessary to analyze the spread of information on the Internet. For example, in a marketing network, by analyzing the spreading path of a product in the network, product information can be spread in a wider range on the network at a lower cost. The spread of viruses and fraudulent links in the epidemic can all be analyzed through the Internet.
  • KOL Key Opinion Leader
  • KOL Key Opinion Leader
  • KOL Key Opinion Leader
  • KOL Key Opinion Leader
  • KOL has more and more accurate information and is accepted or trusted by more relevant groups.
  • KOL can spread information to more nodes in the network , And have a greater impact on the node.
  • the analysis of the information dissemination path in the network is usually to sample KOL nodes, and present the sub-networks that reflect the trend of information dissemination in a visual way.
  • traditional network propagation path analysis techniques usually only manually sample pre-defined KOLs.
  • the type of sampling node is single, and the sampling node needs to be manually adjusted in different scenarios; or all nodes need to be adjusted manually. Differential random sampling, when the KOL in the node is relatively low, it is easy to miss the critical propagation path. It can be seen that the accuracy of traditional network propagation path analysis technology is low.
  • the purpose of the embodiments of the present application is to propose an information propagation path analysis method, device, computer equipment, and storage medium, so as to solve the problem of low accuracy of information propagation path analysis.
  • an embodiment of the present application provides an information propagation path analysis method, which adopts the following technical solutions:
  • the sampling strategy is determined based on the calculated node composition ratio and the preset node association relationship; the node association relationship includes connecting nodes associated with key nodes, and key nodes associated with ordinary nodes;
  • an embodiment of the present application also provides an information propagation path analysis device, including:
  • the information acquisition module is used to acquire the dissemination record information of the network
  • a network division module configured to divide the network into at least one social network according to the dissemination record information, and determine candidate sampling nodes and initial dissemination nodes in each social network;
  • a ratio calculation module configured to calculate the node composition ratio of connecting nodes, key nodes, and ordinary nodes in the candidate sampling node through the node type identification of the candidate sampling node;
  • the strategy determination module is used to determine the sampling strategy based on the calculated node composition ratio and the preset node association relationship; the node association relationship includes connecting nodes associated with key nodes, and key nodes associated with ordinary nodes;
  • a node sampling module configured to sample the initial propagation node and the candidate sampling node according to the sampling strategy
  • the path generation module is used to visually present the collected initial propagation nodes and candidate sampling nodes to generate an information propagation path graph.
  • an embodiment of the present application further provides a computer device, including a memory and a processor.
  • the memory stores computer-readable instructions.
  • the processor executes the computer-readable instructions, the following steps are implemented:
  • the sampling strategy is determined based on the calculated node composition ratio and the preset node association relationship; the node association relationship includes connecting nodes associated with key nodes, and key nodes associated with ordinary nodes;
  • embodiments of the present application also provide a computer-readable storage medium, which stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the following steps are implemented:
  • the sampling strategy is determined based on the calculated node composition ratio and the preset node association relationship; the node association relationship includes connecting nodes associated with key nodes, and key nodes associated with ordinary nodes;
  • the embodiments of this application mainly have the following beneficial effects: first divide the network into at least one social network according to the dissemination record information, each social network disseminates information independently of each other, and determines candidate sampling nodes and each community The initial propagation node in the group network; according to the node type identification of the candidate sampling node, calculate the node composition ratio of the candidate sampling node, the connection node, the key node and the ordinary node.
  • sampling is performed according to the sampling strategy.
  • the sampling strategy is comprehensively determined by the composition ratio of the nodes and the preset node association relationship.
  • the node association relationship includes connecting nodes to key nodes and key nodes to ordinary nodes.
  • the information dissemination path graph is used to show the dissemination path of information in the network, and is composed of collected candidate sampling nodes and initial dissemination nodes. Ensure the accuracy of the analysis of the information propagation path.
  • Figure 1 is an exemplary system architecture diagram to which the present application can be applied;
  • Fig. 2 is a flowchart of an embodiment of an information propagation path analysis method according to the present application
  • FIG. 3 is a schematic diagram of dividing a social network in an embodiment
  • FIG. 4 is a flowchart of a specific implementation of step 204 in FIG. 2;
  • FIG. 5 is a flowchart of a specific implementation manner of step 2042 in FIG. 4;
  • Fig. 6 is a schematic diagram of an information propagation path diagram generated in an embodiment
  • Fig. 7 is a schematic structural diagram of an embodiment of an information propagation path analysis device according to the present application.
  • Fig. 8 is a schematic structural diagram of an embodiment of a computer device according to the present application.
  • the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105.
  • the network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the user can use the terminal devices 101, 102, and 103 to interact with the server 105 through the network 104 to receive or send messages and so on.
  • Various communication client applications such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, and social platform software, can be installed on the terminal devices 101, 102, and 103.
  • the terminal devices 101, 102, 103 may be various electronic devices with a display screen and support web browsing, including but not limited to smart phones, tablets, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image experts compress standard audio layer 4) players, laptop portable computers and desktop computers, etc.
  • MP3 players Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3
  • MP4 Moving Picture Experts Group Audio Layer IV, dynamic image experts compress standard audio layer 4
  • laptop portable computers and desktop computers etc.
  • the server 105 may be a server that provides various services, for example, a background server that provides support for pages displayed on the terminal devices 101, 102, and 103.
  • the information propagation path analysis method provided by the embodiments of the present application is generally executed by a server, and accordingly, the information propagation path analysis device is generally set in the server.
  • terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there can be any number of terminal devices, networks, and servers.
  • the information propagation path analysis method includes the following steps:
  • Step 201 Obtain network propagation record information.
  • the electronic device for example, the server shown in FIG. 1 on which the information propagation path analysis method runs can obtain the propagation record information of the network through a wired connection or a wireless connection.
  • the above wireless connection methods can include, but are not limited to, 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other currently known or future wireless connection methods .
  • the dissemination record information may be information that records the dissemination of a certain kind of information among nodes in the network.
  • the propagation record information may include node identification, node type identification, propagation relationship between nodes, and propagation time.
  • the dissemination record information may also include other attribute information of the node. For example, when a person is a node in the network, the dissemination record information may also include information such as the person's gender and age.
  • the propagation relationship can record whether information dissemination occurs between nodes, and the direction of propagation when information dissemination occurs.
  • the propagation record server stores a part of the propagation record information, such as the node identifier.
  • the dissemination record server monitors and records the information dissemination in the network, and obtains the dissemination record information of the information dissemination.
  • the dissemination record server aggregates dissemination record information and sends it to the server for performing information dissemination path analysis.
  • the server that performs information propagation path analysis and the propagation record server can be the same server or different servers.
  • Table 1 and Table 2 are the dissemination record information in an embodiment. Specifically, referring to Table 1, the customers in the marketing activities are used as nodes in the network, and the customer ID (Identity document, identity identification number) is used as the node identifier, gender, Age is the attribute information of the node, and also includes the node type identification. Table 2 records the propagation relationship and propagation time between nodes.
  • Node ID Customer ID
  • Node type identification Id1 male twenty three Key node Id2 female 25 Normal node
  • the propagation record information may be stored in a database, and the server obtains the propagation record information from the database. It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned propagation record information, the above-mentioned propagation record information may also be stored in a node of a blockchain.
  • the dissemination record information can be counted manually and then uploaded to the server.
  • the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain is essentially a decentralized database. It is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • Step 202 Divide the network into at least one social network according to the dissemination record information, and determine candidate sampling nodes and initial dissemination nodes in each social network.
  • the social network can be a sub-network of the network, and each social network disseminates information independently of each other.
  • the initial dissemination node can be the node that produces the earliest dissemination operation in the social network, and nodes other than the initial dissemination node in the social network are used as candidate sampling nodes.
  • the server divides the network into at least one social network based on the dissemination record information. Nodes in the same social network can be connected through a dissemination relationship to form a closed sub-network; nodes of different social networks cannot use the dissemination relationship related.
  • the server finds the node with the earliest propagation time in each social network, and uses the searched node as the initial propagation node of each social network, and the node that is not the initial propagation node in each social network as a candidate Sampling node.
  • the initial dissemination node is not the origin of information dissemination in the entire network.
  • the origin node of information dissemination is the "producer” or “publisher” of the information, and the origin node propagates the information to the initial dissemination node.
  • Step 203 Calculate the node composition ratio of the connecting node, the key node and the ordinary node in the candidate sampling node through the node type identification of the candidate sampling node.
  • the node type identifier is used to characterize the propagation characteristics of the node
  • the candidate sampling nodes can be divided into connection nodes, key nodes, and ordinary nodes according to the propagation characteristics of the nodes.
  • the node composition ratio can be the proportion of connected nodes, key nodes and common nodes in the candidate sampling nodes.
  • the server reads the node type identification of the candidate sampling node from the propagation record information, and the node type identification represents the propagation characteristics of the node when the information is propagated.
  • Candidate sampling nodes are divided into connector, key opinion leader (KOL, namely key opinion leader) and normal node according to the node type identification.
  • key nodes are key opinion leaders in the network, which can cause widespread dissemination of information in the network. It can be considered that key nodes spread information to multiple ordinary nodes, and ordinary nodes can measure the dissemination ability of key nodes. Connecting nodes can be used as an intermediary for information dissemination, disseminating information to key nodes, thereby causing the widespread dissemination of information in the network.
  • the server counts the number of connection nodes, key nodes and ordinary nodes in the candidate sampling nodes according to the node type identification, and adds the number of nodes of the three types of candidate sampling nodes to obtain the total number of candidate sampling nodes, thereby calculating the connection nodes and key nodes.
  • the proportion of nodes and ordinary nodes in the candidate sampling nodes that is, the proportion of nodes.
  • step 203 may further include: determining the number of node propagation of each candidate sampling node according to the propagation record information; determining the node type of each candidate sampling node through the number of node propagation; The node type identifier corresponding to the node type.
  • the number of node propagation may be the number of nodes to which candidate sampling nodes will propagate information.
  • the server counts the number of times each candidate sampling node performs information dissemination from the dissemination record information, so as to obtain the number of node disseminations of each candidate sampling node.
  • the server obtains the preset propagation quantity threshold, determines the candidate sampling node whose propagation quantity is greater than the propagation quantity threshold as a key node, and adds the node type identification of the key node; and then finds and propagates the information to the key node from the nodes of the non-key nodes The node that is found is determined as a connecting node, and the node type identifier of the connecting node is added; finally, the candidate sampling node that is neither a key node nor a non-connected node is determined as a normal node, and the node type identifier of the normal node is added.
  • the node type identification may also be added by the propagation record server.
  • the propagation record server counts the number of node propagations of each node before uploading the propagation record information, and adds a node type identifier to each node according to the propagation number threshold. After dividing the social network, the server sets the node type identification of the initial propagation node as invalid. These nodes no longer participate in the calculation of the node composition ratio, and only calculate the node composition ratio based on the node type identification of the candidate sampling nodes; the server can also divide the community After the group network, add the identification of the initial propagation node to the initial propagation node.
  • the number of node propagations of candidate sampling nodes is obtained from the propagation record information.
  • the number of node propagations reflects the ability of the node to propagate information.
  • the node type of candidate sampling nodes can be accurately determined, ensuring the composition ratio of nodes Accuracy of calculation.
  • Step 204 Determine a sampling strategy based on the calculated node composition ratio and the preset node association relationship; the node association relationship includes connecting nodes associated with key nodes, and key nodes associated with ordinary nodes.
  • the sampling strategy is used to instruct the server to sample nodes.
  • the node association relationship may indicate the tendency or trend of information spread among different types of nodes.
  • the node association relationship includes but is not limited to: connecting nodes are associated with key nodes, and key nodes are associated with ordinary nodes.
  • the initial dissemination node is the starting point of information dissemination in the social network, and the server starts sampling from the initial dissemination node.
  • the initial propagation node can be set to collect all or part of it.
  • the node association relationship includes: the initial propagation node is associated with the connecting node, the key node and the ordinary node; the connecting node is associated with the key node; the key node is associated with the ordinary node; the ordinary node is associated with the ordinary node.
  • One type of node has a tendency or tendency to spread information to other types of nodes with which it is associated.
  • the initial propagation node is collected during the first round of sampling; according to the node association relationship, the connection nodes, key nodes, and ordinary nodes can be determined during the second round of sampling; key nodes and ordinary nodes are collected during the third round of sampling; ordinary nodes are collected during the fourth round of sampling.
  • the server can use the relative ratio of the node composition ratio of the connection node, the key node and the ordinary node among the candidate sampling nodes as the sampling ratio of the connection node, the key node and the ordinary node.
  • the server regards the types of nodes collected and the corresponding sampling ratio in each round of sampling as the sampling strategy.
  • Step 205 sampling the initial propagation node and candidate sampling nodes according to the sampling strategy.
  • the server randomly samples the initial propagation nodes and candidate sampling nodes in each social network according to the sampling strategy.
  • Step 206 Visually present the collected initial propagation nodes and candidate sampling nodes to generate an information propagation path graph.
  • the information dissemination path diagram is a diagram that shows the path and dissemination trend of information in the network through some nodes in the network.
  • the server creates a blank initial path graph, and marks the collected initial propagation nodes and candidate sampling nodes in the initial path graph.
  • the initial propagation node can be marked in the center of the initial path graph, and then the connected nodes, key nodes and common nodes in the candidate sampling nodes can be marked in an orderly manner according to the propagation record information.
  • the server connects the initial propagation node with various candidate sampling nodes to obtain an information propagation path diagram.
  • the information dissemination path graph may also include the node identifiers of the initial dissemination node and candidate sampling nodes.
  • the network is first divided into at least one social network based on the dissemination record information, and each social network disseminates information independently of each other, and the candidate sampling nodes and the initial dissemination nodes in each social network are determined; according to the candidate sampling nodes
  • the node type identification of calculates the node composition ratio of the three nodes of the connection node, the key node and the ordinary node among the candidate sampling nodes.
  • sampling strategy is comprehensively determined by the composition ratio of the nodes and the preset node association relationship.
  • the node association relationship includes connecting nodes to key nodes and key nodes to ordinary nodes.
  • the information dissemination path graph is used to show the dissemination path of information in the network, and is composed of collected candidate sampling nodes and initial dissemination nodes. Ensure the accuracy of the analysis of the information propagation path.
  • the above step 202 may include: initializing the node identification of each node in the dissemination record information to the community label of each node; for each node in the network, the community label corresponding to the node and its neighboring nodes Determine the minimum community label in the node; update the community label of the node to the determined minimum community label to iteratively update the community label of each node; when the community label of each node no longer changes, according to each The community label of the node divides the network into at least one social network; according to the propagation time in the dissemination record information, the node with the earliest dissemination time in each social network is determined as the initial dissemination node of each social network, and each social network The node that does not have the earliest propagation time is determined as the candidate sampling node.
  • the node identifier may be the identifier of the node, and the node identifier may be a combination of letters, numbers, and special symbols.
  • the community label of the node can identify the social network to which the node belongs. When two nodes have a propagation relationship, the two nodes are adjacent to each other.
  • the propagation time can be the time when the propagation operation takes place.
  • the minimum community label may be the smallest community label among the community labels of a node and its neighboring nodes.
  • the propagation record information includes the node identifier of each node, and the node identifier of each node is different, and the server may initialize the node identifier of each node to the community label of each node.
  • the node belongs to the social network corresponding to the social tag.
  • the server randomly generates the community label of each node, and the randomly generated community labels are different from each other.
  • the community label needs to be updated iteratively to merge the social network. For each node of the network, the community label corresponding to the node itself and its neighboring nodes is compared to determine the minimum community label of the node and its neighboring nodes.
  • the node ID with the smallest value is selected as the smallest community label by means of numerical comparison; when the node ID is a string, a single character or string is compared in lexicographic order.
  • the size of the ASCII (American Standard Code for Information Interchange, American Standard Code for Information Interchange) code value is used as the standard for character comparison, and the node identifier with the smallest ASCII code value is selected as the smallest community label.
  • each node When each node performs each round of iterative update, it will update the determined minimum community label to its own community label. After each round of iterative update is completed, the minimum community label is determined from the community labels of the node itself and its neighboring nodes, and then the next round of iterative update is performed.
  • the network is divided into at least one social network according to the community labels of the nodes.
  • all nodes have the same community label, and nodes with different community labels are divided into different social networks.
  • the server finds the node with the earliest propagation time in each social network, and determines the found node as the initial propagation node of the social network where the node is located, and the remaining nodes in the social network Set as a candidate sampling node. Every social network has an initial propagation node.
  • Figure 3 is a schematic diagram of dividing a social network in an embodiment. For each node in the network on the left in Figure 3, set the node ID of the node to its own community label, then the community label of the node whose node ID is id0 is id0, and the community label of the node whose node ID is id1 is id1, and so on.
  • the minimum community label For each node, before each round of iterative update, determine the minimum community label from the community labels of itself and its neighboring nodes. For node id0, from the community labels id0, id1 and id3, the string id0 has the smallest ASCII code value, and id0 is selected as the smallest community label; similarly, for nodes id1 and id3, the smallest community label is also id0; For node id2, the smallest community label is id1. Then in the first round of iterative update, the community labels of nodes id0, id1, and id3 are updated to id0, and the community labels of node id2 are updated to id1.
  • the minimum community labels of nodes id0, id1 and id3 will no longer change, and the community labels of these three nodes will not change.
  • the community label of node id2 will become id0 and node id0.
  • the community tags of, id1, id2, and id3 are all the same.
  • nodes id0, id1, id2, and id3 there is no information dissemination between nodes id0, id1, id2, and id3 and other nodes in the network such as node id4, so the community tags of nodes such as node id4 cannot affect nodes id0, id1, id2, and id3.
  • the community labels of nodes id0, id1, id2, and id3 will no longer change.
  • the community labels are all id0 when the iteration is completed.
  • the network can be divided into the three social networks on the right in Figure 3.
  • the network can be split into multiple social networks through the ConnectedComponent() function in the graphx module under the spark framework, and the input of the function is the propagation record information in Table 1 and Table 2.
  • Spark is a fast and universal computing engine designed for large-scale data processing;
  • GraphX is a component used for graphs and graph calculations in Spark;
  • ConnectedComponent() function is a connected component algorithm, which is used for the discovery of social networks in the network.
  • a community label is added to each node, and the minimum community label is determined from the community labels corresponding to the node and the neighboring nodes of the node; the community label of the node is updated to the determined minimum community label to Iteratively update the community label of each node; in the iterative update, the nodes in the same social network communicate with each other so that the community labels of the nodes tend to be the same, which can be used at the end of the iterative update , According to the community label, the network is accurately divided into social networks, and at the same time, the initial propagation nodes and candidate sampling nodes in the social network can be accurately located according to the propagation time, thereby ensuring the accuracy of node sampling.
  • the foregoing step 204 may include:
  • Step 2041 Set the sampling ratio of the initial propagation node in a round of sampling according to the preset sampling ratio.
  • the sampling ratio can be the ratio of the number of samples of a certain type of node to the number of samples in this round of sampling in each round of sampling.
  • the preset sampling ratio may be the ratio of the collected initial propagation nodes to all the initial propagation nodes in a preset round of sampling.
  • the server starts sampling from the initial propagation node, and collects the initial propagation node in one round of sampling.
  • the server can collect all initial propagation nodes, or read the preset sampling ratio, and set the sampling ratio of the initial propagation node in a round of sampling to the preset sampling ratio.
  • Step 2042 Based on the calculated node composition ratio and the preset node association relationship, determine the sampling ratio of the connecting node, the key node and the ordinary node in each round of sampling after the first round of sampling.
  • the server collects candidate sampling nodes, and the types of candidate sampling nodes collected in each round of sampling may be different.
  • the server determines the types of candidate sampling nodes involved in each round of sampling after one round of sampling according to the node association relationship.
  • connection nodes, key nodes and common nodes associated with the initial propagation node can be determined in the second round of sampling; in the third round of sampling, the key nodes associated with the connection nodes collected in the second round of sampling can be collected. Collect the common nodes associated with the key nodes collected in the second round of sampling; in the fourth round of sampling, collect the common nodes associated with the key nodes collected in the third round of sampling.
  • the node composition ratio of connecting nodes is 5%
  • the node composition ratio of key nodes is 5%
  • the node composition ratio of ordinary nodes is 90%.
  • the connected nodes, key nodes, and ordinary nodes are collected at a ratio of 1:1:18 (5%:5%:90%); in the three rounds of sampling, the key nodes and ordinary nodes are collected in accordance with 1:18 (5%:5%:90%). %:90%).
  • Step 2043 Determine the determined sampling ratio of nodes in each round of sampling as a sampling strategy.
  • the server uses the types of nodes that need to be collected in the four rounds of sampling and the corresponding sampling ratio as a sampling strategy.
  • the sampling strategy instructs the server to collect initial propagation nodes and various candidate sampling nodes from various social networks.
  • the sampling ratio of the initial propagation node in a round of sampling is determined according to the preset sampling ratio; then the connection nodes, key nodes, and ordinary nodes in the candidate sampling nodes are determined according to the node composition ratio and node association relationship in a round of sampling
  • the sampling ratio in each subsequent round of sampling is used to obtain the sampling strategy; the determination of the sampling strategy integrates the node association relationship and the node composition ratio to ensure that various nodes can be sampled in a balanced manner.
  • the foregoing step 2042 may include:
  • Step 20421 Based on the calculated node composition ratio and the preset node association relationship, initialize the sampling ratio of the connection node, the key node and the ordinary node in each round of sampling after the first round of sampling.
  • the server determines the types of candidate sampling nodes that need to be collected in each round of sampling after one round of sampling according to the preset node association relationship, and initializes the sampling ratio in each round of sampling.
  • the relative ratio of the node composition ratio of the connecting node, the key node and the ordinary node in the candidate sampling node is directly used as the sampling ratio of the connecting node, the key node and the ordinary node.
  • Step 20422 Compare the node composition ratio with a preset ratio threshold.
  • the ratio threshold may be a preset node composition ratio threshold, which is used to determine whether it is necessary to adjust the sampling ratio of connected nodes, key nodes, and ordinary nodes.
  • the server obtains a preset ratio threshold, and compares the node composition ratios of the connected nodes, key nodes, and ordinary nodes with the ratio thresholds. When there is a node composition ratio that is less than the ratio threshold, it indicates that a certain type of candidate sampling node accounts for a small proportion. In random sampling, some important candidate sampling nodes may be missed, thereby missing the relevant propagation path.
  • Step 20423 When there is a node composition ratio that is less than the ratio threshold, adjust the sampling ratio obtained by initialization according to the preset adjustment value.
  • the preset adjustment value may be a preset sampling ratio adjustment value, and the sampling ratio may be adjusted through addition and subtraction.
  • the server needs to adjust the sampling ratio obtained by initialization.
  • the server obtains the preset adjustment value, and adjusts the sampling ratio obtained by initialization according to the preset adjustment value to obtain the final sampling ratio.
  • the ratio threshold is 3%, and the node composition ratio of key nodes is less than the ratio threshold, indicating that the number of key nodes is small.
  • the node composition ratio is compared with a preset ratio threshold.
  • a preset ratio threshold When there is a node composition ratio less than the ratio threshold, to ensure that as much as possible
  • After collecting this type of node adjust the sampling ratio obtained by initialization according to the preset adjustment value to improve the balance of sampling.
  • step 205 may include: during a round of sampling, sampling the initial propagation node according to the sampling strategy; when sampling after a round of sampling, querying the node locations collected in the previous round of sampling according to the propagation record information. Propagated candidate sampling nodes; according to the sampling strategy, sampling the queried candidate sampling nodes.
  • the server needs to perform multiple rounds of sampling, and in one round of sampling, the initial propagation node is collected according to the sampling strategy.
  • the candidate sampling nodes After querying the candidate sampling nodes, count the number of candidate sampling nodes that were queried; then, according to the sampling ratio in the sampling strategy, calculate the sampling number of various candidate sampling nodes in the round of sampling, and then query according to the calculated sampling number The candidate sampling node that has been reached is sampled.
  • the candidate sampling nodes associated with the collected initial propagation node When performing the second round of sampling, first query the candidate sampling nodes associated with the collected initial propagation node and count the number of them to obtain the number of propagation possible for the second round of sampling. According to the sampling ratio of the second round of sampling recorded in the sampling strategy, calculate the sampling number of connected nodes, key nodes and common nodes in the second round of sampling. Then randomly collect connection nodes, key nodes, and ordinary nodes according to the number of samples.
  • the key nodes and common nodes are randomly collected according to the sampling strategy.
  • the initial propagation node is first sampled according to the sampling strategy; when sampling after a round of sampling, the candidate sampling nodes propagated to by the nodes collected in the previous round of sampling are queried according to the propagation record information, and then according to the sampling
  • the strategy samples the queried candidate sampling nodes to ensure that the collected nodes are all related to the nodes collected in the previous round of sampling, and the continuity of the information propagation path is ensured.
  • step 206 may include: adding the collected initial propagation nodes and candidate sampling nodes to the initial path graph according to the propagation record information; setting the initial propagation nodes in the initial path graph and the connection nodes in the candidate sampling nodes, Display mode of key nodes and ordinary nodes; connect the set up initial propagation node, connecting node, key node and ordinary node to get the information propagation path diagram.
  • the server creates a blank initial path graph, and adds the collected initial propagation nodes and candidate sampling nodes to the initial path graph.
  • the server may arrange the initial propagation nodes and candidate sampling nodes according to the propagation record information, so that nodes with direct propagation relationships have a closer distribution in the initial path graph.
  • Display methods include color display and shape display.
  • the initial propagation node, connection node, key node and common node can be set to different colors.
  • the initial propagation node can be represented by red dots
  • the connecting nodes can be represented by blue dots.
  • shape display the initial propagation node, connection node, key node and common node can be set to different shapes.
  • the initial propagation node can be represented by a circle
  • the connecting node can be represented by a triangle.
  • the server connects the initial propagation nodes, connection nodes, key nodes and common nodes that have been set up in the initial path graph to obtain the information propagation path graph.
  • the server can store the information transmission path diagram in the database, or send the information transmission path diagram to a designated terminal for display, or upload the information transmission path diagram to the blockchain for storage.
  • gephi a JVM-based complex network analysis software, mainly used for various networks and complex systems, interactive visualization and detection of dynamic and hierarchical graphs
  • the server inputs the collected initial propagation node and the propagation record information between the candidate sampling nodes into gephi, and gephi generates an information propagation path graph.
  • the collected initial propagation nodes and candidate sampling nodes are added to the initial path graph, and the connection nodes, key nodes, and ordinary nodes in the initial propagation node and candidate sampling nodes are set to each Different display methods are distinguished to ensure that the generated information dissemination path diagram can more accurately and clearly show the information dissemination path.
  • Fig. 6 is an information propagation path diagram generated in an embodiment. Specifically, referring to Figure 6, the center of the figure is the origin node of the information publisher; the solid circles in the figure are initial propagation nodes; the open circles are connected nodes; the solid squares are key nodes; and the open squares are ordinary nodes. Figure 6 shows the trend and path of the spread of certain information in the network through the collected nodes.
  • the information propagation path analysis method in this application can be applied to the field of big data, and can process massive structured data; in addition, this application also relates to fraud detection in financial technology.
  • the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.
  • this application provides an embodiment of an information propagation path analysis device.
  • the device embodiment corresponds to the method embodiment shown in FIG. Specifically, it can be applied to various electronic devices.
  • the information propagation path analysis device 300 in this embodiment includes: an information acquisition module 301, a network division module 302, a ratio calculation module 303, a strategy determination module 304, a node sampling module 305, and a path generation module 306. in:
  • the information acquisition module 301 is used to acquire network propagation record information. It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned propagation record information, the above-mentioned propagation record information may also be stored in a node of a blockchain.
  • the network dividing module 302 is configured to divide the network into at least one social network according to the dissemination record information, and determine candidate sampling nodes and initial dissemination nodes in each social network.
  • the ratio calculation module 303 is used to calculate the node composition ratio of the connection node, the key node and the ordinary node in the candidate sampling node through the node type identification of the candidate sampling node.
  • the strategy determination module 304 is configured to determine a sampling strategy based on the calculated node composition ratio and the preset node association relationship; the node association relationship includes connecting nodes associated with key nodes, and key nodes associated with ordinary nodes.
  • the node sampling module 305 is configured to sample the initial propagation node and candidate sampling nodes according to the sampling strategy.
  • the path generation module 306 is used to visually present the collected initial propagation nodes and candidate sampling nodes to generate an information propagation path graph.
  • the network is first divided into at least one social network based on the dissemination record information, and each social network disseminates information independently of each other, and the candidate sampling nodes and the initial dissemination nodes in each social network are determined; according to the candidate sampling nodes
  • the node type identification of calculates the node composition ratio of the three nodes of the connection node, the key node and the ordinary node among the candidate sampling nodes.
  • sampling strategy is comprehensively determined by the composition ratio of the nodes and the preset node association relationship.
  • the node association relationship includes connecting nodes to key nodes and key nodes to ordinary nodes.
  • the information dissemination path graph is used to show the dissemination path of information in the network, and is composed of collected candidate sampling nodes and initial dissemination nodes. Ensure the accuracy of the analysis of the information propagation path.
  • the aforementioned network division module 302 includes: a label adding submodule, a minimum determination submodule, a label update submodule, a network division submodule, and a node determination submodule, wherein:
  • the label adding submodule is used to initialize the node identification of each node in the propagation record information to the community label of each node.
  • the minimum determination sub-module is used to determine the minimum community label from the community labels corresponding to the node and the neighboring nodes of the node for each node in the network.
  • the label update submodule is used to update the community label of the node to the determined minimum community label to iteratively update the community label of each node.
  • the network division sub-module is used to divide the network into at least one social network according to the community label of each node when the community label of each node no longer changes.
  • the node determination sub-module is used to determine the node with the earliest propagation time in each social network as the initial propagation node of each social network according to the propagation time in the propagation record information, and determine the node in each social network that does not have the earliest propagation time It is a candidate sampling node.
  • a community label is added to each node, and the minimum community label is determined from the community labels corresponding to the node and the neighboring nodes of the node; the community label of the node is updated to the determined minimum community label to Iteratively update the community label of each node; in the iterative update, the nodes in the same social network communicate with each other so that the community labels of the nodes tend to be the same, which can be used at the end of the iterative update , According to the community label, the network is accurately divided into social networks, and at the same time, the initial propagation nodes and candidate sampling nodes in the social network can be accurately located according to the propagation time, thereby ensuring the accuracy of node sampling.
  • the above-mentioned information propagation path analysis device 300 further includes: a quantity determination module, a type determination module, and an identity addition module, wherein:
  • the quantity determining module is used to determine the node propagation quantity of each candidate sampling node according to the propagation record information.
  • the type determination module is used to determine the node type of each candidate sampling node through the number of node propagation.
  • the identifier adding module is used to add a node type identifier corresponding to the respective node type to each candidate sampling node.
  • the number of node propagations of candidate sampling nodes is obtained from the propagation record information.
  • the number of node propagations reflects the ability of the node to propagate information.
  • the node type of candidate sampling nodes can be accurately determined, ensuring the composition ratio of nodes Accuracy of calculation.
  • the above-mentioned strategy determination module 304 includes: a round setting sub-module, a ratio determination sub-module, and a strategy determination sub-module, wherein:
  • the one-round setting sub-module is used to set the sampling ratio of the initial propagation node in one round of sampling according to the preset sampling ratio.
  • the proportion determination sub-module is used to determine the sampling proportions of connecting nodes, key nodes and common nodes in each round of sampling after one round of sampling based on the calculated node composition proportion and the preset node association relationship.
  • the strategy determination sub-module is used to determine the determined sampling ratio of the nodes in each round of sampling as the sampling strategy.
  • the sampling ratio of the initial propagation node in a round of sampling is determined according to the preset sampling ratio; then the connection nodes, key nodes, and ordinary nodes in the candidate sampling nodes are determined according to the node composition ratio and node association relationship in a round of sampling
  • the sampling ratio in each subsequent round of sampling is used to obtain the sampling strategy; the determination of the sampling strategy integrates the node association relationship and the node composition ratio to ensure that various nodes can be sampled in a balanced manner.
  • the aforementioned ratio determination sub-module includes: a ratio initial unit, a ratio comparison unit, and a ratio adjustment unit, wherein:
  • the ratio initial unit is used to initialize the sampling ratio of the connection node, the key node and the common node in each round of sampling after the first round of sampling based on the calculated node composition ratio and the preset node association relationship.
  • the ratio comparison unit is used to compare the node composition ratio with a preset ratio threshold.
  • the ratio adjustment unit is configured to adjust the sampling ratio obtained by initialization according to the preset adjustment value when there is a node composition ratio smaller than the ratio threshold.
  • the node composition ratio is compared with a preset ratio threshold.
  • a preset ratio threshold When there is a node composition ratio less than the ratio threshold, to ensure that as much as possible
  • After collecting this type of node adjust the sampling ratio obtained by initialization according to the preset adjustment value to improve the balance of sampling.
  • the aforementioned node sampling module 305 includes: an initial sampling submodule, a node query submodule, and a node sampling submodule, where:
  • the initial sampling sub-module is used to sample the initial propagation node according to the sampling strategy during a round of sampling.
  • the node query sub-module is used to query the candidate sampling nodes to which the nodes collected in the previous round of sampling are propagated according to the propagation record information when sampling after a round of sampling.
  • the node sampling sub-module is used to sample the queried candidate sampling nodes according to the sampling strategy.
  • the initial propagation node is first sampled according to the sampling strategy; when sampling after a round of sampling, the candidate sampling nodes propagated to by the nodes collected in the previous round of sampling are queried according to the propagation record information, and then according to the sampling
  • the strategy samples the queried candidate sampling nodes to ensure that the collected nodes are all related to the nodes collected in the previous round of sampling, and the continuity of the information propagation path is ensured.
  • the aforementioned path generation module 306 includes: a node adding submodule, a display setting submodule, and a node connection submodule, where:
  • the node adding sub-module is used to add the collected initial propagation nodes and candidate sampling nodes to the initial path graph according to the propagation record information.
  • the display setting sub-module is used to set the display mode of connection nodes, key nodes and common nodes in the initial propagation node and candidate sampling nodes in the initial path graph.
  • the node connection sub-module is used to connect the set up initial propagation node, connecting node, key node and common node to obtain the information propagation path graph.
  • the collected initial propagation nodes and candidate sampling nodes are added to the initial path graph, and the connection nodes, key nodes, and ordinary nodes in the initial propagation node and candidate sampling nodes are set to each Different display methods are distinguished to ensure that the generated information dissemination path diagram can more accurately and clearly show the information dissemination path.
  • FIG. 8 is a block diagram of the basic structure of the computer device in this embodiment.
  • the computer device 4 includes a memory 41, a processor 42, and a network interface 43 that are mutually communicatively connected via a system bus. It should be pointed out that the figure only shows the computer device 4 with the components 41-43, but it should be understood that it is not required to implement all the shown components, and more or fewer components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions.
  • Its hardware includes, but is not limited to, a microprocessor, a dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC Application Specific Integrated Circuit
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Processor
  • the computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
  • the memory 41 includes at least one type of computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium includes flash memory, hard disk, and multimedia card. , Card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), Programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc.
  • the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4.
  • the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk equipped on the computer device 4, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) card, flash card (Flash Card), etc.
  • the memory 41 may also include both the internal storage unit of the computer device 4 and its external storage device.
  • the memory 41 is generally used to store an operating system and various application software installed in the computer device 4, such as computer-readable instructions for an information propagation path analysis method.
  • the memory 41 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 42 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips.
  • the processor 42 is generally used to control the overall operation of the computer device 4.
  • the processor 42 is configured to run computer-readable instructions or process data stored in the memory 41, for example, run the computer-readable instructions of the information propagation path analysis method.
  • the network interface 43 may include a wireless network interface or a wired network interface, and the network interface 43 is generally used to establish a communication connection between the computer device 4 and other electronic devices.
  • the computer device provided in this embodiment can execute the steps of the information propagation path analysis method described above.
  • the steps of the information propagation path analysis method may be the steps in the information propagation path analysis method of each of the foregoing embodiments.
  • the network is first divided into at least one social network based on the dissemination record information, each social network disseminates information independently of each other, and candidate sampling nodes and initial dissemination nodes in each social network are determined; according to the candidate sampling nodes
  • the node type identification of calculates the node composition ratio of the three nodes of the connection node, the key node and the ordinary node among the candidate sampling nodes.
  • sampling strategy is comprehensively determined by the composition ratio of the nodes and the preset node association relationship.
  • the node association relationship includes connecting nodes to key nodes and key nodes to ordinary nodes.
  • the information dissemination path graph is used to show the dissemination path of information in the network, and is composed of collected candidate sampling nodes and initial dissemination nodes. Ensure the accuracy of the analysis of the information propagation path.
  • This application also provides another implementation manner, that is, a computer-readable storage medium that stores computer-readable instructions for information propagation path analysis, and the computer-readable storage medium stores computer-readable instructions for information propagation path analysis.
  • the computer-readable instructions may be executed by at least one processor, so that the at least one processor executes the steps of the computer-readable instructions for information propagation path analysis as described above.
  • the network is first divided into at least one social network based on the dissemination record information, each social network disseminates information independently of each other, and candidate sampling nodes and initial dissemination nodes in each social network are determined; according to the candidate sampling nodes
  • the node type identification of calculates the node composition ratio of the three nodes of the connection node, the key node and the ordinary node among the candidate sampling nodes.
  • sampling strategy is comprehensively determined by the composition ratio of the nodes and the preset node association relationship.
  • the node association relationship includes connecting nodes to key nodes and key nodes to ordinary nodes.
  • the information dissemination path graph is used to show the dissemination path of information in the network, and is composed of collected candidate sampling nodes and initial dissemination nodes. Ensure the accuracy of the analysis of the information propagation path.
  • the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. ⁇
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present application.

Abstract

An information propagation path analysis method, comprising: acquiring propagation record information of a network (201); dividing the network into at least one community network according to the propagation record information, and determining candidate sampling nodes and an initial propagation node in each community network (202); calculating the node composition ratio of connection nodes, key nodes and ordinary nodes in the candidate sampling nodes by means of node type identifiers of the candidate sampling nodes (203); determining a sampling policy on the basis of the calculated node composition ratio and a preset node association relationship, wherein the node association relationship comprises the connection nodes being associated with the key nodes and the key nodes being associated with the ordinary nodes (204); sampling the initial propagation nodes and the candidate sampling nodes according to the sampling policy (205); and performing visual presentation on collected initial propagation nodes and candidate sampling nodes to generate an information propagation path map (206). In addition, the propagation record information can be stored in a blockchain. By means of the method, the accuracy of information propagation path analysis is improved.

Description

信息传播路径分析方法、装置、计算机设备及存储介质Information propagation path analysis method, device, computer equipment and storage medium
本申请要求于2020年06月24日提交中国专利局、申请号为202010592379.5,发明名称为“信息传播路径分析方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 24, 2020, the application number is 202010592379.5, and the invention title is "information propagation path analysis method, device, computer equipment and storage medium". The entire content of the application is approved The reference is incorporated in this application.
技术领域Technical field
本申请涉及大数据,尤其涉及一种信息传播路径分析方法、装置、计算机设备及存储介质。This application relates to big data, and in particular to an information propagation path analysis method, device, computer equipment and storage medium.
背景技术Background technique
在大数据中,人或者计算机等都可以通过复杂的联系形成网络,人或者计算机可以视作网络中的节点,网络中可以传输数据、信息等。随着互联网技术的发展,常常需要对网络中信息的传播进行分析。例如,在营销网络中,通过分析产品在网络中的传播路径,可以使产品信息在网络中以更小的成本进行更大范围的传播。疫情中的病毒传播、欺诈链路等都可以通过网络的方式进行分析。In big data, people or computers can form a network through complex connections. People or computers can be regarded as nodes in the network, and data and information can be transmitted in the network. With the development of Internet technology, it is often necessary to analyze the spread of information on the Internet. For example, in a marketing network, by analyzing the spreading path of a product in the network, product information can be spread in a wider range on the network at a lower cost. The spread of viruses and fraudulent links in the epidemic can all be analyzed through the Internet.
网络中有一种重要节点KOL(Key Opinion Leader,关键意见领袖),KOL拥有更多、更准确的信息,且被更多相关群体所接受或信任,KOL可以将信息传播到网络中更多的节点、并对节点产生较大的影响。对网络中信息传播路径的分析通常是针对KOL进行节点采样,以可视化的方式呈现反应信息传播趋势的子网络。然而,发明人意识到,传统的网络传播路径分析技术,通常仅仅是对预先定义的KOL进行手动采样,采样节点类型单一,在不同的场景中还需要手动调整采样节点;或者对全部节点进行无差别随机采样,当节点中KOL占比较低时,容易错过关键传播路径。由此可见,传统的网络传播路径分析技术准确性较低。There is an important node KOL (Key Opinion Leader) in the network. KOL has more and more accurate information and is accepted or trusted by more relevant groups. KOL can spread information to more nodes in the network , And have a greater impact on the node. The analysis of the information dissemination path in the network is usually to sample KOL nodes, and present the sub-networks that reflect the trend of information dissemination in a visual way. However, the inventor realizes that traditional network propagation path analysis techniques usually only manually sample pre-defined KOLs. The type of sampling node is single, and the sampling node needs to be manually adjusted in different scenarios; or all nodes need to be adjusted manually. Differential random sampling, when the KOL in the node is relatively low, it is easy to miss the critical propagation path. It can be seen that the accuracy of traditional network propagation path analysis technology is low.
发明内容Summary of the invention
本申请实施例的目的在于提出一种信息传播路径分析方法、装置、计算机设备及存储介质,以解决信息传播路径分析准确性较低的问题。The purpose of the embodiments of the present application is to propose an information propagation path analysis method, device, computer equipment, and storage medium, so as to solve the problem of low accuracy of information propagation path analysis.
为了解决上述技术问题,本申请实施例提供一种信息传播路径分析方法,采用了如下所述的技术方案:In order to solve the above technical problems, an embodiment of the present application provides an information propagation path analysis method, which adopts the following technical solutions:
获取网络的传播记录信息;Obtain information on the distribution record of the network;
根据所述传播记录信息将所述网络划分为至少一个社群网络,并确定候选采样节点和各社群网络中的初始传播节点;Dividing the network into at least one social network according to the dissemination record information, and determining candidate sampling nodes and initial dissemination nodes in each social network;
通过所述候选采样节点的节点类型标识,计算所述候选采样节点中连接节点、关键节点和普通节点的节点组成比例;Calculating the node composition ratio of the connecting node, the key node and the common node in the candidate sampling node through the node type identification of the candidate sampling node;
基于计算得到的节点组成比例和预设的节点关联关系确定采样策略;所述节点关联关系包括连接节点关联关键节点,关键节点关联普通节点;The sampling strategy is determined based on the calculated node composition ratio and the preset node association relationship; the node association relationship includes connecting nodes associated with key nodes, and key nodes associated with ordinary nodes;
按照所述采样策略对所述初始传播节点和所述候选采样节点进行采样;Sampling the initial propagation node and the candidate sampling node according to the sampling strategy;
将采集到的初始传播节点和候选采样节点进行可视化呈现,生成信息传播路径图。Visually present the collected initial propagation nodes and candidate sampling nodes to generate an information propagation path diagram.
为了解决上述技术问题,本申请实施例还提供一种信息传播路径分析装置,包括:In order to solve the above technical problems, an embodiment of the present application also provides an information propagation path analysis device, including:
信息获取模块,用于获取网络的传播记录信息;The information acquisition module is used to acquire the dissemination record information of the network;
网络划分模块,用于根据所述传播记录信息将所述网络划分为至少一个社群网络,并确定候选采样节点和各社群网络中的初始传播节点;A network division module, configured to divide the network into at least one social network according to the dissemination record information, and determine candidate sampling nodes and initial dissemination nodes in each social network;
比例计算模块,用于通过所述候选采样节点的节点类型标识,计算所述候选采样节点中连接节点、关键节点和普通节点的节点组成比例;A ratio calculation module, configured to calculate the node composition ratio of connecting nodes, key nodes, and ordinary nodes in the candidate sampling node through the node type identification of the candidate sampling node;
策略确定模块,用于基于计算得到的节点组成比例和预设的节点关联关系确定采样策 略;所述节点关联关系包括连接节点关联关键节点,关键节点关联普通节点;The strategy determination module is used to determine the sampling strategy based on the calculated node composition ratio and the preset node association relationship; the node association relationship includes connecting nodes associated with key nodes, and key nodes associated with ordinary nodes;
节点采样模块,用于按照所述采样策略对所述初始传播节点和所述候选采样节点进行采样;A node sampling module, configured to sample the initial propagation node and the candidate sampling node according to the sampling strategy;
路径生成模块,用于将采集到的初始传播节点和候选采样节点进行可视化呈现,生成信息传播路径图。The path generation module is used to visually present the collected initial propagation nodes and candidate sampling nodes to generate an information propagation path graph.
为了解决上述技术问题,本申请实施例还提供一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:In order to solve the foregoing technical problems, an embodiment of the present application further provides a computer device, including a memory and a processor. The memory stores computer-readable instructions. When the processor executes the computer-readable instructions, the following steps are implemented:
获取网络的传播记录信息;Obtain information on the distribution record of the network;
根据所述传播记录信息将所述网络划分为至少一个社群网络,并确定候选采样节点和各社群网络中的初始传播节点;Dividing the network into at least one social network according to the dissemination record information, and determining candidate sampling nodes and initial dissemination nodes in each social network;
通过所述候选采样节点的节点类型标识,计算所述候选采样节点中连接节点、关键节点和普通节点的节点组成比例;Calculating the node composition ratio of the connecting node, the key node and the common node in the candidate sampling node through the node type identification of the candidate sampling node;
基于计算得到的节点组成比例和预设的节点关联关系确定采样策略;所述节点关联关系包括连接节点关联关键节点,关键节点关联普通节点;The sampling strategy is determined based on the calculated node composition ratio and the preset node association relationship; the node association relationship includes connecting nodes associated with key nodes, and key nodes associated with ordinary nodes;
按照所述采样策略对所述初始传播节点和所述候选采样节点进行采样;Sampling the initial propagation node and the candidate sampling node according to the sampling strategy;
将采集到的初始传播节点和候选采样节点进行可视化呈现,生成信息传播路径图。Visually present the collected initial propagation nodes and candidate sampling nodes to generate an information propagation path diagram.
为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下步骤:In order to solve the foregoing technical problems, embodiments of the present application also provide a computer-readable storage medium, which stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the following steps are implemented:
获取网络的传播记录信息;Obtain information on the distribution record of the network;
根据所述传播记录信息将所述网络划分为至少一个社群网络,并确定候选采样节点和各社群网络中的初始传播节点;Dividing the network into at least one social network according to the dissemination record information, and determining candidate sampling nodes and initial dissemination nodes in each social network;
通过所述候选采样节点的节点类型标识,计算所述候选采样节点中连接节点、关键节点和普通节点的节点组成比例;Calculating the node composition ratio of the connecting node, the key node and the common node in the candidate sampling node through the node type identification of the candidate sampling node;
基于计算得到的节点组成比例和预设的节点关联关系确定采样策略;所述节点关联关系包括连接节点关联关键节点,关键节点关联普通节点;The sampling strategy is determined based on the calculated node composition ratio and the preset node association relationship; the node association relationship includes connecting nodes associated with key nodes, and key nodes associated with ordinary nodes;
按照所述采样策略对所述初始传播节点和所述候选采样节点进行采样;Sampling the initial propagation node and the candidate sampling node according to the sampling strategy;
将采集到的初始传播节点和候选采样节点进行可视化呈现,生成信息传播路径图。Visually present the collected initial propagation nodes and candidate sampling nodes to generate an information propagation path diagram.
与现有技术相比,本申请实施例主要有以下有益效果:先根据传播记录信息将网络划分为至少一个社群网络,每个社群网络相互独立地传播信息,并确定候选采样节点和各社群网络中的初始传播节点;根据候选采样节点的节点类型标识计算出候选采样节点中,连接节点、关键节点和普通节点三种节点的节点组成比例。在采集初始传播节点或候选采样节点时,依照采样策略进行采样,采样策略由节点组成比例和预设的节点关联关系综合确定,其中节点关联关系包括连接节点关联关键节点、关键节点关联普通节点,从而保证可以对各类节点进行均衡的采样,提高了节点采样的准确性;信息传播路径图用于呈现信息在网络中的传播路径,是由采集到的候选采样节点和初始传播节点构成,从而保证了信息传播路径分析的准确性。Compared with the prior art, the embodiments of this application mainly have the following beneficial effects: first divide the network into at least one social network according to the dissemination record information, each social network disseminates information independently of each other, and determines candidate sampling nodes and each community The initial propagation node in the group network; according to the node type identification of the candidate sampling node, calculate the node composition ratio of the candidate sampling node, the connection node, the key node and the ordinary node. When collecting initial propagation nodes or candidate sampling nodes, sampling is performed according to the sampling strategy. The sampling strategy is comprehensively determined by the composition ratio of the nodes and the preset node association relationship. The node association relationship includes connecting nodes to key nodes and key nodes to ordinary nodes. This ensures that all kinds of nodes can be sampled in a balanced manner, and the accuracy of node sampling is improved; the information dissemination path graph is used to show the dissemination path of information in the network, and is composed of collected candidate sampling nodes and initial dissemination nodes. Ensure the accuracy of the analysis of the information propagation path.
附图说明Description of the drawings
为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the solution in this application more clearly, the following will briefly introduce the drawings used in the description of the embodiments of the application. Obviously, the drawings in the following description are some embodiments of the application. Ordinary technicians can obtain other drawings based on these drawings without creative work.
图1是本申请可以应用于其中的示例性系统架构图;Figure 1 is an exemplary system architecture diagram to which the present application can be applied;
图2是根据本申请的信息传播路径分析方法的一个实施例的流程图;Fig. 2 is a flowchart of an embodiment of an information propagation path analysis method according to the present application;
图3是一个实施例中划分社群网络的示意图;FIG. 3 is a schematic diagram of dividing a social network in an embodiment;
图4是图2中步骤204的一种具体实施方式的流程图;FIG. 4 is a flowchart of a specific implementation of step 204 in FIG. 2;
图5是图4中步骤2042的一种具体实施方式的流程图;FIG. 5 is a flowchart of a specific implementation manner of step 2042 in FIG. 4;
图6是一个实施例中生成的信息传播路径图的示意图;Fig. 6 is a schematic diagram of an information propagation path diagram generated in an embodiment;
图7是根据本申请的信息传播路径分析装置的一个实施例的结构示意图;Fig. 7 is a schematic structural diagram of an embodiment of an information propagation path analysis device according to the present application;
图8是根据本申请的计算机设备的一个实施例的结构示意图。Fig. 8 is a schematic structural diagram of an embodiment of a computer device according to the present application.
具体实施方式detailed description
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。Unless otherwise defined, all technical and scientific terms used herein have the same meanings as commonly understood by those skilled in the technical field of the application; the terms used in the specification of the application herein are only for describing specific embodiments. The purpose is not to limit the application; the terms "including" and "having" in the description and claims of the application and the above-mentioned description of the drawings and any variations thereof are intended to cover non-exclusive inclusions. The terms "first", "second", etc. in the specification and claims of this application or the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific sequence.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference to "embodiments" herein means that a specific feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described herein can be combined with other embodiments.
为了使本技术领域的人员更好地理解本申请方案,下面将结合附图,对本申请实施例中的技术方案进行清楚、完整地描述。In order to enable those skilled in the art to better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings.
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如网页浏览器应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。The user can use the terminal devices 101, 102, and 103 to interact with the server 105 through the network 104 to receive or send messages and so on. Various communication client applications, such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, and social platform software, can be installed on the terminal devices 101, 102, and 103.
终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。The terminal devices 101, 102, 103 may be various electronic devices with a display screen and support web browsing, including but not limited to smart phones, tablets, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image experts compress standard audio layer 4) players, laptop portable computers and desktop computers, etc.
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上显示的页面提供支持的后台服务器。The server 105 may be a server that provides various services, for example, a background server that provides support for pages displayed on the terminal devices 101, 102, and 103.
需要说明的是,本申请实施例所提供的信息传播路径分析方法一般由服务器执行,相应地,信息传播路径分析装置一般设置于服务器中。It should be noted that the information propagation path analysis method provided by the embodiments of the present application is generally executed by a server, and accordingly, the information propagation path analysis device is generally set in the server.
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there can be any number of terminal devices, networks, and servers.
继续参考图2,示出了根据本申请的信息传播路径分析方法的一个实施例的流程图。所述的信息传播路径分析方法,包括以下步骤:Continuing to refer to FIG. 2, there is shown a flowchart of an embodiment of an information propagation path analysis method according to the present application. The information propagation path analysis method includes the following steps:
步骤201,获取网络的传播记录信息。Step 201: Obtain network propagation record information.
在本实施例中,信息传播路径分析方法运行于其上的电子设备(例如图1所示的服务器)可以通过有线连接方式或者无线连接方式获取网络的传播记录信息。需要指出的是,上述无线连接方式可以包括但不限于3G/4G连接、WiFi连接、蓝牙连接、WiMAX连接、Zigbee连接、UWB(ultra wideband)连接、以及其他现在已知或将来开发的无线连接方式。In this embodiment, the electronic device (for example, the server shown in FIG. 1) on which the information propagation path analysis method runs can obtain the propagation record information of the network through a wired connection or a wireless connection. It should be pointed out that the above wireless connection methods can include, but are not limited to, 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other currently known or future wireless connection methods .
其中,传播记录信息可以是记录某种信息在网络内各节点间传播情况的信息。传播记录信息可以包括节点标识、节点类型标识、节点之间的传播关系和传播时间。传播记录信息还可以包括节点的其他属性信息,例如,将人作为网络中的节点时,传播记录信息还可 以包括人的性别、年龄等信息。传播关系可以记录节点间是否发生了信息传播,以及发生信息传播时的传播方向。Among them, the dissemination record information may be information that records the dissemination of a certain kind of information among nodes in the network. The propagation record information may include node identification, node type identification, propagation relationship between nodes, and propagation time. The dissemination record information may also include other attribute information of the node. For example, when a person is a node in the network, the dissemination record information may also include information such as the person's gender and age. The propagation relationship can record whether information dissemination occurs between nodes, and the direction of propagation when information dissemination occurs.
具体地,传播记录服务器存储了一部分的传播记录信息,例如节点标识。传播记录服务器监测并记录网络中的信息传播,得到信息传播方面的传播记录信息。传播记录服务器将传播记录信息进行汇总,并发送至用于执行信息传播路径分析的服务器。Specifically, the propagation record server stores a part of the propagation record information, such as the node identifier. The dissemination record server monitors and records the information dissemination in the network, and obtains the dissemination record information of the information dissemination. The dissemination record server aggregates dissemination record information and sends it to the server for performing information dissemination path analysis.
执行信息传播路径分析的服务器与传播记录服务器可以是同一个服务器,也可以是不同的服务器。The server that performs information propagation path analysis and the propagation record server can be the same server or different servers.
表1和表2为一个实施例中的传播记录信息,具体地,参照表1,由营销活动中的客户作为网络中的节点,客户ID(Identity document,身份标识号)作为节点标识,性别、年龄为节点的属性信息,还包括节点类型标识。表2记录了节点之间的传播关系和传播时间。Table 1 and Table 2 are the dissemination record information in an embodiment. Specifically, referring to Table 1, the customers in the marketing activities are used as nodes in the network, and the customer ID (Identity document, identity identification number) is used as the node identifier, gender, Age is the attribute information of the node, and also includes the node type identification. Table 2 records the propagation relationship and propagation time between nodes.
节点标识(客户ID)Node ID (Customer ID) 性别gender 年龄age 节点类型标识Node type identification
Id1Id1 malemale 23twenty three 关键节点Key node
Id2Id2 femalefemale 2525 普通节点Normal node
表1Table 1
传播节点Propagation node 被传播节点Propagated node 传播时间Propagation time
Id1Id1 Id2Id2 2020-02-0412:00:002020-02-0412:00:00
表2Table 2
在一个实施例中,传播记录信息可以存储在数据库中,服务器从数据库中获取传播记录信息。需要强调的是,为进一步保证上述传播记录信息的私密和安全性,上述传播记录信息还可以存储于一区块链的节点中。In one embodiment, the propagation record information may be stored in a database, and the server obtains the propagation record information from the database. It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned propagation record information, the above-mentioned propagation record information may also be stored in a node of a blockchain.
由人作为网络中的节点、且由人作为载体实现信息传播时,传播记录信息可以依靠人工进行统计,然后上传至服务器。When people are used as nodes in the network and people are used as carriers to realize information dissemination, the dissemination record information can be counted manually and then uploaded to the server.
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain is essentially a decentralized database. It is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
步骤202,根据传播记录信息将网络划分为至少一个社群网络,并确定候选采样节点和各社群网络中的初始传播节点。Step 202: Divide the network into at least one social network according to the dissemination record information, and determine candidate sampling nodes and initial dissemination nodes in each social network.
其中,社群网络可以是网络的子网络,每个社群网络彼此独立地传播信息。初始传播节点可以是社群网络中最早产生传播操作的节点,社群网络中初始传播节点以外的节点作为候选采样节点。Among them, the social network can be a sub-network of the network, and each social network disseminates information independently of each other. The initial dissemination node can be the node that produces the earliest dissemination operation in the social network, and nodes other than the initial dissemination node in the social network are used as candidate sampling nodes.
具体地,服务器依据传播记录信息将网络划分为至少一个社群网络,同一个社群网络中的节点可以通过传播关系相联系,构成一个封闭的子网络;不同社群网络的节点无法通过传播关系相联系。Specifically, the server divides the network into at least one social network based on the dissemination record information. Nodes in the same social network can be connected through a dissemination relationship to form a closed sub-network; nodes of different social networks cannot use the dissemination relationship related.
服务器根据传播记录信息中的传播时间,在各社群网络中查找具有最早传播时间的节点,并将查找的节点作为各社群网络的初始传播节点,将各社群网络中非初始传播节点的节点作为候选采样节点。According to the propagation time in the propagation record information, the server finds the node with the earliest propagation time in each social network, and uses the searched node as the initial propagation node of each social network, and the node that is not the initial propagation node in each social network as a candidate Sampling node.
另有,初始传播节点并非整个网络中信息传播的原点。信息传播的原点节点为信息的“制造者”或“发布者”,原点节点将信息传播至初始传播节点。以上对社群网络彼此独立地传播信息的相关描述,是在不考虑原点节点的情况下成立的。In addition, the initial dissemination node is not the origin of information dissemination in the entire network. The origin node of information dissemination is the "producer" or "publisher" of the information, and the origin node propagates the information to the initial dissemination node. The above description of social networks disseminating information independently of each other is established without considering the origin node.
举例说明,手机厂商A在其官方微博“A手机”发布了一款手机的营销微博,则“A手机”为整个网络中的原点节点;“A手机”将微博传播至多个初始传播节点,初始传播节点继续传播该微博。在本申请中,不对原点节点“A手机”进行分析处理。For example, if mobile phone manufacturer A publishes a mobile phone marketing Weibo on its official Weibo "A mobile phone", "A mobile phone" is the origin node in the entire network; "A mobile phone" spreads Weibo to multiple initial spreads Node, the initial dissemination node continues to disseminate the Weibo. In this application, the origin node "A mobile phone" is not analyzed and processed.
步骤203,通过候选采样节点的节点类型标识,计算候选采样节点中连接节点、关键节点和普通节点的节点组成比例。Step 203: Calculate the node composition ratio of the connecting node, the key node and the ordinary node in the candidate sampling node through the node type identification of the candidate sampling node.
其中,节点类型标识用于表征节点的传播特性,依据节点的传播特性可以将候选采样节点划分为连接节点、关键节点和普通节点。节点组成比例可以是连接节点、关键节点和普通节点在候选采样节点中所占的比例。Among them, the node type identifier is used to characterize the propagation characteristics of the node, and the candidate sampling nodes can be divided into connection nodes, key nodes, and ordinary nodes according to the propagation characteristics of the nodes. The node composition ratio can be the proportion of connected nodes, key nodes and common nodes in the candidate sampling nodes.
具体地,服务器从传播记录信息中读取候选采样节点的节点类型标识,节点类型标识表征了节点在信息传播时的传播特性。候选采样节点依据节点类型标识分为连接节点(connector)、关键节点(Key Opinion Leader,KOL,即关键意见领袖)和普通节点(normal)。Specifically, the server reads the node type identification of the candidate sampling node from the propagation record information, and the node type identification represents the propagation characteristics of the node when the information is propagated. Candidate sampling nodes are divided into connector, key opinion leader (KOL, namely key opinion leader) and normal node according to the node type identification.
其中,关键节点是网络中的关键意见领袖,可以将信息在网络中引起广泛传播,可以认为关键节点将信息传播至多个普通节点,而普通节点可以衡量关键节点的传播能力。连接节点可以作为信息传播的中介,将信息传播给关键节点,从而引起信息在网络中的广泛传播。Among them, key nodes are key opinion leaders in the network, which can cause widespread dissemination of information in the network. It can be considered that key nodes spread information to multiple ordinary nodes, and ordinary nodes can measure the dissemination ability of key nodes. Connecting nodes can be used as an intermediary for information dissemination, disseminating information to key nodes, thereby causing the widespread dissemination of information in the network.
服务器根据节点类型标识统计候选采样节点中连接节点、关键节点和普通节点各自的节点数量,并将三类候选采样节点的节点数量相加得到候选采样节点的总数量,从而计算得到连接节点、关键节点和普通节点在候选采样节点中所占的比例,即节点组成比例。The server counts the number of connection nodes, key nodes and ordinary nodes in the candidate sampling nodes according to the node type identification, and adds the number of nodes of the three types of candidate sampling nodes to obtain the total number of candidate sampling nodes, thereby calculating the connection nodes and key nodes. The proportion of nodes and ordinary nodes in the candidate sampling nodes, that is, the proportion of nodes.
在一个实施例中,上述步骤203之前,还可以包括:根据传播记录信息确定各候选采样节点的节点传播数量;通过节点传播数量确定各候选采样节点的节点类型;给各候选采样节点添加与各自节点类型相对应的节点类型标识。In one embodiment, before step 203, it may further include: determining the number of node propagation of each candidate sampling node according to the propagation record information; determining the node type of each candidate sampling node through the number of node propagation; The node type identifier corresponding to the node type.
其中,节点传播数量可以是候选采样节点将信息传播到的节点的数量。Wherein, the number of node propagation may be the number of nodes to which candidate sampling nodes will propagate information.
具体地,服务器从传播记录信息中统计各候选采样节点进行信息传播的次数,从而得到各候选采样节点的节点传播数量。Specifically, the server counts the number of times each candidate sampling node performs information dissemination from the dissemination record information, so as to obtain the number of node disseminations of each candidate sampling node.
服务器获取预设的传播数量阈值,将节点传播数量大于传播数量阈值的候选采样节点确定为关键节点,添加关键节点的节点类型标识;再从非关键节点的节点中,查找将信息传播至关键节点的节点,并将查找到的节点确定为连接节点,添加连接节点的节点类型标识;最后,将既非关键节点又非连接节点的候选采样节点确定为普通节点,添加普通节点的节点类型标识。The server obtains the preset propagation quantity threshold, determines the candidate sampling node whose propagation quantity is greater than the propagation quantity threshold as a key node, and adds the node type identification of the key node; and then finds and propagates the information to the key node from the nodes of the non-key nodes The node that is found is determined as a connecting node, and the node type identifier of the connecting node is added; finally, the candidate sampling node that is neither a key node nor a non-connected node is determined as a normal node, and the node type identifier of the normal node is added.
在一个实施例中,节点类型标识还可以由传播记录服务器添加。传播记录服务器在上传传播记录信息之前统计各节点的节点传播数量,根据传播数量阈值给各节点添加节点类型标识。划分社群网络后,服务器将初始传播节点的节点类型标识设置为无效,这些节点不再参与节点组成比例的计算,仅根据候选采样节点的节点类型标识计算节点组成比例;服务器还可以在划分社群网络后,给初始传播节点添加初始传播节点的标识。In an embodiment, the node type identification may also be added by the propagation record server. The propagation record server counts the number of node propagations of each node before uploading the propagation record information, and adds a node type identifier to each node according to the propagation number threshold. After dividing the social network, the server sets the node type identification of the initial propagation node as invalid. These nodes no longer participate in the calculation of the node composition ratio, and only calculate the node composition ratio based on the node type identification of the candidate sampling nodes; the server can also divide the community After the group network, add the identification of the initial propagation node to the initial propagation node.
本实施例中,从传播记录信息中获取候选采样节点的节点传播数量,节点传播数量反应了节点传播信息的能力,根据节点传播数量可以准确地确定候选采样节点的节点类型,保证了节点组成比例计算的准确性。In this embodiment, the number of node propagations of candidate sampling nodes is obtained from the propagation record information. The number of node propagations reflects the ability of the node to propagate information. According to the number of node propagations, the node type of candidate sampling nodes can be accurately determined, ensuring the composition ratio of nodes Accuracy of calculation.
步骤204,基于计算得到的节点组成比例和预设的节点关联关系确定采样策略;节点关联关系包括连接节点关联关键节点,关键节点关联普通节点。Step 204: Determine a sampling strategy based on the calculated node composition ratio and the preset node association relationship; the node association relationship includes connecting nodes associated with key nodes, and key nodes associated with ordinary nodes.
其中,采样策略用于指示服务器进行节点采样。节点关联关系可以表示信息在不同类型的节点间传播的倾向或趋势,节点关联关系包括但不限于:连接节点关联关键节点,关键节点关联普通节点。Among them, the sampling strategy is used to instruct the server to sample nodes. The node association relationship may indicate the tendency or trend of information spread among different types of nodes. The node association relationship includes but is not limited to: connecting nodes are associated with key nodes, and key nodes are associated with ordinary nodes.
具体地,初始传播节点为社群网络中信息传播的起点,服务器从初始传播节点开始采样。初始传播节点可以设置全部采集,也可以设置为部分采集。Specifically, the initial dissemination node is the starting point of information dissemination in the social network, and the server starts sampling from the initial dissemination node. The initial propagation node can be set to collect all or part of it.
节点关联关系包括:初始传播节点关联连接节点、关键节点和普通节点;连接节点关 联关键节点;关键节点关联普通节点;普通节点关联普通节点。一种类型的节点具有将信息传播至它所关联的其他类型节点的倾向或趋势。The node association relationship includes: the initial propagation node is associated with the connecting node, the key node and the ordinary node; the connecting node is associated with the key node; the key node is associated with the ordinary node; the ordinary node is associated with the ordinary node. One type of node has a tendency or tendency to spread information to other types of nodes with which it is associated.
一轮采样时采集初始传播节点;依据节点关联关系,可以确定二轮采样时采集连接节点、关键节点和普通节点;三轮采样时采集关键节点和普通节点;四轮采样时采集普通节点。The initial propagation node is collected during the first round of sampling; according to the node association relationship, the connection nodes, key nodes, and ordinary nodes can be determined during the second round of sampling; key nodes and ordinary nodes are collected during the third round of sampling; ordinary nodes are collected during the fourth round of sampling.
在每一轮采样中,服务器可以将候选采样节点中连接节点、关键节点和普通节点的节点组成比例的相对比值,作为连接节点、关键节点和普通节点的采样比例。In each round of sampling, the server can use the relative ratio of the node composition ratio of the connection node, the key node and the ordinary node among the candidate sampling nodes as the sampling ratio of the connection node, the key node and the ordinary node.
服务器将每一轮采样中,采集的节点的种类以及对应的采样比例作为采样策略。The server regards the types of nodes collected and the corresponding sampling ratio in each round of sampling as the sampling strategy.
步骤205,按照采样策略对初始传播节点和候选采样节点进行采样。 Step 205, sampling the initial propagation node and candidate sampling nodes according to the sampling strategy.
具体地,服务器按照采样策略,对各社群网络中的初始传播节点和候选采样节点进行随机采样。Specifically, the server randomly samples the initial propagation nodes and candidate sampling nodes in each social network according to the sampling strategy.
在进行随机采样时,如果一个节点未被采集到,则抛弃该节点相关的传播路径,即,由该节点延伸出来的其他节点在采样中不再考虑。When performing random sampling, if a node is not collected, the propagation path related to the node is discarded, that is, other nodes extending from the node are no longer considered in the sampling.
步骤206,将采集到的初始传播节点和候选采样节点进行可视化呈现,生成信息传播路径图。Step 206: Visually present the collected initial propagation nodes and candidate sampling nodes to generate an information propagation path graph.
其中,信息传播路径图是以通过网络中的部分节点,展示信息在网络中的传播路径及传播趋势的图。Among them, the information dissemination path diagram is a diagram that shows the path and dissemination trend of information in the network through some nodes in the network.
具体地,服务器新建空白的初始路径图,将采集到的初始传播节点和候选采样节点标注在初始路径图中。初始传播节点可以标注在初始路径图的中央,再依据传播记录信息有序地标注候选采样节点中的连接节点、关键节点和普通节点。最后,服务器将初始传播节点和各类候选采样节点相连接,得到信息传播路径图。信息传播路径图中还可以包括初始传播节点及候选采样节点的节点标识。Specifically, the server creates a blank initial path graph, and marks the collected initial propagation nodes and candidate sampling nodes in the initial path graph. The initial propagation node can be marked in the center of the initial path graph, and then the connected nodes, key nodes and common nodes in the candidate sampling nodes can be marked in an orderly manner according to the propagation record information. Finally, the server connects the initial propagation node with various candidate sampling nodes to obtain an information propagation path diagram. The information dissemination path graph may also include the node identifiers of the initial dissemination node and candidate sampling nodes.
本实施例中,先根据传播记录信息将网络划分为至少一个社群网络,每个社群网络相互独立地传播信息,并确定候选采样节点和各社群网络中的初始传播节点;根据候选采样节点的节点类型标识计算出候选采样节点中,连接节点、关键节点和普通节点三种节点的节点组成比例。在采集初始传播节点或候选采样节点时,依照采样策略进行采样,采样策略由节点组成比例和预设的节点关联关系综合确定,其中节点关联关系包括连接节点关联关键节点、关键节点关联普通节点,从而保证可以对各类节点进行均衡的采样,提高了节点采样的准确性;信息传播路径图用于呈现信息在网络中的传播路径,是由采集到的候选采样节点和初始传播节点构成,从而保证了信息传播路径分析的准确性。In this embodiment, the network is first divided into at least one social network based on the dissemination record information, and each social network disseminates information independently of each other, and the candidate sampling nodes and the initial dissemination nodes in each social network are determined; according to the candidate sampling nodes The node type identification of, calculates the node composition ratio of the three nodes of the connection node, the key node and the ordinary node among the candidate sampling nodes. When collecting initial propagation nodes or candidate sampling nodes, sampling is performed according to the sampling strategy. The sampling strategy is comprehensively determined by the composition ratio of the nodes and the preset node association relationship. The node association relationship includes connecting nodes to key nodes and key nodes to ordinary nodes. This ensures that all kinds of nodes can be sampled in a balanced manner, and the accuracy of node sampling is improved; the information dissemination path graph is used to show the dissemination path of information in the network, and is composed of collected candidate sampling nodes and initial dissemination nodes. Ensure the accuracy of the analysis of the information propagation path.
进一步的,上述步骤202可以包括:将传播记录信息中各节点的节点标识初始化为各节点的社群标签;对于网络中的每个节点,从节点及节点的相邻节点所对应的社群标签中确定最小社群标签;将节点的社群标签更新为确定的最小社群标签以对每个节点的社群标签进行迭代更新;当每个节点的社群标签不再变化时,根据每个节点的社群标签将网络划分为至少一个社群网络;根据传播记录信息中的传播时间,将各社群网络中具有最早传播时间的节点确定为各社群网络的初始传播节点,将各社群网络中不具有最早传播时间的节点确定为候选采样节点。Further, the above step 202 may include: initializing the node identification of each node in the dissemination record information to the community label of each node; for each node in the network, the community label corresponding to the node and its neighboring nodes Determine the minimum community label in the node; update the community label of the node to the determined minimum community label to iteratively update the community label of each node; when the community label of each node no longer changes, according to each The community label of the node divides the network into at least one social network; according to the propagation time in the dissemination record information, the node with the earliest dissemination time in each social network is determined as the initial dissemination node of each social network, and each social network The node that does not have the earliest propagation time is determined as the candidate sampling node.
其中,节点标识可以是节点的标识,节点标识可以是字母、数字、特殊符号等结合的字符串。节点的社群标签可以标识节点归属的社群网络。当两个节点存在传播关系时,两个节点互为相邻节点。传播时间可以是传播操作发生的时间。最小社群标签可以是节点及其相邻节点的社群标签中最小的社群标签。Among them, the node identifier may be the identifier of the node, and the node identifier may be a combination of letters, numbers, and special symbols. The community label of the node can identify the social network to which the node belongs. When two nodes have a propagation relationship, the two nodes are adjacent to each other. The propagation time can be the time when the propagation operation takes place. The minimum community label may be the smallest community label among the community labels of a node and its neighboring nodes.
具体地,传播记录信息中包括各节点的节点标识,各节点的节点标识各不相同,服务器可以将各节点的节点标识初始化为各节点的社群标签。节点归属于社群标签所对应的社群网络。Specifically, the propagation record information includes the node identifier of each node, and the node identifier of each node is different, and the server may initialize the node identifier of each node to the community label of each node. The node belongs to the social network corresponding to the social tag.
在一个实施例中,服务器随机生成各节点的社群标签,随机生成的社群标签互不相同。In one embodiment, the server randomly generates the community label of each node, and the randomly generated community labels are different from each other.
初始化之后网络中存在与节点数量等同的社群网络,需要对社群标签进行迭代更新以 合并社群网络。对于网络的每个节点,比较节点自身及其相邻节点所对应的社群标签,从而确定该节点及其相邻节点中的最小社群标签。After initialization, there is a social network equal to the number of nodes in the network, and the community label needs to be updated iteratively to merge the social network. For each node of the network, the community label corresponding to the node itself and its neighboring nodes is compared to determine the minimum community label of the node and its neighboring nodes.
当节点标识为数字时,通过数值比较的方式,选取具有最小数值的节点标识作为最小社群标签;当节点标识为字符串时,按照字典次序对单个字符或字符串进行比较大小的操作,可以是以ASCII(American Standard Code for Information Interchange,美国信息交换标准代码)码值的大小作为字符比较的标准,选取具有最小ASCII码值的节点标识作为最小社群标签。When the node ID is a number, the node ID with the smallest value is selected as the smallest community label by means of numerical comparison; when the node ID is a string, a single character or string is compared in lexicographic order. The size of the ASCII (American Standard Code for Information Interchange, American Standard Code for Information Interchange) code value is used as the standard for character comparison, and the node identifier with the smallest ASCII code value is selected as the smallest community label.
每个节点在进行每轮的迭代更新时,将确定的最小社群标签更新为自己的社群标签。在每一轮迭代更新完成后,重新从节点自身及其相邻节点的社群标签中确定最小社群标签,然后进行下一轮迭代更新。When each node performs each round of iterative update, it will update the determined minimum community label to its own community label. After each round of iterative update is completed, the minimum community label is determined from the community labels of the node itself and its neighboring nodes, and then the next round of iterative update is performed.
当所有节点的社群标签在迭代更新中不再变化时,根据节点的社群标签将网络划分为至少一个社群网络。在同一个社群网络内,所有节点具有相同的社群标签,具有不同社群标签的节点被划分至不同的社群网络。When the community labels of all nodes no longer change in the iterative update, the network is divided into at least one social network according to the community labels of the nodes. In the same social network, all nodes have the same community label, and nodes with different community labels are divided into different social networks.
服务器根据传播记录信息中的传播时间,查找每个社群网络内具有最早传播时间的节点,并将查找到的节点确定为节点所在社群网络的初始传播节点,将社群网络内的其余节点设置为候选采样节点。每个社群网络都具有一个初始传播节点。According to the propagation time in the propagation record information, the server finds the node with the earliest propagation time in each social network, and determines the found node as the initial propagation node of the social network where the node is located, and the remaining nodes in the social network Set as a candidate sampling node. Every social network has an initial propagation node.
举例说明,下边的图3为一个实施例中划分社群网络的示意图。对于图3中左边网络中的每一个节点,将节点的节点标识设置为自己的社群标签,则节点标识为id0的节点的社群标签为id0,节点标识为id1的节点的社群标签为id1,以此类推。For example, Figure 3 below is a schematic diagram of dividing a social network in an embodiment. For each node in the network on the left in Figure 3, set the node ID of the node to its own community label, then the community label of the node whose node ID is id0 is id0, and the community label of the node whose node ID is id1 is id1, and so on.
对于每一个节点,在每一轮迭代更新前,从自身及其相邻节点的社群标签中,确定最小社群标签。对于节点id0,从社群标签id0、id1和id3中,字符串id0具有最小的ASCII码值,选取id0作为最小社群标签;同理,对于节点id1和id3,最小社群标签也为id0;而对于节点id2,最小社群标签为id1。则在第一轮迭代更新时,节点id0、id1和id3的社群标签更新为id0,节点id2的社群标签更新为id1。For each node, before each round of iterative update, determine the minimum community label from the community labels of itself and its neighboring nodes. For node id0, from the community labels id0, id1 and id3, the string id0 has the smallest ASCII code value, and id0 is selected as the smallest community label; similarly, for nodes id1 and id3, the smallest community label is also id0; For node id2, the smallest community label is id1. Then in the first round of iterative update, the community labels of nodes id0, id1, and id3 are updated to id0, and the community labels of node id2 are updated to id1.
在第二轮迭代更新时,节点id0、id1和id3的最小社群标签不再变化,这三个节点的社群标签也不发生变化,节点id2的社群标签则会变为id0,节点id0、id1、id2和id3的社群标签全部相同。In the second round of iterative update, the minimum community labels of nodes id0, id1 and id3 will no longer change, and the community labels of these three nodes will not change. The community label of node id2 will become id0 and node id0. The community tags of, id1, id2, and id3 are all the same.
节点id0、id1、id2和id3与网络中节点id4等其他节点没有发生信息传播,因此节点id4等节点的社群标签无法影响到节点id0、id1、id2和id3。此时,节点id0、id1、id2和id3的社群标签不会再发生变化,迭代完毕,迭代完毕时社群标签均为id0。There is no information dissemination between nodes id0, id1, id2, and id3 and other nodes in the network such as node id4, so the community tags of nodes such as node id4 cannot affect nodes id0, id1, id2, and id3. At this time, the community labels of nodes id0, id1, id2, and id3 will no longer change. After the iteration is completed, the community labels are all id0 when the iteration is completed.
同理,节点id4、id5、id6和id7在迭代完毕时,社群标签为id4;节点id8、id9和id10在迭代完毕时,社群标签为id8。根据社群标签,可以将网络划分为图3中右边的三个社群网络。In the same way, the community label of nodes id4, id5, id6, and id7 is id4 when the iteration is completed, and the community label of nodes id8, id9, and id10 is id8 when the iteration is completed. According to the community tags, the network can be divided into the three social networks on the right in Figure 3.
在一个实施例中,可以通过spark框架下,graphx模块中的ConnectedComponent()函数,将网络拆分成多个社群网络,函数的输入为表1和表2中的传播记录信息。Spark是专为大规模数据处理而设计的快速通用的计算引擎;GraphX是Spark中用于图和图计算的组件;ConnectedComponent()函数即连通体算法,用于网络中社群网络的发现。In one embodiment, the network can be split into multiple social networks through the ConnectedComponent() function in the graphx module under the spark framework, and the input of the function is the propagation record information in Table 1 and Table 2. Spark is a fast and universal computing engine designed for large-scale data processing; GraphX is a component used for graphs and graph calculations in Spark; ConnectedComponent() function is a connected component algorithm, which is used for the discovery of social networks in the network.
本实施例中,给各节点添加社群标签,并从节点及节点的相邻节点所对应的社群标签中确定最小社群标签;将节点的社群标签更新为确定的最小社群标签以对每个节点的社群标签进行迭代更新;在迭代更新中,同一个社群网络中的节点通过互相之间的传播关系,使得节点的社群标签趋向相同,由此可在迭代更新结束时,根据社群标签将网络准确地划分为社群网络,同时可以依据传播时间,准确地定位出社群网络中的初始传播节点和候选采样节点,从而保证了节点采样的准确性。In this embodiment, a community label is added to each node, and the minimum community label is determined from the community labels corresponding to the node and the neighboring nodes of the node; the community label of the node is updated to the determined minimum community label to Iteratively update the community label of each node; in the iterative update, the nodes in the same social network communicate with each other so that the community labels of the nodes tend to be the same, which can be used at the end of the iterative update , According to the community label, the network is accurately divided into social networks, and at the same time, the initial propagation nodes and candidate sampling nodes in the social network can be accurately located according to the propagation time, thereby ensuring the accuracy of node sampling.
进一步的,如图4所示,上述步骤204可以包括:Further, as shown in FIG. 4, the foregoing step 204 may include:
步骤2041,根据预设采样比例设置一轮采样中初始传播节点的采样比例。Step 2041: Set the sampling ratio of the initial propagation node in a round of sampling according to the preset sampling ratio.
其中,采样比例可以是每轮采样中,某类节点的采样数量在该轮采样的采样数量中所 占的比例。预设采样比例可以是预设的、一轮采样中,采集的初始传播节点在全部初始传播节点中所占的比例。Among them, the sampling ratio can be the ratio of the number of samples of a certain type of node to the number of samples in this round of sampling in each round of sampling. The preset sampling ratio may be the ratio of the collected initial propagation nodes to all the initial propagation nodes in a preset round of sampling.
具体地,服务器从初始传播节点开始采样,一轮采样中采集初始传播节点。服务器可以采集全部初始传播节点,也可以读取预设采样比例,将一轮采样中初始传播节点的采样比例设置为预设采样比例。Specifically, the server starts sampling from the initial propagation node, and collects the initial propagation node in one round of sampling. The server can collect all initial propagation nodes, or read the preset sampling ratio, and set the sampling ratio of the initial propagation node in a round of sampling to the preset sampling ratio.
为了从全局角度观察信息传播路径,优选采集全部的初始传播节点,即,优选地将一轮采样中初始传播节点的采样比例设置为1。In order to observe the information propagation path from a global perspective, it is preferable to collect all the initial propagation nodes, that is, it is preferable to set the sampling ratio of the initial propagation nodes in one round of sampling to 1.
步骤2042,基于计算得到的节点组成比例和预设的节点关联关系,确定连接节点、关键节点和普通节点在一轮采样之后的每轮采样中的采样比例。Step 2042: Based on the calculated node composition ratio and the preset node association relationship, determine the sampling ratio of the connecting node, the key node and the ordinary node in each round of sampling after the first round of sampling.
一轮采样之后的每轮采样中,服务器采集候选采样节点,且每轮采样中采集的候选采样节点的种类可以不相同。服务器根据节点关联关系确定一轮采样之后的每轮采样中所涉及的候选采样节点的种类。In each round of sampling after one round of sampling, the server collects candidate sampling nodes, and the types of candidate sampling nodes collected in each round of sampling may be different. The server determines the types of candidate sampling nodes involved in each round of sampling after one round of sampling according to the node association relationship.
基于节点关联关系,可以确定二轮采样中采集与初始传播节点相关联的连接节点、关键节点和普通节点;三轮采样时,采集与二轮采样中采集到的连接节点相关联的关键节点,采集与二轮采样中采集到的关键节点相关联的普通节点;四轮采样时,采集与三轮采样中采集到的关键节点相关联的普通节点。Based on the node association relationship, the connection nodes, key nodes and common nodes associated with the initial propagation node can be determined in the second round of sampling; in the third round of sampling, the key nodes associated with the connection nodes collected in the second round of sampling can be collected. Collect the common nodes associated with the key nodes collected in the second round of sampling; in the fourth round of sampling, collect the common nodes associated with the key nodes collected in the third round of sampling.
一轮采样之后,根据每轮采样中所涉及的候选采样节点的种类,以及连接节点、关键节点和普通节点的节点组成比例,计算每轮采样中连接节点、关键节点和普通节点的采样比例。After one round of sampling, according to the types of candidate sampling nodes involved in each round of sampling, and the composition ratio of connecting nodes, key nodes and ordinary nodes, calculate the sampling ratio of connecting nodes, key nodes and ordinary nodes in each round of sampling.
举例说明,假设候选采样节点中连接节点的节点组成比例为5%,关键节点的节点组成比例为5%,普通节点的节点组成比例为90%。二轮采样时,连接节点、关键节点和普通节点按照1:1:18(5%:5%:90%)的比例进行采集;三轮采样时,关键节点和普通节点按照1:18(5%:90%)的比例进行采集。For example, suppose that among the candidate sampling nodes, the node composition ratio of connecting nodes is 5%, the node composition ratio of key nodes is 5%, and the node composition ratio of ordinary nodes is 90%. In the second round of sampling, the connected nodes, key nodes, and ordinary nodes are collected at a ratio of 1:1:18 (5%:5%:90%); in the three rounds of sampling, the key nodes and ordinary nodes are collected in accordance with 1:18 (5%:5%:90%). %:90%).
在此声明,因连接节点和普通节点直接传播到的普通节点数量较少,因此连接节点关联普通节点、普通节点关联普通节点的关联关系,在实际采样中可以忽略。同时,四轮采样对应了信息的四轮传播,在四轮传播之后,信息的传播已经较弱,可以不做分析。It is stated here that since the number of ordinary nodes directly spread to connecting nodes and ordinary nodes is small, the association relationship between connecting nodes and ordinary nodes to ordinary nodes and ordinary nodes to ordinary nodes can be ignored in actual sampling. At the same time, the four rounds of sampling correspond to the four rounds of dissemination of information. After the four rounds of dissemination, the dissemination of information is already weak and no analysis is needed.
步骤2043,将确定的每轮采样中节点的采样比例确定为采样策略。Step 2043: Determine the determined sampling ratio of nodes in each round of sampling as a sampling strategy.
具体地,服务器将四轮采样中需要采集的节点的种类以及对应的采样比例,作为采样策略。采样策略指示服务器从各社群网络中采集初始传播节点和各类候选采样节点。Specifically, the server uses the types of nodes that need to be collected in the four rounds of sampling and the corresponding sampling ratio as a sampling strategy. The sampling strategy instructs the server to collect initial propagation nodes and various candidate sampling nodes from various social networks.
本实施例中,根据预设采样比例确定一轮采样中初始传播节点的采样比例;再根据节点组成比例和节点关联关系,确定候选采样节点中的连接节点、关键节点和普通节点在一轮采样之后的每轮采样中的采样比例,从而得到采样策略;采样策略的确定综合了节点关联关系以及节点组成比例,保证了能够对各类节点进行均衡的采样。In this embodiment, the sampling ratio of the initial propagation node in a round of sampling is determined according to the preset sampling ratio; then the connection nodes, key nodes, and ordinary nodes in the candidate sampling nodes are determined according to the node composition ratio and node association relationship in a round of sampling The sampling ratio in each subsequent round of sampling is used to obtain the sampling strategy; the determination of the sampling strategy integrates the node association relationship and the node composition ratio to ensure that various nodes can be sampled in a balanced manner.
进一步的,如图5所示,上述步骤2042可以包括:Further, as shown in FIG. 5, the foregoing step 2042 may include:
步骤20421,基于计算得到的节点组成比例和预设的节点关联关系,初始化连接节点、关键节点和普通节点在一轮采样之后的每轮采样中的采样比例。Step 20421: Based on the calculated node composition ratio and the preset node association relationship, initialize the sampling ratio of the connection node, the key node and the ordinary node in each round of sampling after the first round of sampling.
具体地,服务器根据预设的节点关联关系确定一轮采样之后的每轮采样中需要采集的候选采样节点的种类,并初始化每轮采样中的采样比例。Specifically, the server determines the types of candidate sampling nodes that need to be collected in each round of sampling after one round of sampling according to the preset node association relationship, and initializes the sampling ratio in each round of sampling.
在初始化时,直接将候选采样节点中连接节点、关键节点和普通节点的节点组成比例的相对比值,作为连接节点、关键节点和普通节点的采样比例。In the initialization, the relative ratio of the node composition ratio of the connecting node, the key node and the ordinary node in the candidate sampling node is directly used as the sampling ratio of the connecting node, the key node and the ordinary node.
步骤20422,将节点组成比例与预设的比例阈值相比较。Step 20422: Compare the node composition ratio with a preset ratio threshold.
其中,比例阈值可以是预设的节点组成比例阈值,用于确定是否需要调整连接节点、关键节点和普通节点的采样比例。Wherein, the ratio threshold may be a preset node composition ratio threshold, which is used to determine whether it is necessary to adjust the sampling ratio of connected nodes, key nodes, and ordinary nodes.
具体地,服务器获取预设的比例阈值,将连接节点、关键节点和普通节点的节点组成比例分别与比例阈值相比较。当存在小于比例阈值的节点组成比例时,表明某类候选采样节点所占比例较小,在随机采样中可能会错过某些重要的候选采样节点,从而错过相关的 传播路径。Specifically, the server obtains a preset ratio threshold, and compares the node composition ratios of the connected nodes, key nodes, and ordinary nodes with the ratio thresholds. When there is a node composition ratio that is less than the ratio threshold, it indicates that a certain type of candidate sampling node accounts for a small proportion. In random sampling, some important candidate sampling nodes may be missed, thereby missing the relevant propagation path.
步骤20423,当存在小于比例阈值的节点组成比例时,根据预设调整值调整初始化得到的采样比例。Step 20423: When there is a node composition ratio that is less than the ratio threshold, adjust the sampling ratio obtained by initialization according to the preset adjustment value.
其中,预设调整值可以是预设的采样比例调整值,可以通过加减的方式调整采样比例。Wherein, the preset adjustment value may be a preset sampling ratio adjustment value, and the sampling ratio may be adjusted through addition and subtraction.
具体地,当存在小于比例阈值的节点组成比例时,服务器需要调整初始化得到的采样比例。在调整时,服务器获取预设调整值,根据预设调整值调整初始化得到的采样比例,得到最终的采样比例。Specifically, when there is a node composition ratio that is less than the ratio threshold, the server needs to adjust the sampling ratio obtained by initialization. During adjustment, the server obtains the preset adjustment value, and adjusts the sampling ratio obtained by initialization according to the preset adjustment value to obtain the final sampling ratio.
举例说明,假设节点组成比例为连接节点:关键节点:普通节点=9%:1%:90%,比例阈值为3%,关键节点的节点组成比例小于比例阈值,说明关键节点数量较少。如果直接将节点组成比例作为采样比例,则二轮采样时采样比例为连接节点:关键节点:普通节点=9%:1%:90%。如果存在一个关键节点关联大量普通节点,且该关键节点在随机抽样中被错过,则会丢失较重要的传播路径。因此,可以根据预设调整值增加关键节点的采样比例,例如预设调整值为2%,则将关键节点的采样比例调整为3%。For example, suppose the node composition ratio is connected node: key node: ordinary node=9%:1%:90%, the ratio threshold is 3%, and the node composition ratio of key nodes is less than the ratio threshold, indicating that the number of key nodes is small. If the node composition ratio is directly used as the sampling ratio, the sampling ratio in the second round of sampling is connected nodes: key nodes: ordinary nodes = 9%: 1%: 90%. If there is a key node associated with a large number of ordinary nodes, and the key node is missed in random sampling, the more important propagation path will be lost. Therefore, the sampling ratio of the key nodes can be increased according to the preset adjustment value. For example, if the preset adjustment value is 2%, the sampling ratio of the key nodes is adjusted to 3%.
在减少连接节点和普通节点的采样比例时,可以优先减少普通节点的采样比例,因为普通节点通常占有最大数量。此时采样比例为连接节点:关键节点:普通节点=9%:3%:88%。同理,若关键节点的节点组成比例为2%,可以在二轮采样时将关键节点的采样比例调整为4%;若关键节点的节点组成比例为0.5%,可以在二轮采样时将关键节点的采样比例进行两次上调,调整为4.5%,在此不再多做举例。When reducing the sampling ratio of connected nodes and ordinary nodes, the sampling ratio of ordinary nodes can be reduced first, because ordinary nodes usually occupy the largest number. At this time, the sampling ratio is connected nodes: key nodes: ordinary nodes = 9%: 3%: 88%. Similarly, if the node composition ratio of key nodes is 2%, the sampling ratio of key nodes can be adjusted to 4% in the second round of sampling; if the node composition ratio of key nodes is 0.5%, the key nodes can be adjusted in the second round of sampling. The sampling ratio of the node is adjusted up twice, adjusted to 4.5%, and no more examples will be given here.
本实施例中,在初始化连接节点、关键节点和普通节点的采样比例后,将节点组成比例与预设的比例阈值相比较,当存在小于比例阈值的节点组成比例时,为保证尽可能多地采集到该类节点,根据预设调整值调整初始化得到的采样比例,以提高采样的均衡。In this embodiment, after initializing the sampling ratios of connection nodes, key nodes, and ordinary nodes, the node composition ratio is compared with a preset ratio threshold. When there is a node composition ratio less than the ratio threshold, to ensure that as much as possible After collecting this type of node, adjust the sampling ratio obtained by initialization according to the preset adjustment value to improve the balance of sampling.
进一步的,上述步骤205可以包括:在进行一轮采样时,按照采样策略对初始传播节点进行采样;当进行一轮采样之后的采样时,根据传播记录信息查询上轮采样中采集到的节点所传播到的候选采样节点;按照采样策略,对查询到的候选采样节点进行采样。Further, the above step 205 may include: during a round of sampling, sampling the initial propagation node according to the sampling strategy; when sampling after a round of sampling, querying the node locations collected in the previous round of sampling according to the propagation record information. Propagated candidate sampling nodes; according to the sampling strategy, sampling the queried candidate sampling nodes.
具体地,服务器需进行多轮采样,在一轮采样时,按照采样策略采集初始传播节点。Specifically, the server needs to perform multiple rounds of sampling, and in one round of sampling, the initial propagation node is collected according to the sampling strategy.
在进行一轮采样之后的采样时,先根据传播记录信息查询上一轮采样中采集到的节点所传播到的候选采样节点。When sampling after a round of sampling, first query the candidate sampling nodes to which the nodes collected in the previous round of sampling are propagated according to the propagation record information.
在查询到候选采样节点后,统计查询到的候选采样节点的数量;再根据采样策略中的采样比例,计算该轮采样中各类候选采样节点的采样数量,然后按照计算得到的采样数量对查询到的候选采样节点进行采样。After querying the candidate sampling nodes, count the number of candidate sampling nodes that were queried; then, according to the sampling ratio in the sampling strategy, calculate the sampling number of various candidate sampling nodes in the round of sampling, and then query according to the calculated sampling number The candidate sampling node that has been reached is sampled.
在进行二轮采样时,先查询采集到的初始传播节点所关联到的候选采样节点并统计其数量,得到二轮采样的可传播数量。根据采样策略中记录的二轮采样的采样比例,计算二轮采样时连接节点、关键节点和普通节点的采样数量。再根据采样数量随机采集连接节点、关键节点和普通节点。When performing the second round of sampling, first query the candidate sampling nodes associated with the collected initial propagation node and count the number of them to obtain the number of propagation possible for the second round of sampling. According to the sampling ratio of the second round of sampling recorded in the sampling strategy, calculate the sampling number of connected nodes, key nodes and common nodes in the second round of sampling. Then randomly collect connection nodes, key nodes, and ordinary nodes according to the number of samples.
在进行三轮采样时,在二轮采样中采集到的连接节点和关键节点所关联的候选采样节点中,按照采样策略随机采集关键节点和普通节点。In the three rounds of sampling, among the candidate sampling nodes associated with the connected nodes and key nodes collected in the second round of sampling, the key nodes and common nodes are randomly collected according to the sampling strategy.
在进行四轮采样时,在三轮采样中采集到的关键节点所关联的候选采样节点中,按照采样策略随机采集普通节点。In the four rounds of sampling, among the candidate sampling nodes associated with the key nodes collected in the three rounds of sampling, ordinary nodes are randomly collected according to the sampling strategy.
本实施例中,先按照采样策略对初始传播节点进行采样;在进行一轮采样之后的采样时,根据传播记录信息查询上轮采样中采集到的节点所传播到的候选采样节点,再按照采样策略对查询到的候选采样节点进行采样,以此保证采集到的节点均与上轮采样中采集到的节点相关,保证了信息传播路径的连续性。In this embodiment, the initial propagation node is first sampled according to the sampling strategy; when sampling after a round of sampling, the candidate sampling nodes propagated to by the nodes collected in the previous round of sampling are queried according to the propagation record information, and then according to the sampling The strategy samples the queried candidate sampling nodes to ensure that the collected nodes are all related to the nodes collected in the previous round of sampling, and the continuity of the information propagation path is ensured.
进一步的,上述步骤206可以包括:按照传播记录信息,将采集到的初始传播节点和候选采样节点添加到初始路径图中;设置初始路径图中的初始传播节点和候选采样节点中的连接节点、关键节点及普通节点的显示方式;连接设置完毕的初始传播节点、连接节点、关键节点和普通节点,得到信息传播路径图。Further, the above step 206 may include: adding the collected initial propagation nodes and candidate sampling nodes to the initial path graph according to the propagation record information; setting the initial propagation nodes in the initial path graph and the connection nodes in the candidate sampling nodes, Display mode of key nodes and ordinary nodes; connect the set up initial propagation node, connecting node, key node and ordinary node to get the information propagation path diagram.
具体地,服务器新建空白的初始路径图,将采集到的初始传播节点和候选采样节点添加到初始路径图中。服务器可以按照传播记录信息排列初始传播节点和候选采样节点,使得具有直接传播关系的节点,在初始路径图中具有较近的分布。Specifically, the server creates a blank initial path graph, and adds the collected initial propagation nodes and candidate sampling nodes to the initial path graph. The server may arrange the initial propagation nodes and candidate sampling nodes according to the propagation record information, so that nodes with direct propagation relationships have a closer distribution in the initial path graph.
为了更直观、更明显地突出信息传播路径,服务器将不同类型的节点设置为不同的显示方式。显示方式包括颜色显示和形状显示。采用颜色显示时,初始传播节点、连接节点、关键节点和普通节点可以设置为不同的颜色。比如,初始传播节点可以用红色的点表示,连接节点采用蓝色的点表示。采用形状显示时,初始传播节点、连接节点、关键节点和普通节点可以设置为不同的形状。比如,初始传播节点可以用圆形表示,连接节点采用三角形表示。In order to highlight the information dissemination path more intuitively and clearly, the server sets different types of nodes to different display modes. Display methods include color display and shape display. When using color display, the initial propagation node, connection node, key node and common node can be set to different colors. For example, the initial propagation node can be represented by red dots, and the connecting nodes can be represented by blue dots. When using shape display, the initial propagation node, connection node, key node and common node can be set to different shapes. For example, the initial propagation node can be represented by a circle, and the connecting node can be represented by a triangle.
服务器连接初始路径图中设置完毕的初始传播节点、连接节点、关键节点和普通节点,从而得到信息传播路径图。The server connects the initial propagation nodes, connection nodes, key nodes and common nodes that have been set up in the initial path graph to obtain the information propagation path graph.
服务器可以将信息传路径图存储在数据库中,也可以将信息传播路径图发送至指定的终端进行展示,还可以将信息传播路径图上传至区块链中进行保存。The server can store the information transmission path diagram in the database, or send the information transmission path diagram to a designated terminal for display, or upload the information transmission path diagram to the blockchain for storage.
在一个实施例中,可以利用gephi(一款基于JVM的复杂网络分析软件,主要用于各种网络和复杂系统,动态和分层图的交互可视化与探测开源工具)生成信息传播路径图。服务器将采集到的初始传播节点和候选采样节点间的传播记录信息输入gephi,由gephi生成信息传播路径图。In one embodiment, gephi (a JVM-based complex network analysis software, mainly used for various networks and complex systems, interactive visualization and detection of dynamic and hierarchical graphs) can be used to generate an information propagation path diagram. The server inputs the collected initial propagation node and the propagation record information between the candidate sampling nodes into gephi, and gephi generates an information propagation path graph.
本实施例中,按照传播记录信息,将采集到的初始传播节点和候选采样节点添加到初始路径图中,并将初始传播节点和候选采样节点中的连接节点、关键节点及普通节点设置为各不相同的显示方式加以区分,保证了生成的信息传播路径图可以更准确清晰地展示信息传播路径。In this embodiment, according to the propagation record information, the collected initial propagation nodes and candidate sampling nodes are added to the initial path graph, and the connection nodes, key nodes, and ordinary nodes in the initial propagation node and candidate sampling nodes are set to each Different display methods are distinguished to ensure that the generated information dissemination path diagram can more accurately and clearly show the information dissemination path.
图6为一个实施例中生成的信息传播路径图。具体地,参照图6,图中心为信息的发布者原点节点;图中的实心圆为初始传播节点;空心圆为连接节点;实心正方形为关键节点;空心正方形为普通节点。图6可以通过采集到的节点展示某信息在网络中的传播趋势和传播路径。Fig. 6 is an information propagation path diagram generated in an embodiment. Specifically, referring to Figure 6, the center of the figure is the origin node of the information publisher; the solid circles in the figure are initial propagation nodes; the open circles are connected nodes; the solid squares are key nodes; and the open squares are ordinary nodes. Figure 6 shows the trend and path of the spread of certain information in the network through the collected nodes.
本申请中的信息传播路径分析方法可应用于大数据领域,可以对海量的结构化数据进行处理;此外,本申请还涉及金融科技中的欺诈检测。The information propagation path analysis method in this application can be applied to the field of big data, and can process massive structured data; in addition, this application also relates to fraud detection in financial technology.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by instructing relevant hardware through computer-readable instructions, which can be stored in a computer-readable storage medium. When the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Among them, the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowchart of the drawings are displayed in sequence as indicated by the arrows, these steps are not necessarily performed in sequence in the order indicated by the arrows. Unless there is a clear description in this article, the execution of these steps is not strictly limited in order, and they can be executed in other orders. Moreover, at least part of the steps in the flowchart of the drawings may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times, and the order of execution is also It is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
进一步参考图7,作为对上述图2所示方法的实现,本申请提供了一种信息传播路径分析装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。With further reference to FIG. 7, as an implementation of the method shown in FIG. 2, this application provides an embodiment of an information propagation path analysis device. The device embodiment corresponds to the method embodiment shown in FIG. Specifically, it can be applied to various electronic devices.
如图7所示,本实施例所述的信息传播路径分析装置300包括:信息获取模块301、网络划分模块302、比例计算模块303、策略确定模块304、节点采样模块305和路径生成模块306,其中:As shown in FIG. 7, the information propagation path analysis device 300 in this embodiment includes: an information acquisition module 301, a network division module 302, a ratio calculation module 303, a strategy determination module 304, a node sampling module 305, and a path generation module 306. in:
信息获取模块301,用于获取网络的传播记录信息。需要强调的是,为进一步保证上述传播记录信息的私密和安全性,上述传播记录信息还可以存储于一区块链的节点中。The information acquisition module 301 is used to acquire network propagation record information. It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned propagation record information, the above-mentioned propagation record information may also be stored in a node of a blockchain.
网络划分模块302,用于根据传播记录信息将网络划分为至少一个社群网络,并确定候选采样节点和各社群网络中的初始传播节点。The network dividing module 302 is configured to divide the network into at least one social network according to the dissemination record information, and determine candidate sampling nodes and initial dissemination nodes in each social network.
比例计算模块303,用于通过候选采样节点的节点类型标识,计算候选采样节点中连接节点、关键节点和普通节点的节点组成比例。The ratio calculation module 303 is used to calculate the node composition ratio of the connection node, the key node and the ordinary node in the candidate sampling node through the node type identification of the candidate sampling node.
策略确定模块304,用于基于计算得到的节点组成比例和预设的节点关联关系确定采样策略;节点关联关系包括连接节点关联关键节点,关键节点关联普通节点。The strategy determination module 304 is configured to determine a sampling strategy based on the calculated node composition ratio and the preset node association relationship; the node association relationship includes connecting nodes associated with key nodes, and key nodes associated with ordinary nodes.
节点采样模块305,用于按照采样策略对初始传播节点和候选采样节点进行采样。The node sampling module 305 is configured to sample the initial propagation node and candidate sampling nodes according to the sampling strategy.
路径生成模块306,用于将采集到的初始传播节点和候选采样节点进行可视化呈现,生成信息传播路径图。The path generation module 306 is used to visually present the collected initial propagation nodes and candidate sampling nodes to generate an information propagation path graph.
本实施例中,先根据传播记录信息将网络划分为至少一个社群网络,每个社群网络相互独立地传播信息,并确定候选采样节点和各社群网络中的初始传播节点;根据候选采样节点的节点类型标识计算出候选采样节点中,连接节点、关键节点和普通节点三种节点的节点组成比例。在采集初始传播节点或候选采样节点时,依照采样策略进行采样,采样策略由节点组成比例和预设的节点关联关系综合确定,其中节点关联关系包括连接节点关联关键节点、关键节点关联普通节点,从而保证可以对各类节点进行均衡的采样,提高了节点采样的准确性;信息传播路径图用于呈现信息在网络中的传播路径,是由采集到的候选采样节点和初始传播节点构成,从而保证了信息传播路径分析的准确性。In this embodiment, the network is first divided into at least one social network based on the dissemination record information, and each social network disseminates information independently of each other, and the candidate sampling nodes and the initial dissemination nodes in each social network are determined; according to the candidate sampling nodes The node type identification of, calculates the node composition ratio of the three nodes of the connection node, the key node and the ordinary node among the candidate sampling nodes. When collecting initial propagation nodes or candidate sampling nodes, sampling is performed according to the sampling strategy. The sampling strategy is comprehensively determined by the composition ratio of the nodes and the preset node association relationship. The node association relationship includes connecting nodes to key nodes and key nodes to ordinary nodes. This ensures that all kinds of nodes can be sampled in a balanced manner, and the accuracy of node sampling is improved; the information dissemination path graph is used to show the dissemination path of information in the network, and is composed of collected candidate sampling nodes and initial dissemination nodes. Ensure the accuracy of the analysis of the information propagation path.
在本实施例的一些可选的实现方式中,上述网络划分模块302包括:标签添加子模块、最小确定子模块、标签更新子模块、网络划分子模块以及节点确定子模块,其中:In some optional implementations of this embodiment, the aforementioned network division module 302 includes: a label adding submodule, a minimum determination submodule, a label update submodule, a network division submodule, and a node determination submodule, wherein:
标签添加子模块,用于将传播记录信息中各节点的节点标识初始化为各节点的社群标签。The label adding submodule is used to initialize the node identification of each node in the propagation record information to the community label of each node.
最小确定子模块,用于对于网络中的每个节点,从节点及节点的相邻节点所对应的社群标签中确定最小社群标签。The minimum determination sub-module is used to determine the minimum community label from the community labels corresponding to the node and the neighboring nodes of the node for each node in the network.
标签更新子模块,用于将节点的社群标签更新为确定的最小社群标签以对每个节点的社群标签进行迭代更新。The label update submodule is used to update the community label of the node to the determined minimum community label to iteratively update the community label of each node.
网络划分子模块,用于当每个节点的社群标签不再变化时,根据每个节点的社群标签将网络划分为至少一个社群网络。The network division sub-module is used to divide the network into at least one social network according to the community label of each node when the community label of each node no longer changes.
节点确定子模块,用于根据传播记录信息中的传播时间,将各社群网络中具有最早传播时间的节点确定为各社群网络的初始传播节点,将各社群网络中不具有最早传播时间的节点确定为候选采样节点。The node determination sub-module is used to determine the node with the earliest propagation time in each social network as the initial propagation node of each social network according to the propagation time in the propagation record information, and determine the node in each social network that does not have the earliest propagation time It is a candidate sampling node.
本实施例中,给各节点添加社群标签,并从节点及节点的相邻节点所对应的社群标签中确定最小社群标签;将节点的社群标签更新为确定的最小社群标签以对每个节点的社群标签进行迭代更新;在迭代更新中,同一个社群网络中的节点通过互相之间的传播关系,使得节点的社群标签趋向相同,由此可在迭代更新结束时,根据社群标签将网络准确地划分为社群网络,同时可以依据传播时间,准确地定位出社群网络中的初始传播节点和候选采样节点,从而保证了节点采样的准确性。In this embodiment, a community label is added to each node, and the minimum community label is determined from the community labels corresponding to the node and the neighboring nodes of the node; the community label of the node is updated to the determined minimum community label to Iteratively update the community label of each node; in the iterative update, the nodes in the same social network communicate with each other so that the community labels of the nodes tend to be the same, which can be used at the end of the iterative update , According to the community label, the network is accurately divided into social networks, and at the same time, the initial propagation nodes and candidate sampling nodes in the social network can be accurately located according to the propagation time, thereby ensuring the accuracy of node sampling.
在本实施例的一些可选的实现方式中,上述信息传播路径分析装置300还包括:数量确定模块、类型确定模块和标识添加模块,其中:In some optional implementation manners of this embodiment, the above-mentioned information propagation path analysis device 300 further includes: a quantity determination module, a type determination module, and an identity addition module, wherein:
数量确定模块,用于根据传播记录信息确定各候选采样节点的节点传播数量。The quantity determining module is used to determine the node propagation quantity of each candidate sampling node according to the propagation record information.
类型确定模块,用于通过节点传播数量确定各候选采样节点的节点类型。The type determination module is used to determine the node type of each candidate sampling node through the number of node propagation.
标识添加模块,用于给各候选采样节点添加与各自节点类型相对应的节点类型标识。The identifier adding module is used to add a node type identifier corresponding to the respective node type to each candidate sampling node.
本实施例中,从传播记录信息中获取候选采样节点的节点传播数量,节点传播数量反应了节点传播信息的能力,根据节点传播数量可以准确地确定候选采样节点的节点类型,保证了节点组成比例计算的准确性。In this embodiment, the number of node propagations of candidate sampling nodes is obtained from the propagation record information. The number of node propagations reflects the ability of the node to propagate information. According to the number of node propagations, the node type of candidate sampling nodes can be accurately determined, ensuring the composition ratio of nodes Accuracy of calculation.
在本实施例的一些可选的实现方式中,上述策略确定模块304包括:一轮设置子模块、比例确定子模块和策略确定子模块,其中:In some optional implementation manners of this embodiment, the above-mentioned strategy determination module 304 includes: a round setting sub-module, a ratio determination sub-module, and a strategy determination sub-module, wherein:
一轮设置子模块,用于根据预设采样比例设置一轮采样中初始传播节点的采样比例。The one-round setting sub-module is used to set the sampling ratio of the initial propagation node in one round of sampling according to the preset sampling ratio.
比例确定子模块,用于基于计算得到的节点组成比例和预设的节点关联关系,确定连接节点、关键节点和普通节点在一轮采样之后的每轮采样中的采样比例。The proportion determination sub-module is used to determine the sampling proportions of connecting nodes, key nodes and common nodes in each round of sampling after one round of sampling based on the calculated node composition proportion and the preset node association relationship.
策略确定子模块,用于将确定的每轮采样中节点的采样比例确定为采样策略。The strategy determination sub-module is used to determine the determined sampling ratio of the nodes in each round of sampling as the sampling strategy.
本实施例中,根据预设采样比例确定一轮采样中初始传播节点的采样比例;再根据节点组成比例和节点关联关系,确定候选采样节点中的连接节点、关键节点和普通节点在一轮采样之后的每轮采样中的采样比例,从而得到采样策略;采样策略的确定综合了节点关联关系以及节点组成比例,保证了能够对各类节点进行均衡的采样。In this embodiment, the sampling ratio of the initial propagation node in a round of sampling is determined according to the preset sampling ratio; then the connection nodes, key nodes, and ordinary nodes in the candidate sampling nodes are determined according to the node composition ratio and node association relationship in a round of sampling The sampling ratio in each subsequent round of sampling is used to obtain the sampling strategy; the determination of the sampling strategy integrates the node association relationship and the node composition ratio to ensure that various nodes can be sampled in a balanced manner.
在本实施例的一些可选的实现方式中,上述比例确定子模块包括:比例初始单元、比例比较单元和比例调整单元,其中:In some optional implementation manners of this embodiment, the aforementioned ratio determination sub-module includes: a ratio initial unit, a ratio comparison unit, and a ratio adjustment unit, wherein:
比例初始单元,用于基于计算得到的节点组成比例和预设的节点关联关系,初始化连接节点、关键节点和普通节点在一轮采样之后的每轮采样中的采样比例。The ratio initial unit is used to initialize the sampling ratio of the connection node, the key node and the common node in each round of sampling after the first round of sampling based on the calculated node composition ratio and the preset node association relationship.
比例比较单元,用于将节点组成比例与预设的比例阈值相比较。The ratio comparison unit is used to compare the node composition ratio with a preset ratio threshold.
比例调整单元,用于当存在小于比例阈值的节点组成比例时,根据预设调整值调整初始化得到的采样比例。The ratio adjustment unit is configured to adjust the sampling ratio obtained by initialization according to the preset adjustment value when there is a node composition ratio smaller than the ratio threshold.
本实施例中,在初始化连接节点、关键节点和普通节点的采样比例后,将节点组成比例与预设的比例阈值相比较,当存在小于比例阈值的节点组成比例时,为保证尽可能多地采集到该类节点,根据预设调整值调整初始化得到的采样比例,以提高采样的均衡。In this embodiment, after initializing the sampling ratios of connection nodes, key nodes, and ordinary nodes, the node composition ratio is compared with a preset ratio threshold. When there is a node composition ratio less than the ratio threshold, to ensure that as much as possible After collecting this type of node, adjust the sampling ratio obtained by initialization according to the preset adjustment value to improve the balance of sampling.
在本实施例的一些可选的实现方式中,上述节点采样模块305包括:初始采样子模块、节点查询子模块和节点采样子模块,其中:In some optional implementation manners of this embodiment, the aforementioned node sampling module 305 includes: an initial sampling submodule, a node query submodule, and a node sampling submodule, where:
初始采样子模块,用于在进行一轮采样时,按照采样策略对初始传播节点进行采样。The initial sampling sub-module is used to sample the initial propagation node according to the sampling strategy during a round of sampling.
节点查询子模块,用于当进行一轮采样之后的采样时,根据传播记录信息查询上轮采样中采集到的节点所传播到的候选采样节点。The node query sub-module is used to query the candidate sampling nodes to which the nodes collected in the previous round of sampling are propagated according to the propagation record information when sampling after a round of sampling.
节点采样子模块,用于按照采样策略,对查询到的候选采样节点进行采样。The node sampling sub-module is used to sample the queried candidate sampling nodes according to the sampling strategy.
本实施例中,先按照采样策略对初始传播节点进行采样;在进行一轮采样之后的采样时,根据传播记录信息查询上轮采样中采集到的节点所传播到的候选采样节点,再按照采样策略对查询到的候选采样节点进行采样,以此保证采集到的节点均与上轮采样中采集到的节点相关,保证了信息传播路径的连续性。In this embodiment, the initial propagation node is first sampled according to the sampling strategy; when sampling after a round of sampling, the candidate sampling nodes propagated to by the nodes collected in the previous round of sampling are queried according to the propagation record information, and then according to the sampling The strategy samples the queried candidate sampling nodes to ensure that the collected nodes are all related to the nodes collected in the previous round of sampling, and the continuity of the information propagation path is ensured.
在本实施例的一些可选的实现方式中,上述路径生成模块306包括:节点添加子模块、显示设置子模块和节点连接子模块,其中:In some optional implementation manners of this embodiment, the aforementioned path generation module 306 includes: a node adding submodule, a display setting submodule, and a node connection submodule, where:
节点添加子模块,用于按照传播记录信息,将采集到的初始传播节点和候选采样节点添加到初始路径图中。The node adding sub-module is used to add the collected initial propagation nodes and candidate sampling nodes to the initial path graph according to the propagation record information.
显示设置子模块,用于设置初始路径图中的初始传播节点和候选采样节点中的连接节点、关键节点及普通节点的显示方式。The display setting sub-module is used to set the display mode of connection nodes, key nodes and common nodes in the initial propagation node and candidate sampling nodes in the initial path graph.
节点连接子模块,用于连接设置完毕的初始传播节点、连接节点、关键节点和普通节点,得到信息传播路径图。The node connection sub-module is used to connect the set up initial propagation node, connecting node, key node and common node to obtain the information propagation path graph.
本实施例中,按照传播记录信息,将采集到的初始传播节点和候选采样节点添加到初始路径图中,并将初始传播节点和候选采样节点中的连接节点、关键节点及普通节点设置为各不相同的显示方式加以区分,保证了生成的信息传播路径图可以更准确清晰地展示信息传播路径。In this embodiment, according to the propagation record information, the collected initial propagation nodes and candidate sampling nodes are added to the initial path graph, and the connection nodes, key nodes, and ordinary nodes in the initial propagation node and candidate sampling nodes are set to each Different display methods are distinguished to ensure that the generated information dissemination path diagram can more accurately and clearly show the information dissemination path.
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图8,图8为本实施例计算机设备基本结构框图。In order to solve the above technical problems, the embodiments of the present application also provide computer equipment. Please refer to FIG. 8 for details. FIG. 8 is a block diagram of the basic structure of the computer device in this embodiment.
所述计算机设备4包括通过系统总线相互通信连接存储器41、处理器42、网络接口43。需要指出的是,图中仅示出了具有组件41-43的计算机设备4,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数 值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。The computer device 4 includes a memory 41, a processor 42, and a network interface 43 that are mutually communicatively connected via a system bus. It should be pointed out that the figure only shows the computer device 4 with the components 41-43, but it should be understood that it is not required to implement all the shown components, and more or fewer components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions. Its hardware includes, but is not limited to, a microprocessor, a dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。The computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
所述存储器41至少包括一种类型的计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,所述计算机可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器41可以是所述计算机设备4的内部存储单元,例如该计算机设备4的硬盘或内存。在另一些实施例中,所述存储器41也可以是所述计算机设备4的外部存储设备,例如该计算机设备4上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器41还可以既包括所述计算机设备4的内部存储单元也包括其外部存储设备。本实施例中,所述存储器41通常用于存储安装于所述计算机设备4的操作系统和各类应用软件,例如信息传播路径分析方法的计算机可读指令等。此外,所述存储器41还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 41 includes at least one type of computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium includes flash memory, hard disk, and multimedia card. , Card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), Programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk equipped on the computer device 4, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) card, flash card (Flash Card), etc. Of course, the memory 41 may also include both the internal storage unit of the computer device 4 and its external storage device. In this embodiment, the memory 41 is generally used to store an operating system and various application software installed in the computer device 4, such as computer-readable instructions for an information propagation path analysis method. In addition, the memory 41 can also be used to temporarily store various types of data that have been output or will be output.
所述处理器42在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器42通常用于控制所述计算机设备4的总体操作。本实施例中,所述处理器42用于运行所述存储器41中存储的计算机可读指令或者处理数据,例如运行所述信息传播路径分析方法的计算机可读指令。In some embodiments, the processor 42 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips. The processor 42 is generally used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to run computer-readable instructions or process data stored in the memory 41, for example, run the computer-readable instructions of the information propagation path analysis method.
所述网络接口43可包括无线网络接口或有线网络接口,该网络接口43通常用于在所述计算机设备4与其他电子设备之间建立通信连接。The network interface 43 may include a wireless network interface or a wired network interface, and the network interface 43 is generally used to establish a communication connection between the computer device 4 and other electronic devices.
本实施例中提供的计算机设备可以执行上述信息传播路径分析方法的步骤。此处信息传播路径分析方法的步骤可以是上述各个实施例的信息传播路径分析方法中的步骤。The computer device provided in this embodiment can execute the steps of the information propagation path analysis method described above. Here, the steps of the information propagation path analysis method may be the steps in the information propagation path analysis method of each of the foregoing embodiments.
本实施例中,先根据传播记录信息将网络划分为至少一个社群网络,每个社群网络相互独立地传播信息,并确定候选采样节点和各社群网络中的初始传播节点;根据候选采样节点的节点类型标识计算出候选采样节点中,连接节点、关键节点和普通节点三种节点的节点组成比例。在采集初始传播节点或候选采样节点时,依照采样策略进行采样,采样策略由节点组成比例和预设的节点关联关系综合确定,其中节点关联关系包括连接节点关联关键节点、关键节点关联普通节点,从而保证可以对各类节点进行均衡的采样,提高了节点采样的准确性;信息传播路径图用于呈现信息在网络中的传播路径,是由采集到的候选采样节点和初始传播节点构成,从而保证了信息传播路径分析的准确性。In this embodiment, the network is first divided into at least one social network based on the dissemination record information, each social network disseminates information independently of each other, and candidate sampling nodes and initial dissemination nodes in each social network are determined; according to the candidate sampling nodes The node type identification of, calculates the node composition ratio of the three nodes of the connection node, the key node and the ordinary node among the candidate sampling nodes. When collecting initial propagation nodes or candidate sampling nodes, sampling is performed according to the sampling strategy. The sampling strategy is comprehensively determined by the composition ratio of the nodes and the preset node association relationship. The node association relationship includes connecting nodes to key nodes and key nodes to ordinary nodes. This ensures that all kinds of nodes can be sampled in a balanced manner, and the accuracy of node sampling is improved; the information dissemination path graph is used to show the dissemination path of information in the network, and is composed of collected candidate sampling nodes and initial dissemination nodes. Ensure the accuracy of the analysis of the information propagation path.
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质存储有用于信息传播路径分析的计算机可读指令,所述用于信息传播路径分析的计算机可读指令可被至少一个处理器执行,以使所述至少一个处理器执行如上述的用于信息传播路径分析的计算机可读指令的步骤。This application also provides another implementation manner, that is, a computer-readable storage medium that stores computer-readable instructions for information propagation path analysis, and the computer-readable storage medium stores computer-readable instructions for information propagation path analysis. The computer-readable instructions may be executed by at least one processor, so that the at least one processor executes the steps of the computer-readable instructions for information propagation path analysis as described above.
本实施例中,先根据传播记录信息将网络划分为至少一个社群网络,每个社群网络相互独立地传播信息,并确定候选采样节点和各社群网络中的初始传播节点;根据候选采样节点的节点类型标识计算出候选采样节点中,连接节点、关键节点和普通节点三种节点的节点组成比例。在采集初始传播节点或候选采样节点时,依照采样策略进行采样,采样策略由节点组成比例和预设的节点关联关系综合确定,其中节点关联关系包括连接节点关联关键节点、关键节点关联普通节点,从而保证可以对各类节点进行均衡的采样,提高了节点采样的准确性;信息传播路径图用于呈现信息在网络中的传播路径,是由采集到的候选 采样节点和初始传播节点构成,从而保证了信息传播路径分析的准确性。In this embodiment, the network is first divided into at least one social network based on the dissemination record information, each social network disseminates information independently of each other, and candidate sampling nodes and initial dissemination nodes in each social network are determined; according to the candidate sampling nodes The node type identification of, calculates the node composition ratio of the three nodes of the connection node, the key node and the ordinary node among the candidate sampling nodes. When collecting initial propagation nodes or candidate sampling nodes, sampling is performed according to the sampling strategy. The sampling strategy is comprehensively determined by the composition ratio of the nodes and the preset node association relationship. The node association relationship includes connecting nodes to key nodes and key nodes to ordinary nodes. This ensures that all kinds of nodes can be sampled in a balanced manner, and the accuracy of node sampling is improved; the information dissemination path graph is used to show the dissemination path of information in the network, and is composed of collected candidate sampling nodes and initial dissemination nodes. Ensure the accuracy of the analysis of the information propagation path.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present application.
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。Obviously, the above-described embodiments are only a part of the embodiments of the application, rather than all of the embodiments. The drawings show preferred embodiments of the application, but do not limit the patent scope of the application. This application can be implemented in many different forms. On the contrary, the purpose of providing these embodiments is to make the understanding of the disclosure of this application more thorough and comprehensive. Although this application has been described in detail with reference to the foregoing embodiments, for those skilled in the art, they can still modify the technical solutions described in the foregoing specific embodiments, or equivalently replace some of the technical features. . All equivalent structures made by using the contents of the description and drawings of this application, directly or indirectly used in other related technical fields, are similarly within the scope of patent protection of this application.

Claims (20)

  1. 一种信息传播路径分析方法,包括下述步骤:An information propagation path analysis method includes the following steps:
    获取网络的传播记录信息;Obtain information on the distribution record of the network;
    根据所述传播记录信息将所述网络划分为至少一个社群网络,并确定候选采样节点和各社群网络中的初始传播节点;Dividing the network into at least one social network according to the dissemination record information, and determining candidate sampling nodes and initial dissemination nodes in each social network;
    通过所述候选采样节点的节点类型标识,计算所述候选采样节点中连接节点、关键节点和普通节点的节点组成比例;Calculating the node composition ratio of the connecting node, the key node and the common node in the candidate sampling node through the node type identification of the candidate sampling node;
    基于计算得到的节点组成比例和预设的节点关联关系确定采样策略;所述节点关联关系包括连接节点关联关键节点,关键节点关联普通节点;The sampling strategy is determined based on the calculated node composition ratio and the preset node association relationship; the node association relationship includes connecting nodes associated with key nodes, and key nodes associated with ordinary nodes;
    按照所述采样策略对所述初始传播节点和所述候选采样节点进行采样;Sampling the initial propagation node and the candidate sampling node according to the sampling strategy;
    将采集到的初始传播节点和候选采样节点进行可视化呈现,生成信息传播路径图。Visually present the collected initial propagation nodes and candidate sampling nodes to generate an information propagation path diagram.
  2. 根据权利要求1所述的信息传播路径分析方法,其中,所述根据所述传播记录信息将所述网络划分为至少一个社群网络,并确定候选采样节点和各社群网络中的初始传播节点的步骤具体包括:The information dissemination path analysis method according to claim 1, wherein the network is divided into at least one social network according to the dissemination record information, and the candidate sampling node and the initial dissemination node in each social network are determined The specific steps include:
    将所述传播记录信息中各节点的节点标识初始化为所述各节点的社群标签;Initialize the node identifier of each node in the dissemination record information to the community label of each node;
    对于所述网络中的每个节点,从节点及所述节点的相邻节点所对应的社群标签中确定最小社群标签;For each node in the network, determine the smallest community label from the community labels corresponding to the node and the neighboring nodes of the node;
    将节点的社群标签更新为确定的最小社群标签以对所述每个节点的社群标签进行迭代更新;Updating the community label of the node to the determined minimum community label to iteratively update the community label of each node;
    当所述每个节点的社群标签不再变化时,根据所述每个节点的社群标签将所述网络划分为至少一个社群网络;When the community label of each node no longer changes, divide the network into at least one social network according to the community label of each node;
    根据所述传播记录信息中的传播时间,将各社群网络中具有最早传播时间的节点确定为所述各社群网络的初始传播节点,将所述各社群网络中不具有最早传播时间的节点确定为候选采样节点。According to the dissemination time in the dissemination record information, the node with the earliest dissemination time in each social network is determined as the initial dissemination node of each social network, and the node in each social network that does not have the earliest dissemination time is determined as Candidate sampling node.
  3. 根据权利要求1所述的信息传播路径分析方法,其中,所述通过所述候选采样节点的节点类型标识,计算所述候选采样节点中连接节点、关键节点和普通节点的节点组成比例的步骤之前,还包括:The information dissemination path analysis method according to claim 1, wherein the step of calculating the node composition ratio of connecting nodes, key nodes, and ordinary nodes in the candidate sampling node through the node type identification of the candidate sampling node ,Also includes:
    根据所述传播记录信息确定各候选采样节点的节点传播数量;Determining the number of node propagations of each candidate sampling node according to the propagation record information;
    通过节点传播数量确定所述各候选采样节点的节点类型;Determining the node type of each candidate sampling node according to the number of node propagation;
    给所述各候选采样节点添加与各自节点类型相对应的节点类型标识。A node type identifier corresponding to the respective node type is added to each candidate sampling node.
  4. 根据权利要求1所述的信息传播路径分析方法,其中,所述基于计算得到的节点组成比例和预设的节点关联关系确定采样策略的步骤具体包括:The information propagation path analysis method according to claim 1, wherein the step of determining the sampling strategy based on the calculated node composition ratio and the preset node association relationship specifically comprises:
    根据预设采样比例设置一轮采样中所述初始传播节点的采样比例;Set the sampling ratio of the initial propagation node in a round of sampling according to the preset sampling ratio;
    基于计算得到的节点组成比例和预设的节点关联关系,确定所述连接节点、所述关键节点和所述普通节点在一轮采样之后的每轮采样中的采样比例;Based on the calculated node composition ratio and the preset node association relationship, determine the sampling ratio of the connection node, the key node and the common node in each round of sampling after the first round of sampling;
    将确定的每轮采样中节点的采样比例确定为采样策略。Determine the sampling ratio of nodes in each round of sampling as the sampling strategy.
  5. 根据权利要求4所述的信息传播路径分析方法,其中,所述基于计算得到的节点组成比例和预设的节点关联关系,确定所述连接节点、所述关键节点和所述普通节点在一轮采样之后的每轮采样中的采样比例的步骤具体包括:The information propagation path analysis method according to claim 4, wherein the connection node, the key node and the ordinary node are determined in a round based on the calculated node composition ratio and the preset node association relationship. The steps of sampling ratio in each round of sampling after sampling specifically include:
    基于计算得到的节点组成比例和预设的节点关联关系,初始化所述连接节点、所述关键节点和所述普通节点在一轮采样之后的每轮采样中的采样比例;Based on the calculated node composition ratio and the preset node association relationship, initialize the sampling ratio of the connection node, the key node, and the ordinary node in each round of sampling after the first round of sampling;
    将所述节点组成比例与预设的比例阈值相比较;Comparing the composition ratio of the nodes with a preset ratio threshold;
    当存在小于所述比例阈值的节点组成比例时,根据预设调整值调整初始化得到的采样比例。When there is a node composition ratio smaller than the ratio threshold, the sampling ratio obtained by initialization is adjusted according to the preset adjustment value.
  6. 根据权利要求4所述的信息传播路径分析方法,其中,所述按照所述采样策略对所述初始传播节点和所述候选采样节点进行采样的步骤具体包括:The information propagation path analysis method according to claim 4, wherein the step of sampling the initial propagation node and the candidate sampling node according to the sampling strategy specifically comprises:
    在进行一轮采样时,按照所述采样策略对所述初始传播节点进行采样;During a round of sampling, sampling the initial propagation node according to the sampling strategy;
    当进行一轮采样之后的采样时,根据所述传播记录信息查询上轮采样中采集到的节点所传播到的候选采样节点;When sampling after a round of sampling, query the candidate sampling nodes to which the nodes collected in the previous round of sampling are propagated according to the propagation record information;
    按照所述采样策略,对查询到的候选采样节点进行采样。According to the sampling strategy, sampling the queried candidate sampling nodes.
  7. 根据权利要求1所述的信息传播路径分析方法,其中,所述将采集到的初始传播节点和候选采样节点进行可视化呈现,生成信息传播路径图的步骤具体包括:The information dissemination path analysis method according to claim 1, wherein the step of visually presenting the collected initial dissemination nodes and candidate sampling nodes to generate an information dissemination path graph specifically comprises:
    按照所述传播记录信息,将采集到的初始传播节点和候选采样节点添加到初始路径图中;Add the collected initial propagation nodes and candidate sampling nodes to the initial path graph according to the propagation record information;
    设置所述初始路径图中的初始传播节点和候选采样节点中的连接节点、关键节点及普通节点的显示方式;Setting the display modes of connection nodes, key nodes and common nodes in the initial propagation node and candidate sampling nodes in the initial path graph;
    连接设置完毕的所述初始传播节点、所述连接节点、所述关键节点和所述普通节点,得到信息传播路径图。Connect the set-up initial propagation node, the connection node, the key node, and the ordinary node to obtain an information propagation path graph.
  8. 一种信息传播路径分析装置,包括:An information propagation path analysis device, including:
    信息获取模块,用于获取网络的传播记录信息;The information acquisition module is used to acquire the dissemination record information of the network;
    网络划分模块,用于根据所述传播记录信息将所述网络划分为至少一个社群网络,并确定候选采样节点和各社群网络中的初始传播节点;A network division module, configured to divide the network into at least one social network according to the dissemination record information, and determine candidate sampling nodes and initial dissemination nodes in each social network;
    比例计算模块,用于通过所述候选采样节点的节点类型标识,计算所述候选采样节点中连接节点、关键节点和普通节点的节点组成比例;A ratio calculation module, configured to calculate the node composition ratio of connecting nodes, key nodes, and ordinary nodes in the candidate sampling node through the node type identification of the candidate sampling node;
    策略确定模块,用于基于计算得到的节点组成比例和预设的节点关联关系确定采样策略;所述节点关联关系包括连接节点关联关键节点,关键节点关联普通节点;The strategy determination module is used to determine the sampling strategy based on the calculated node composition ratio and the preset node association relationship; the node association relationship includes connecting nodes associated with key nodes, and key nodes associated with ordinary nodes;
    节点采样模块,用于按照所述采样策略对所述初始传播节点和所述候选采样节点进行采样;A node sampling module, configured to sample the initial propagation node and the candidate sampling node according to the sampling strategy;
    路径生成模块,用于将采集到的初始传播节点和候选采样节点进行可视化呈现,生成信息传播路径图。The path generation module is used to visually present the collected initial propagation nodes and candidate sampling nodes to generate an information propagation path graph.
  9. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device includes a memory and a processor, wherein computer readable instructions are stored in the memory, and the processor implements the following steps when executing the computer readable instructions:
    获取网络的传播记录信息;Obtain information on the distribution record of the network;
    根据所述传播记录信息将所述网络划分为至少一个社群网络,并确定候选采样节点和各社群网络中的初始传播节点;Dividing the network into at least one social network according to the dissemination record information, and determining candidate sampling nodes and initial dissemination nodes in each social network;
    通过所述候选采样节点的节点类型标识,计算所述候选采样节点中连接节点、关键节点和普通节点的节点组成比例;Calculating the node composition ratio of the connecting node, the key node and the common node in the candidate sampling node through the node type identification of the candidate sampling node;
    基于计算得到的节点组成比例和预设的节点关联关系确定采样策略;所述节点关联关系包括连接节点关联关键节点,关键节点关联普通节点;The sampling strategy is determined based on the calculated node composition ratio and the preset node association relationship; the node association relationship includes connecting nodes associated with key nodes, and key nodes associated with ordinary nodes;
    按照所述采样策略对所述初始传播节点和所述候选采样节点进行采样;Sampling the initial propagation node and the candidate sampling node according to the sampling strategy;
    将采集到的初始传播节点和候选采样节点进行可视化呈现,生成信息传播路径图。Visually present the collected initial propagation nodes and candidate sampling nodes to generate an information propagation path diagram.
  10. 根据权利要求9所述的计算机设备,其中,所述根据所述传播记录信息将所述网络划分为至少一个社群网络,并确定候选采样节点和各社群网络中的初始传播节点的步骤具体包括:The computer device according to claim 9, wherein the step of dividing the network into at least one social network according to the dissemination record information, and determining candidate sampling nodes and initial dissemination nodes in each social network specifically comprises :
    将所述传播记录信息中各节点的节点标识初始化为所述各节点的社群标签;Initialize the node identifier of each node in the dissemination record information to the community label of each node;
    对于所述网络中的每个节点,从节点及所述节点的相邻节点所对应的社群标签中确定最小社群标签;For each node in the network, determine the smallest community label from the community labels corresponding to the node and the neighboring nodes of the node;
    将节点的社群标签更新为确定的最小社群标签以对所述每个节点的社群标签进行迭代更新;Updating the community label of the node to the determined minimum community label to iteratively update the community label of each node;
    当所述每个节点的社群标签不再变化时,根据所述每个节点的社群标签将所述网络划分为至少一个社群网络;When the community label of each node no longer changes, divide the network into at least one social network according to the community label of each node;
    根据所述传播记录信息中的传播时间,将各社群网络中具有最早传播时间的节点确定为所述各社群网络的初始传播节点,将所述各社群网络中不具有最早传播时间的节点确定 为候选采样节点。According to the dissemination time in the dissemination record information, the node with the earliest dissemination time in each social network is determined as the initial dissemination node of each social network, and the node in each social network that does not have the earliest dissemination time is determined as Candidate sampling node.
  11. 根据权利要求9所述的计算机设备,其中,所述通过所述候选采样节点的节点类型标识,计算所述候选采样节点中连接节点、关键节点和普通节点的节点组成比例的步骤之前,所述处理器执行所述计算机可读指令时还实现如下步骤:The computer device according to claim 9, wherein, before the step of calculating the node composition ratio of connecting nodes, key nodes, and ordinary nodes in the candidate sampling node through the node type identification of the candidate sampling node, the The processor also implements the following steps when executing the computer-readable instructions:
    根据所述传播记录信息确定各候选采样节点的节点传播数量;Determining the number of node propagations of each candidate sampling node according to the propagation record information;
    通过节点传播数量确定所述各候选采样节点的节点类型;Determining the node type of each candidate sampling node according to the number of node propagation;
    给所述各候选采样节点添加与各自节点类型相对应的节点类型标识。A node type identifier corresponding to the respective node type is added to each candidate sampling node.
  12. 根据权利要求9所述的计算机设备,其中,所述基于计算得到的节点组成比例和预设的节点关联关系确定采样策略的步骤具体包括:The computer device according to claim 9, wherein the step of determining the sampling strategy based on the calculated node composition ratio and the preset node association relationship specifically comprises:
    根据预设采样比例设置一轮采样中所述初始传播节点的采样比例;Set the sampling ratio of the initial propagation node in a round of sampling according to the preset sampling ratio;
    基于计算得到的节点组成比例和预设的节点关联关系,确定所述连接节点、所述关键节点和所述普通节点在一轮采样之后的每轮采样中的采样比例;Based on the calculated node composition ratio and the preset node association relationship, determine the sampling ratio of the connection node, the key node and the common node in each round of sampling after the first round of sampling;
    将确定的每轮采样中节点的采样比例确定为采样策略。Determine the sampling ratio of nodes in each round of sampling as the sampling strategy.
  13. 根据权利要求12所述的计算机设备,其中,所述基于计算得到的节点组成比例和预设的节点关联关系,确定所述连接节点、所述关键节点和所述普通节点在一轮采样之后的每轮采样中的采样比例的步骤具体包括:The computer device according to claim 12, wherein said connection node, said key node and said ordinary node are determined based on the calculated node composition ratio and preset node association relationship after a round of sampling The steps of sampling ratio in each round of sampling specifically include:
    基于计算得到的节点组成比例和预设的节点关联关系,初始化所述连接节点、所述关键节点和所述普通节点在一轮采样之后的每轮采样中的采样比例;Based on the calculated node composition ratio and the preset node association relationship, initialize the sampling ratio of the connection node, the key node, and the ordinary node in each round of sampling after the first round of sampling;
    将所述节点组成比例与预设的比例阈值相比较;Comparing the composition ratio of the nodes with a preset ratio threshold;
    当存在小于所述比例阈值的节点组成比例时,根据预设调整值调整初始化得到的采样比例。When there is a node composition ratio smaller than the ratio threshold, the sampling ratio obtained by initialization is adjusted according to the preset adjustment value.
  14. 根据权利要求12所述的计算机设备,其中,所述按照所述采样策略对所述初始传播节点和所述候选采样节点进行采样的步骤具体包括:The computer device according to claim 12, wherein the step of sampling the initial propagation node and the candidate sampling node according to the sampling strategy specifically comprises:
    在进行一轮采样时,按照所述采样策略对所述初始传播节点进行采样;During a round of sampling, sampling the initial propagation node according to the sampling strategy;
    当进行一轮采样之后的采样时,根据所述传播记录信息查询上轮采样中采集到的节点所传播到的候选采样节点;When sampling after a round of sampling, query the candidate sampling nodes to which the nodes collected in the previous round of sampling are propagated according to the propagation record information;
    按照所述采样策略,对查询到的候选采样节点进行采样。According to the sampling strategy, sampling the queried candidate sampling nodes.
  15. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令;其中,所述计算机可读指令被处理器执行时实现如下步骤:A computer-readable storage medium on which computer-readable instructions are stored; wherein, when the computer-readable instructions are executed by a processor, the following steps are implemented:
    获取网络的传播记录信息;Obtain information on the distribution record of the network;
    根据所述传播记录信息将所述网络划分为至少一个社群网络,并确定候选采样节点和各社群网络中的初始传播节点;Dividing the network into at least one social network according to the dissemination record information, and determining candidate sampling nodes and initial dissemination nodes in each social network;
    通过所述候选采样节点的节点类型标识,计算所述候选采样节点中连接节点、关键节点和普通节点的节点组成比例;Calculating the node composition ratio of the connecting node, the key node and the common node in the candidate sampling node through the node type identification of the candidate sampling node;
    基于计算得到的节点组成比例和预设的节点关联关系确定采样策略;所述节点关联关系包括连接节点关联关键节点,关键节点关联普通节点;The sampling strategy is determined based on the calculated node composition ratio and the preset node association relationship; the node association relationship includes connecting nodes associated with key nodes, and key nodes associated with ordinary nodes;
    按照所述采样策略对所述初始传播节点和所述候选采样节点进行采样;Sampling the initial propagation node and the candidate sampling node according to the sampling strategy;
    将采集到的初始传播节点和候选采样节点进行可视化呈现,生成信息传播路径图。Visually present the collected initial propagation nodes and candidate sampling nodes to generate an information propagation path diagram.
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述根据所述传播记录信息将所述网络划分为至少一个社群网络,并确定候选采样节点和各社群网络中的初始传播节点的步骤具体包括:The computer-readable storage medium according to claim 15, wherein the network is divided into at least one social network according to the dissemination record information, and the candidate sampling node and the initial dissemination node in each social network are determined The specific steps include:
    将所述传播记录信息中各节点的节点标识初始化为所述各节点的社群标签;Initialize the node identifier of each node in the dissemination record information to the community label of each node;
    对于所述网络中的每个节点,从节点及所述节点的相邻节点所对应的社群标签中确定最小社群标签;For each node in the network, determine the smallest community label from the community labels corresponding to the node and the neighboring nodes of the node;
    将节点的社群标签更新为确定的最小社群标签以对所述每个节点的社群标签进行迭代更新;Updating the community label of the node to the determined minimum community label to iteratively update the community label of each node;
    当所述每个节点的社群标签不再变化时,根据所述每个节点的社群标签将所述网络划分为至少一个社群网络;When the community label of each node no longer changes, divide the network into at least one social network according to the community label of each node;
    根据所述传播记录信息中的传播时间,将各社群网络中具有最早传播时间的节点确定为所述各社群网络的初始传播节点,将所述各社群网络中不具有最早传播时间的节点确定为候选采样节点。According to the dissemination time in the dissemination record information, the node with the earliest dissemination time in each social network is determined as the initial dissemination node of each social network, and the node in each social network that does not have the earliest dissemination time is determined as Candidate sampling node.
  17. 根据权利要求15所述的计算机可读存储介质,其中,所述通过所述候选采样节点的节点类型标识,计算所述候选采样节点中连接节点、关键节点和普通节点的节点组成比例的步骤之前,所述计算机可读指令被处理器执行时还实现如下步骤:The computer-readable storage medium according to claim 15, wherein the step of calculating the node composition ratio of connecting nodes, key nodes, and ordinary nodes in the candidate sampling node through the node type identification of the candidate sampling node When the computer-readable instructions are executed by the processor, the following steps are also implemented:
    根据所述传播记录信息确定各候选采样节点的节点传播数量;Determining the number of node propagations of each candidate sampling node according to the propagation record information;
    通过节点传播数量确定所述各候选采样节点的节点类型;Determining the node type of each candidate sampling node according to the number of node propagation;
    给所述各候选采样节点添加与各自节点类型相对应的节点类型标识。A node type identifier corresponding to the respective node type is added to each candidate sampling node.
  18. 根据权利要求15所述的计算机可读存储介质,其中,所述基于计算得到的节点组成比例和预设的节点关联关系确定采样策略的步骤具体包括:The computer-readable storage medium according to claim 15, wherein the step of determining a sampling strategy based on the calculated node composition ratio and a preset node association relationship specifically comprises:
    根据预设采样比例设置一轮采样中所述初始传播节点的采样比例;Set the sampling ratio of the initial propagation node in a round of sampling according to the preset sampling ratio;
    基于计算得到的节点组成比例和预设的节点关联关系,确定所述连接节点、所述关键节点和所述普通节点在一轮采样之后的每轮采样中的采样比例;Based on the calculated node composition ratio and the preset node association relationship, determine the sampling ratio of the connection node, the key node and the common node in each round of sampling after the first round of sampling;
    将确定的每轮采样中节点的采样比例确定为采样策略。Determine the sampling ratio of nodes in each round of sampling as the sampling strategy.
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述基于计算得到的节点组成比例和预设的节点关联关系,确定所述连接节点、所述关键节点和所述普通节点在一轮采样之后的每轮采样中的采样比例的步骤具体包括:The computer-readable storage medium according to claim 18, wherein the connection node, the key node, and the ordinary node are determined in a round based on the calculated node composition ratio and a preset node association relationship. The steps of sampling ratio in each round of sampling after sampling specifically include:
    基于计算得到的节点组成比例和预设的节点关联关系,初始化所述连接节点、所述关键节点和所述普通节点在一轮采样之后的每轮采样中的采样比例;Based on the calculated node composition ratio and the preset node association relationship, initialize the sampling ratio of the connection node, the key node, and the ordinary node in each round of sampling after the first round of sampling;
    将所述节点组成比例与预设的比例阈值相比较;Comparing the composition ratio of the nodes with a preset ratio threshold;
    当存在小于所述比例阈值的节点组成比例时,根据预设调整值调整初始化得到的采样比例。When there is a node composition ratio smaller than the ratio threshold, the sampling ratio obtained by initialization is adjusted according to the preset adjustment value.
  20. 根据权利要求18所述的计算机可读存储介质,其中,所述按照所述采样策略对所述初始传播节点和所述候选采样节点进行采样的步骤具体包括:The computer-readable storage medium according to claim 18, wherein the step of sampling the initial propagation node and the candidate sampling node according to the sampling strategy specifically comprises:
    在进行一轮采样时,按照所述采样策略对所述初始传播节点进行采样;During a round of sampling, sampling the initial propagation node according to the sampling strategy;
    当进行一轮采样之后的采样时,根据所述传播记录信息查询上轮采样中采集到的节点所传播到的候选采样节点;When sampling after a round of sampling, query the candidate sampling nodes to which the nodes collected in the previous round of sampling are propagated according to the propagation record information;
    按照所述采样策略,对查询到的候选采样节点进行采样。According to the sampling strategy, sampling the queried candidate sampling nodes.
PCT/CN2021/096857 2020-06-24 2021-05-28 Information propagation path analysis method and apparatus, and computer device and storage medium WO2021258998A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010592379.5 2020-06-24
CN202010592379.5A CN111814065B (en) 2020-06-24 2020-06-24 Information propagation path analysis method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021258998A1 true WO2021258998A1 (en) 2021-12-30

Family

ID=72856507

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/096857 WO2021258998A1 (en) 2020-06-24 2021-05-28 Information propagation path analysis method and apparatus, and computer device and storage medium

Country Status (2)

Country Link
CN (1) CN111814065B (en)
WO (1) WO2021258998A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117811992A (en) * 2024-02-29 2024-04-02 山东海量信息技术研究院 Network bad information propagation inhibition method, device, equipment and storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814065B (en) * 2020-06-24 2022-05-06 平安科技(深圳)有限公司 Information propagation path analysis method and device, computer equipment and storage medium
CN114205290B (en) * 2021-12-10 2022-11-25 中国电子科技集团公司第十五研究所 Data processing method and device for behavioral propagation
CN116882522B (en) * 2023-09-07 2023-11-28 湖南视觉伟业智能科技有限公司 Distributed space-time mining method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040199881A1 (en) * 2003-04-01 2004-10-07 Indradeep Ghosh Evaluating a validation vector for validating a network design
CN104092598A (en) * 2014-07-03 2014-10-08 厦门欣欣信息有限公司 Message propagation path extraction method and system
CN110730128A (en) * 2018-11-05 2020-01-24 哈尔滨安天科技集团股份有限公司 Information propagation path processing method and device, electronic equipment and storage medium
CN110955846A (en) * 2018-09-26 2020-04-03 北京国双科技有限公司 Propagation path diagram generation method and device
CN111814065A (en) * 2020-06-24 2020-10-23 平安科技(深圳)有限公司 Information propagation path analysis method and device, computer equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105991397B (en) * 2015-02-04 2020-03-03 阿里巴巴集团控股有限公司 Information dissemination method and device
CN108989105B (en) * 2018-07-16 2021-09-07 创新先进技术有限公司 Propagation path diagram generation method and device and server
CN110837608B (en) * 2019-11-07 2024-04-12 中科天玑数据科技股份有限公司 Public opinion topic propagation path analysis system and method based on multi-source data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040199881A1 (en) * 2003-04-01 2004-10-07 Indradeep Ghosh Evaluating a validation vector for validating a network design
CN104092598A (en) * 2014-07-03 2014-10-08 厦门欣欣信息有限公司 Message propagation path extraction method and system
CN110955846A (en) * 2018-09-26 2020-04-03 北京国双科技有限公司 Propagation path diagram generation method and device
CN110730128A (en) * 2018-11-05 2020-01-24 哈尔滨安天科技集团股份有限公司 Information propagation path processing method and device, electronic equipment and storage medium
CN111814065A (en) * 2020-06-24 2020-10-23 平安科技(深圳)有限公司 Information propagation path analysis method and device, computer equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117811992A (en) * 2024-02-29 2024-04-02 山东海量信息技术研究院 Network bad information propagation inhibition method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN111814065A (en) 2020-10-23
CN111814065B (en) 2022-05-06

Similar Documents

Publication Publication Date Title
WO2021258998A1 (en) Information propagation path analysis method and apparatus, and computer device and storage medium
US11886555B2 (en) Online identity reputation
CN105657003B (en) Information processing method and server
WO2021174944A1 (en) Message push method based on target activity, and related device
US20200045016A1 (en) Trusted tunnel bridge
TWI804575B (en) Method and apparatus, computer readable storage medium, and computing device for identifying high-risk users
WO2018192496A1 (en) Trend information generation method and device, storage medium and electronic device
CN111046237B (en) User behavior data processing method and device, electronic equipment and readable medium
WO2021043064A1 (en) Community detection method and apparatus, and computer device and storage medium
WO2022007434A1 (en) Visualization method and related device
US11048766B1 (en) Audience-centric event analysis
WO2023078120A1 (en) Graph data querying
CN114428822B (en) Data processing method and device, electronic equipment and storage medium
CN112506925A (en) Data retrieval system and method based on block chain
WO2020030959A1 (en) Resource recommendation method and apparatus, device/terminal/server, and computer-readable medium
US9846746B2 (en) Querying groups of users based on user attributes for social analytics
CN112307264A (en) Data query method and device, storage medium and electronic equipment
KR20230010695A (en) Differentiated private frequency deduplication
CN111382315B (en) Merging method of sub-graph isomorphic matching results, electronic equipment and storage medium
US11676345B1 (en) Automated adaptive workflows in an extended reality environment
JP2015537313A (en) Third-party communication to social networking system users using user descriptors
CN106779899B (en) Malicious order identification method and device
CN112561636A (en) Recommendation method, recommendation device, terminal equipment and medium
US20180150890A1 (en) Trust circle through machine learning
CN110557351B (en) Method and apparatus for generating information

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21830287

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21830287

Country of ref document: EP

Kind code of ref document: A1