WO2024108913A1 - 一种信息传播源的确定方法、系统、电子设备及存储介质 - Google Patents

一种信息传播源的确定方法、系统、电子设备及存储介质 Download PDF

Info

Publication number
WO2024108913A1
WO2024108913A1 PCT/CN2023/093272 CN2023093272W WO2024108913A1 WO 2024108913 A1 WO2024108913 A1 WO 2024108913A1 CN 2023093272 W CN2023093272 W CN 2023093272W WO 2024108913 A1 WO2024108913 A1 WO 2024108913A1
Authority
WO
WIPO (PCT)
Prior art keywords
infected
propagation
information
determining
decision
Prior art date
Application number
PCT/CN2023/093272
Other languages
English (en)
French (fr)
Inventor
胡奇夫
李茹杨
邓琪
赵雅倩
李仁刚
Original Assignee
浪潮(北京)电子信息产业有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浪潮(北京)电子信息产业有限公司 filed Critical 浪潮(北京)电子信息产业有限公司
Publication of WO2024108913A1 publication Critical patent/WO2024108913A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols

Definitions

  • the present application relates to the field of data analysis technology, and in particular to a method, system, electronic device and storage medium for determining an information propagation source.
  • tracing the source of transmission is an important means to block the spread.
  • all devices in a computer intranet form a group, and the external environment is composed of all factors that may introduce viruses into the intranet.
  • the device that introduces the virus from the external environment is the source of transmission.
  • After tracing back to the source of transmission analyze the potential loopholes and find out the mechanism by which the virus enters the intranet from the external network, thereby blocking the virus from invading from the external network again.
  • a tracing algorithm based on structural information entropy is usually used to determine the source of transmission, but the tracing accuracy of the above method is low.
  • the purpose of this application is to provide a method, system, electronic device and storage medium for determining the source of information dissemination, which can improve the accuracy of tracing the source of information dissemination.
  • the present application provides a method for determining an information propagation source, the method for determining an information propagation source comprising:
  • the candidate set includes all devices that are in an infected state at a deadline te , and the devices in an infected state are devices that have received target information;
  • the i-th communication event is that the device u i transmits data to the device vi i at the time t i
  • determine the candidate set based on the propagation source tracing request including:
  • a candidate set is constructed based on the target information and the deadline t e , including:
  • determining the starting time t s at which all devices in the candidate set are not in an infected state includes:
  • the device scanning records are traced back from the end time te as the starting point, and the starting time ts when all devices in the candidate set are not in the infected state is determined by using the device scanning records.
  • the method before using the intelligent agent to perform the state decision operation on each communication event in chronological order, the method further includes:
  • the environment state is updated.
  • the agent is used to perform decision operations on each communication event in chronological order to obtain the propagation trajectory of the target information, including:
  • the alternative trajectory with a total benefit greater than the preset benefit value is set as the propagation trajectory of the target information.
  • the total benefit of the alternative trajectory is calculated according to the total benefit calculation formula, including:
  • the total revenue calculation formula is: r total represents the total benefit, K represents the total number of communication events, ri is used to describe the probability of a device being infected during the time period from ti -1 to ti , r end is used to describe the probability of a device in the candidate set that has not been determined to be infected by the agent being infected by external factors during the time period from t K to te , and r penalty represents the misjudgment penalty value.
  • first-category benefit value r 2i-1 and the second-category benefit value r 2i are positively correlated with the probability that the device in the candidate set is infected by the external infection source during the time period t i-1 to t i
  • second-category benefit value r 2i is positively correlated with the probability that the target information is propagated through the i-th communication event at time t i ;
  • the logarithm of the probability that a device in the candidate set that is not judged as infected by the agent during the time period from tK to te is infected by external factors is used as the third type of benefit value rend ;
  • the number of devices that are not in the candidate set and are determined by the agent to be in an infected state is set to the number of misjudgments
  • the method before using the intelligent agent to perform decision-making operations on each communication event in chronological order, the method further includes:
  • the agent s decision network;
  • the decision network is obtained by combining the Structure2Vector model and the recurrent neural network;
  • the input required for training the decision network includes: communication logs, the agent’s initial policy parameters, the number of iterations and the number of propagation trajectories generated by the agent during each iteration, and hyperparameters for calculating the benchmark return;
  • the agent's decision network after training the agent's decision network, also include:
  • the trained agent is used for policy reasoning so that the trained agent can be used to perform decision operations on each communication event in chronological order.
  • query all communication events of all devices from the start time ts to the end time te including:
  • the present application also provides a system for determining an information propagation source, the system comprising:
  • a candidate set determination module is used to receive a propagation source tracing request and determine a candidate set according to the propagation source tracing request; wherein the candidate set includes all devices in an infected state at the deadline te , and the devices in an infected state are devices that have received the target information;
  • a time tracing module is used to determine the starting time t s when all devices in the candidate set are not in an infected state
  • the propagation source determination module is used to determine the information propagation source according to the propagation trajectory.
  • the present application also provides a non-volatile readable storage medium on which a computer program is stored, and when the computer program is executed, the steps of the method for determining the information propagation source described above are implemented.
  • the present application also provides an electronic device, including a memory and a processor, wherein a computer program is stored in the memory, and when the processor calls the computer program in the memory, the steps of the method for determining the source of information propagation described above are implemented.
  • the present application provides a method for determining an information propagation source, comprising: receiving a propagation source tracing request, and determining a candidate set according to the propagation source tracing request; wherein the candidate set includes all devices in an infected state at a deadline te , and the devices in an infected state are devices that have received target information; determining a starting time ts at which all devices in the candidate set are not in an infected state; querying all communication events of all devices from the starting time ts to the deadline te , and using an intelligent agent to perform a decision operation on each communication event in turn in chronological order to obtain a propagation trajectory of the target information; wherein the i-th communication event is that the device u i transmits data to the device vi i at the time t i , and the decision operation corresponding to the i-th communication event includes: determining whether the devices in the candidate set are infected by an external infection source within the time period from t i-1 to t i
  • the present application determines a candidate set according to the received propagation source tracing request, determines the starting time ts and the ending time t e for determining the propagation trajectory, and queries all communication events of all devices from the starting time ts to the ending time t e .
  • the decision operation also includes determining whether vi is infected by u i and becomes infected at the time t i .
  • the above method can discretize the decision process in the continuous time domain into decisions at time periods and time points, and then obtain the propagation trajectory and information propagation source of the target information.
  • the above method combines the communication time between devices and the propagation characteristics of information to determine the information propagation source of the target information, which can improve the tracing accuracy of the information propagation source.
  • the present application also provides a system for determining an information transmission source, a non-volatile readable storage medium and an electronic device, which have the above-mentioned beneficial effects and will not be described in detail here.
  • FIG1 is a flow chart of a method for determining an information transmission source provided by an embodiment of the present application
  • FIG2 is a schematic diagram of a device state at time te provided by an embodiment of the present application.
  • FIG3 is a schematic diagram of a device state in a candidate set provided in an embodiment of the present application.
  • FIG4 is a schematic diagram of the status of a device not in a candidate set provided by an embodiment of the present application.
  • FIG5 is a schematic diagram of the structure of a system for determining an information transmission source provided in an embodiment of the present application
  • FIG6 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.
  • FIG. 7 is a schematic diagram of the structure of a non-volatile readable storage medium provided in an embodiment of the present application.
  • Figure 1 is a flow chart of a method for determining an information propagation source provided in an embodiment of the present application.
  • S101 receiving a propagation source tracing request, and determining a candidate set according to the propagation source tracing request;
  • this embodiment can be applied to a data analysis platform connected to a data center, and the data center includes multiple devices.
  • the equipment administrator can scan the data center according to a preset period (such as 3 days). If the equipment administrator detects that there is a device that has received the target information, a propagation source tracing request can be sent to the data analysis platform. For example, the equipment administrator scans the data center for viruses every 3 days. No virus was found during the scan on 2022-11-04, and the virus was found during the scan on 2022-11-07, indicating that the virus entered the data center between 2022-11-04 and 2022-11-07. The communication records between machines during this period are taken to perform the information propagation source determination operation of this embodiment.
  • the propagation source tracing request can be parsed to obtain the target information and the deadline t e , and then a candidate set can be constructed based on the target information and the deadline t e .
  • the above-mentioned deadline t e can also be the time when the propagation source tracing request is received.
  • the candidate set includes all devices that are in an infected state at the deadline t e , and the devices in the infected state are devices that have received the target information, and the devices that are not in the infected state are devices that have not received the target information.
  • this step can determine the target devices that have received the target information at the deadline t e , and then construct a candidate set containing all target devices.
  • the above-mentioned target information may be a virus, rumor or pollutant (such as pictures or audios carrying bad information).
  • the content or type of the target information is not specifically limited here.
  • S102 Determine the starting time t s at which no device in the candidate set is in an infected state
  • the start time t s can be queried under the constraint condition, and the constraint condition is: t s is earlier than t e , and all devices in the candidate set of t s are not in an infected state.
  • this step can trace back the device scanning record from the end time t e as the starting point, and use the device scanning record to determine the start time t s when all devices in the candidate set are not in an infected state.
  • S103 query all communication events of all devices from the start time ts to the end time te , and use the intelligent agent to perform decision operations on each communication event in chronological order to obtain the propagation trajectory of the target information;
  • the communication logs of all devices in the candidate set can be read, and all communication events from the starting time ts to the end time te are queried according to the communication logs.
  • all devices include devices in the candidate set and devices not in the candidate set.
  • the candidate set refers to all devices that are in the infected state at the end time te .
  • the so-called "candidate" means that the device may be a transmission source.
  • this step may utilize an intelligent agent to perform a decision operation on each communication event, so as to determine the propagation trajectory of the target information according to the result of the decision operation.
  • the intelligent agent may be an information trajectory analysis model.
  • the i-th communication event is that device u i transmits data to device vi i at time t i .
  • the agent can obtain the benefit value of each alternative trajectory, and then determine the propagation trajectory of the target information according to the benefit value.
  • the above alternative trajectory is determined according to the decision of the agent on all communication events. If the agent determines that device u in the candidate set is infected by an external infection source, then u is the propagation source; if the agent determines that the target information is propagated from u to v, then u and v are adjacent in the alternative trajectory.
  • S104 Determine the source of information dissemination according to the dissemination trajectory.
  • the propagation trajectory describes the order in which the target information is transmitted between the external environment and each device.
  • at least one propagation trajectory can be determined.
  • the device that introduces the target information from the external environment is the information propagation source.
  • the first device in the propagation trajectory can be set as the information propagation source.
  • a network security audit operation can be performed on the information propagation source.
  • This embodiment determines a candidate set according to the received propagation source tracing request, determines the starting time ts and the ending time t e for determining the propagation trajectory, and queries all communication events of the devices in the candidate set from the starting time ts to the ending time t e .
  • the decision operation also includes determining whether vi i is infected by u i and becomes infected at time t i .
  • the above method can discretize the decision process in the continuous time domain into decisions at time periods and time points, and then obtain the propagation trajectory and information propagation source of the target information.
  • the above method combines the communication time between devices and the propagation characteristics of information to determine the information propagation source of the target information, which can improve the tracing accuracy of the information propagation source.
  • an environment state may be set so that the agent performs the decision operation according to the environment state. Furthermore, after the agent performs a decision operation on a communication event, the environment state may be updated.
  • the process of using an agent to perform a decision operation to obtain a propagation trajectory includes the following steps:
  • Step 1 Use the agent to perform decision operations on each communication event in chronological order
  • Step 2 Determine the benefit value of each decision operation and calculate the total benefit of the alternative trajectory according to the total benefit calculation formula
  • r total represents the total revenue
  • K represents the total number of communication events
  • r end is used to describe the probability of a device in the candidate set that has not been determined to be infected by the agent during the time period t K to te being infected by an external factor
  • r penalty represents the misjudgment penalty value.
  • all factors that can introduce viruses into the system are external factors.
  • a server renter may introduce a virus by uploading a file
  • a device administrator may introduce a virus by using a USB flash drive infected with a virus.
  • the candidate trajectory with the largest total benefit can be used as the propagation trajectory of the target information, and the candidate trajectory with a total benefit greater than a preset benefit value can also be used as the propagation trajectory of the target information.
  • r i can be determined in the following way: determine the first-category benefit value r 2i-1 and the second-category benefit value r 2i ; wherein the first-category benefit value r 2i-1 is positively correlated with the probability that the device in the candidate set is infected by an external infection source in the time period t i-1 to t i , and the second-category benefit value r 2i is positively correlated with the probability that the target information is propagated through the i-th communication event at time t i .
  • the above embodiment may also determine r end in the following manner: taking the logarithm of the probability that a device in the candidate set that is in an uninfected state is infected by external factors during the time period from tK to te as the third type benefit value r end .
  • the above embodiment may also determine r penalty in the following manner: after the agent performs decision operations on all communication events, the number of devices that are not in the candidate set and are determined by the agent to be in an infected state is set as the number of misjudgments; and the misjudgment penalty value r penalty is determined according to the number of misjudgments.
  • the agent's decision network can also be trained; wherein the decision network is obtained by combining the Structure2Vector model and the recurrent neural network.
  • the decision network of the agent before training the decision network of the agent, it also includes: inputting communication logs, initial strategy parameters of the agent, the number of iterations and the number of propagation trajectories generated by the agent during each iteration, and hyperparameters for calculating benchmark returns into the algorithm for training the decision network.
  • the above embodiment can obtain the strategy parameters obtained through pre-training, and set the strategy parameters obtained through pre-training as the initial strategy parameters of the intelligent agent.
  • the trained intelligent agent can also be used for strategy reasoning, so as to use the trained intelligent agent to perform decision operations on each communication event in chronological order.
  • viruses computer viruses, biological viruses
  • rum, and pollutants enter the group from the external environment and then spread in the group.
  • the infection pathways of any individual in the group can be divided into: (1) infection through contact with the external environment; (2) infection through contact with other infected individuals in the group.
  • the source of transmission refers to the individual who is infected through contact with the external environment.
  • the source tracing algorithm based on structural information entropy determines the source of information propagation.
  • the above-mentioned source tracing algorithm based on structural information entropy has the following disadvantages: (1) It ignores the contact time between individuals. In actual scenarios, obtaining contact time usually does not require additional cost. For example, a computer virus spreads through shared files, and the time when two devices share files can be easily obtained from logs. (2) The propagation characteristics of the virus are not utilized. For example, in an intranet, the more services that device u opens for external network access, the greater the probability that device u will become a source of transmission. The more files shared by device u and device v, the greater the probability of mutual infection. These propagation characteristics can be obtained through expert experience or estimated from historical data. (3) Artificially designed tracing strategies have limitations.
  • this embodiment proposes a traceability solution based on reinforcement learning, which can be used as an effective supplement to the existing traceability algorithms.
  • the traceability algorithm proposed in this proposal improves the accuracy of traceability from the following two aspects: (1) The traceability problem is modeled as a sequential decision problem, and reinforcement learning is used to automatically learn the traceability strategy to avoid the limitations of manual strategies.
  • this proposal proposes the Evolve-S2V model as the state perception module of the intelligent agent based on the S2V (Structure2Vector) model.
  • Evolve-S2V can dynamically generate a graph neural network in each decision operation to perceive the state in different decision operations.
  • the reward function in the sequential decision problem is designed using the time when the contact occurs and the propagation characteristics of the virus.
  • This embodiment models the traceability problem as a sequential decision problem, incorporates the time of contact and the transmission characteristics of the virus into the reward mechanism to improve the traceability accuracy, and discretizes the decision-making process in the continuous time domain into decisions at time periods and time points.
  • This embodiment proposes a learning process and reasoning process for a traceability strategy; models the state as a graph structure, and proposes the Evolve-S2V algorithm based on the S2V algorithm as the perception module of the intelligent agent to handle finite-step decisions and variable-length state graph sequences; this embodiment also proposes a specific implementation plan for traceability.
  • Figure 2 is a schematic diagram of the device status at time t e provided by an embodiment of the present application.
  • Figure 2 shows all external factors that can introduce viruses, all devices in an infected state at time t e (candidate set), and devices that are not infected at time t e .
  • This embodiment uses the communication records between devices over a period of time and the status of all devices at t e (infected or uninfected) to infer the source of transmission.
  • this embodiment uses an intelligent agent trained by reinforcement learning to make decisions on the status of the device. Specifically, two decisions are made before and after each communication between devices. For example, in a certain communication, device v receives data from device u at time t, and the time interval from the last communication event between the devices is ⁇ t, then the two decision operations are as follows:
  • Decision operation 1 Determine the status of the devices in the candidate set that have not been determined to be infected.
  • Figure 3 is a schematic diagram of the device status in a candidate set provided by an embodiment of the present application.
  • Figure 3 shows all external factors that can introduce viruses, all devices in an infected state at t e (candidate set), and devices that are not infected at t e .
  • the gray circles in Figure 3 represent devices that were determined to be infected by the agent before the current decision operation, and the gray circles represent devices that were not determined to be infected.
  • the agent only considers devices in the candidate set that have not been determined to be infected, and determines whether these devices are infected by external factors in the ⁇ t time period. For devices determined to be infected by external factors, their corresponding nodes are changed into gray circles.
  • Decision operation 2 Determine the state of device v. If device v has not been determined to be infected, and device u has been determined to be infected in the previous decision operation, the agent determines whether device v is infected by device u and becomes infected.
  • Figure 4 is a schematic diagram of the state of devices not in the candidate set provided by the embodiment of the present application. Figure 4 shows all external factors that can introduce viruses, all devices in the infected state at t e (candidate set), and devices that are not infected at t e .
  • the above two decision operations are performed until decisions are made on all communication events. After completing this process, the transmission source is obtained.
  • the above process can be repeated multiple times to improve the accuracy of tracing.
  • the decision-making process of the agent is driven by communication events. For each event, the agent makes two decisions. Therefore, for K communication events, there are 2K decision-making operations. When the agent performs each decision operation, it will judge the state change of the relevant equipment. Therefore, the entire decision-making process of the agent corresponds to a transmission link of the virus. is proportional to the probability of this transmission link occurring in the environment.
  • r penalty represents the penalty for the agent to judge an uninfected node as an infected node, which is the penalty for misjudgment in the entire decision-making process.
  • the communication event (u i , vi , ti ) represents the communication between device u i and device vi at time ti .
  • ti represents the time when the i-th communication event occurs.
  • t s be the time when no device in the data center is infected
  • the decision-making process of the agent is as follows:
  • First decision determine whether the nodes in the candidate set are infected by external infection sources in (t s , t 1 );
  • Second decision Determine whether v1 is infected by u1 at time t1.
  • the third decision determine whether the nodes in the candidate set are infected by external infection sources in (t 1 , t 2 );
  • the fourth decision determine whether v 2 is infected by u 2 at time t 2 .
  • 2K-1th decision determine whether the nodes in the candidate set are infected by external infection sources in (t K-1 , t K );
  • the 2Kth decision Determine whether v K is infected by u K at time t K.
  • the agent After completing 2K decisions, for the nodes in I that have not been infected, the agent assumes that these nodes are infected by external factors in (t K , t e ).
  • O represents the group, that is, the set of all individuals
  • Z 0 represents all external (relative to the population) factors that can introduce the virus into O;
  • t s represents the initial time, and no individual is infected at this time
  • t e represents the moment when the existence of an infected individual is discovered
  • I represents the set of all infected individuals at time t e , i.e., the candidate set
  • ⁇ u represents the rate at which u is infected through Z 0 ;
  • T u represents the time point when u is infected through Z 0 ;
  • p u,v represents the probability that v will be infected when u contacts v if u is in an infected state
  • R represents the set of real numbers.
  • the calculation of the rate ⁇ u is related to the specific application scenario.
  • the infection rate of a server is related to the ports opened on the server, the user groups it serves, the frequency of user data upload, etc.
  • An individual can be in an infected state or a non-infected state. Once an individual is infected, they will remain in an infected state.
  • the contact is directed, that is, (u, v, t) means that u and v have a contact at time t (that is, a communication event has occurred), and the virus can only spread from u to v.
  • undirected contacts they can be expressed by directed contacts (u, v, t) and (v, u, t).
  • This embodiment models the traceability problem as a sequential decision problem and uses reinforcement learning to automatically learn the traceability strategy.
  • reinforcement learning the agent observes the state of the environment, selects actions based on the information obtained from the observation, applies them to the environment, and changes the state of the environment to obtain corresponding benefits. The agent repeats this process until it reaches the terminal state. The goal of the agent is to maximize the cumulative benefits obtained during the interaction with the environment.
  • the decision-making task of the intelligent agent is to determine when an individual is infected, which is a decision-making problem in the continuous time domain.
  • the communication record Log contains a total of K contacts. Sort L. g by the order of contact occurrence: where t i ⁇ t j , As mentioned above, there are two decision operations for each contact (communication event).
  • the edge set E describes the contacts between nodes in the current decision operation
  • W E ⁇ R, W(e) represents the weight of edge e, which describes the probability of virus transmission through e;
  • the initial state G1 is defined as follows:
  • V 1 V
  • I 2i-1 ⁇ u: u ⁇ I
  • the agent chooses behavior according to strategy ⁇ Apply A 2i-1 to the environment: All individuals in A 2i-1 are judged to be infected.
  • the benefit obtained from this operation is: the probability that the virus spreads from Z 0 to individuals in A 2i-1 and does not spread to individuals in I 2i-1 ⁇ A 2i-1 in the time period [t i-1 , t i ].
  • the individuals in A 2i-1 are explained below:
  • I 2i-1 ⁇ 1, 2 ⁇ , it means that device 1 and device 2 have not been infected; is the action set of the agent, and the agent's behavior is the following 4 types:
  • ⁇ 1 ⁇ means device 1 is infected
  • ⁇ 2 ⁇ means device 2 is infected
  • ⁇ 1, 2 ⁇ means that both device 1 and device 2 are infected
  • the individuals in A2i-1 represent infected devices.
  • I 2i—1 ⁇ A 2i-1 represents the difference between the set A 2i-1 and the set I 2i—1 .
  • the agent decides whether to change vi to an infected state. (indicates that ui is in a non-infected state or vi is in an infected state), then the action set otherwise.
  • the agent chooses actions according to the policy ⁇ Apply A 2i to the environment: All individuals in A 2i are judged to be infected. The benefits of this operation are:
  • the benefit obtained when the agent judges vi as being in an infected state is the logarithm of the probability that the virus successfully infects vi ; the benefit obtained when the agent judges vi as being in a non-infected state is the logarithm of the probability that the virus fails to successfully infect vi .
  • a communication event has two decisions, and i is the index of the communication event.
  • the state observed in the first decision is G 1 (ie, G 2i-1 ), and the state observed in the second decision is G 2 (ie, G 2i ).
  • the state observed in the first decision is G 3 (ie, G 2i-1 ), and the state observed in the second decision is G 4 (ie, G 2i ).
  • V2 i+1 V
  • l:O ⁇ I ⁇ R + is a pre-given penalty function
  • l(u) represents the penalty for judging an uninfected individual u as infected.
  • this proposal transforms the decision-making problem in the continuous time domain into a decision-making problem in discrete time steps through decisions on time periods and decisions on time points, simplifying the calculation process.
  • the decision-making process of the agent is the process of generating a propagation trajectory.
  • Each propagation trajectory may have one or more propagation sources.
  • the process of generating a propagation trajectory based on the above decision-making process is as follows:
  • a propagation trajectory ⁇ is an infection tree or multiple infection trees.
  • the root node of each infection tree is a propagation source, that is, the node infected by Z0 in a certain time period [t i—1 , t i ] decided by the agent.
  • An edge (u, v) is in ⁇ if and only if the agent determines that the virus spreads from u to v through contact (u, v, t).
  • a transmission trajectory contains multiple infection trees.
  • the total benefit obtained by the agent is related to the following two aspects:
  • the interaction process between the agent and the environment described in the previous section (in other words, the process of generating propagation trajectories) is called the function GenTrajectories, whose input is ⁇ ⁇ and whose output is the trajectory T consisting of one or more infected trees, and the reward r corresponding to T. Since ⁇ ⁇ is a random strategy, the output of each call to GenTrajectories may be different.
  • the strategy training process is as follows:
  • ⁇ 0 represents the initial value of parameter ⁇
  • IterNum indicates the number of iterations
  • BatchSize indicates the number of times GenTrajectories is called in each iteration
  • GenTraiectories represents a decision-making process of an agent.
  • ⁇ (0, 1) represents the hyperparameter for calculating the benchmark return.
  • Step 2.1For i 1,...,BatchSize
  • Step 2.2 Calculate the gradient:
  • B represents the batch size BatchSize
  • b represents the benchmark return, see step 2.4;
  • ⁇ ⁇ (T i ) represents the probability of the policy network generating T i .
  • Step 2.3 Update parameters ⁇ ADAM( ⁇ , d ⁇ );
  • Step 2.4 Update the benchmark return
  • the essence of the training process is that the intelligent agent continuously generates propagation trajectories. For propagation trajectories with high rewards, the probability of generating the propagation trajectory is increased; for propagation trajectories with low rewards, the probability of generating the propagation trajectory is reduced.
  • one or more pre-collected logs can be used to pre-train from randomly initialized ⁇ 0 to obtain a strategy ⁇ pretrain .
  • adaptive training is performed using the parameters of ⁇ pretrain as initial parameters to save training time.
  • PopulationSize represents the number of times sampled from ⁇
  • Step 2 Count the individuals in I The number of times it was identified as a source of transmission;
  • Step 3 Return the CandidateNum individuals identified as the transmission source the most times.
  • the agent generates multiple propagation trajectories, and one or more propagation sources can be inferred from each propagation trajectory.
  • the node that becomes the propagation source the most times is determined as the propagation source.
  • the perception and decision modules of the agent are described as follows:
  • the sequential decision problem in this proposal is a finite-step sequential decision problem, so this proposal uses different graph neural networks in different decision operations.
  • this proposal combines the S2V model and the recurrent neural network to propose the Evolve-S2V model.
  • the Evolve-S2V model can dynamically generate a graph neural network in each decision operation to extract information from the state.
  • the perception module can represent the environment (i.e., nodes, node states, edges) as a dense vector for use by the agent.
  • X(u) represents the feature of node u.
  • W(v,u) represents the probability that node v infects node u. Represents the matrix that transforms the edge weights in the lth layer of the graph neural network in the i-th decision operation.
  • the parameters Wi,l of the graph neural network in the decision operation i in the lth layer are evolved from the parameters Wi-1 ⁇ l of the graph neural network in the previous decision operation i-1.
  • the description of the decision module is as follows:
  • the agent starts from the action set Select set A from the set and determine whether the individuals in A are infected. For ease of calculation, the agent in this proposal independently determines whether each candidate point is selected into A.
  • the set of candidate points in decision operation i be C (in different decision operations, C is an uninfected individual in I or an individual that has contact with an infected individual).
  • the agent takes The probability of selecting v into A, or The probability of not selecting v into A.
  • u is an external factor Z0 that can introduce the virus into O or an infected individual who contacts v.
  • This embodiment specifically solves three problems existing in existing traceability algorithms: (1) ignoring the contact time between individuals; (2) not utilizing the transmission characteristics of the virus; (3) the artificially designed traceability strategy has limitations.
  • this proposal improves the accuracy of traceability from the following two aspects: (1) Modeling the traceability problem as a sequential decision problem, and using reinforcement learning to automatically learn the traceability strategy.
  • this embodiment proposes the Evolve-S2V model as the state perception module of the intelligent agent based on the S2V model. Compared with the S2V model, Evolve-S2V can dynamically generate a graph neural network in each decision operation to perceive the state in different decision operations. (2) In the decision-making process, make full use of the time when the contact occurs and the transmission characteristics of the virus.
  • the reward function in the sequential decision problem is designed using the time when the contact occurs.
  • the algorithm proposed in this embodiment can not only identify the source of transmission, but also restore the transmission trajectory of the virus in the group. Therefore, the algorithm in this embodiment can be used as an auxiliary means in scenarios such as electronic forensics and network security auditing.
  • FIG5 is a schematic diagram of the structure of a system for determining an information transmission source provided in an embodiment of the present application
  • the system may include:
  • the candidate set determination module 501 is used to receive a propagation source tracing request and determine a candidate set according to the propagation source tracing request; wherein the candidate set includes all devices in an infected state at the deadline te , and the devices in an infected state are devices that have received target information;
  • a time tracing module 502 used to determine a starting time ts when all devices in the candidate set are not in an infected state
  • the transmission source determination module 504 is used to determine the information transmission source according to the transmission trajectory.
  • This embodiment determines a candidate set according to the received propagation source tracing request, determines the starting time ts and the ending time t e for determining the propagation trajectory, and queries all communication events of all devices from the starting time ts to the ending time t e .
  • the decision operation also includes determining whether vi i is infected by u i and becomes infected at time t i .
  • the above method can discretize the decision process in the continuous time domain into decisions at time periods and time points, and then obtain the propagation trajectory and information propagation source of the target information.
  • the above method combines the communication time between devices and the propagation characteristics of information to determine the information propagation source of the target information, which can improve the tracing accuracy of the information propagation source.
  • the process of determining the candidate set according to the propagation source tracing request by the candidate set determination module 501 includes: parsing the propagation source tracing request to obtain target information and deadline te ; and constructing the candidate set according to the target information and deadline te .
  • the process of constructing a candidate set according to the target information and the deadline te by the candidate set determination module 501 includes: determining the target devices that have received the target information at the deadline te ; and constructing a candidate set including all the target devices.
  • the process of the time tracing module 502 determining the starting time ts when no device in the candidate set is in an infected state includes: tracing back the device scanning records from the end time te as the starting point, and determining the starting time ts when no device in the candidate set is in an infected state by using the device scanning records.
  • the environment setting module is used to set the environment state before using the intelligent agent to perform state decision operations on each communication event in chronological order, so that the intelligent agent can perform decision operations according to the environment state.
  • the environment update module is used to update the environment state after the agent performs a decision operation on a communication event.
  • the decision module 503 uses the intelligent agent to perform decision operations on each communication event in turn according to the time sequence, and the process of obtaining the propagation trajectory of the target information includes: using the intelligent agent to perform decision operations on each communication event in turn according to the time sequence; determining the benefit value of each decision operation, and calculating the total benefit of the alternative trajectory according to the total benefit calculation formula; setting the alternative trajectory whose total benefit is greater than the preset benefit value as the propagation trajectory of the target information.
  • the process of the decision module 503 calculating the total benefit of the candidate trajectory according to the total benefit calculation formula includes: calculating the total benefit of the candidate trajectory according to the total benefit calculation formula;
  • a gain determination module used to determine a first-category gain value r 2i-1 and a second-category gain value r 2i ;
  • the first type of benefit value r 2i-1 is positively correlated with the probability that the device in the candidate set is infected by an external infection source during the time period ti -1 to ti
  • the second type of benefit value r 2i is positively correlated with the probability that the target information is propagated through the i-th communication event at time ti .
  • the benefit determination module is further used to use the logarithm of the probability that a device in the candidate set that is not determined by the intelligent agent to be in an infected state during the time period from t K to t e is infected by external factors as the third type of benefit value r end .
  • the penalty determination module is used to set the number of devices that are not in the candidate set and are determined by the agent to be in an infected state as the number of misjudgments after the agent performs decision operations on all communication events; and is also used to determine the misjudgment penalty value r penalty according to the number of misjudgments.
  • a training module used for training a decision network of the agent before using the agent to perform decision operations on each communication event in chronological order; wherein the decision network is obtained by combining the Structure2Vector model and the recurrent neural network;
  • the parameter setting module is used to input the communication log, the initial strategy parameters of the agent, the number of iterations and the number of propagation trajectories generated by the agent during each iteration, and the hyperparameters for calculating the benchmark return into the training algorithm of the agent before training the agent.
  • the strategy parameter setting module is used to obtain the strategy parameters obtained through pre-training and set the strategy parameters obtained through pre-training as the initial strategy parameters of the intelligent agent.
  • the reasoning module is used to perform strategy reasoning using the trained intelligent agent after training the intelligent agent, so as to perform decision operations on each communication event in sequence according to the time sequence using the trained intelligent agent.
  • the audit module is used to perform network security audit operations on the information dissemination source after determining the information dissemination source based on the dissemination trajectory.
  • the process of the decision module 503 querying all communication events of all devices from the starting time ts to the ending time te includes: reading the communication logs of all devices, and querying all communication events from the starting time ts to the ending time te according to the communication logs.
  • the present application also provides an electronic device, which may include a memory and a processor.
  • a computer program is stored in the memory.
  • the processor calls the computer program in the memory, the steps provided in the above embodiment can be implemented.
  • the electronic device may also include various network interfaces, power supplies and other components.
  • FIG6 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application. As shown in FIG6, the electronic device includes:
  • Communication interface 601 capable of exchanging information with other devices such as network devices;
  • the processor 602 is connected to the communication interface 601 to implement information exchange with other devices and is used to execute the information transmission source determination method provided by one or more technical solutions when running the computer program.
  • the computer program is stored in the memory 603.
  • bus system 604. the various components in the electronic device are coupled together through the bus system 604. It can be understood that the bus system 604 is used to realize the connection and communication between these components.
  • the bus system 604 also includes a power bus, a control bus and a status signal bus. However, for the sake of clarity, various buses are marked as bus system 604 in FIG. 6.
  • the present application also provides a non-volatile readable storage medium, on which a computer program is stored, and when the computer program is executed, the steps provided in the above embodiment can be implemented.
  • the storage medium may include: semiconductor storage chip, USB flash drive, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program code.
  • Figure 7 is a structural schematic diagram of a non-volatile readable storage medium provided in an embodiment of the present application.
  • the storage medium can be a non-volatile or non-transient storage chip, specifically including a decoding driver, a storage matrix, a read-write circuit, an address line, a data line, a chip select line and a read/write control line.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请公开了一种信息传播源的确定方法、系统、电子设备及存储介质,所属的技术领域为数据分析技术。所述信息传播源的确定方法包括:接收传播源追溯请求,根据所述传播源追溯请求确定候选集;其中,所有设备包括候选集中的设备和不在候选集中的设备;确定所述候选集中的设备均未处于感染状态的起始时刻ts;查询所有设备从所述起始时刻ts至所述截止时刻te的所有通信事件,按照时间顺序利用智能体依次对每一所述通信事件执行决策操作,得到所述目标信息的传播轨迹;根据所述传播轨迹确定信息传播源。本申请能够提高对信息传播源的追溯精度。

Description

一种信息传播源的确定方法、系统、电子设备及存储介质
相关申请的交叉引用
本申请要求于2022年11月24日提交中国专利局、申请号为202211482014.2、发明名称为“一种信息传播源的确定方法、系统、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据分析技术领域,特别涉及一种信息传播源的确定方法、系统、电子设备及存储介质。
背景技术
病毒(计算机病毒、生物病毒)、谣言、污染物等的传播给社会经济秩序的稳健发展带来了巨大的不确定性。传播源追溯(即,溯源)是阻断传播的重要手段。例如,某计算机内网中的所有设备组成了一个群体,外部环境由一切可能将病毒引入内网的因素构成,从外部环境引入病毒的设备就是传播源。追溯到传播源后,分析其中潜在的漏洞,找出病毒从外网进入内网的机理,从而阻断病毒再次从外网侵入。相关技术中,通常使用基于结构信息熵的溯源算法确定传播源,但是上述方式的追溯精度较低。
发明内容
本申请的目的是提供一种信息传播源的确定方法、系统、电子设备及存储介质,能够提高对信息传播源的追溯精度。
为解决上述技术问题,根据第一方面,本申请提供一种信息传播源的确定方法,该信息传播源的确定方法包括:
接收传播源追溯请求,根据传播源追溯请求确定候选集;其中,候选集包括所有在截止时刻te处于感染状态的设备,处于感染状态的设备为接收了目标信息的设备;
确定候选集中的设备均未处于感染状态的起始时刻ts
查询所有设备从起始时刻ts至截止时刻te的所有通信事件,按照时间顺序利用智能体依次对每一通信事件执行决策操作,得到目标信息的传播轨迹;其中,第i个通信事件为设备ui在ti时刻向设备vi传输了数据,第i个通信事件对应的决策操作包括:判断在ti-1至ti时间段内候选集中的设备是否被外部感染源所感染,t0=ts;若设备ui在当前的决策操作之前已被智能体判定为感染状态且设备vi尚未被判定为感染状态,则第i个通信事件对应的决策操作还包括:判断在ti时刻设备vi是否被ui所感染;
根据传播轨迹确定信息传播源。
可选的,根据传播源追溯请求确定候选集,包括:
解析传播源追溯请求得到目标信息和截止时刻te
根据目标信息和截止时刻te构建候选集。
可选的,根据目标信息和截止时刻te构建候选集,包括:
确定在截止时刻te接收了目标信息的目标设备;
构建包含所有目标设备的候选集。
可选的,确定候选集中的设备均未处于感染状态的起始时刻ts,包括:
以截止时刻te为起点向前追溯设备扫描记录,利用设备扫描记录确定候选集中的设备均未处于感染状态的起始时刻ts
可选的,在按照时间顺序利用智能体依次对每一通信事件执行状态决策操作之前,还包括:
设置环境状态,以便智能体根据环境状态执行决策操作。
可选的,还包括:
在智能体对一个通信事件执行一次决策操作后,更新环境状态。
可选的,按照时间顺序利用智能体依次对每一通信事件执行决策操作,得到目标信息的传播轨迹,包括:
按照时间顺序利用智能体依次对每一通信事件执行决策操作;
确定每一决策操作的收益值,并按照总收益计算公式计算备选轨迹的总收益;
将总收益大于预设收益值的备选轨迹设置为目标信息的传播轨迹。
可选的,按照总收益计算公式计算备选轨迹的总收益,包括:
按照总收益计算公式计算备选轨迹的总收益;
其中,总收益计算公式为rtotal表示总收益,K表示通信事件的总数量,ri用于描述在ti-1至ti时间段内设备被感染的概率,rend用于描述在tK至te时间段内候选集中尚未被智能体判定为感染状态的设备通过外部因素被感染的概率,rpenalty表示误判惩罚值。
可选的,还包括:
确定第一类收益值r2i-1和第二类收益值r2i;其中,第一类收益值r2i-1与在ti-1至ti时间段内候选集中的设备被外部感染源所感染的概率正相关,第二类收益值r2i与在ti时刻目标信息通过第i个通信事件传播的概率正相关;
将在tK至te时间段内候选集中未被智能体判断为感染状态的设备通过外部因素被感染的概率的对数作为第三类收益值rend
在智能体对所有通信事件执行决策操作之后,将不在候选集中且被智能体判定为处于感染状态的设备数量设置为误判数量;
根据误判数量确定误判惩罚值rpenalty
可选的,在按照时间顺序利用智能体依次对每一通信事件执行决策操作之前,还包括:
训练智能体的决策网络;其中,决策网络通过将Structure2Vector模型和循环神经网络结合得到;其中训练决策网络所需的输入包括:通信日志、智能体的初始策略参数、迭代次数和每次迭代过程中智能体所生成的传播轨迹的数目、以及用于计算基准回报的超参数;
获取通过预训练得到的策略参数,将预训练得到的策略参数设置为智能体的初始策略参数。
可选的,在训练智能体的决策网络之后,还包括:
利用训练完毕的智能体进行策略推理,以便按照时间顺序利用训练完毕的智能体依次对每一通信事件执行决策操作。
可选的,查询所有设备从起始时刻ts至截止时刻te的所有通信事件,包括:
读取所有设备的通信日志,根据通信日志查询从起始时刻ts至截止时刻te的所有通信事件。
根据第二方面,本申请还提供了一种信息传播源的确定系统,该系统包括:
候选集确定模块,用于接收传播源追溯请求,根据传播源追溯请求确定候选集;其中,候选集包括所有在截止时刻te处于感染状态的设备,处于感染状态的设备为接收了目标信息的设备;
时间追溯模块,用于确定候选集中的设备均未处于感染状态的起始时刻ts
决策模块,用于查询所有设备从起始时刻ts至截止时刻te的所有通信事件,按照时间顺序利用智能体依次对每一通信事件执行决策操作,得到目标信息的传播轨迹;其中,第i个通信事件为设备ui在ti时刻向设备vi传输了数据,第i个通信事件对应的决策操作包括:判断在ti-1至ti时间段内候选集中的设备是否被外部感染源所感染,t0=ts;若设备ui在当前的决策操作之前已被智能体判定为感染状态且设备vi尚未被判定为感染状态,则第i个通信事件对应的决策操作还包括:判断在ti时刻设备vi是否被ui所感染;
传播源确定模块,用于根据传播轨迹确定信息传播源。
根据第三方面,本申请还提供了一种非易失性可读存储介质,其上存储有计算机程序,计算机程序执行时实现上述信息传播源的确定方法执行的步骤。
根据第四方面,本申请还提供了一种电子设备,包括存储器和处理器,存储器中存储有计算机程序,处理器调用存储器中的计算机程序时实现上述信息传播源的确定方法执行的步骤。
本申请提供了一种信息传播源的确定方法,包括:接收传播源追溯请求,根据传播源追溯请求确定候选集;其中,候选集包括所有在截止时刻te处于感染状态的设备,处于感染状态的设备为接收了目标信息的设备;确定候选集中的设备均未处于感染状态的起始时刻ts;查询所有设备从起始时刻ts至截止时刻te的所有通信事件,按照时间顺序利用智能体依次对每一通信事件执行决策操作,得到目标信息的传播轨迹;其中,第i个通信事件为设备ui在ti时刻向设备vi传输了数据,第i个通信事件对应的决策操作包括:判断在ti-1至ti时间段内候选集中的设备是否被外部感染源所感染,t0=ts;若设备ui在当前的决策操作之前已被智能体判定为感染状态且设备vi尚未被判定为感染状态,则第i个通信事件对应的决策操作还包括:判断在ti时刻设备vi是否被ui所感染;根据传播轨迹确定信息传播源。
本申请根据接收到的传播源追溯请求确定候选集,并确定用于确定传播轨迹的起始时刻ts和截止时刻te,查询所有设备从起始时刻ts至截止时刻te的所有通信事件。本申请利用智能体依次对每一通信事件执行决策操作,记第i个通信事件为设备ui在ti时刻向设备vi传输了数据,针对此事件,决策操作判断在ti-1至ti时间段内候选集中的设备是否被外部感染源所感染,其中,t0=ts。此外,若ui在之前的决策操作中已被智能体判定为感染状态,决策操作还包括判断在ti时刻判断vi是否被ui感染而变成感染状态。通过上述方式能够将连续时间域上的决策流程离散化为时间段和时间点上的决策,进而得到目标信息的传播轨迹和信息传播源。上述方式结合设备间的通信时间和信息的传播特性确定目标信息的信息传播源,能够提高对信息传播源的追溯精度。本申请同时还提供了一种信息传播源的确定系统、一种非易失性可读存储介质和一种电子设备,具有上述有益效果,在此不再赘述。
附图说明
为了更清楚地说明本申请实施例,下面将对实施例中所需要使用的附图做简单的介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例所提供的一种信息传播源的确定方法的流程图;
图2为本申请实施例所提供的一种在te时刻的设备状态示意图;
图3为本申请实施例所提供的一种候选集中设备状态示意图;
图4为本申请实施例所提供的不在候选集中的设备的状态示意图;
图5为本申请实施例所提供的一种信息传播源的确定系统的结构示意图;
图6为本申请实施例所提供的一种电子设备的结构示意图;
图7为本申请实施例所提供的一种非易失性可读存储介质的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
下面请参见图1,图1为本申请实施例所提供的一种信息传播源的确定方法的流程图。
具体步骤可以包括:
S101:接收传播源追溯请求,根据传播源追溯请求确定候选集;
其中,本实施例可以应用于与数据中心连接的数据分析平台,数据中心包括多个设备。设备管理员可以按照预设周期(如3天)对数据中心进行扫描,若设备管理员检测到存在设备接收了目标信息,则可以向数据分析平台发送传播源追溯请求。例如设备管理员每隔3天对数据中心进行病毒扫描,在2022-11-04扫描时未发现病毒,在2022-11-07扫描时发现了病毒,说明病毒是在2022-11-04~2022-11-07间进入数据中心的,取这段时间中机器间的通信记录进行本实施例的信息传播源的确定操作。
在接收到传播源追溯请求后,可以解析传播源追溯请求得到目标信息和截止时刻te,进而根据目标信息和截止时刻te构建候选集。上述截止时刻te也可以为接收传播源追溯请求的时刻。候选集包括所有在截止时刻te处于感染状态的设备,处于感染状态的设备为接收了目标信息的设备,未处于感染状态的设备为未接收目标信息的设备。具体的,本步骤可以确定在截止时刻te接收了目标信息的目标设备,进而构建包含所有目标设备的候选集。
上述目标信息可以为病毒、谣言或污染物(如携带不良信息的图片、音频等),此处不对目标信息的内容或类型进行具体限定。
S102:确定候选集中的设备均未处于感染状态的起始时刻ts
其中,在确定截止时刻te之后,可以在约束条件下查询起始时刻ts,上述约束条件为:ts早于te,且在ts候选集中的所有设备均未处于感染状态。作为一种可行的实施方式,本步骤可以以截止时刻te为起点向前追溯设备扫描记录,利用设备扫描记录确定候选集中的设备均未处于感染状态的起始时刻ts
S103:查询所有设备从起始时刻ts至截止时刻te的所有通信事件,按照时间顺序利用智能体依次对每一通信事件执行决策操作,得到目标信息的传播轨迹;
其中,从起始时刻ts至截止时刻te存在多个通信事件,第i个通信事件的发生时间为ti。本步骤可以读取有候选集中所有设备的通信日志,根据通信日志查询从起始时刻ts至截止时刻te的所有通信事件。其中,所有设备包括候选集中的设备和不在候选集中的设备,候选集指所有在截止时刻te处于感染状态的设备,所谓“候选”是指该设备有可能是传播源。
在得到各个通信事件之后,本步骤可以利用智能体对每一通信事件执行决策操作,以便根据决策操作的结果确定目标信息的传播轨迹,上述智能体可以为信息轨迹分析模型。
其中,第i个通信事件为设备ui在ti时刻向设备vi传输了数据,第i个通信事件对应的决策操作包括:判断在ti-1至ti时间段内候选集中的设备是否被外部感染源所感染,t0=ts;若设备ui在当前的决策操作之前已被智能体判定为感染状态且设备vi尚未被判定为感染状态,则第i个通信事件对应的决策操作还包括:判断在ti时刻设备vi是否被ui所感染;ti表示第i个通信事件的发生时刻。智能体在执行上述决策操作后能够得到每一备选轨迹的收益值,进而根据收益值确定目标信息的传播轨迹。上述备选轨迹根据智能体对所有通信事件的决策确定,若智能体判定候选集中的设备u被外部感染源感染,则u为传播源;若智能体判定目标信息从u传播到v,则u和v在备选轨迹中位置相邻。
S104:根据传播轨迹确定信息传播源。
其中,传播轨迹描述了目标信息在外部环境和各个设备之间的传递顺序,通过本实施例的操作可以确定至少一条传播轨迹,从外部环境引入目标信息的设备就是信息传播源,在本步骤中可以将传播轨迹中第一个设备设置为信息传播源。进一步的,在根据传播轨迹确定信息传播源之后,还可以对信息传播源进行网络安全审计操作。
本实施例根据接收到的传播源追溯请求确定候选集,并确定用于确定传播轨迹的起始时刻ts和截止时刻te,查询候选集中的设备从起始时刻ts至截止时刻te的所有通信事件。本实施例利用智能体依次对每一通信事件执行决策操作,记第i个通信事件为设备ui在ti时刻向设备vi传输了数据,针对此事件,上述决策操作判断在ti-1至ti时间段内候选集中的设备是否被外部感染源所感染,其中,t0=ts。此外,若ui在之前的决策操作中已被智能体判定为感染状态,决策操作还包括判断在ti时刻判断vi是否被ui感染而变成感染状态。通过上述方式能够将连续时间域上的决策流程离散化为时间段和时间点上的决策,进而得到目标信息的传播轨迹和信息传播源。上述方式结合设备间的通信时间和信息的传播特性确定目标信息的信息传播源,能够提高对信息传播源的追溯精度。
作为对于图1对应实施例的进一步介绍,在按照时间顺序利用智能体依次对每一通信事件执行状态决策操作之前,还可以设置环境状态,以便智能体根据环境状态执行上述决策操作。进一步的,在智能体对一个通信事件执行一次决策操作后,还可以更新上述环境状态。
作为对于图1对应实施例的进一步介绍,利用智能体执行决策操作得到传播轨迹的过程包括以下步骤:
步骤1:按照时间顺序利用智能体依次对每一通信事件执行决策操作;
步骤2:确定每一决策操作的收益值,并按照总收益计算公式计算备选轨迹的总收益;
具体的,上述总收益计算公式为rtotal表示总收益,K表示通信事件的总数量,ri用于描述在ti-1至ti时间段内设备被感染的概率,其中,t0=ts。rend用于描述在tK至te时间段内候选集中尚未被智能体判定为感染状态的设备通过外部因素被感染的概率,rpenalty表示误判惩罚值。在数据中心的场景下,所有可将病毒引入系统的都是外部因素。例如:服务器的租用者可能会通过上传文件引入病毒、设备管理员可能会通过使用感染病毒的U盘引入病毒,上述行为都是外部因素。
步骤3:将总收益大于预设收益值的备选轨迹设置为目标信息的传播轨迹。
本实施例可以将总收益最大的备选轨迹作为目标信息的传播轨迹,也可以将总收益大于预设收益值的备选轨迹作为目标信息的传播轨迹。
作为一种可行的实施方式,可以通过以下方式确定ri:确定第一类收益值r2i-1和第二类收益值r2i;其中,第一类收益值r2i-1与在ti-1至ti时间段内候选集中的设备被外部感染源所感染的概率正相关,第二类收益值r2i与在ti时刻目标信息通过第i个通信事件传播的概率正相关。
上述实施例还可以通过以下方式确定rend:将在tK至te时间段内候选集中处于未感染状态的设备通过外部因素被感染的概率的对数作为第三类收益值rend
上述实施例还可以通过以下方式确定rpenalty:在智能体对所有通信事件执行决策操作之后,将不在候选集中且被智能体判定为处于感染状态的设备数量设置为误判数量;根据误判数量确定误判惩罚值rpenalty
作为对于图1对应实施例的进一步介绍,在按照时间顺序利用智能体依次对每一通信事件执行决策操作之前,还可以训练智能体的决策网络;其中,决策网络通过将Structure2Vector模型和循环神经网络结合得到。
进一步的,在训练智能体的决策网络之前,还包括:向训练决策网络的算法输入通信日志、智能体的初始策略参数、迭代次数和每次迭代过程中智能体所生成的传播轨迹的数目、以及用于计算基准回报的超参数。
为了提高训练智能体的效率,上述实施例可以获取通过预训练得到的策略参数,将预训练得到的策略参数设置为智能体的初始策略参数。
在完成智能体的策略网络的训练之后,还可以利用训练完毕的智能体进行策略推理,以便按照时间顺序利用训练完毕的智能体依次对每一通信事件执行决策操作。
下面通过在实际应用中的基于强化学习的传播源追溯方案说明上述实施例描述的流程。
病毒(计算机病毒、生物病毒)、谣言、污染物的传播给社会经济秩序的稳健发展带来了巨大的不确定性。病毒(注意:不失一般性,本实施例将病毒、谣言、污染物的传播统称为病毒传播。)由外部环境进入群体,进而在群体中扩散。群体中任一个体的感染途径可分为:(1)通过与外部环境接触而被感染;(2)通过与群体中其他已感染个体的接触而被感染。其中,传播源是指通过与外部环境接触而被感染的个体。
相关技术中基于结构信息熵的溯源算法确定信息传播源,上述基于结构信息熵的溯源算法确定信息传播源的缺点如下:(1)忽略了个体间的接触时间。在实际场景中,获得接触时间通常不需要付出额外的成本。例如,某计算机病毒通过共享文件传播,两台设备共享文件的时间很容易从日志中获得。(2)未利用病毒的传播特性。例如,在内网中,设备u开启供外网访问的服务越多,则设备u成为传播源的概率越大。设备u和设备v共享的文件越多,则相互感染的概率越大。这些传播特性可通过专家经验或从历史数据中估计得到。(3)人工设计的溯源策略具有局限性。
针对上述相关技术中存在的缺点,本实施例提出了一种基于强化学习的溯源方案,该方案可作为已有溯源算法的有效补充。和已有溯源算法相比,本提案提出的溯源算法从如下两方面提升溯源的精度:(1)将溯源问题建模为序列决策问题,采用强化学习自动地学习出溯源策略,避免人工策略的局限性。特别地,本提案基于S2V(Structure2Vector)模型提出Evolve-S2V模型作为智能体的状态感知模块。和S2V模型相比,Evolve-S2V能在每个决策操作中动态地生成图神经网络以感知不同决策操作中的状态。(2)在决策过程中充分利用接触发生的时间和病毒的传播特性。特别地,利用接触发生的时间和病毒的传播特性设计序列决策问题中的回报函数。
本实施例将溯源问题建模为序列决策问题,将接触发生的时间和病毒的传播特性纳入回报机制中以提升溯源精度,将连续时间域上的决策流程离散化为时间段和时间点上的决策。本实施例提出了一种溯源策略的学习过程和推理过程;将状态建模为图结构,且基于S2V算法提出了Evolve-S2V算法作为智能体的感知模块,以处理有限步的决策及变长的状态图序列;本实施例还提出了溯源的一种具体实施方案。
下面结合具体的场景来简要介绍决策过程。在某数据中心,设备管理员在te时刻发现病毒,进而对所有设备进行诊断发现所有被感染的设备。故设备管理员在te时所得的信息如图2所示,图2为本申请实施例所提供的一种在te时刻的设备状态示意图,图2中示出了所有可引入病毒的外部因素、te时所有处于感染状态的设备(候选集)、以及te时未被感染的设备。
假设设备被感染后将持续处于感染状态,直至被设备管理员修复。因此,只需要在te时处于感染状态的设备(候选集)中寻找传播源。本实施例利用一段时间内设备间的通信记录,以及te时所有设备的状态(处于感染状态或非感染状态)来推断传播源。为实现此目标,本实施例采用强化学习训练的智能体对设备的状态进行决策。具体而言,在每次设备间通信的前后进行两次决策。例如,在某次通信中,设备v在时刻t从设备u接收了数据,且距离设备间上次通信事件的时间间隔为Δt,则两个决策操作如下:
决策操作1:判定候选集中尚未被判定为感染状态的设备的状态。
请参见图3,图3为本申请实施例所提供的一种候选集中设备状态示意图,图3种示出了所有可引入病毒的外部因素、te时所有处于感染状态的设备(候选集)、te时未被感染的设备。图3中存在灰度的圆圈表示在当前决策操作之前被智能体判定为感染状态的设备,无灰度的圆圈表示未被判定处于感染状态的设备。在本决策操作中,智能体仅考虑候选集中尚未被判定为感染状态的设备,且判断这些设备是否在Δt时间段中被外部因素感染。对判定为被外部因素感染的设备,将其对应的节点变成存在灰度的圆圈。
决策操作2:对设备v的状态进行判断。若设备v尚未被判定为感染状态,且设备u在之前的决策操作中已被判定为感染状态,则智能体判定设备v是否被设备u感染而变成感染状态。请参见图4,图4为本申请实施例所提供的不在候选集中的设备的状态示意图,图4中示出了所有可引入病毒的外部因素、te时所有处于感染状态的设备(候选集)、te时未被感染的设备。
针对每一通信事件,均进行以上两次决策操作,直到对所有的通信事件完成决策,完成这一过程后就得到了传播源,可将以上过程重复多次来提升溯源的精度。
智能体的决策过程是由通信事件驱动的,对每一个事件,智能体均进行两次决策。故对K个通信事件而言共有2K次决策操作。智能体在执行每次决策操作时,会判断相关设备的状态变化。因此,智能体的整个决策过程就对应病毒的一条传播链路。和这条传播链路在环境中发生的概率成正比。rpenalty表示智能体将未被感染的节点判定为感染节点所受到的惩罚,是整个决策过程中误判的惩罚。通信事件(ui,vi,ti)表示在ti时设备ui与设备vi进行通信。ti表示第i个通信事件的发生时间。
整个决策过程可描述如下:
记ts为数据中心中无设备感染的时间;
在本实施例中,智能体的决策流程如下:
观察到第一个通信事件(u1,v1,t1)进行两次决策:
第一次决策:判断在(ts,t1)中,候选集中的节点是否被外部感染源所感染;
第二次决策:判断在t1时刻,v1是否被u1感染。
观察到第二个通信事件(u2,v2,t2)时,进行第三次决策和第四次决策。
第三次决策:判断在(t1,t2)中,候选集中的节点是否被外部感染源所感染;
第四次决策:判断在t2时刻,v2是否被u2感染。
依此类推,直到观察到最后一个通信事件
(uK,vK,tK)进行第2K-1次决策和第2K次决策。
第2K-1次决策:判断在(tK-1,tK)中,候选集中的节点是否被外部感染源所感染;
第2K次决策:判断在tK时刻,vK是否被uK感染。
完成2K次决策后,对于I中尚未被感染的节点,智能体默认这些节点在(tK,te)中被外部因素感染。
本实施例的符号说明如下:
O表示群体,即:所有个体组成的集合;
Z0表示所有可将病毒引入O的外部(相对群体而言)因素;
ts表示初始时刻,且在该时刻无个体被感染;
te表示发现存在被感染个体的时刻;
I表示te时所有被感染的个体组成的集合,即候选集;
λu表示u通过Z0被感染的速率;
Tu表示u通过Z0被感染的时间点;
pu,v表示若u处于感染状态,u在接触v时,v被感染的概率;
R表示实数集。
速率λu的计算和具体的应用场景相关,例如:在数据中心中,某台服务器被感染的速率和该服务器上开启的端口、所面向的用户群体、用户上传数据的频率等均有关系。
本实施例的假设如下:
(1)个体可处于感染状态或非感染状态。个体一旦被感染,将持续处于感染状态。
(2)接触是有向的,即:(u,v,t)表示u在时刻t和v发生了一次接触(即,发生了通信事件),病毒只能从u传播到v。对无向接触,可通过有向接触(u,v,t)和(v,u,t)来表达。
(3)λu和pu,v均可通过专家经验、历史数据估计得到。
(4)若在t时刻,个体u处于非感染状态,则u在[t,t+Δt]中通过Z0被感染的概率Pr为:
本实施例的问题描述如下:
在初始时刻ts无个体处于感染状态,在时刻te发现存在被感染的个体。给定历史接触记录Log={(u,v,t):u∈O,v∈O,t∈[ts,te]},给定I,推断传播源,即:I中通过Z0被感染的个体。
本实施例的问题建模如下:
本实施例将溯源问题建模为序列决策问题,并采用强化学习自动地学习出溯源策略。在强化学习中,智能体观察环境状态,由观察所得的信息选择动作并将其作用于环境且改变环境状态以获得相应的收益。智能体重复此过程直至达到终止状态。智能体的目标是极大化和环境交互过程中所获得的累计收益。下面将描述在溯源问题中智能体和环境的交互过程:
智能体的决策任务是判断个体在何时被感染,这是一个连续时间域上的决策问题。
为便于计算,本提案将连续时间域上的决策问题转化为离散时间步上的决策问题。通信记录Log共包含K次接触。将L。g按接触发生的先后排序:其中ti<tj如前文所述,对每次接触(通信事件)均有两次决策操作。
本提案采用图G=(V,E,W,X)表示环境状态,其中:
(1)节点集V={Z0}∪O在所有决策操作中保持不变;
(2)边集E刻画当前决策操作中的节点间的接触;
(3)W:E→R,W(e)表示边e的权重,刻画了病毒通过e传播的概率;
(4)X:O→{0,1},X(u)=0表示智能体判定u为非感染状态,X(u)=1表示智能体判定u为感染状态。
初始状态G1的定义如下:
(1)V1=V;
(2)E1={(z0,u):u∈I,X1(u)=0};
(3)W1(z0,u)=Pr{Tu≤t1|Tu>t0},其中t0=ts
(4)X1(u)=0,
对1≤i≤K,在观察到G2i-1后,智能体的动作集为其中I2i-1={u:u∈I,X2i-1(u)=0}由所有I中尚未被判定为感染状态的个体组成。其中,表示I2i—1的幂集。智能体根据策略π选择行为将A2i-1作用于环境:将A2i-1中所有的个体判定为感染状态。此步操作所获得的收益为:病毒在时间段[ti-1,ti]中从Z0扩散到A2i—1中个体且未扩散到I2i-1\A2i-1中个体的概率。下面说明A2i-1中的个体:
若I2i-1={1,2},表示设备1和设备2尚未被感染;是智能体的动作集,智能体的行为就是以下4种:
表示设备1和设备2均未被感染;
{1}表示设备1被感染;
{2}表示设备2被感染;
{1,2}表示设备1和设备2均被感染;
A2i—1中的个体表示被感染的设备。
I2i—1\A2i-1表示集合A2i-1与集合I2i—1的差。
为方便计算,取概率的对数作为收益如下:
概率的定义参照前文所述的假设,若A2i-1和I2i—1\A2i—1均为空集,则r2i—1=0。
此步操作后,环境状态将更新为G2i
(1)V2i=V;
(2)
(3)
(4)
在观察到G2i后,智能体判断是否将vi变成感染状态。若(表示ui处于非感染状态或vi处于感染状态),则动作集否则。智能体根据策略π选择行为将A2i作用于环境:将A2i中所有的个体判定为感染状态。此步操作所获得的收益为:
其中,若智能体将vi判定为感染状态时所获得的收益为病毒成功感染vi的概率的对数;智能体将vi判定为非感染状态时所获得的收益为病毒未成功感染vi的概率的对数。
一个通信事件,有两次决策,i是通信事件的索引。
在i=1时,对于第一个通信事件(u1,v1,t1),此时进行两次决策:第一次决策观察到的状态为G1(即,G2i-1),第二次决策观察到的状态为G2(即,G2i)。
在i=2时,对于第二个通信事件(u2,v2,t2),此时进行两次决策:第一次决策观察到的状态为G3(即,G2i-1),第二次决策观察到的状态为G4(即,G2i)。
此步操作后,环境状态将更新为G2i+1
(1)V2i+1=V;
(2)E2i+1={(Z0,u):u∈I,X2i+1(u)=0};
(3)W2i+1(z0,u)=Pr{Tu≤ti+1|Tu>ti},
(4)
增加i的取值,重复以上过程,直到智能体完成最后一次决策(即第2K次决策)。所有I中仍被判定为处于非感染状态的个体组成集合I′={u:∈I,X2K+1(u)=0}。智能体将获得收益为病毒在时间段[tK,te]中从Z0扩散到I′中的概率的对数。
此外,对O\I中被判定为处于感染状态的节点集合I″={u:u∈O\I,X2K+1(u)=1}。智能体将获得负的收益:
其中,l:O\I→R+为预先给定的惩罚函数,l(u)表示将未感染的个体u判断为感染状态的惩罚。
综上所述,智能体所获得的总收益为:
上述决策过程中,在每个时间段[ti-1,ti]上,智能体判断I中的节点是否通过Z0被感染;在每个时间点ti上,智能体判断病毒是否通过(ui,vi)传播。故本提案通过时间段上的决策和时间点上的决策将连续时间域上的决策问题转化为了离散时间步上的决策问题,简化了计算流程。
智能体的决策过程是传播轨迹的生成过程,每一条传播轨迹中可能有一个或多个传播源,根据上述决策过程生成一条传播轨迹的过程如下:
(1)一条传播轨迹τ是一棵感染树或多棵感染树,每棵感染树的根节点为某个传播源,即:智能体决策的在某个时间段[ti—1,ti]中通过Z0被感染的节点。
(2)边(u,v)在τ中当且仅当智能体判断病毒通过接触(u,v,t)从u传播到v。
(3)当有多个传播源时,一条传播轨迹中就包含了多棵感染树。
智能体所获得的总收益和如下两方面相关:
(1)其生成的传播轨迹在环境中发生的概率;
(2)完成决策后,其所判定的所有个体所处的状态和te时刻所有个体所处状态的吻合程度。
因此,智能体所获得的总收益越高,其生成的传播轨迹越准确。
智能体中溯源策略的训练和推理说明如下:
下面介绍智能体的策略π的学习。为便于描述,采用πθ来强调策略中的参数θ。
记上小节所述的智能体和环境的交互过程(换言之,传播轨迹生成过程)为函数GenTrajectories,其输入为πθ,输出为生成的一棵或多棵感染树组成的轨迹T,及T对应的奖励r。由于πθ是一个随机策略,因此每次调用GenTrajectories的输出可能不同。
策略的训练过程如下:
输入:θ0,IterNum,BatchSize,α;
参数说明:
θ0表示参数θ的初值;
IterNum表示迭代的轮数;
BatchSize表示每次迭代过程调用GenTrajectories的次数;
GenTraiectories表示智能体的一次决策过程。
α∈(0,1)表示计算基准回报的超参数。
Step 1.θ←θ0,b←0
Step 2.Fort=1,…,IterNum
Step 2.1For i=1,…,BatchSize
Ti,ri=GenTrajectories(πθ)
Step 2.2计算梯度:
以上公式中B表示批大小BatchSize;
b表示基准收益,见step 2.4;
表示对函数ln πθ(Ti)中的参数θ求导数。
其中,πθ(Ti)表示策略网络生成Ti的概率。
Step 2.3更新参数θ←ADAM(θ,dθ);
Step 2.4更新基准收益
训练过程本质上就是智能体不断地生成传播轨迹,对于获得奖励高的传播轨迹,增大生成该传播轨迹的概率;对于获得奖励低的传播轨迹,降低生成该传播轨迹的概率。
在实际应用中,可利用预先搜集的一个或多个Log由随机初始化的θ0开始预训练得到策略πpretrain。对后续需溯源的Log,以πpretrain的参数作为初始参数进行适应性训练以节省训练时间。
策略的推理过程如下:
输入:π,PopulationSize,CandidateNum
参数说明:
π表示已训练好的策略
PopulationSize表示从π中采样的次数
CandidateNum表示输出结果的长度
输出:最有可能是传播源的CandidateNum个个体;
Step 1.For i=1,…,PopulationSize;其含义是:生成PopulationSize个轨迹。
Ti,ri=GenTrajectories(πθ);
Step 2.统计I中的个体在中被识别为传播源的次数;
Step 3.返回被识别为传播源次数最多的CandidateNum个个体。
智能体生成多条传播轨迹,从每条传播轨迹中均可推断出一个或多个传播源。将成为传播源次数最多的节点判定为传播源。
智能体的感知和决策模块说明如下:
感知模块:本提案中的序列决策问题是有限步的序列决策问题,故本提案在不同的决策操作中采用不同的图神经网络。此外,为处理变长的图状态序列,本提案将S2V模型和循环神经网络相结合提出了Evolve-S2V模型,和S2V模型相比,Evolve-S2V模型能在每个决策操作中动态地生成图神经网络以从状态中抽取信息。感知模块能够将环境(即:节点、节点状态、边)表示为稠密向量以供智能体使用。
记图神经网络的层数为L,L为超参数。若在决策操作i中观察到状态G=(V,E,W,X),第l层中个体u的隐向量计算如下:
其中是决策操作i中采用的图神经网络在第l层的参数,1≤l≤L。对任意个体u,
表示第i个决策操作中,图神经网络在第l层输出的节点u的隐向量。X(u)表示节点u的特征,X(u)=0表示智能体尚未判定u变成感染状态。X(u)=1表示在之前的决策操作中,智能体已判定u变成了感染状态。表示第i个决策操作中,在图神经网络的第l层中变换中心节点特征的向量。W(v,u)表示节点v感染节点u的概率。表示第i个决策操作中在图神经网络的第l层中变换边上权重的矩阵。表示第i个决策操作中,在图神经网络的第l层中变换邻居节点在第l-1层中隐向量的矩阵。表示第i个决策操作中,在图神经网络的第l层中变换的W(v,u)向量。
决策操作i中的图神经网络在第l层中的参数Wi,l由前一个决策操作i-1中的图神经网络的参数Wi—1·l演化而来。具体而言,本提案采用循环神经网络GRU来建模此演化关系:Wi,l=GRU(Wi-1,l,gi,l-1)。
其中,为决策操作i中图在第i-1层的表示。初始化W0,l=0。
决策模块的说明如下:
智能体从动作集中选择集合A,并判定A中的个体为感染状态。为便于计算,本提案中智能体独立地判断每个候选点是否被选入A。记决策操作i中的候选点集合为C(在不同的决策操作中,C为I中未被感染的个体或和已感染个体有接触的某个体),对v∈C,智能体以的概率将v选入A,或以的概率不将v选入A。在不同的决策操作中,u为可将病毒引入O的外部因素Z0或接触v的已感染个体。
本实施例针对性地解决现有溯源算法中存在的三个问题:(1)忽略了个体间的接触时间;(2)未利用病毒的传播特性;(3)人工设计的溯源策略具有局限性。为解决以上问题,本提案从如下两方面提升溯源的精度:(1)将溯源问题建模为序列决策问题,采用强化学习自动地学习出溯源策略。特别地,本实施例基于S2V模型提出Evolve-S2V模型作为智能体的状态感知模块。和S2V模型相比,Evolve-S2V能在每个决策操作中动态地生成图神经网络以感知不同决策操作中的状态。(2)在决策过程中充分利用接触发生的时间和病毒的传播特性。特别地,利用接触发生的时间和病毒的传播特性设计序列决策问题中的回报函数。本实施例中提出的算法不仅可识别传播源,还可还原出病毒在群体中的传播轨迹。因此,本实施例中的算法可作为辅助手段应用到电子取证、网络安全审计等场景中。
请参见图5,图5为本申请实施例所提供的一种信息传播源的确定系统的结构示意图;
该系统可以包括:
候选集确定模块501,用于接收传播源追溯请求,根据传播源追溯请求确定候选集;其中,候选集包括所有在截止时刻te处于感染状态的设备,处于感染状态的设备为接收了目标信息的设备;
时间追溯模块502,用于确定候选集中的设备均未处于感染状态的起始时刻ts
决策模块503,用于查询所有设备从起始时刻ts至截止时刻te的所有通信事件,按照时间顺序利用智能体依次对每一通信事件执行决策操作,得到目标信息的传播轨迹;其中,第i个通信事件为设备ui在ti时刻向设备vi传输了数据,第i个通信事件对应的决策操作包括:判断在ti-1至ti时间段内候选集中的设备是否被外部感染源所感染,t0=ts;若设备ui在当前的决策操作之前已被智能体判定为感染状态且设备vi尚未被判定为感染状态,则第i个通信事件对应的决策操作还包括:判断在ti时刻设备vi是否被ui所感染;
传播源确定模块504。用于根据传播轨迹确定信息传播源。
本实施例根据接收到的传播源追溯请求确定候选集,并确定用于确定传播轨迹的起始时刻ts和截止时刻te,查询所有设备从起始时刻ts至截止时刻te的所有通信事件。本实施例利用智能体依次对每一通信事件执行决策操作,记第i个通信事件为设备ui在ti时刻向设备vi传输了数据,针对此事件,上述决策操作判断在ti-1至ti时间段内候选集中的设备是否被外部感染源所感染,其中,t0=ts。此外,若ui在之前的决策操作中已被智能体判定为感染状态,决策操作还包括判断在ti时刻判断vi是否被ui感染而变成感染状态。通过上述方式能够将连续时间域上的决策流程离散化为时间段和时间点上的决策,进而得到目标信息的传播轨迹和信息传播源。上述方式结合设备间的通信时间和信息的传播特性确定目标信息的信息传播源,能够提高对信息传播源的追溯精度。
进一步的,候选集确定模块501根据传播源追溯请求确定候选集的过程包括:解析传播源追溯请求得到目标信息和截止时刻te;根据目标信息和截止时刻te构建候选集。
进一步的,候选集确定模块501根据目标信息和截止时刻te构建候选集的过程包括:确定在截止时刻te接收了目标信息的目标设备;构建包含所有目标设备的候选集。
进一步的,时间追溯模块502确定候选集中的设备均未处于感染状态的起始时刻ts的过程包括:以截止时刻te为起点向前追溯设备扫描记录,利用设备扫描记录确定候选集中的设备均未处于感染状态的起始时刻ts
进一步的,还包括:
环境设置模块,用于在按照时间顺序利用智能体依次对每一通信事件执行状态决策操作之前,设置环境状态,以便智能体根据环境状态执行决策操作。
进一步的,还包括:
环境更新模块,用于在智能体对一个通信事件执行一次决策操作后,更新环境状态。
进一步的,决策模块503按照时间顺序利用智能体依次对每一通信事件执行决策操作,得到目标信息的传播轨迹的过程包括:按照时间顺序利用智能体依次对每一通信事件执行决策操作;确定每一决策操作的收益值,并按照总收益计算公式计算备选轨迹的总收益;将总收益大于预设收益值的备选轨迹设置为目标信息的传播轨迹。
进一步的,决策模块503按照总收益计算公式计算备选轨迹的总收益的过程包括:按照总收益计算公式计算备选轨迹的总收益;
其中,总收益计算公式为rtotal表示总收益,K表示通信事件的总数量,ri用于描述在ti-1至ti时间段内设备被感染的概率,其中,t0=ts。rend用于描述在tK至te时间段内候选集中尚未被智能体判定为感染状态的设备通过外部因素被感染的概率,rpenalty表示误判惩罚值。
进一步的,还包括:
增益确定模块,用于确定第一类收益值r2i-1和第二类收益值r2i
其中,第一类收益值r2i-1与在ti-1至ti时间段内候选集中的设备被外部感染源所感染的概率正相关,第二类收益值r2i与在ti时刻目标信息通过第i个通信事件传播的概率正相关。
收益确定模块,还用于将在tK至te时间段内候选集中未被智能体判断为感染状态的设备通过外部因素被感染的概率的对数作为第三类收益值rend
进一步的,还包括:
惩罚确定模块,用于在智能体对所有通信事件执行决策操作之后,将不在候选集中目被智能体判定为处于感染状态的设备数量设置为误判数量;还用于根据误判数量确定误判惩罚值rpenalty
进一步的,还包括:
训练模块,用于在按照时间顺序利用智能体依次对每一通信事件执行决策操作之前,训练智能体的决策网络;其中,决策网络通过将Structure2Vector模型和循环神经网络结合得到;
进一步的,还包括:
参数设置模块,用于在训练智能体之前,向智能体的训练算法中输入通信日志、智能体的初始策略参数、迭代次数和每次迭代过程中智能体所生成的传播轨迹的数目、以及用于计算基准回报的超参数。
进一步的,还包括:
策略参数设置模块,用于获取通过预训练得到的策略参数,将预训练得到的策略参数设置为智能体的初始策略参数。
进一步的,还包括:
推理模块,用于在训练智能体之后,利用训练完毕的智能体进行策略推理,以便按照时间顺序利用训练完毕的智能体依次对每一通信事件执行决策操作。
进一步的,还包括:
审计模块,用于在根据传播轨迹确定信息传播源之后,对信息传播源进行网络安全审计操作。
进一步的,决策模块503查询所有设备从起始时刻ts至截止时刻te的所有通信事件的过程包括:读取所有设备的通信日志,根据通信日志查询从起始时刻ts至截止时刻te的所有通信事件。
由于系统部分的实施例与方法部分的实施例相互对应,因此系统部分的实施例请参见方法部分的实施例的描述,这里暂不赘述。
本申请还提供了一种电子设备,可以包括存储器和处理器,存储器中存有计算机程序,处理器调用存储器中的计算机程序时,可以实现上述实施例所提供的步骤。当然电子设备还可以包括各种网络接口,电源等组件。图6为本申请实施例所提供的一种电子设备的结构示意图,如图6所示,电子设备包括:
通信接口601,能够与其它设备比如网络设备等进行信息交互;
处理器602,与通信接口601连接,以实现与其它设备进行信息交互,用于运行计算机程序时,执行上述一个或多个技术方案提供的信息传播源的确定方法。而计算机程序存储在存储器603上。
当然,实际应用时,电子设备中的各个组件通过总线系统604耦合在一起。可理解,总线系统604用于实现这些组件之间的连接通信。总线系统604除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图6中将各种总线都标为总线系统604。
本申请还提供了一种非易失性可读存储介质,其上存有计算机程序,该计算机程序被执行时可以实现上述实施例所提供的步骤。该存储介质可以包括:半导体存储芯片、U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。图7为本申请实施例所提供的一种非易失性可读存储介质的结构示意图,该存储介质可以为非易失或非瞬时的存储芯片,具体包括译码驱动、存储矩阵、读写电路、地址线、数据线、片选线和读/写控制线。
说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的系统而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。
还需要说明的是,在本说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的状况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。

Claims (20)

  1. 一种信息传播源的确定方法,其特征在于,包括:
    接收传播源追溯请求,根据所述传播源追溯请求确定候选集;其中,所述候选集包括所有在截止时刻te处于感染状态的设备,所述处于感染状态的设备为接收了目标信息的设备;
    确定所述候选集中的设备均未处于感染状态的起始时刻ts
    查询所有设备从所述起始时刻ts至所述截止时刻te的所有通信事件,按照时间顺序利用智能体依次对每一所述通信事件执行决策操作,得到所述目标信息的传播轨迹;其中,第i个通信事件为设备ui在ti时刻向设备vi传输了数据,所述第i个通信事件对应的决策操作包括:判断在ti-1至ti时间段内所述候选集中的设备是否被外部感染源所感染,t0=ts;若设备ui在当前的决策操作之前已被所述智能体判定为感染状态且设备vi尚未被判定为感染状态,则所述第i个通信事件对应的决策操作还包括:判断在ti时刻设备vi是否被ui所感染;
    根据所述传播轨迹确定信息传播源。
  2. 根据权利要求1所述信息传播源的确定方法,其特征在于,根据所述传播源追溯请求确定候选集,包括:
    解析所述传播源追溯请求得到所述目标信息和所述截止时刻te
    根据所述目标信息和所述截止时刻te构建所述候选集。
  3. 根据权利要求2所述信息传播源的确定方法,其特征在于,根据所述目标信息和所述截止时刻te构建所述候选集,包括:
    确定在所述截止时刻te接收了所述目标信息的目标设备;
    构建包含所有所述目标设备的所述候选集。
  4. 根据权利要求1所述信息传播源的确定方法,其特征在于,确定所述候选集中的设备均未处于感染状态的起始时刻ts,包括:
    以所述截止时刻te为起点向前追溯设备扫描记录,利用所述设备扫描记录确定所述候选集中的设备均未处于感染状态的起始时刻ts
  5. 根据权利要求1所述信息传播源的确定方法,其特征在于,在按照时间顺序利用智能体依次对每一所述通信事件执行状态决策操作之前,还包括:
    设置环境状态,以便所述智能体根据所述环境状态执行所述决策操作。
  6. 根据权利要求5所述信息传播源的确定方法,其特征在于,还包括:
    在所述智能体对一个所述通信事件执行一次决策操作后,更新所述环境状态。
  7. 根据权利要求1所述信息传播源的确定方法,其特征在于,所述按照时间顺序利用智能体依次对每一所述通信事件执行决策操作,得到所述目标信息的传播轨迹,包括:
    按照时间顺序利用所述智能体依次对每一所述通信事件执行所述决策操作;
    确定每一所述决策操作的收益值,并按照总收益计算公式计算备选轨迹的总收益;
    将总收益大于预设收益值的备选轨迹设置为所述目标信息的传播轨迹。
  8. 根据权利要求7所述信息传播源的确定方法,其特征在于,所述按照总收益计算公式计算备选轨迹的总收益,包括:
    按照所述总收益计算公式计算备选轨迹的总收益;
    其中,所述总收益计算公式为rtotal表示总收益,K表示通信事件的总数量,ri用于描述在ti-1至ti时间段内设备被感染的概率,rend用于描述在tK至te时间段内候选集中尚未被智能体判定为感染状态的设备通过外部因素被感染的概率,rpenalty表示误判惩罚值。
  9. 根据权利要求8所述信息传播源的确定方法,其特征在于,还包括:
    确定第一类收益值r2i-1和第二类收益值r2i;其中,所述第一类收益值r2i-1与在ti-1至ti时间段内所述候选集中的设备被外部感染源所感染的概率正相关,所述第二类收益值r2i与在ti时刻所述目标信息通过第i个通信事件传播的概率正相关;
    将在tK至te时间段内所述候选集中未被智能体判断为感染状态的设备通过外部因素被感染的概率的对数作为第三类收益值rend
    在所述智能体对所有所述通信事件执行所述决策操作之后,将不在所述候选集中且被智能体判定为处于感染状态的设备数量设置为误判数量;
    根据所述误判数量确定所述误判惩罚值rpenalty
  10. 根据权利要求1所述信息传播源的确定方法,其特征在于,在按照时间顺序利用智能体依次对每一所述通信事件执行决策操作之前,还包括:
    训练所述智能体的决策网络;其中,所述决策网络通过将Structure2Vector模型和循环神经网络结合得到;训练所述决策网络所需的输入包括:通信日志、所述智能体的初始策略参数、迭代次数和每次迭代过程中智能体所生成的传播轨迹的数目、以及用于计算基准回报的超参数;
    获取通过预训练得到的策略参数,将所述预训练得到的策略参数设置为所述智能体的初始策略参数。
  11. 根据权利要求10所述信息传播源的确定方法,其特征在于,在训练所述智能体的决策网络之后,还包括:
    利用训练完毕的智能体进行策略推理,以便按照时间顺序利用训练完毕的智能体依次对每一所述通信事件执行决策操作。
  12. 根据权利要求1所述信息传播源的确定方法,其特征在于,查询所有设备从所述起始时刻ts至所述截止时刻te的所有通信事件,包括:
    读取所有设备的通信日志,根据所述通信日志查询从所述起始时刻ts至所述截止时刻te的所有所述通信事件。
  13. 一种信息传播源的确定系统,其特征在于,包括:
    候选集确定模块,用于接收传播源追溯请求,根据所述传播源追溯请求确定候选集;其中,所述候选集包括所有在截止时刻te处于感染状态的设备,所述处于感染状态的设备为接收了目标信息的设备;
    时间追溯模块,用于确定所述候选集中的设备均未处于感染状态的起始时刻ts
    决策模块,用于查询所有设备从所述起始时刻ts至所述截止时刻te的所有通信事件,按照时间顺序利用智能体依次对每一所述通信事件执行决策操作,得到所述目标信息的传播轨迹;其中,第i个通信事件为设备ui在ti时刻向设备vi传输了数据,所述第i个通信事件对应的决策操作包括:判断在ti-1至ti时间段内所述候选集中的设备是否被外部感染源所感染,t0=ts;若设备ui在当前的决策操作之前已被所述智能体判定为感染状态且设备vi尚未被判定为感染状态,则所述第i个通信事件对应的决策操作还包括:判断在ti时刻设备vi是否被ui所感染;
    传播源确定模块,用于根据所述传播轨迹确定信息传播源。
  14. 一种电子设备,其特征在于,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器调用所述存储器中的计算机程序时实现如权利要求1至12任一项所述信息传播源的确定方法的步骤。
  15. 一种非易失性可读存储介质,其特征在于,所述非易失性可读存储介质中存储有计算机可执行指令,所述计算机可执行指令被处理器加载并执行时,实现如权利要求1至12任一项所述信息传播源的确定方法的步骤。
  16. 根据权利要求1所述信息传播源的确定方法,其特征在于,所述传播轨迹为多条,其中,根据所述传播轨迹确定信息传播源,包括:
    从每一所述传播轨迹中推断出一个或多个传播源;
    将成为传播源次数最多的设备判定为所述信息传播源。
  17. 根据权利要求1所述信息传播源的确定方法,其特征在于,根据所述传播轨迹确定信息传播源,包括:
    将所述传播轨迹中第一个设备设置为所述信息传播源。
  18. 根据权利要求1所述信息传播源的确定方法,其特征在于,在根据所述传播轨迹确定信息传播源之后,还包括:
    对所述信息传播源进行网络安全审计操作。
  19. 根据权利要求7所述信息传播源的确定方法,其特征在于,所述备选轨迹的总收益和所述备选轨迹在环境中发生的概率相关。
  20. 根据权利要求7所述信息传播源的确定方法,其特征在于,所述备选轨迹根据所述智能体对所有所述通信事件的决策确定。
PCT/CN2023/093272 2022-11-24 2023-05-10 一种信息传播源的确定方法、系统、电子设备及存储介质 WO2024108913A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211482014.2A CN115842668A (zh) 2022-11-24 2022-11-24 一种信息传播源的确定方法、系统、电子设备及存储介质
CN202211482014.2 2022-11-24

Publications (1)

Publication Number Publication Date
WO2024108913A1 true WO2024108913A1 (zh) 2024-05-30

Family

ID=85575995

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/093272 WO2024108913A1 (zh) 2022-11-24 2023-05-10 一种信息传播源的确定方法、系统、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN115842668A (zh)
WO (1) WO2024108913A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115842668A (zh) * 2022-11-24 2023-03-24 浪潮(北京)电子信息产业有限公司 一种信息传播源的确定方法、系统、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105915399A (zh) * 2016-06-27 2016-08-31 华侨大学 一种基于反向传播的网络风险源头追溯方法
CN111935192A (zh) * 2020-10-12 2020-11-13 腾讯科技(深圳)有限公司 网络攻击事件溯源处理方法、装置、设备和存储介质
CN113569142A (zh) * 2021-07-20 2021-10-29 西北工业大学 一种基于全阶邻居覆盖策略的网络谣言溯源方法
US20210409428A1 (en) * 2020-06-25 2021-12-30 VocaLink Limited Forensically Analysing and Determining a Network Associated with a Network Security Threat
CN115842668A (zh) * 2022-11-24 2023-03-24 浪潮(北京)电子信息产业有限公司 一种信息传播源的确定方法、系统、电子设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105915399A (zh) * 2016-06-27 2016-08-31 华侨大学 一种基于反向传播的网络风险源头追溯方法
US20210409428A1 (en) * 2020-06-25 2021-12-30 VocaLink Limited Forensically Analysing and Determining a Network Associated with a Network Security Threat
CN111935192A (zh) * 2020-10-12 2020-11-13 腾讯科技(深圳)有限公司 网络攻击事件溯源处理方法、装置、设备和存储介质
CN113569142A (zh) * 2021-07-20 2021-10-29 西北工业大学 一种基于全阶邻居覆盖策略的网络谣言溯源方法
CN115842668A (zh) * 2022-11-24 2023-03-24 浪潮(北京)电子信息产业有限公司 一种信息传播源的确定方法、系统、电子设备及存储介质

Also Published As

Publication number Publication date
CN115842668A (zh) 2023-03-24

Similar Documents

Publication Publication Date Title
US11799660B2 (en) Optimizations for verification of interactions system and method
Ben-Sasson et al. Interactive oracle proofs
Zeng et al. Survey of attack graph analysis methods from the perspective of data and knowledge processing
US11468044B2 (en) Optimizations for verification of interactions system and method using probability density functions
Sommestad et al. Cyber security risks assessment with bayesian defense graphs and architectural models
US20020138755A1 (en) Automatically generating valid behavior specifications for intrusion detection
CN111325417B (zh) 实现隐私保护的多方协同更新业务预测模型的方法及装置
WO2024108913A1 (zh) 一种信息传播源的确定方法、系统、电子设备及存储介质
Barzegar et al. Attack scenario reconstruction using intrusion semantics
CN113489619B (zh) 一种基于时间序列分析的网络拓扑推断方法及装置
Liu et al. IEEE P2668-compliant multi-layer IoT-DDoS defense system using deep reinforcement learning
US11875188B2 (en) Data processing system using directed acyclic graph and method of use thereof
CN113254472B (zh) 一种参数配置方法、装置、设备及可读存储介质
Sun et al. Network security technology of intelligent information terminal based on mobile internet of things
Tang et al. Secure and trusted collaborative learning based on blockchain for artificial intelligence of things
CN111291138B (zh) 更新关系网络的方法及装置
Ruget et al. Multi-species temporal network of livestock movements for disease spread
Barbhuiya et al. Analytical and computational aspects of the infinite buffer single server N policy queue with batch renewal input
Turner et al. Adaptive decision rules for the acquisition of nature reserves
CN112235254B (zh) 一种高速主干网中Tor网桥的快速识别方法
Wang et al. Label specificity attack: Change your label as I want
CN116743468A (zh) 基于强化学习的动态攻击路径生成方法
Rade et al. Temporal and stochastic modelling of attacker behaviour
CN112396151B (zh) 谣言事件的分析方法、装置、设备及计算机可读存储介质
Tian et al. Study on information management and security of e-commerce system