CN111181757B - Information security risk prediction method and device, computing equipment and storage medium - Google Patents

Information security risk prediction method and device, computing equipment and storage medium Download PDF

Info

Publication number
CN111181757B
CN111181757B CN201910684716.0A CN201910684716A CN111181757B CN 111181757 B CN111181757 B CN 111181757B CN 201910684716 A CN201910684716 A CN 201910684716A CN 111181757 B CN111181757 B CN 111181757B
Authority
CN
China
Prior art keywords
risk
value
operation data
basic information
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910684716.0A
Other languages
Chinese (zh)
Other versions
CN111181757A (en
Inventor
任飞
周明辉
刘跃波
方明
朱祁林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910684716.0A priority Critical patent/CN111181757B/en
Publication of CN111181757A publication Critical patent/CN111181757A/en
Application granted granted Critical
Publication of CN111181757B publication Critical patent/CN111181757B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides an information security risk prediction method, an information security risk prediction device, a computing device and a storage medium, which relate to the technical field of data processing and comprise the following steps: acquiring at least one type of operation data of a terminal to be evaluated; analyzing various operation data, and determining risk values respectively corresponding to the various operation data; and determining the risk assessment value of the terminal to be assessed based on a predetermined core density distribution and the risk values of various types of operation data, wherein the core density distribution is generated in advance according to the basic information of multiple users, and is used for representing the probability distribution of the risk values based on the basic information of the users. The information security risk prediction is carried out in the mode, the basic information of the user is combined, the risk value of the information security is predicted based on the nuclear density, and the accuracy of the information security risk prediction is improved.

Description

Information security risk prediction method and device, computing equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for predicting an information security risk, a computing device, and a storage medium.
Background
With the continuous improvement of office safety awareness, most companies build intranet office systems meeting office demands according to the demands, usually account numbers correspond to numbers of employees one by one, however, when lawless persons steal employee account information and log in the intranet of the company through the employee account information, the information safety of the company is seriously threatened, and the problem of information safety prediction is a problem of much concern in the industry.
In the prior art, information security risk prediction generally extracts features of information of a certain dimension of a company, inputs the extracted features into a prediction model obtained by pre-training, respectively obtains a risk prediction tag and a risk-free prediction tag, and respectively obtains corresponding risk prediction information and risk-free prediction information according to the risk prediction tag and the risk-free prediction tag, and the security risk predicted by the method only considers the influence of the certain dimension on information security, so that the accuracy of security risk prediction is low.
Disclosure of Invention
The embodiment of the application provides an information security risk prediction method and device, a computing device and a storage medium, which are used for solving the problem of poor accuracy of information security risk prediction in the prior art.
In a first aspect, an embodiment of the present application provides an information security risk prediction method, where the method includes:
acquiring at least one type of operation data of a terminal to be evaluated;
analyzing various operation data, and determining risk values respectively corresponding to the various operation data;
and determining the risk assessment value of the terminal to be assessed based on a predetermined core density distribution and the risk values of various types of operation data, wherein the core density distribution is generated in advance according to the basic information of multiple users, and is used for representing the probability distribution of the risk values based on the basic information of the users.
Optionally, each type of user basic information corresponds to one type of kernel density distribution, and the determining the risk assessment value of the terminal to be assessed based on the predetermined kernel density distribution and the risk values of various types of operation data includes:
executing for each type of operation data:
for each kind of user basic information: determining a risk value interval corresponding to the risk value of the type of operation data in the kernel density distribution of the user basic information; determining a probability value of the risk value in the nuclear density distribution of the user basic information according to the determined risk value interval, and determining the reciprocal of the probability value as a risk combination value obtained by combining the operation data and the user basic information;
carrying out weighted summation on risk combination values of various types of operation data corresponding to the same user basic information, and taking the obtained weighted summation result as a risk value corresponding to the user basic information;
and carrying out weighted summation on the risk values corresponding to the basic information of each user to obtain the risk evaluation value of the terminal to be evaluated.
Optionally, generating a kernel density distribution according to the user basic information includes:
for each kind of user basic information: counting the number of people corresponding to each designated value section by taking the designated value section in the user basic information value as a reference; and determining the kernel density distribution corresponding to the user basic information by combining a kernel density estimation algorithm according to each designated value interval and the corresponding number of people.
Optionally, analyzing various types of operation data, and determining risk values corresponding to the various types of operation data respectively, includes:
if the operation data are network topology data, generating access vectors corresponding to all the levels step by step according to the accessed topology, wherein the value in the access vectors is the access duration of the terminal to be evaluated at the corresponding network node;
and performing difference calculation on vectors of the same network level in the network topology data of different time periods, and performing weighted summation by taking the level number corresponding to each network level as a weight to obtain a risk value of the network topology.
Optionally, analyzing various types of operation data, and determining risk values corresponding to the various types of operation data respectively, includes:
performing, for each type of operational data other than network topology data: inputting the operation data into corresponding risk value prediction models to obtain risk values corresponding to the operation data, wherein each risk value prediction model is obtained by training according to the following method:
obtaining a predicted value of historical operation data of each terminal;
determining a difference value between the predicted value and the true value;
and training a risk value prediction model by adopting a baseline learning method according to the difference value.
Optionally, the method further includes:
if the risk assessment value is larger than a preset threshold value, comparing and analyzing the secondary assessment data of the terminal to be assessed and the nuclear density distribution of the secondary assessment data of the same type of terminal to determine the degree of outlier of the secondary assessment data of the terminal to be assessed;
and if the outlier is greater than the designated outlier, generating prompt information and sending the prompt information to a risk auditing terminal.
Optionally, the secondary evaluation data includes at least one type, and the determining the degree of outlier of the secondary evaluation data of the terminal to be evaluated by performing comparison analysis on the secondary evaluation data of the terminal to be evaluated and the nuclear density distribution of the secondary evaluation data of the terminal of the same type includes:
respectively determining the probability of each type of secondary evaluation data in the corresponding nuclear density distribution;
and carrying out weighted summation on the determined probability reciprocal to obtain the outlier.
Second aspect an embodiment of the present application provides an information security risk prediction apparatus, where the apparatus includes:
the acquisition module is used for acquiring at least one type of operation data of the terminal to be evaluated;
the data analysis module is used for analyzing various operation data and determining risk values corresponding to the various operation data respectively;
and the determining module is used for determining the risk assessment value of the terminal to be assessed based on a predetermined core density distribution and the risk values of various types of operation data, wherein the core density distribution is generated in advance according to the basic information of multiple users, and the core density distribution is used for representing the probability distribution of the risk values based on the basic information of the users.
Optionally, each type of user basic information corresponds to one type of kernel density distribution, and the determining module is configured to execute, for each type of operation data:
for each kind of user basic information: determining a risk value interval corresponding to the risk value of the type of operation data in the kernel density distribution of the user basic information; determining a probability value of the risk value in the nuclear density distribution of the user basic information according to the determined risk value interval, and determining the reciprocal of the probability value as a risk combination value obtained by combining the operation data and the user basic information;
carrying out weighted summation on risk combination values of various types of operation data corresponding to the same user basic information, and taking the obtained weighted summation result as a risk value corresponding to the user basic information;
and carrying out weighted summation on the risk values corresponding to the basic information of each user to obtain the risk evaluation value of the terminal to be evaluated.
Optionally, the apparatus further comprises: the kernel density distribution generating module is used for generating kernel density distribution according to the user basic information, and executing for each kind of user basic information:
counting the number of people corresponding to each designated value section by taking the designated value section in the user basic information value as a reference; and determining the kernel density distribution corresponding to the user basic information by combining a kernel density estimation algorithm according to each designated value interval and the corresponding number of people. Optionally, the data analysis module is configured to:
if the operation data are network topology data, generating access vectors corresponding to all the levels step by step according to the accessed topology, wherein the value in the access vectors is the access duration of the terminal to be evaluated at the corresponding network node;
and performing difference calculation on vectors of the same network level in the network topology data of different time periods, and performing weighted summation by taking the level number corresponding to each network level as a weight to obtain a risk value of the network topology.
Optionally, the data analysis module is configured to perform, for each type of operation data except for the network topology data:
inputting the operation data into corresponding risk value prediction models to obtain risk values corresponding to the operation data, wherein each risk value prediction model is obtained by training according to the following method:
obtaining a predicted value of historical operation data of each terminal;
determining a difference value between the predicted value and the true value;
and training a risk value prediction model by adopting a baseline learning method according to the difference value.
Optionally, the apparatus further comprises: the outlier determining module is used for comparing and analyzing the secondary evaluation data of the terminal to be evaluated and the nuclear density distribution of the secondary evaluation data of the same type of terminal if the risk evaluation value is greater than a preset threshold value, and determining the outlier of the secondary evaluation data of the terminal to be evaluated;
and if the outlier is greater than the designated outlier, generating prompt information and sending the prompt information to a risk auditing terminal.
Optionally, the secondary evaluation data includes at least one type, and the outlier determining module determines the outlier of the secondary evaluation data of the terminal to be evaluated by comparing and analyzing the secondary evaluation data of the terminal to be evaluated with the kernel density distribution of the secondary evaluation data of the terminal of the same type, and is configured to:
respectively determining the probability of each type of secondary evaluation data in the corresponding nuclear density distribution;
and carrying out weighted summation on the determined probability reciprocal to obtain the outlier.
In a third aspect, an embodiment of the present invention further provides a computing device, including:
a memory and a processor;
a memory for storing program instructions;
and the processor is used for calling the program instructions stored in the memory and the information security risk prediction method according to any one of the first aspects of the obtained program.
In a fourth aspect, an embodiment of the present invention further provides a computer storage medium, where the computer storage medium stores computer-executable instructions, and the computer-executable instructions are configured to enable a computer to execute any information security risk prediction method in the embodiments of the present application.
The information security risk prediction method, device, computing equipment and storage medium provided by the embodiment of the application comprise: the method comprises the steps of firstly obtaining at least one type of operation data of a terminal to be evaluated, then analyzing the various types of operation data, determining risk values corresponding to the various types of operation data respectively, and finally determining a risk evaluation value of the terminal to be evaluated based on a predetermined kernel density distribution and the risk values of the various types of operation data, wherein the kernel density distribution is generated in advance according to multi-user basic information, and the kernel density distribution is used for representing the probability distribution of the risk values based on the user basic information. The information security risk prediction is carried out in the mode, the basic information of the user is combined, the risk value of the information security is predicted based on the nuclear density, the information security risk prediction can be effectively carried out, and the accuracy of the information security risk prediction is higher.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of an information security risk prediction method according to an embodiment of the present disclosure;
fig. 2 is a network topology diagram provided in an embodiment of the present application;
FIG. 3 is a basic information distribution histogram provided by an embodiment of the present application;
FIG. 4 is a graph of a risk value distribution provided by an embodiment of the present application;
fig. 5 is a schematic flowchart of a method for verifying a risk assessment value according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram of an information security risk prediction method according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an information security risk prediction apparatus according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a computing device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.
With the rapid development of the internet, the mechanized office becomes more and more popular, and a lot of work in life needs to be done through the terminal: the computer is finished, office efficiency can be improved through computer office, and convenience is brought to life of users, however, as a result of the computer office, office information is leaked due to the fact that network information is leaked, and confidential information of companies is leaked. Based on this, the prediction about the information security risk becomes crucial, and based on this, the present application provides a prediction method of the information security risk. The embodiment of the present application specifically describes the prediction based on the office information security risk as an example, but the technical solution provided by the embodiment of the present application can be applied to the prediction of information security risks outside the office environment. Such as website affiliate-based security risk prediction, etc.
The office information may include: office information of each enterprise, school office information, hospital office information, and the like.
Referring to fig. 1, a method for predicting information security risk is provided in an embodiment of the present application, including:
step S101: and acquiring at least one type of operation data of the terminal to be evaluated.
It should be noted that the terminal to be evaluated generally refers to a terminal that needs to perform information security risk prediction, and if the information security risks of all employees in company a need to be predicted, the terminal to be evaluated is an office computer of all employees in company a.
Different users can leave different access information when accessing different websites through an office computer, different access flow information is provided when accessing different websites, and other office habit information such as different software use habits, file use habits, registry use habits and the like provided by different users can be used as operation data for information security risk prediction.
Step S102: and analyzing the various operation data, and determining the risk values respectively corresponding to the various operation data.
Step S103: and determining the risk assessment value of the terminal to be assessed based on a predetermined core density distribution and the risk values of various types of operation data, wherein the core density distribution is generated in advance according to the basic information of multiple users, and is used for representing the probability distribution of the risk values based on the basic information of the users.
When information security risk prediction is performed in step S102, data analysis is performed on different operation data to obtain corresponding risk values.
In one embodiment, if the operation data is network topology data, access vectors corresponding to the levels are generated step by step according to the accessed topology, wherein the value in the access vectors is the access duration of the terminal to be evaluated at the corresponding network node. And performing difference calculation on vectors of the same network level in the network topology data of different time periods, and performing weighted summation by taking the level number corresponding to each network level as a weight to obtain a risk value of the network topology.
The network topology diagram is shown in fig. 2, the diagram includes a user a and three servers accessed by the user a, which are respectively S1, S2 and S3, wherein the time duration of accessing the server S1 by the user a on a certain day in the diagram is 600S, the time duration of accessing the server S2 is 500S, so as to obtain a first-order tensor [600,500], the time duration of accessing S3 by S1 is 100S to obtain a second-order tensor [100,0], the first-order tensor obtained on the second day is [400,200] and a second-order tensor [0,100] respectively, and then all dimensions are reduced to form a vector, and the risk value is calculated by 600-400 +2 | 100-0 + | + 500-200 ± +2 | 0-100 |.
In one embodiment, in order to be able to simply and accurately identify the risk value of each operation data, the method can be executed for each type of operation data except for the network topology data, wherein the type of operation data is input to a corresponding risk value prediction model to obtain a risk value corresponding to the operation data, and each risk value prediction model can be trained according to the following method:
step A1: and acquiring a predicted value of the historical operation data of each terminal.
Step A2: and determining the difference value between the predicted value and the true value.
Step A3: and training a risk value prediction model by adopting a baseline learning method according to the difference value.
Specifically, the network traffic operation data is input to the risk value prediction model of the network traffic to obtain a risk value corresponding to the network traffic data, and, in addition, when the risk value model of the network traffic is trained, taking the time limit as one day as an example, the time limit can be set according to the actual demand, and the time limit is a week, a month, an hour, even a whole month and the like, and the method is not specifically limited, assuming that the date is 7, 15 and 7 in 2019, the risk value of the network traffic data before 15 and 7 in 2019 of each terminal is obtained, and inputting the data into a risk value prediction model of the network flow for data processing to obtain a predicted value of the network flow data in 7, 15 days in 2019, calculating a difference value between the predicted value and a true value of the network flow data in 7, 15 days in 2019, and adjusting model parameters by a baseline learning method through the difference value.
By the method, the risk values corresponding to different types of operation data can be obtained, and the accuracy is high.
In one embodiment, in step S103, the basic information of the user may include: working age information, sex information, work post information and the like, wherein the basic information of each user corresponds to a kernel density distribution. Determining the risk assessment value of the terminal to be assessed based on the predetermined core density distribution and the risk values of the various types of operation data comprises:
executing for each type of operation data:
for each kind of user basic information: determining a risk value interval corresponding to the risk value of the type of operation data in the kernel density distribution of the user basic information; and determining a probability value of the risk value in the nuclear density distribution of the user basic information according to the determined risk value interval, and determining the reciprocal of the probability value as a risk combination value obtained by combining the operation data and the user basic information.
The method adopts interval probability, can reduce calculated amount, and can be simply and effectively combined with the basic information of the user based on the risk value obtained by the estimated probability of each risk value interval.
In one embodiment, generating the kernel density distribution from the user base information to enable the kernel density distribution comprises: for each kind of user basic information: counting the number of people corresponding to each designated value section by taking the designated value section in the user basic information as a reference; and determining the kernel density distribution corresponding to the user basic information by combining a kernel density estimation algorithm according to each designated value interval and the corresponding number of people.
Taking the basic information as employees with a working age less than two years as an example, taking the operation data as a risk value of network traffic data as an example, the basic information distribution is shown in table 1, and includes a risk value interval of employees with a working age less than two years, and the number of employees with a working age less than two years corresponding to the risk value interval, the basic information distribution diagram is shown in fig. 3, the risk value distribution diagram calculated by the kernel density estimation algorithm is shown in fig. 4, the risk value of the user obtained in step S102 is combined with the risk value in the user basic information to calculate, and the reciprocal of the risk value probability of the user is determined as the risk value obtained by combining the operation data and the user basic information, such as: if the network traffic operation data of the staff with the working age less than two years is subjected to data analysis to obtain a risk value of 81, the risk value 81 is known to fall in a risk value interval of 90-100, the probability value with the risk value of 81 can be accurately estimated through kernel density estimation, and if the probability value is A, the reciprocal of A, namely 1/A, is taken as a risk combination value obtained by combining the working age information and the network traffic data.
TABLE 1
Figure BDA0002145870280000091
And carrying out weighted summation on the risk combination values of various types of operation data corresponding to the same user basic information, and taking the obtained weighted summation result as the risk value corresponding to the user basic information. And carrying out weighted summation on the risk values corresponding to the basic information of each user to obtain the risk evaluation value of the terminal to be evaluated.
It should be noted that the operation data includes multiple categories, the user basic information includes multiple categories, and when performing the risk assessment value calculation, it is specifically assumed that the user basic information is two types, namely, the work age and the gender, and the operation data is two types, namely, the network traffic data and the employee topology data, then the core density distribution corresponding to the work age information is assumed to be C, the core density distribution corresponding to the gender information is assumed to be D, the risk value based on the network traffic data of a certain employee is a, and the obtained risk value based on the employee topology data is B, as shown in table 2, a weighted sum calculation is performed to obtain a risk assessment value, where x, y, k, l, w, and s are weighted values, and the risk assessment value is obtained by performing a weighted sum calculation on the corresponding weighted values.
TABLE 2
Figure BDA0002145870280000101
The risk prediction value of the terminal to be evaluated is acquired by the method, and the accuracy of information security risk prediction is improved by combining with the user basic information and the application and kernel density estimation method.
In one embodiment, the information security risk prediction method further includes: the method for verifying the risk assessment value, as shown in fig. 5, includes:
step S501: and if the risk assessment value is larger than a preset threshold value, comparing and analyzing the secondary assessment data of the terminal to be assessed with the nuclear density distribution of the secondary assessment data of the similar terminal to determine the degree of outlier of the secondary assessment data of the terminal to be assessed.
Step S502: if the outlier is larger than the designated outlier, generating prompt information and sending the prompt information to a risk auditing terminal
It should be noted that, a preset threshold is drawn up as a, if the risk assessment value is greater than a, verification of the risk assessment value is required, if the risk assessment value is less than a, verification of the risk assessment value is not performed, and since the risk value is less than the preset threshold, it is determined that the information security risk is low, and thus verification is not performed, the same type of terminal may refer to terminals of the same work content, the same project group, the same post type, and the like. Because the operation data of the terminals of the same type are basically similar, the reliability of the risk assessment value can be determined by comparing and analyzing the operation data of the terminals of the same type, and the research and development post staff is taken as an example for explanation, such as: the research and development staff are staff with high risk assessment values, the risk assessment value of staff B of a certain research and development station is calculated through the method to be higher, the staff is temporarily determined as staff with information security risk, then the flow data used by the research and development station is used as secondary assessment data to be converted into nuclear density distribution, the degree of outlier of the flow data used between the staff B and staff of each research and development station is judged based on the flow data, and the lower the degree of outlier is, the smaller the difference between the staff B and the staff of each research and development station is judged; the smaller the information security risk of the employee B is; the larger the degree of outlier is, the larger the difference between the employee B and the employees at each research and development post is, the larger the information security risk of the employee B is, the prompt information is generated and sent to the risk auditing terminal for rechecking, and therefore the accuracy of the risk assessment value of the employee B is determined.
It should be noted that the secondary evaluation data includes at least one type, and when the risk evaluation value is verified, not only the flow data but also other data may be selected as the secondary evaluation data, and different parts of employees may be verified according to different secondary evaluation data, which may be set according to actual situations in specific implementation.
In an embodiment, the step S501 of comparing and analyzing the secondary evaluation data of the terminal to be evaluated and the kernel density distribution of the secondary evaluation data of the terminal of the same type to determine the degree of outlier of the secondary evaluation data of the terminal to be evaluated may include:
step C1: and respectively determining the probability of each type of secondary evaluation data in the corresponding nuclear density distribution.
Step C2: and carrying out weighted summation on the determined probability reciprocal to obtain the outlier.
For example, the core density distribution based on the access traffic of the same post obtains an outlier of L1, and the outlier obtained based on the login time of the same post of L2 may perform a weighted summation on L1 and L2 to determine the reliability of the risk value, where L1 is the inverse of the probability value of the user in the core density distribution of the access traffic of the same post, and L2 is the inverse of the probability value of the user in the core density distribution of the login time of the same post.
The method obtains the reliability of the risk assessment value of the terminal to be assessed by utilizing the kernel density estimation algorithm, and further improves the accuracy of information security risk prediction.
It should be noted that the weights used for weighted summation described herein are all obtained by training in advance through a machine learning method. The data previously labeled by the operator, i.e. a set of characteristic values, corresponds to a score that also labels such behavior well. And during training, the input data is the risk value obtained by calculating each operation data, and the output is the risk score marked by the operator. The standard of the risk score given by the operator is according to the risk grade, and the accuracy of information safety risk prediction can be improved by training the weight value in the mode.
In an embodiment, as shown in fig. 6, a schematic diagram of the information security risk prediction method provided by the present application is only illustrated by taking two types of operation data, namely, employee network topology data and network traffic operation data as an example, but the two types of operation data are not limited in practical application. In the embodiment of the application, firstly, data analysis is respectively carried out on network topology data and network flow operation data of employees to obtain corresponding risk values, weighting summation calculation is carried out on the corresponding risk values of the two types of operation data in an integrated analysis module through kernel density distribution determined based on employee basic information in an HR system to obtain a risk assessment value, calculation of the degree of outlier is carried out on data with higher risk assessment value in a secondary assessment data module, and the data with the higher degree of outlier is transmitted to a risk auditing terminal to be judged again by professional risk assessment personnel.
According to the method, the basic information of the user and the operation data are combined to evaluate the information security risk of the staff, and the risk evaluation value is subjected to secondary verification, so that the accuracy of information security risk prediction is further improved.
An embodiment of the present application provides an information security risk prediction apparatus, as shown in fig. 7, the apparatus includes: an acquisition module 71, a data analysis module 72, and a determination module 73.
The obtaining module 71 is configured to obtain at least one type of operation data of the terminal to be evaluated.
The data analysis module 72 is configured to analyze the various types of operation data and determine risk values corresponding to the various types of operation data.
A determining module 73, configured to determine a risk assessment value of the terminal to be assessed based on a predetermined kernel density distribution and risk values of various types of operation data, where the kernel density distribution is generated in advance according to the multi-user basic information, and the kernel density distribution is used to represent a probability distribution of risk values based on the user basic information.
Optionally, each type of user basic information corresponds to a kernel density distribution, and the determining module 73 is configured to execute, for each type of operation data:
for each kind of user basic information: determining a risk value interval corresponding to the risk value of the type of operation data in the kernel density distribution of the user basic information; determining a probability value of the risk value in the nuclear density distribution of the user basic information according to the determined risk value interval, and determining the reciprocal of the probability value as a risk combination value obtained by combining the operation data and the user basic information;
carrying out weighted summation on risk combination values of various types of operation data corresponding to the same user basic information, and taking the obtained weighted summation result as a risk value corresponding to the user basic information;
and carrying out weighted summation on the risk values corresponding to the basic information of each user to obtain the risk evaluation value of the terminal to be evaluated.
Optionally, the apparatus further comprises: the kernel density distribution generating module is used for generating kernel density distribution according to the user basic information, and executing for each kind of user basic information:
counting the number of people corresponding to each designated value section by taking the designated value section in the user basic information value as a reference; and determining the kernel density distribution corresponding to the user basic information by combining a kernel density estimation algorithm according to each designated value interval and the corresponding number of people.
Optionally, the data analysis module 72 is configured to:
if the operation data are network topology data, generating access vectors corresponding to all the levels step by step according to the accessed topology, wherein the value in the access vectors is the access duration of the terminal to be evaluated at the corresponding network node;
and performing difference calculation on vectors of the same network level in the network topology data of different time periods, and performing weighted summation by taking the level number corresponding to each network level as a weight to obtain a risk value of the network topology.
Optionally, the data analysis module 72 is configured to perform, for each type of operation data except for the network topology data:
inputting the operation data into corresponding risk value prediction models to obtain risk values corresponding to the operation data, wherein each risk value prediction model is obtained by training according to the following method:
obtaining a predicted value of historical operation data of each terminal;
determining a difference value between the predicted value and the true value;
and training a risk value prediction model by adopting a baseline learning method according to the difference value.
Optionally, the apparatus further comprises: the outlier determining module is used for comparing and analyzing the secondary evaluation data of the terminal to be evaluated and the nuclear density distribution of the secondary evaluation data of the same type of terminal if the risk evaluation value is greater than a preset threshold value, and determining the outlier of the secondary evaluation data of the terminal to be evaluated;
and if the outlier is greater than the designated outlier, generating prompt information and sending the prompt information to a risk auditing terminal.
Optionally, the secondary evaluation data includes at least one type, and the outlier determining module determines the outlier of the secondary evaluation data of the terminal to be evaluated by comparing and analyzing the secondary evaluation data of the terminal to be evaluated with the kernel density distribution of the secondary evaluation data of the terminal of the same type, and is configured to:
respectively determining the probability of each type of secondary evaluation data in the corresponding nuclear density distribution;
and carrying out weighted summation on the determined probability reciprocal to obtain the outlier.
Having described the information security risk prediction method and apparatus in the exemplary embodiments of the present application, a computing device of another exemplary embodiment of the present application is described next.
As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
In some possible implementations, a computing device according to the present application may include at least one processor, and at least one memory. Wherein the memory stores a computer program which, when executed by the processor, causes the processor to perform the steps of the information security risk prediction method according to various exemplary embodiments of the present application described above in the present specification. For example, the processor may perform steps S101-S103 as shown in fig. 1.
The computing device 130 according to this embodiment of the present application is described below with reference to fig. 8. The computing device 130 shown in fig. 7 is only an example and should not bring any limitations to the functionality or scope of use of the embodiments of the present application.
As shown in FIG. 7, computing device 130 is embodied in the form of a general purpose computing apparatus. Components of computing device 130 may include, but are not limited to: the at least one processor 131, the at least one memory 132, and a bus 133 that connects the various system components (including the memory 132 and the processor 131).
Bus 133 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.
The memory 132 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)1321 and/or cache memory 1322, and may further include Read Only Memory (ROM) 1323.
Memory 132 may also include a program/utility 1325 having a set (at least one) of program modules 1324, such program modules 1324 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The computing device 130 may also communicate with one or more external devices 134 (e.g., keyboard, pointing device, etc.), and/or with any device (e.g., router, modem, etc.) that enables the computing device 130 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 135. Also, computing device 130 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via network adapter 136. As shown, network adapter 136 communicates with other modules for computing device 130 over bus 133. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with computing device 130, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
In some possible embodiments, various aspects of the control method of the smart terminal provided in the present application may also be implemented in the form of a program product including a computer program for causing a computer device to execute the steps in the information security risk prediction method according to various exemplary embodiments of the present application described above in this specification when the program product runs on the computer device, for example, the computer device may execute the steps S101-S103 shown in fig. 1.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The program product for control of a smart terminal of an embodiment of the present application may employ a portable compact disc read only memory (CD-ROM) and include a computer program, and may be run on a computing device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A readable signal medium may include a propagated data signal with a readable computer program embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer program embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer programs for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer program may execute entirely on the target object computing device, partly on the target object apparatus, as a stand-alone software package, partly on the target object computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the target object computing device over any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., over the internet using an internet service provider).
It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functions of two or more units described above may be embodied in one unit, according to embodiments of the application. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.
Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having a computer-usable computer program embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (9)

1. An information security risk prediction method, the method comprising:
acquiring at least one type of operation data of a terminal to be evaluated;
analyzing various operation data, and determining risk values respectively corresponding to the various operation data;
determining a risk assessment value of the terminal to be assessed based on a predetermined core density distribution and risk values of various types of operation data, wherein the core density distribution is generated in advance according to multi-user basic information and is used for representing the probability distribution of the risk values based on the user basic information;
wherein, generating the kernel density distribution according to the multi-user basic information comprises:
for each kind of user basic information: counting the number of people corresponding to each designated value section by taking the designated value section in the user basic information value as a reference; and determining the kernel density distribution corresponding to the user basic information by combining a kernel density estimation algorithm according to each designated value interval and the corresponding number of people.
2. The method according to claim 1, wherein each type of user basic information corresponds to a core density distribution, and the determining the risk assessment value of the terminal to be assessed based on the predetermined core density distribution and the risk values of various types of operation data comprises:
executing for each type of operation data:
for each kind of user basic information: determining a risk value interval corresponding to the risk value of the type of operation data in the kernel density distribution of the user basic information; determining a probability value of the risk value in the nuclear density distribution of the user basic information according to the determined risk value interval, and determining the reciprocal of the probability value as a risk combination value obtained by combining the operation data and the user basic information;
carrying out weighted summation on risk combination values of various types of operation data corresponding to the same user basic information, and taking the obtained weighted summation result as a risk value corresponding to the user basic information;
and carrying out weighted summation on the risk values corresponding to the basic information of each user to obtain the risk evaluation value of the terminal to be evaluated.
3. The method of claim 1, wherein analyzing each type of operation data to determine the risk value corresponding to each type of operation data comprises:
if the operation data are network topology data, generating access vectors corresponding to all the levels step by step according to the accessed topology, wherein the value in the access vectors is the access duration of the terminal to be evaluated at the corresponding network node;
and performing difference calculation on vectors of the same network level in the network topology data of different time periods, and performing weighted summation by taking the level number corresponding to each network level as a weight to obtain a risk value of the network topology.
4. The method of claim 2, wherein analyzing the various types of operation data to determine the risk values corresponding to the various types of operation data comprises:
performing, for each type of operational data other than network topology data: inputting the operation data into corresponding risk value prediction models to obtain risk values corresponding to the operation data, wherein each risk value prediction model is obtained by training according to the following method:
obtaining a predicted value of historical operation data of each terminal;
determining a difference value between the predicted value and the true value;
and training a risk value prediction model by adopting a baseline learning method according to the difference value.
5. The method of claim 1, further comprising:
if the risk assessment value is larger than a preset threshold value, comparing and analyzing the secondary assessment data of the terminal to be assessed and the nuclear density distribution of the secondary assessment data of the same type of terminal to determine the degree of outlier of the secondary assessment data of the terminal to be assessed;
and if the outlier is greater than the designated outlier, generating prompt information and sending the prompt information to a risk auditing terminal.
6. The method according to claim 5, wherein the secondary evaluation data includes at least one type, and determining the degree of outlier of the secondary evaluation data of the terminal to be evaluated by comparing and analyzing the secondary evaluation data of the terminal to be evaluated with the kernel density distribution of the secondary evaluation data of the terminal of the same type comprises:
respectively determining the probability of each type of secondary evaluation data in the corresponding nuclear density distribution;
and carrying out weighted summation on the determined probability reciprocal to obtain the outlier.
7. An information security risk prediction apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring at least one type of operation data of the terminal to be evaluated;
the data analysis module is used for analyzing various operation data and determining risk values corresponding to the various operation data respectively;
the determining module is used for determining the risk assessment value of the terminal to be assessed based on a predetermined kernel density distribution and the risk values of various types of operation data, wherein the kernel density distribution is generated in advance according to the basic information of multiple users, and is used for representing the probability distribution of the risk values based on the basic information of the users;
wherein the apparatus further comprises a kernel density distribution generating module, configured to perform, for each user basic information of the multiple users basic information: counting the number of people corresponding to each designated value section by taking the designated value section in the user basic information value as a reference; and determining the kernel density distribution corresponding to the user basic information by combining a kernel density estimation algorithm according to each designated value interval and the corresponding number of people.
8. A computing device, comprising: a memory and a processor;
a memory for storing program instructions;
a processor for calling program instructions stored in said memory to execute the method of any one of claims 1 to 6 in accordance with the obtained program.
9. A computer storage medium storing computer-executable instructions for performing the method of any one of claims 1-6.
CN201910684716.0A 2019-07-26 2019-07-26 Information security risk prediction method and device, computing equipment and storage medium Active CN111181757B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910684716.0A CN111181757B (en) 2019-07-26 2019-07-26 Information security risk prediction method and device, computing equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910684716.0A CN111181757B (en) 2019-07-26 2019-07-26 Information security risk prediction method and device, computing equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111181757A CN111181757A (en) 2020-05-19
CN111181757B true CN111181757B (en) 2021-10-08

Family

ID=70622373

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910684716.0A Active CN111181757B (en) 2019-07-26 2019-07-26 Information security risk prediction method and device, computing equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111181757B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085328B (en) * 2020-08-03 2024-05-24 北京贝壳时代网络科技有限公司 Risk assessment method, system, electronic equipment and storage medium
CN112085590B (en) * 2020-09-02 2023-03-14 支付宝(杭州)信息技术有限公司 Method and device for determining safety of rule model and server
CN112288329A (en) * 2020-11-23 2021-01-29 中国农业银行股份有限公司 Risk estimation method and device for operation behavior record
CN117749530A (en) * 2024-02-19 2024-03-22 瑞达可信安全技术(广州)有限公司 Network information security analysis method and system based on big data

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10009234B2 (en) * 2015-11-19 2018-06-26 International Business Machines Corporation Predictive modeling of risk for services in a computing environment
CN106127576A (en) * 2016-07-01 2016-11-16 武汉泰迪智慧科技有限公司 A kind of bank risk based on user behavior assessment system
CN106992994B (en) * 2017-05-24 2020-07-03 腾讯科技(深圳)有限公司 Automatic monitoring method and system for cloud service

Also Published As

Publication number Publication date
CN111181757A (en) 2020-05-19

Similar Documents

Publication Publication Date Title
CN111181757B (en) Information security risk prediction method and device, computing equipment and storage medium
CN109726763B (en) Information asset identification method, device, equipment and medium
US10977030B2 (en) Predictive code clearance by a cognitive computing system
CN110390408B (en) Transaction object prediction method and device
CN111210335B (en) User risk identification method and device and electronic equipment
CN111145009A (en) Method and device for evaluating risk after user loan and electronic equipment
US10678821B2 (en) Evaluating theses using tree structures
US20200293970A1 (en) Minimizing Compliance Risk Using Machine Learning Techniques
CN111199469A (en) User payment model generation method and device and electronic equipment
CN112015562A (en) Resource allocation method and device based on transfer learning and electronic equipment
CN112508695A (en) Financial risk prediction method and device based on market risk and electronic equipment
CN116542781A (en) Task allocation method, device, computer equipment and storage medium
CN111210332A (en) Method and device for generating post-loan management strategy and electronic equipment
CN111179055A (en) Credit limit adjusting method and device and electronic equipment
CN114548118A (en) Service conversation detection method and system
CN114398465A (en) Exception handling method and device of Internet service platform and computer equipment
CN115936895A (en) Risk assessment method, device and equipment based on artificial intelligence and storage medium
US11922129B2 (en) Causal knowledge identification and extraction
CN111582649B (en) Risk assessment method and device based on user APP single-heat coding and electronic equipment
CN113590310A (en) Resource allocation method and device based on rule touch rate scoring and electronic equipment
CN113568739A (en) User resource limit distribution method and device and electronic equipment
CN113516398A (en) Risk equipment identification method and device based on hierarchical sampling and electronic equipment
CN114091815A (en) Resource request processing method, device and system and electronic equipment
CN113052509A (en) Model evaluation method, model evaluation apparatus, electronic device, and storage medium
CN112950003A (en) User resource quota adjusting method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant