CN115174151B

CN115174151B - Security policy autonomous forming method based on cloud edge architecture

Info

Publication number: CN115174151B
Application number: CN202210642750.3A
Authority: CN
Inventors: 高维; 王明月; 李方伟
Original assignee: Chongqing Yitong College
Current assignee: Chongqing Yitong College
Priority date: 2022-06-08
Filing date: 2022-06-08
Publication date: 2023-06-16
Anticipated expiration: 2042-06-08
Also published as: CN115174151A

Abstract

The invention discloses a security policy autonomous forming method based on cloud side architecture, belonging to the big data security field, comprising the following steps: an edge security policy forming mechanism, a cloud security policy forming mechanism and a cloud edge cooperative security policy forming mechanism. The edge security policy forming mechanism comprises situation data enhancement and optimization training model parameters, and can rapidly generate a security policy training model at each edge; the cloud security policy forming mechanism trains a decision model suitable for ubiquitous network space by means of strong computing and storage capacity of the cloud, and comprises security situation information processing and security policy forming; the cloud edge cooperative security policy forming mechanism comprises the steps of obtaining cloud training model parameters and an edge end security policy differentiation generation model, so that training time of the edge end model can be effectively reduced, and generalization capability of the model can be improved.

Description

Security policy autonomous forming method based on cloud edge architecture

Technical Field

The invention belongs to the field of big data security, and particularly relates to an autonomous forming method of a security policy under a cloud edge architecture, which can be used for ubiquitous network space and can be used for solving the network security problem under a new situation.

Background

With the rapid development of information technology, networks have penetrated into aspects of human life, providing ubiquitous information and services for individuals and society, and simultaneously bringing frequent network security problems. In terms of the current network defense means, the method mainly comprises the following two aspects, in a software development stage, various specifications are introduced, and strict authentication and control are performed on the authority of a user to access network resources, so that the security of a system is enhanced, for example: user identity authentication, password encryption, access file authority control, firewall and other technologies, however, the static defensive mode has not satisfied a changed complex network environment; on the other hand, the network security matters are responded through the artificial defense means, such as security tests, penetration tests, security event analysis and the like, but the timeliness of the means is low, the speed of an attacker is far away, and meanwhile, the decision result made by security management staff is limited to a certain extent in the face of complex heterogeneous network environments and security forms.

At present, a great deal of research is made on predicting future development and evolution trend of network security elements by using situation awareness. The security situation data obtained by situation awareness is further utilized, so that a decision can be quickly and intelligently made, and the security situation data can be an important means of network security defense. Meanwhile, in the process of policy formation, the security risk level of the handling, the timeliness of policy generation, the global property of policy formation and the resource cost required by the training of a policy generation model should be fully considered. Based on the analysis, when a security policy is formed, for quick response to security threats, situation data is analyzed in real time at an edge end, so that a quick decision is made, namely, the edge policy is generated; carrying out association analysis on large-scale security situation data in the dimensions of time, space, semantics and the like by utilizing the analysis, processing and storage capacity of the cloud center on the big data, so as to form an intelligent security policy which is suitable for aiming at the security risk of the whole ubiquitous network, namely a cloud policy forming mechanism; meanwhile, in order to further reduce the response time of high-risk-level security threats, avoid the process of zero learning of the edge end, enhance the capability of the edge end to quickly form a security policy, and migrate a policy forming model obtained by training on the cloud to the edge end, namely a cloud-edge cooperative security policy forming mechanism.

CN110460608A, a situation awareness method and system including association analysis, collecting data of different information sources, preprocessing to obtain a data stream with a uniform format, extracting high-frequency item group elements from the data stream, generating high-frequency association rules, sending into situation assessment to carry out assessment quantification, merging with different assessment systems and fuzzy processing the data elements to obtain situation values of single equipment and local networks, combining architecture components of the whole network to obtain situation values of the whole system, importing situation values of different layers into a neural network model to predict, finally visually displaying a prediction result, fully assessing the whole system and each single equipment, establishing association between each equipment and each layer, carrying out rule detection on different rules, and calculating risk values, thereby being capable of scientifically predicting future systems and providing valuable reference suggestions for users.

The above patent is a situation awareness method and system based on association analysis, that is, predicts the security state in the network space and obtains the visual prediction result, but does not provide an active response capability to the network risk. In order to cope with the continuous upgrading of security threat in the network space, the invention autonomously forms a security policy based on artificial intelligence technology under a cloud side architecture by utilizing situation data obtained by situation awareness, thereby providing policy analysis for responding to the risk of the network space.

Disclosure of Invention

The present invention is directed to solving the above problems of the prior art. A security policy autonomous forming method based on a cloud side architecture is provided. The technical scheme of the invention is as follows:

a security policy autonomous forming method based on cloud side architecture comprises the following steps:

step one: a small sample learning method is adopted, and a security policy training model is rapidly generated at each edge end;

step two: reasoning and learning large-scale and multi-source situation data at the cloud end, and training a decision model suitable for ubiquitous network space; the method comprises the steps of safety situation information processing and an on-cloud decision training model based on reinforcement learning;

step three: and transferring the model obtained by cloud training to each edge end, and performing differential training learning of the model.

Further, the first step: the method for learning by adopting the small sample is adopted to quickly generate a security policy training model at each edge end, and specifically comprises the following steps:

situation data expansion: performing data expansion on a training data set by adopting weak tag data conversion and training data synthesis; optimizing training model parameters: in order to reduce the data set required by the training process, the trained model is utilized, and a method of metric learning and meta learning is adopted to train new model parameters.

Further, the data expansion of the training data set by adopting weak tag data conversion and training data synthesis specifically includes:

weak tag data conversion: according to priori knowledge of the existing training data, carrying out feature analysis and feature extraction on the data, and designing a training data generator with good generalization performance; training data synthesis: sample data very similar to the training set is generated from the generation of the challenge network.

Further, the step of optimizing the training model parameters specifically includes:

metric learning: the data set is divided into a plurality of tasks for training, and each task randomly samples a C-way K-shot sample from the training set. When training is carried out, a plurality of tasks are sequentially input into the embedding module, then similarity scores are given according to the measuring module, and finally classification results are output; meta learning: the meta-knowledge is learned in the trained model task, and the prior knowledge is utilized to guide the new model to quickly learn under the small sample training set.

Further, the situation information processing step in the second step specifically includes:

the method comprises the steps of learning internal rules of situation data in a supervised and unsupervised mode, processing and abstracting the situation data through multiple layers of nonlinear information data, extracting data features layer by layer between an input layer and an output layer, converting feature spaces of the data, and automatically learning hierarchical feature representations of the situation data on cloud; and carrying out uncertainty analysis on situation data by adopting a D-S evidence theory, so as to effectively fuse multi-source heterogeneous situation data.

Further, the training of the decision model on the cloud by reinforcement learning specifically includes:

and using the processed safety situation information data as a data set for training the reinforcement learning model. According to the logic security target, the intelligent agent is continuously tested for errors, and a 'rewarding/punishing' mechanism is implemented, so that a security policy model applicable to the ubiquitous network space is obtained.

Further, the third step: migrating the model obtained by cloud training to an edge end, specifically comprising:

acquiring model training weights on the cloud as initial weights of training parameters of all edge models; and (3) fine tuning the model parameters according to the situation data of the current edge, training new model parameters, and performing differential training learning of the model.

The invention has the advantages and beneficial effects as follows:

1. the situation information formed by the situation awareness result is used as the input of intelligent decision, so that the effectiveness of the decision is improved;

2. the security policy is automatically formed by utilizing big data and artificial intelligence technology, so that the defect of the current network security risk treatment is effectively overcome;

3. the computing power and the storage capacity of the cloud end and the edge end are fully utilized, and an intelligent decision model with multi-role attribute is formed according to situation information of different scales and different coverage areas;

4. and migrating the decision training model based on the whole network space obtained by cloud training to the edge and combining situation information of the edge network at the moment to perform differential training learning, thereby effectively avoiding a zero learning process of the edge and rapidly making decisions.

In order to improve the capacity of the ubiquitous network space for coping with security risks, situation data obtained by situation awareness are learned and inferred, and a security policy capable of coping with security risks is obtained. And (3) generating the security policy under the models of the side, cloud and cloud-edge collaboration respectively in consideration of timeliness and global to be considered in the policy formation process and resource cost required by training the policy generation model. In the process of generating the edge safety strategy, based on model training of small sample data, a situation data training set is expanded by using a weak tag data conversion and training data synthesis mode, and the requirement on the training data set in the model training process is reduced by using a measurement learning and meta learning mode; in the cloud security policy forming process, in order to further reduce uncertainty of situation data and improve reliability of decision, firstly, carrying out association, fusion and uncertainty analysis on the situation data, then continuously testing errors of an intelligent agent according to a logic security target, and implementing a 'rewarding/punishing' mechanism to obtain a security policy model applicable to ubiquitous network space; due to the flexibility of access of the terminal network equipment, the terminal entity of the network is changed in different time periods, so that situation information is changed; meanwhile, the network attack behavior has mobility, and can act on different networks in different time periods, and the security policy formed on the cloud forms a model which is generated based on situation information training of the whole ubiquitous network and has certain commonality. In order to enable the edge end to quickly form a security policy and reduce the learning cost of the edge end, a training model obtained on the cloud is transferred to the edge end, and then differential training learning is performed by combining situation information of the edge network, namely, the cloud end and the edge end are cooperated to generate the security policy. The invention fully utilizes the characteristics of situation data, artificial intelligence and cloud edge architecture, and provides a guarantee for the network risk of the ubiquitous network space for continuous upgrading.

Drawings

FIG. 1 is an edge policy modeling model in accordance with a preferred embodiment of the present invention;

FIG. 2 is a cloud policy formation model;

fig. 3 is a cloud-edge collaborative security policy modeling.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and specifically described below with reference to the drawings in the embodiments of the present invention. The described embodiments are only a few embodiments of the present invention.

The technical scheme for solving the technical problems is as follows:

1. the edge security policy formation mechanism, as shown in fig. 1, includes situational data enhancement and optimization training model parameters.

1-1: the situation data enhancement mainly comprises the following steps: weak tag data conversion: according to priori knowledge of the existing training data, carrying out feature analysis and feature extraction on the data, and designing a training data generator with good generalization performance; training data synthesis: generating security situation data similar to a real sample by using a generator and a discriminator for generating an countermeasure network, firstly, learning data distribution characteristics of the existing sample by the generator and generating data according to the distribution characteristics; then the discriminator judges whether the data is real data or data generated by the generator, and finally generates sample data very similar to the training set;

1-2: optimizing training model parameters and measuring and learning: the data set is divided into a plurality of tasks for training, each task randomly samples C-way K-shot samples from the training set, namely C categories are selected, each category contains K samples, and the plurality of tasks are constructed through multiple sampling. When training is carried out, a plurality of tasks are sequentially input into the embedding module, then similarity scores are given according to the measuring module, and finally classification results are output; and (3) element learning, namely acquiring initial parameters of the neural network and the structure of the neural network in a trained model task, and reducing the dependence on parameter sets in the training process of a small sample.

2. The cloud security policy forming mechanism, as shown in fig. 2, comprises security situation information processing and security policy forming.

2-1: security situation information processing: the method comprises the steps of learning internal rules of situation data in a supervised and unsupervised mode, processing and abstracting the situation data through multiple layers of nonlinear information data, extracting data features layer by layer between an input layer and an output layer, converting feature spaces of the data, and automatically learning hierarchical feature representations of the situation data on cloud; and carrying out uncertainty analysis on situation data by adopting a D-S evidence theory, so as to effectively fuse multi-source heterogeneous situation data.

2-2: the security policy is formed: and using the processed safety situation information data as a data set for training the reinforcement learning model. Virtual ubiquitous network space environment is adopted, so that an intelligent agent can continuously try out errors, a prize/penalty mechanism is implemented according to a logic safety target, and a safety strategy model applicable to ubiquitous network space is trained.

3. The cloud-edge cooperative security policy forming mechanism, as shown in fig. 3, includes obtaining cloud training model parameters and an edge security policy differentiation generation model. The security policy generation model trained on the cloud is generated based on situation information training of the whole network, and has certain commonality. In order to enable each edge to form a safety strategy rapidly and reduce the learning cost of the edge, a training model obtained on the cloud is transferred to the edge, and then model parameters are finely adjusted by combining situation information of an edge network. The process that the edge end needs to learn from zero is avoided, so that the response time of the security policy can be accelerated, and the method specifically comprises the following steps:

step one: migrating the cloud trained decision model parameters to the edge end, and taking the cloud trained decision model parameters as initial weights of the edge model training parameters;

step two: the initial weights of a convolution layer and a full connection layer of an edge end training model are reserved, and a random assignment mode is adopted in the last layer

Step three: according to the situation data of the current edge end, training new model parameters at each edge end, and performing differential training learning of the model.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The above examples should be understood as illustrative only and not limiting the scope of the invention. Various changes and modifications to the present invention may be made by one skilled in the art after reading the teachings herein, and such equivalent changes and modifications are intended to fall within the scope of the invention as defined in the appended claims.

Claims

1. The method for autonomously forming the security policy based on the cloud side architecture is characterized by comprising the following steps of:

step three: transferring the model obtained by cloud training to each edge end, and performing differential training learning of the model; the first step is as follows: the method for learning by adopting the small sample is adopted to quickly generate a security policy training model at each edge end, and specifically comprises the following steps:

situation data expansion: the training data set data is expanded by adopting weak tag data conversion and training data synthesis;

optimizing training model parameters: optimizing the training model at multiple angles and multiple layers, reducing a data set required by the training process, and training model parameters by adopting a measurement learning and meta learning method;

the data conversion and training data synthesis by adopting the weak labels specifically comprises the following steps:

weak tag data conversion: according to priori knowledge of the existing training data, carrying out feature analysis and feature extraction on the data, and designing a training data generator; training data synthesis: generating sample data very similar to the training set according to the generated countermeasure network;

the optimization training model parameters specifically comprise:

metric learning: calculating the similarity between samples in a measuring space by using a Euclidean distance measuring mode, and classifying; meta learning: learning element knowledge in a trained model task, and guiding a new model to quickly learn under a small sample training set by using the prior knowledge;

the situation information processing in the second step specifically includes:

learning the internal law of situation data in a supervised and unsupervised mode, processing and abstracting the situation data through multiple layers of nonlinear information data, acquiring value information implied by the situation information from a plurality of scattered and heterogeneous data sources, and improving the reliability of decision making; carrying out uncertainty analysis on situation information by using an uncertainty theory analysis method;

the uncertainty analysis is carried out on situation information by using an uncertainty theory analysis method, and the method specifically comprises the following steps:

by utilizing Dempster-Shafer (DS) evidence theory, under the condition of no situation information prior probability, the uncertainty of the situation information is flexibly and effectively modeled and analyzed;

the training of the decision model on the cloud by using reinforcement learning specifically comprises the following steps:

the processed safety situation information data is used as a data set for training the reinforcement learning model; according to the logic security target, continuously testing the intelligent agent for errors, and implementing a prize/penalty mechanism to obtain a security policy model applicable to the ubiquitous network space;

and step three: migrating the model obtained by cloud training to an edge end, specifically comprising:

acquiring model training weights on the cloud as initial weights of training parameters of all edge models; and (3) carrying out fine adjustment on model parameters according to situation data of the current edge, namely carrying out differential training learning on the model.