CN117056951A

CN117056951A - Data security management method for digital platform

Info

Publication number: CN117056951A
Application number: CN202311004874.XA
Authority: CN
Inventors: 郝慧
Original assignee: Shanghai Haoxin Haoyi Intelligent Technology Co ltd
Current assignee: Shanghai Haoxin Haoyi Intelligent Technology Co ltd
Priority date: 2023-08-09
Filing date: 2023-08-09
Publication date: 2023-11-14

Abstract

The application discloses a data security management method of a digital platform, which relates to the technical field of data security.

Description

Data security management method for digital platform

Technical Field

The application relates to the technical field of data security, in particular to a data security management method of a digital platform.

Background

Data security management methods refer to protecting data from the risk of unauthorized access, corruption, leakage, or tampering by employing a range of measures and policies. The goal of data security management is to ensure confidentiality, integrity, and availability of data, and to comply with applicable legal regulations and industry standards.

The data security management method of the digital platform refers to a series of measures and strategies adopted for the data on the digital platform to ensure confidentiality, integrity and usability of the data and prevent the data from being accessed, tampered, leaked or damaged by unauthorized. The digital platform can be various online services, application programs, cloud services and network and mobile applications, and common measures of the data security management method of the digital platform comprise identity authentication and access control, data encryption, data backup and disaster recovery, security audit and monitoring, staff training and consciousness, network security protection, updating and vulnerability repairing, data classification and access authority control and the like, so that data resources of the digital platform can be better protected, data security is ensured, and more reliable and safe services are provided for users. With the continuous development of technology, the data security management method of the digital platform is also continuously optimized and updated.

However, the conventional data security management method uses a rule engine and a signature detection method to identify the known attack mode, so that the identification of the external supply cannot be effectively performed, and meanwhile, the conventional data sharing method may directly share the original data and may cause the leakage of key data, so that a data security management method of a digital platform is needed to solve such problems.

Disclosure of Invention

(one) solving the technical problems

Aiming at the defects of the prior art, the application provides a data security management method of a digital platform, which solves the problems that the prior art can not effectively identify external supply by using a rule engine and a signature detection mode to identify a known attack mode, and meanwhile, the traditional data sharing mode can directly share original data and can cause key data leakage.

(II) technical scheme

In order to achieve the above object, the present application provides a data security management method for a digital platform, which includes:

building a deep learning model, training by using historical data, continuously monitoring network flow and user behaviors, identifying abnormal activities, monitoring the network flow and the user behaviors in real time, automatically detecting and alarming abnormality and potential threat, and providing high-precision intrusion detection;

differential privacy data sharing, which adopts differential privacy technology to encrypt and noise sensitive data;

generating an antagonism network defense, introducing the generated antagonism network, and generating an antagonism sample to test and strengthen the safety of a traditional machine learning model;

federation learning, which adopts a federation learning method to train a model locally for a plurality of data sources, and only shares model parameters instead of original data;

safety reinforcement learning, which adopts a safety reinforcement learning technology to enable a system to interact with the environment, autonomously learn and adjust a defense strategy;

edge intelligence, namely deploying an edge intelligence technology on terminal equipment to realize real-time safety monitoring and processing;

an interpretable AI, which interprets and visualizes the model decision process using an interpretable artificial intelligence model;

automatic bug repair, automatically detecting bugs in the system by using a machine learning technology, and generating a repair strategy in real time.

The application is further arranged to: the specific steps of constructing the depth model and training are as follows:

collecting historical data of network traffic and user behaviors as a training data set, and performing data cleaning, feature extraction and label marking;

in the intrusion detection task, a deep learning algorithm is selected to comprise a convolutional neural network CNN, a recurrent neural network RNN, a long-short-term memory network LSTM and a converter, a model architecture is constructed, the model architecture comprises an input layer, a hidden layer and an output layer, and an activation function, a loss function and an optimization algorithm are set;

dividing the data set into a training set, a verification set and a test set, wherein the training set is used for model training, the verification set is used for adjusting super parameters and avoiding overfitting, and the test set is used for evaluating model performance;

training a deep learning model by using a training set, iterating and optimizing model parameters, minimizing a loss function, and selecting GradientDescent and variants thereof by using a gradient descent method in an optimization algorithm, wherein the gradient descent method comprises Adam and RMSprop;

according to the performance of the verification set, super parameters of the model are adjusted, including learning rate, regularization coefficient, hidden layer node number, performance and generalization capability of the model are optimized;

evaluating the trained deep learning model by using a test set, and calculating performance indexes including accuracy, recall and F1 value;

deploying the trained deep learning model on a digital platform, continuously monitoring network flow and user behavior, inputting data samples into the model in real time for prediction, identifying abnormal activities and triggering corresponding response measures;

the application is further arranged to: the deep learning model is built by a convolutional neural network:

input: x is X

Hidden layer h=f (W x+b)

Output layer y=g (V h+c)

Wherein X is a feature vector of a data sample, f is an activation function 1, g is an activation function 2;

loss function definition:

loss function L (y, y')

Wherein y is an actual tag and y' is a predicted tag;

algorithm optimization and parameter updating rule:

wherein alpha is learning rate and is gradient vector;

the application is further arranged to: the step of sharing the differential privacy data specifically comprises the following steps:

adding noise to the preprocessed data, and selecting Laplacian noise for noise adding processing, wherein a specific noise adding formula is as follows:

wherein,noise representing the Laplace distribution, epsilon is the privacy budget and sensitivity is the sensitivity;

setting privacy budget epsilon of differential privacy, and sharing the encrypted and noisy data to authorized data users;

when the inquiry of the data user is received, decrypting and processing the encrypted and noisy data, and then returning a response result;

performing privacy protection analysis, evaluating the effect of the differential privacy technology, and ensuring that the shared data meets the privacy protection requirement;

the application is further arranged to: the privacy preserving analysis step further includes:

for shared sensitive data, calculating the sensitivity thereof;

determining the size of noise according to the setting of privacy budget epsilon;

the mathematical definition of differential privacy is used to evaluate the privacy preserving effect of shared data, specifically:

for any adjacent data sets D and D', and any query Q, the following conditions are satisfied for all possible query results S:

Pr[Q(D)∈S]<＝exp(epsilon)*Pr[Q(D’)∈S]

where epsilon represents the privacy budget, Q (D) represents the result of querying D on dataset D, exp (epsilon) represents the exponentiation of the privacy budget;

evaluating privacy preserving effects of the shared data using the differential privacy distortion;

after privacy protection processing, performance evaluation is carried out on the shared data, wherein the performance evaluation comprises model accuracy, data availability and query response time;

according to the result of privacy protection analysis, adjusting parameters in the differential privacy technology;

the application is further arranged to: the step of introducing the generated challenge network to test and strengthen the security of the traditional machine learning model specifically comprises the following steps:

adopting a generator network and a discriminator network, and preparing a data set for training GAN, wherein the data set comprises real data and noise data; training the GAN using the real data and the noise data, the generator network attempting to generate samples that approximate the real data, and the arbiter network attempting to distinguish the real data from the data generated by the generator;

generating an antagonism sample using the trained generator network;

testing a traditional machine learning model by using the generated resistance sample, inputting the resistance sample into the traditional model as input, and observing an output result of the model;

according to the test result of the traditional model, the improvement of the U-shaped face pinching and defending method of the resistance is selectively carried out:

resistance training: mixing the generated resistance sample with the original training data, and retraining the traditional model;

the application is further arranged to: the local training steps by adopting the federal learning method specifically comprise:

respectively collecting a plurality of data sources which need to participate in federal learning;

randomly initializing parameters of a federal learning model before federal learning is started;

in each federal learning iteration, the data source sequence is specifically:

each data source locally trains a model using local data;

after the local training is finished, each data source uploads model parameters obtained by the local training to a central server;

the central server aggregates the collected model parameters, and sends the aggregated model parameters back to each data source by the central server to update the respective local model parameters;

repeating the federal learning iteration until the model converges;

the parameter aggregation formula in the federal learning process is:

where ω_avg is the average of the parameters, N is the number of data sources, ω_i is the local model parameter of the ith data source;

the application is further arranged to: the safety reinforcement learning step specifically includes:

in the safety reinforcement learning, the system is interacted with the environment, and the specific steps of autonomous learning and adjusting the defense strategy are as follows:

modeling a system environment, including abstracting the system operating environment, a network structure and an attacker behavior into a mathematical model;

defining a reward function for evaluating the performance of the system in different states;

adopting a Q-learning algorithm to connect the built reinforcement learning model with a defending part of the system, so that the system can interact with the environment;

learning and optimizing according to the reinforcement learning algorithm, and continuously interacting with the environment and learning;

the updating rule formula of the reinforcement learning algorithm in reinforcement learning is as follows:

Q(s,a)＝Q(s,a)+a*(r+γ*max(Q(s’,a’))-Q(s,a))

where Q (s, a) represents the expected return for performing action a in state s, a is the learning rate, r is the reward obtained after performing action a in state s, γ is the discount factor, s ' is the new state after performing action a, and a ' is the optimal action selected in the new state s '.

The application also provides a terminal device, which comprises: the control program of the data security management method of the digital platform is executed by the processor to realize the data security management method of the digital platform;

the application also provides a storage medium which is applied to a computer, wherein the storage medium is stored with a control program of the data security management method of the digital platform, and the control program of the data security management method of the digital platform realizes the data security management method of the digital platform when being executed by the processor.

(III) beneficial effects

The application provides a data security management method of a digital platform. The beneficial effects are as follows:

the data security management method of the digital platform provided by the application uses a deep learning model to perform intrusion detection, gathers historical data of network traffic and user behaviors as a training data set, builds a model framework in an intrusion detection task, comprises an input layer, a hidden layer and an output layer, sets an activation function, a loss function and an optimization algorithm, divides the data set into a training set, a verification set and a test set for training, super-parameter adjustment and evaluation of the model, uses the training set to train the deep learning model, optimizes model parameters through iteration, minimizes the loss function, optimizes by adopting a gradient descent method, and adjusts super-parameters of the model including learning rate, regularization coefficient and hidden layer node number according to the performance of the verification set so as to improve the performance and generalization capability of the model.

Aiming at real-time invasion, a trained deep learning model is deployed on a digital platform, network flow and user behaviors are continuously monitored, data samples are input into the model in real time for prediction, abnormal activities are identified, and corresponding response measures are triggered.

Aiming at privacy data, the differential privacy technology is utilized to encrypt and noise sensitive data, data privacy is protected, meanwhile, an authorized data user is allowed to obtain limited irreversible insight, and the encrypted and noisy data is shared to the authorized data user according to privacy budget.

The method comprises the steps of generating an antagonism network, generating an antagonism sample to test and strengthen the safety of a traditional machine learning model, enhancing the safety of the model by antagonism training and improving a defense strategy, performing model training on local equipment by using a federal learning method, sharing only model parameters instead of original data to reduce the risk of data leakage, improving the safety of the data, and simultaneously adopting a safety enhancement learning technology to enable a system to interact with the environment, autonomously learn and adjust the defense strategy so as to adapt to the constantly-changing safety threat.

The method solves the problems that the known attack mode is identified by using a rule engine and a signature detection mode in the prior art, the external supply cannot be effectively identified, and meanwhile, original data can be directly shared by a traditional data sharing mode, so that key data leakage can be caused.

Drawings

Fig. 1 is a flowchart of a data security management method of a digital platform according to the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Examples

Referring to fig. 1, the present application provides a data security management method for a digital platform, which includes the following steps:

s1, constructing a deep learning model, training by using historical data, continuously monitoring network flow and user behaviors, identifying abnormal activities, monitoring the network flow and the user behaviors in real time, automatically detecting and alarming abnormality and potential threat, and providing high-precision intrusion detection;

the specific steps of constructing the depth model and training are as follows:

the specific implementation process is as follows:

the deep learning model is built by a convolutional neural network:

input: x is X

Hidden layer h=f (W x+b)

Output layer y=g (V h+c)

loss function definition:

loss function L (y, y')

Wherein y is an actual tag and y' is a predicted tag;

algorithm optimization and parameter updating rule:

wherein alpha is learning rate and is gradient vector;

s2, differential privacy data sharing, namely encrypting and denoising sensitive data by adopting a differential privacy technology;

the step of sharing the differential privacy data specifically comprises the following steps:

the privacy protection analysis step specifically includes:

for shared sensitive data, calculating the sensitivity thereof;

Pr[Q(D)∈S]<＝exp(epsilon)*Pr[Q(D’)∈S]

according to the result of privacy protection analysis, adjusting parameters in the differential privacy technology; privacy budget epsilon and noise size to balance privacy protection and data accuracy;

after differential privacy processing is carried out on the shared data, the shared data can be protected while the privacy is still available and effective, and the shared data can achieve a better privacy protection effect under the protection of a differential privacy technology by setting privacy budget and noise size and carrying out performance evaluation and optimization parameters;

s3, generating countermeasure network defense, introducing a generated countermeasure network, and generating a countermeasure sample to test and strengthen the safety of the traditional machine learning model; the method has the advantages that the resistance attack is effectively detected and defended, and the robustness and safety of the model are improved;

the step of introducing the generation of the challenge network to test and strengthen the security of the traditional machine learning model specifically comprises:

generating an antagonism sample using the trained generator network; the resistance sample is a sample obtained after the original input sample is subjected to micro disturbance;

testing a traditional machine learning model by using the generated resistance sample, inputting the resistance sample into the traditional model as input, and observing an output result of the model; if the model performs poorly on the resistance sample, it may be shown that the model is less robust to resistance attacks;

through repeated iterative training, the generator network gradually learns to generate samples close to real data, and the arbiter network gradually improves the capability of distinguishing the real data from the generated data, and through generating an opposite sample by utilizing the generated opposite network, the robustness of the traditional machine learning model is evaluated and enhanced, and the performance of the model is improved when the model faces unknown attacks and opposite samples;

s4, federation learning, namely locally training a model for a plurality of data sources by adopting a federation learning method, wherein only model parameters are shared instead of original data;

model training is carried out under the condition of not centralizing data, so that the risk of data leakage is reduced, and better model generalization capability is realized;

the local training steps by adopting the federal learning method specifically comprise:

respectively collecting a plurality of data sources which need to participate in federal learning; each data source locally carries out pretreatment, feature extraction and label marking on own data, so that the consistency and usability of the data are ensured;

in each federal learning iteration, the data source sequence is specifically:

each data source locally trains a model using local data;

repeating the federal learning iteration until the model converges;

the parameter aggregation formula in the federal learning process is:

federal learning allows multiple data sources to train a model locally, sharing only model parameters and not original data, thereby protecting user privacy and data security, through federal learning, different data sources can train a global model together without concentrating the data set in one place;

s5, safety reinforcement learning, namely enabling the system to interact with the environment, and autonomously learning and adjusting a defense strategy by adopting a safety reinforcement learning technology; the adaptability of the system to constantly-changing security threats is improved, and the security defense effect is enhanced;

Q(s,a)＝Q(s,a)+a*(r+γ*max(Q(s’,a’))-Q(s,a))

wherein Q (s, a) represents the expected return of executing action a in state s, a is the learning rate, r is the reward obtained after executing action a in state s, gamma is the discount factor, s ' is the new state after executing action a, a ' is the optimal action selected in the new state s ', and the updating rule can enable the system to continuously optimize the Q value according to the feedback of the environment, so that the optimal defense strategy is found; the security reinforcement learning is used for autonomously learning and adjusting the defense strategy in the process of constantly interacting with the environment and learning, so that the adaptability of the system to security threats is improved, and the security of the system is enhanced;

s6, edge intelligence, namely deploying an edge intelligence technology on the terminal equipment to realize real-time safety monitoring and processing;

the dependence on a central server is reduced, and the data security and the instant response are enhanced;

the specific steps of deploying the edge intelligent technology on the terminal equipment comprise:

the method comprises the steps of adopting a lightweight edge intelligent technology of a deep learning model, collecting data required by safety monitoring on terminal equipment, including sensor data and log data, and transmitting the collected data to an edge node;

a lightweight intelligent algorithm of a deep learning model is deployed on the edge node to detect and analyze the safety event, wherein the lightweight intelligent algorithm comprises real-time data processing and a safety monitoring model;

real-time security monitoring and processing are carried out on the edge nodes, collected data are analyzed and processed, possible security threats are detected, and corresponding response measures are triggered;

when the security threat is detected, the edge node triggers a security response mechanism, sends an alarm and blocks attack flow;

s7, an interpretable AI, wherein an interpretable artificial intelligent model is used for interpreting and visualizing a model decision process; the security team is helped to understand the behavior of the model, and abnormal conditions and security events are rapidly identified;

the method for interpreting and visualizing the model decision process by using the interpretable artificial intelligence model comprises the following specific steps:

training an interpretability model by using the preprocessed data;

interpreting the decision process of the interpretable model through feature importance analysis, local interpretation and global interpretation;

in the local interpretation, a LIME method is used for constructing a local linear model to interpret the prediction result of the model on a specific sample;

in global interpretation, calculating contribution of features to model prediction results by adopting a SHAP method;

visualizing the interpreted result;

s8, automatically repairing the loopholes, automatically detecting the loopholes in the system by using a machine learning technology, generating a repairing strategy in real time, improving the safety of the system, automatically repairing the loopholes and reducing the safety risk;

adopting a logistic regression model, automatically detecting the loopholes in the system by using a machine learning technology, and generating a repairing strategy in real time, wherein the specific steps are as follows:

dividing the data set into a training set and a testing set for training and evaluating the model;

training the selected machine learning model using the training set;

evaluating the trained model by using a test set, and evaluating the accuracy and performance of the model;

when the system operates, detecting the loopholes in the system in real time, predicting the data collected in real time by using a trained machine learning model, judging whether the loopholes exist or not, and generating corresponding repairing strategies according to the prediction result of the model and the characteristics of the loopholes if the loopholes are detected;

applying the generated repairing strategy to the system to repair the detected loopholes;

the predictive formula of the machine learning model of logistic regression is:

where y is the predicted output of the model and z is the linear combination of the input data.

In the present application, the above is combined with the above matters:

In the description of the embodiments of the present application, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the application, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A method for data security management of a digital platform, the method comprising:

2. The method for data security management of a digital platform according to claim 1, wherein the specific steps of constructing a depth model and training are as follows:

and deploying the trained deep learning model on a digital platform, continuously monitoring network flow and user behaviors, inputting data samples into the model in real time for prediction, identifying abnormal activities and triggering corresponding response measures.

3. The method for data security management of a digital platform according to claim 2, wherein the deep learning model is built by convolutional neural network:

input: x is X

Hidden layer h=f (W x+b)

Output layer y=g (V h+c)

loss function definition:

loss function L (y, y')

Wherein y is an actual tag and y' is a predicted tag;

algorithm optimization and parameter updating rule:

where α is the learning rate and is the gradient vector.

4. The method for data security management of a digital platform according to claim 1, wherein the step of sharing the differential privacy data specifically comprises:

and carrying out privacy protection analysis, evaluating the effect of the differential privacy technology, and ensuring that the shared data meets the privacy protection requirement.

5. The method for data security management of a digital platform according to claim 1, wherein the privacy preserving analyzing step further comprises:

for shared sensitive data, calculating the sensitivity thereof;

Pr[Q(D)∈S]<＝exp(epsilon)*Pr[Q(D’)∈S]

and adjusting parameters in the differential privacy technology according to the result of the privacy protection analysis.

6. The method for data security management of a digital platform according to claim 1, wherein the step of introducing a generated challenge network to test and strengthen the security of a traditional machine learning model specifically comprises:

generating an antagonism sample using the trained generator network;

resistance training: the generated challenge samples are mixed with the original training data to retrain the traditional model.

7. The method for data security management of a digital platform according to claim 1, wherein the step of locally training using a federal learning method specifically comprises:

in each federal learning iteration, the data source sequence is specifically:

each data source locally trains a model using local data;

repeating the federal learning iteration until the model converges;

the parameter aggregation formula in the federal learning process is:

where ω_avg is the average of the parameters, N is the number of data sources, and ω_i is the local model parameter of the ith data source.

8. The method for data security management of a digital platform according to claim 1, wherein the security reinforcement learning step specifically comprises:

Q(s,a)＝Q(s,a)+a*(r+γ*max(Q(s’,a’))-Q(s,a))

9. A terminal device, characterized in that the device comprises: a memory, a processor, and a control program for a data security management method of a digital platform stored on the memory and executable on the processor, the control program for the data security management method of the digital platform implementing the data security management method of the digital platform according to any one of claims 1 to 8 when executed by the processor.

10. A storage medium, characterized in that the medium is applied to a computer, the storage medium storing thereon a control program of a data security management method of a digital platform, the control program of the data security management method of the digital platform implementing the data security management method of the digital platform according to any one of claims 1 to 8 when executed by the processor.