CN116894113A

CN116894113A - Data security classification method and data security management system based on deep learning

Info

Publication number: CN116894113A
Application number: CN202310875777.1A
Authority: CN
Inventors: 吉欣晨; 曹孙佳; 程威威; 谢俊杰; 仇成群
Original assignee: Yancheng Teachers University
Current assignee: Yancheng Teachers University
Priority date: 2023-07-17
Filing date: 2023-07-17
Publication date: 2023-10-17

Abstract

The invention discloses a data security classification method and a data security management system based on deep learning, wherein the method comprises the following steps: s1, acquiring corresponding service data in a service information system through a big data configuration center; s2, preprocessing service data and storing the processed service data; s3, constructing a novel convolution neural network HIDCNN combined model; s4, training the novel convolution neural network HIDCNN combination model to obtain a trained novel convolution neural network HIDCNN combination model; s5, classifying the data according to the trained novel convolution neural network HIDCNN combination model. According to the invention, the acquired original data is converted into the required target information, and after the acquisition is completed, the data is cleaned and converted, so that the safety of the data is improved, and the data loss is avoided.

Description

Data security classification method and data security management system based on deep learning

Technical Field

The invention relates to the technical field of data security classification, in particular to a data security classification method and a data security management system based on deep learning.

Background

Deep learning is a branch of machine learning, and aims to learn characteristics and modes of data by simulating the neural network structure and functions of a human brain, so as to achieve the aim of artificial intelligence. These features are further used for classification, regression, clustering, etc. tasks. Deep learning is a new method for realizing artificial intelligence by constructing a deep neural network to learn the internal law and the expression level of sample data by simulating the working mode of human brain, so that a machine can simulate the activities of human beings such as audio-visual and thinking, and the like, thereby solving a plurality of complex pattern recognition problems and greatly improving the artificial intelligence technology.

Machine learning learns from the data by parsing the data, employing a corresponding algorithmic model, and then makes decisions and predictions about events in the real world. Unlike conventional hard-coded software programs that address specific tasks, machine learning is "training" with a large amount of data from which it is learned by various algorithms how to accomplish the task. As a recent branch of the machine learning field, deep learning itself also uses supervised and unsupervised learning methods to train deep neural networks. In recent years, the development of the field is rapid, and some special learning means are sequentially proposed (such as a convolution network, a residual network, an antagonism network and the like), so that more and more researchers use deep neural networks to solve the feature expression learning of the specific field. The deep neural network comprises a plurality of hidden layers, and the learning tasks such as complex classification and the like can be completed by using a simple model after the initial low-layer characteristic representation is gradually converted into the high-layer characteristic representation through multi-layer processing.

In the field of data management, a data manager is often required to classify data according to application scenes and data contents, but manual classification of the data is not only labor-intensive, and has low efficiency, but also cannot be applied to the scene of classifying massive data with a large number of data categories in real time, and the safety of the data in the classification process is not guaranteed, so that data information is easy to leak, and at this time, in order to solve the problems, a method for improving the data safety classification is needed.

For the problems in the related art, no effective solution has been proposed at present.

Disclosure of Invention

Aiming at the problems in the related art, the invention provides a data security classification method based on deep learning, so as to overcome the technical problems existing in the prior related art.

For this purpose, the invention adopts the following specific technical scheme:

a data security classification method based on deep learning, the method comprising the steps of:

s1, acquiring corresponding service data in a service information system through a big data configuration center;

s2, preprocessing service data and storing the processed service data;

s3, constructing a novel convolution neural network HIDCNN combined model;

s4, training the novel convolution neural network HIDCNN combination model to obtain a trained novel convolution neural network HIDCNN combination model;

s5, classifying the data according to the trained novel convolution neural network HIDCNN combination model.

Further, the preprocessing the service data and storing the processed service data includes the following steps:

s21, integrating the service data and forming a unified data set;

s22, data cleaning is carried out on the data in the data set, and the data in different formats are subjected to unified conversion;

s23, clustering the data in the data set;

s24, verifying whether the processed data is accurate or not, and storing the processed data to a corresponding position to obtain required target data;

s25, backing up the obtained target data.

Further, the clustering processing of the data in the data set includes the following steps:

s231, finding out the similarity between every two data points in the original data to obtain a similarity matrix A;

s232, calculating a matrix D, enabling diagonal elements of the matrix D to be the sum of corresponding column values of a similarity matrix A, enabling the matrix B=D-A, solving a certain eigenvalue and eigenvector of the matrix B, and projecting data points to a K-dimensional space;

s233, clustering the data in the K-dimensional space according to the K-dimensional space coordinates of each data point.

Further, the clustering of the data in the K-dimensional space according to the K-dimensional space coordinates of each data point includes the following steps:

s2331, randomly finding out a plurality of center positions, and classifying each data point to the center nearest to the center;

s2332, the data points are divided into groups of clusters, and the center point of each cluster is found, and the center is transferred to the average position of the data points inside the cluster using a minimization function.

Further, the step of backing up the obtained target data includes the following steps:

s251, copying target data into a backup catalog, and starting a backup mode for a table space to be backed up;

s252, copying the table space and placing the table space in an end backup mode;

s253, executing S251 and S252 on each table space in the database;

s254, the current data sequence number is obtained by executing a command on the svrmgrl, and command forced data switching is executed, so that all data are conveniently archived.

Further, the construction of the novel convolution neural network HIDCNN combination model comprises the following steps:

s31, dividing the model framework into a mixed characteristic input layer and a model main body framework layer;

s32, the mixed characteristic input layer adopts a mode of classifying target vectors and initializing space vectors immediately, and converts data into continuous space vectors as input vectors of a model;

s33, selecting a model type as a text classification model by the model main body framework layer, and introducing an iterative cavity convolutional neural network;

s34, combining the high-speed neural network with the IDCNN to construct a novel convolution neural network HIDCNN combination model;

s35, stacking DCNN network blocks by adopting an iteration method to form an iterative cavity convolutional neural network;

s36, using the Highway network as a connecting layer of the cavity convolutional neural network and the Softmax classifying layer to form a hierarchical classifying model based on the HIDCNN, and simultaneously optimizing the characteristics extracted by the convolutional layer;

s37, connecting the Dropout layer and the Softmax classification layer by the hierarchical classification model of the HIDCNN to form a complete classification model.

Further, the calculation formula of the HIDCNN is as follows:

Y _i ＝H(h _i-1 ，W _H )*T(h _i-1 ，W _T )+h _i-1 *C(h _i-1 ，W _C )

in the formula, h _i Outputting a vector for an i-th layer Highway layer;

h is a nonlinear affine transformation function;

t is a conversion gate;

c is a carrying door;

and T and C are hyperbolic tangent functions, C is 1-T.

Further, the training of the novel convolutional neural network HIDCNN combination model to obtain the trained novel convolutional neural network HIDCNN combination model comprises the following steps:

s41, defining a target of data security classification;

s42, training a novel convolution neural network HIDCNN combination model by using a target;

s44, evaluating and adjusting the trained model.

Further, the defining the data security hierarchical classification target includes the following steps:

s411, intensively taking m samples from the data tag to form a hierarchical classification target sample, and marking a set formed by the rest data samples as N;

s412, solving an optimization problem on the target sample by adopting a QP method to obtain a support vector, and forming a group of classification targets;

s413, using the samples in the classification target test set N, ending if the samples in the N are empty sets, and continuing otherwise;

s414, placing samples which do not meet the optimization condition in the set N into target samples, and simultaneously taking out the same number of samples from the target samples and placing the samples into the set N;

s415, repeating the step S412, and defining a plurality of groups of classification targets.

Further, the training of the novel convolution neural network HIDCNN combination model by using the target comprises the following steps:

s421, taking a classification target composition data set as training data to obtain an initial training model;

s422, evaluating the initial training model to obtain an abnormal data set generated in the evaluation;

s423, grouping the obtained abnormal data sets to obtain a plurality of abnormal data set groups;

s424, determining model training information according to the obtained abnormal data set group;

and S425, continuously adjusting the parameters of the detection model according to the training result until the training accuracy and the loss rate of the detection model are optimal, namely, the detection model is trained.

The beneficial effects of the invention are as follows:

1. the invention combines the original business information system data source as the initial data with the deep learning method, dynamically adjusts and classifies the data according to the need, realizes the automatic classification of the business information system data, can perform classification marking on the data of different business information systems in real time, and improves the working efficiency of a data manager.

2. The invention adjusts the discriminant standard of different levels of data by using the novel convolution neural network HIDCNN combination model, can output different service classification data marked according to different data security requirements of different service information systems, can make the data easier to understand and analyze by clustering the data, reduces the complexity of data processing, and can improve the accuracy of the classification model.

3. The invention converts the collected original data into the required target information, and after the collection is completed, the data is cleaned and converted, so that valuable information can be conveniently extracted, the data is more accurate, complete and consistent, the data can be better utilized and analyzed, the quality and the efficiency of data analysis are improved, the processed data is backed up, the safety of the data is improved, and the data loss is avoided.

4. According to the invention, a novel convolution neural network HIDCNN combined model is used, after model construction is completed, the model is trained by using data, data are verified to adjust model parameters and prevent over fitting, meanwhile, test data are used to evaluate the performance of the model, the model is continuously monitored according to the requirement, the accuracy of classification and the performance of the model are ensured, and repair and update can be carried out when required.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a data security classification method based on deep learning according to an embodiment of the invention.

Detailed Description

For the purpose of further illustrating the various embodiments, the present invention provides the accompanying drawings, which are a part of the disclosure of the present invention, and which are mainly used for illustrating the embodiments and for explaining the principles of the operation of the embodiments in conjunction with the description thereof, and with reference to these matters, it will be apparent to those skilled in the art to which the present invention pertains that other possible embodiments and advantages of the present invention may be practiced.

According to an embodiment of the invention, a data security classification method based on deep learning is provided.

The invention will be further described with reference to the accompanying drawings and detailed description, as shown in fig. 1, a data security classification method based on deep learning according to an embodiment of the invention, the method includes the following steps:

s1, acquiring corresponding service data in a service information system through a big data configuration center.

S2, preprocessing the service data and storing the processed service data.

Specifically, the service data is preprocessed, and the processed service data is stored.

Specifically, the preprocessing the service data and storing the processed service data includes the following steps:

s21, integrating the service data and forming a unified data set;

data consolidation refers to the merging of data in multiple data sets into one data set, typically in order to combine data of different sources or formats together for more comprehensive data analysis or processing.

S22, data cleaning is carried out on the data in the data set, and the data in different formats are subjected to unified conversion.

In particular, data conversion refers to the process of converting one data format or type to another data format or type. In computer science, data conversion typically involves converting data from one programming language, file format, database type, network protocol, etc. to another.

The data conversion may be unidirectional, i.e. converting data from one format to another, or bidirectional, i.e. converting data to each other between two formats. For example, converting a character string into numbers, converting a JSON object into XML format, and the like are all examples of data conversion.

Data conversion is typically accomplished using specialized tools or libraries. Common data transformation tools include programming language-built data type transformation functions, third party libraries such as Pandas and NumPy, ETL tools such as talen and informatics, and the like.

S23, clustering the data in the data set;

in particular, clustering is a process of grouping similar things together and classifying dissimilar things into different categories, which is a very important means in data analysis.

The clustering processing of the data in the data set comprises the following steps:

s232, calculating a matrix D, enabling diagonal elements of the matrix D to be the sum of corresponding column values of a similarity matrix A, enabling a matrix B=D-A, solving a certain eigenvalue and eigenvector of the matrix B, and projecting data points to a K-dimensional space;

in particular, the method comprises the steps of,

D(i，i)＝∑ _j A(i，j)

where i and j are data points.

Specifically, when clustering data in a data set, firstly, similarity between every two N data points is given, that is, a similarity matrix a of n×n, where a (i, j) represents similarity between i and j, and the larger the value is, the more similar the value is, and attention is paid to

A(i，j)＝A(j，i)，A(i，j)＝0。

Further calculating matrix D such that its diagonal elements are the sum of the values of the corresponding column (or row) of matrix A, the remainder being 0, i.e. such that

D(i，i)＝∑ _j A(i，j)

Let b=d-a, the first k eigenvalues and eigenvectors of the B matrix, project the data points into a k-dimensional space, the jth value of the ith eigenvector represents the projection of the jth data point in the k-dimensional space in the ith dimension, that is, if the k eigenvectors are combined into an N x k matrix, each row represents the coordinates of a data point in the k-dimensional space, and the clustering algorithm clusters the data in the k-dimensional space according to the k-dimensional space coordinates of each data point.

Specifically, the clustering of the data in the K-dimensional space according to the K-dimensional space coordinates of each data point includes the following steps:

By clustering the data, the data can be easier to understand and analyze, the complexity of data processing is reduced, and the accuracy of the hierarchical classification model can be improved.

s25, backing up the obtained target data.

The step of backing up the obtained target data comprises the following steps:

s253, executing S251 and S252 on each table space in the database;

The collected original data is converted into the required target information, and after the collection is completed, the data is cleaned and converted, so that valuable information can be conveniently extracted, the data is more accurate, complete and consistent, the data can be better utilized and analyzed, and the quality and efficiency of data analysis are improved. And the processed data is backed up, so that the safety of the data is improved, and the data loss is avoided.

S3, constructing a novel convolution neural network HIDCNN combination model.

Specifically, the construction of the novel convolution neural network HIDCNN combination model comprises the following steps:

The calculation formula of the HIDCNN is as follows:

Y _i ＝H(h _i-1 ，W _H )*T(h _i-1 ，W _T )+h _i-1 *C(h _i-1 ，W _C )

in the formula, h _i Output for the i-th layer Highway layerVector;

h is a nonlinear affine transformation function;

t is a conversion gate;

c is a carrying door;

and T and C are hyperbolic tangent functions, C is 1-T.

And S4, training the novel convolution neural network HIDCNN combination model to obtain the trained novel convolution neural network HIDCNN combination model.

Specifically, the training of the novel convolutional neural network HIDCNN combination model to obtain the trained novel convolutional neural network HIDCNN combination model comprises the following steps:

s41, defining a target of data security classification.

Wherein the defining the targets of the data security hierarchical classification comprises the following steps:

In particular, the primary goal of data security hierarchical classification is to ensure confidentiality, integrity, and availability of data, and to simplify the data management process.

The data can be separately managed by dividing the data into different levels, and higher security protection measures are provided for the data, so that the confidentiality of the data is protected, the integrity of important data can be ensured by classifying the data in a grading manner when the data are classified, the data is prevented from being tampered or damaged, the data management is convenient, and the data can be managed more easily by classifying the data, so that the data management flow is simplified.

S42, training the novel convolution neural network HIDCNN combination model by using the target.

The training of the novel convolution neural network HIDCNN combination model by using the target comprises the following steps:

S43, evaluating and adjusting the trained model.

The performance of the model is estimated by using the test data, the model is adjusted according to the requirement, the model is continuously monitored finally, the accuracy of classification and the performance of the model are ensured, and the model can be repaired and updated when the model is required.

Specifically, the model is monitored by combining the classified structure during use, and the model monitoring means that the classified model is periodically monitored, analyzed and evaluated to ensure good performance and correctness of the classified model in a production environment, and the problem of the model can be timely found and processed by monitoring the model, so that the performance and reliability of the model are improved, and the effectiveness of the classified model in the production environment is ensured.

Monitoring may cover the following aspects:

and (3) data quality monitoring: the quality of the input data is checked, for example, whether the data has a miss, an outlier, a repeated value, etc.

Model performance monitoring: the performance of the model in the production environment, such as the accuracy, precision, recall, and other indicators of the model, is monitored.

And (3) real-time prediction monitoring: the real-time prediction results of the model are monitored to detect whether abnormal behavior or deviation of the model occurs.

Interpretability monitoring: the accuracy and reliability of the prediction result are ensured by monitoring the interpretation capability of the model.

And (3) safety monitoring: the monitoring model is subject to attack or abuse, such as a resistance attack or data leakage, etc.

Self-adaptive monitoring: the model is adaptively monitored and fed back to update the model in time as new data distributions or conceptual drifts occur.

In summary, by means of the above technical scheme of the invention, the invention combines the original service information system data source as the initial data with the deep learning method, dynamically adjusts and classifies the data according to the need, realizes the automatic classification of the service information system data, can perform classification marking on the data of different service information systems in real time, and improves the work efficiency of the data manager. The invention adjusts the discriminant standard of different levels of data by using the novel convolution neural network HIDCNN combination model, can output different service classification data marked according to different data security requirements of different service information systems, can make the data easier to understand and analyze by clustering the data, reduces the complexity of data processing, and can improve the accuracy of the classification model. The invention converts the collected original data into the required target information, and after the collection is completed, the data is cleaned and converted, so that valuable information can be conveniently extracted, the data is more accurate, complete and consistent, the data can be better utilized and analyzed, the quality and the efficiency of data analysis are improved, the processed data is backed up, the safety of the data is improved, and the data loss is avoided. According to the invention, a novel convolution neural network HIDCNN combined model is used, after model construction is completed, the model is trained by using data, data are verified to adjust model parameters and prevent over fitting, meanwhile, test data are used to evaluate the performance of the model, the model is continuously monitored according to the requirement, the accuracy of classification and the performance of the model are ensured, and repair and update can be carried out when required.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. The data security classification method based on deep learning is characterized by comprising the following steps of:

s2, preprocessing service data and storing the processed service data;

s3, constructing a novel convolution neural network HIDCNN combined model;

2. The deep learning-based data security classification method as claimed in claim 1, wherein the preprocessing of the service data and the storage of the processed service data comprises the steps of:

s21, integrating the service data and forming a unified data set;

s23, clustering the data in the data set;

s25, backing up the obtained target data.

3. The data security classification method based on deep learning as claimed in claim 2, wherein the clustering process of the data in the data set comprises the following steps:

4. A data security classification method based on deep learning according to claim 3, wherein said clustering data in K-dimensional space according to K-dimensional space coordinates of each data point comprises the steps of:

5. The deep learning-based data security classification method as claimed in claim 2, wherein the backing up the obtained target data comprises the steps of:

s253, executing S251 and S252 on each table space in the database;

6. The data security classification method based on deep learning as claimed in claim 1, wherein the construction of the novel convolutional neural network HIDCNN combination model comprises the following steps:

7. The data security classification method based on deep learning of claim 1, wherein training the novel convolutional neural network HIDCNN combination model to obtain the trained novel convolutional neural network HIDCNN combination model comprises the following steps:

s41, defining a target of data security classification;

s43, evaluating and adjusting the trained model.

8. The deep learning based data security classification method of claim 7, wherein the defining the data security classification targets comprises the steps of:

9. The deep learning-based data security classification method of claim 8, wherein the training of the new convolutional neural network HIDCNN combination model using the target comprises the steps of: