CN114511019A - Sensitive data classification and grading identification method and system - Google Patents

Sensitive data classification and grading identification method and system Download PDF

Info

Publication number
CN114511019A
CN114511019A CN202210087453.7A CN202210087453A CN114511019A CN 114511019 A CN114511019 A CN 114511019A CN 202210087453 A CN202210087453 A CN 202210087453A CN 114511019 A CN114511019 A CN 114511019A
Authority
CN
China
Prior art keywords
sensitive data
data classification
classification
model
classification model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210087453.7A
Other languages
Chinese (zh)
Inventor
管小娟
马媛媛
周诚
李伟伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Global Energy Interconnection Research Institute
Original Assignee
State Grid Corp of China SGCC
Global Energy Interconnection Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Global Energy Interconnection Research Institute filed Critical State Grid Corp of China SGCC
Priority to CN202210087453.7A priority Critical patent/CN114511019A/en
Publication of CN114511019A publication Critical patent/CN114511019A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Evolutionary Biology (AREA)
  • Human Resources & Organizations (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Bioethics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a classification and grading identification method and a classification and grading identification system for sensitive data, wherein the method comprises the following steps: acquiring sensitive data characteristics in each service system of the internal and external networks of the power information; constructing a sensitive data classification model; inputting the sensitive data characteristics into a sensitive data classification model to obtain a sensitive data classification result; constructing a sensitive data grading model; inputting the classification result of the sensitive data, the characteristics of the sensitive data and the safety level of the preset data into a sensitive data classification model to obtain a sensitive data classification result; and grading and identifying the sensitive data according to the sensitive data classification result, the sensitive data characteristics and the sensitive data grading result. The classification of the power structural sensitive data and the dynamic adjustment of a classification model, parameters and the like are realized through an improved neural network algorithm, the classification and classification of the power structural sensitive data can be realized in a short time, and the problems that a training network model fails, the requirement on training samples is high, the convergence speed during training is low and the like possibly occurring in the traditional machine learning algorithm are solved.

Description

Sensitive data classification and grading identification method and system
Technical Field
The invention relates to the technical field of data security management, in particular to a sensitive data classification and classification identification method and system.
Background
The identification of sensitive data is the most fundamental problem of data security management, and the methods for identifying the sensitive data are many, but all have respective advantages and disadvantages. The current sensitive data identification method is mainly divided into three categories: the method comprises the steps that firstly, a manual identification method is adopted, subjective consciousness of a data analyst is utilized to judge whether sensitive data exist or not, the identification efficiency is low, when a large amount of data are faced, the period of manual carding speed is long relative to the machine identification speed, and the requirement on professional quality of processing personnel is high; and the judgment standards are not uniform, because the sensitive data identification process mainly depends on subjective judgment of people, different people may have different judgment standards on the same data, and even the results identified by the same person at different times are still different, the difference of the sensitive data identification results can be caused. The dictionary matching method is mainly used for identifying the sensitive data in a mode of patterned matching, and has low identification precision, and sensitive data is easily found improperly under the condition that a data dictionary is incomplete or is established incorrectly. And thirdly, the intelligent learning method realizes the identification of the sensitive data by adopting the modes of machine learning, deep semantic analysis, automatic keyword extraction and the like. However, the current machine learning algorithm has the problems of high requirement on training samples, low convergence rate during training and the like.
Therefore, at present, in the field of data security management, no suitable method for identifying sensitive data exists.
Disclosure of Invention
Therefore, the technical problem to be solved by the present invention is to overcome the defect that there is no suitable method for identifying sensitive data in the prior art, and to provide a method and a system for classifying and grading sensitive data.
In a first aspect, an embodiment of the present invention provides a sensitive data classification and classification identification method, including:
acquiring sensitive data characteristics in each service system of the internal and external networks of the power information;
constructing a sensitive data classification model;
inputting the sensitive data characteristics into the sensitive data classification model to obtain a sensitive data classification result;
constructing a sensitive data grading model;
inputting the classification result of the sensitive data, the characteristics of the sensitive data and the safety level of preset data into the sensitive data classification model to obtain a classification result of the sensitive data;
grading and marking the sensitive data according to the sensitive data classification result, the sensitive data characteristics and the sensitive data grading result.
Optionally, the constructing a sensitive data classification model includes:
preprocessing the sensitive data;
equally dividing the preprocessed sensitive data into a plurality of diversity sets, circularly selecting any one diversity set as a first test set, and using the rest diversity sets as a first training set;
setting a neural network model and parameters, and constructing an initial sensitive data classification model;
and training and testing the sensitive data classification model according to the first training set and the first testing set, executing incremental neural network learning, and continuously adjusting the sensitive data classification model and parameters until an output result meets a preset target.
Optionally, the setting a neural network model and parameters, and constructing an initial sensitive data classification model, includes:
setting the number of nodes of an input layer, the number of nodes of an output layer, the number of hidden layers and the number of nodes of each hidden layer of the neural network;
setting initial connection weights from the neural network input layer to the hidden layer and from the hidden layer to the output layer, and constructing an initial sensitive data classification model.
Optionally, the training and testing a sensitive data classification model according to the first training set and the first test set, performing incremental neural network learning, and continuously adjusting the sensitive data classification model and parameters until an output result meets a preset target includes:
inputting the first training set into the sensitive data classification model, and performing multiple training on the sensitive data classification model;
inputting the first test set into a sensitive data classification model after multiple times of training, and determining whether the output result has classification errors or whether a new class needs to be added according to the output result of the test set;
if the classification is wrong, executing a steepest descent method to adjust the weight of the sensitive data classification model until the classification result conforms to the expectation;
if a new class is added, an output node is added on the basis of the original sensitive data classification model, a newly added connection weight from the hidden layer to the output layer is initialized randomly, the number of the newly added nodes of the hidden layer is confirmed by adopting a mode of gradually increasing the nodes of the hidden layer, and the sensitive data classification model and parameters are adjusted according to an actual result and an expected result.
Optionally, the constructing a sensitive data hierarchical model includes:
taking the classification result of the sensitive data, the characteristics of the sensitive data and the safety level of preset data as a sample set, equally dividing the sample set into a plurality of diversity sets, circularly selecting any one diversity set as a second test set, and taking the rest diversity sets as second training sets;
setting parameters of a neural network model, and constructing an initial sensitive data hierarchical model;
and training and testing the sensitive data hierarchical model according to the second training set and the second testing set, and continuously adjusting the parameters of the sensitive data hierarchical model until the output result meets a preset target.
Optionally, the training and testing a sensitive data classification model according to the second training set and the second testing set, and continuously adjusting parameters of the sensitive data classification model until an output result meets a preset target includes:
inputting the second training set into the sensitive data classification model, and training the sensitive data classification model for multiple times;
inputting the second test set into the sensitive data grading model after multiple times of training, and determining whether grading errors exist in output results according to the output results of the test set;
if the grading is wrong, the steepest descent method is executed to adjust the weight of the sensitive data classification model until the grading result is in line with expectation.
In a second aspect, an embodiment of the present invention provides a sensitive data classification and hierarchical identification system, including:
the acquisition module is used for acquiring sensitive data characteristics in each service system of the internal and external networks of the power information;
the first construction module is used for constructing a sensitive data classification model;
the classification module is used for inputting the sensitive data characteristics into the sensitive data classification model to obtain a sensitive data classification result;
the second construction module is used for constructing a sensitive data hierarchical model;
the grading module is used for inputting the sensitive data classification result, the sensitive data characteristics and the preset data security level into the sensitive data grading model to obtain a sensitive data grading result;
and the identification module is used for grading and identifying the sensitive data according to the sensitive data classification result, the sensitive data characteristics and the sensitive data grading result.
In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium stores computer instructions, and the computer instructions are configured to cause the computer to execute the sensitive data classification and hierarchical identification method according to the first aspect of the embodiment of the present invention.
In a fourth aspect, an embodiment of the present invention provides a computer device, including: the sensitive data classification hierarchical identification method comprises a memory and a processor, wherein the memory and the processor are mutually connected in a communication mode, the memory stores computer instructions, and the processor executes the computer instructions so as to execute the sensitive data classification hierarchical identification method according to the first aspect of the embodiment of the invention.
The technical scheme of the invention has the following advantages:
the sensitive data classification and grading identification method provided by the invention comprises the following steps: acquiring sensitive data characteristics in each service system of the internal and external networks of the power information; constructing a sensitive data classification model; inputting the sensitive data characteristics into a sensitive data classification model to obtain a sensitive data classification result; constructing a sensitive data grading model; inputting the classification result of the sensitive data, the characteristics of the sensitive data and the safety level of the preset data into a sensitive data classification model to obtain a classification result of the sensitive data; and grading and identifying the sensitive data according to the sensitive data classification result, the sensitive data characteristics and the sensitive data grading result. The classification of the electric power structured sensitive data and the dynamic adjustment of a classification model, parameters and the like are realized through an improved neural network algorithm, the classification and classification of the electric power structured sensitive data can be realized in a short time, the improved neural network algorithm can avoid the problems of failure of a training network model, high requirement on training samples, low convergence speed during training and the like which possibly occur in the traditional machine learning algorithm, also avoid the problems of low manual identification efficiency, inaccurate positioning of sensitive data contents and the like, and simultaneously improve the data maintenance efficiency of a sensitive field library.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flowchart illustrating a specific example of a sensitive data classification and classification method according to an embodiment of the present invention;
FIG. 2 is a flow chart of sensitive data classification according to an embodiment of the present invention;
FIG. 3 is a flow chart of sensitive data ranking in an embodiment of the present invention;
FIG. 4 is a schematic block diagram of a specific example of a sensitive data classification hierarchical identification system in an embodiment of the present invention;
fig. 5 is a block diagram of a specific example of a computer device according to an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; the two elements may be directly connected or indirectly connected through an intermediate medium, or may be connected through the inside of the two elements, or may be connected wirelessly or through a wire. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The embodiment of the invention provides a sensitive data classification and grading identification method, as shown in fig. 1, comprising the following steps:
step S1: and acquiring sensitive data characteristics in each service system of the internal and external networks of the power information.
In a specific embodiment, public sensitive data and industry sensitive data are researched and are combed. And then scanning each database related to each service system of the information internal and external networks, collecting each metadata information, and carrying out sampling research and analysis on all fields of the non-system table to form possible sensitive data characteristics.
In particular, sensitive data is generally classified into public sensitive data (from a legal perspective), industry sensitive data (from an industry regulation perspective), and enterprise sensitive data (from an internal regulation perspective). Public sensitive data and industry sensitive data are generally defined in a regulation document, but the combing of enterprise sensitive data needs to depend on the understanding of a person on a business system. Sensitive data which may be generated by the service system is identified by communicating with a service system accessed by a cross-information internal and external network and a related data storage point and by organizing the service flow together with the service department in interview and workgroup forms. The business system of cross-information internal and external network access mainly comprises an leaving retirement, a business trip, an electronic commerce platform, a British letter trust, a safety supervision and control system, an electric power smart cloud, electrician equipment, a worker book house, an electric power market transaction system, a 95598 system, power utilization information acquisition, an ERP system, a safety supervision system, an infrastructure management and control system, a vehicle management system and the like. The business system can be classified into data of human property, planning, construction, operation, overhaul, marketing and the like. The sensitive and private information mainly comprises sensitive information such as electric power marketing data, electric power scheduling data, customer archive information, customer authentication information, customer electricity utilization information and the like. The customer information comprises sensitive data such as customer profile information, customer authentication information, customer electricity consumption information and the like. The electric power marketing system relates to privacy information of a plurality of users, such as: name, identification number, phone, mailbox, etc.
Further, each database related to each service system of the internal and external networks of the information is cleared, metadata information including data tables and field names, types, annotation information and the like is collected, for service data with relatively fixed information of the internal and external networks, sampling data of all fields of a non-system table are obtained by scanning the service system database, the sampling data of all the fields are preprocessed and regularly judged to form possible sensitive data characteristics, the data of columns and the databases are determined to be sensitive data, the general data structure and the data length of the data cannot be changed, and most of the data are characters with numerical values and fixed lengths. Such as: unit code, house number, house name, power utilization address and other identification columns.
In the embodiment of the present invention, since the data table has various formats such as numeric type and character type, the data in these formats needs to be processed, for example, when the identification number, name, etc. are in the database, the character type storage form can be converted into the numeric type, etc.
Step S2: and constructing a sensitive data classification model.
In a specific embodiment, according to known sensitive data characteristics, classification is carried out by combining with an improved neural network incremental learning algorithm to form a preliminary classification scheme, an electric structured sensitive data classification model is constructed, the electric structured sensitive data classification model can be roughly classified into 6 categories including a name category, an address category, a connection category, a certificate category, an asset category, a financial category and the like, and specific data items and sensitive data characteristics in each category are obtained.
In the embodiment of the invention, the sensitive data classification model construction comprises the following steps:
step S21: and preprocessing the sensitive data.
Step S22: equally dividing the preprocessed sensitive data into a plurality of diversity sets, circularly selecting any one diversity set as a first test set, and using the rest diversity sets as a first training set.
Step S23: setting a neural network model and parameters, and constructing an initial sensitive data classification model.
Step S24: and training and testing the sensitive data classification model according to the first training set and the first testing set, executing incremental neural network learning, and continuously adjusting the sensitive data classification model and parameters until an output result meets a preset target.
Specifically, step S23: the method comprises the following steps:
step S231: and setting the number of nodes of an input layer, the number of nodes of an output layer, the number of hidden layers and the number of hidden layers of the neural network.
Step S232: setting initial connection weights from the neural network input layer to the hidden layer and from the hidden layer to the output layer, and constructing an initial sensitive data classification model.
Specifically, step S24: the method comprises the following steps:
step S241: and inputting the first training set into the sensitive data classification model, and training the sensitive data classification model for multiple times.
Step S242: inputting the first test set into the sensitive data classification model after multiple times of training, and determining whether the output result has classification errors or whether new classes need to be added according to the output result of the test set.
Step S243: if the classification is wrong, a steepest descent method is executed to adjust the weight of the sensitive data classification model until the classification result is in line with expectation.
Step S244: if a new class is added, an output node is added on the basis of the original sensitive data classification model, a newly added connection weight from the hidden layer to the output layer is initialized randomly, the number of the newly added nodes of the hidden layer is confirmed by adopting a mode of gradually increasing the nodes of the hidden layer, and the sensitive data classification model and the parameters are adjusted according to an actual result and an expected result.
Specifically, a power structured sensitive data classification model is constructed, and an improved neural network incremental learning algorithm is adopted to realize the following steps:
1. training data was normalized and normalized.
The data is normalized and preprocessed by z-score, and the mean value of any processed feature on all samples in the data set is 0 and the standard deviation is 1, and the formula is as follows:
Figure BDA0003487558130000111
wherein x is the actual value; μ is the mean of the overall data; δ is the standard deviation of the overall data.
Normalizing the input data to approximate the distribution of the individual features: this tends to make it easier to train out an effective model.
With the progress of model training, when the parameters in each layer are updated, the output close to the output layer is difficult to change drastically. Even if the input data is standardized, the updating of the model parameters in the training can still easily cause drastic changes close to the output of the output layer. This instability of the calculated values often makes it difficult to train an effective depth model. Thus introducing normalization
Normalization, a common method is to map the data between [0,1] by performing a linear transformation on the original data, the transformation function being:
Figure BDA0003487558130000112
2. training data diversity: and equally dividing the same data set into N small data sets, taking N-1 of the small data sets as a training set and the other small data sets as a test set in each training process, and circularly selecting the test set.
3. Defining an initial weight: the mean value of the connection weights from the input layer to the hidden layer and from the hidden layer to the output layer of the neural network is 0, and the standard deviation is
Figure BDA0003487558130000121
Is initialized.
4. Designing a neural network model: and defining the number of nodes of an input layer, the number of nodes of an output layer, the number of hidden layers and the number of nodes of each hidden layer. The number of nodes of an input layer and the number of nodes of an output layer can be determined according to the attribute dimension and the target output of the training sample.
(1) The number of nodes of the hidden layer number is confirmed according to the following formula:
Figure BDA0003487558130000122
Niis the number of nodes of the input layer, NoThe number of nodes of an output layer is N, and the number of samples of a training set is N; alpha is any value variable which can be taken by itself, and the range is 2-10 usually.The hidden node number N is determined by the hidden node according to the mode that alpha ranges from 2 to 10hThe range of (1). And according to the mode that the number of the hidden nodes is from low to high, adopting the same sample to train one by one to determine the number of the corresponding hidden nodes when the error is minimum.
(2) After the number of hidden layer nodes is determined, the number of the hidden layer nodes is trained according to 1-3 layers, the same sample is used for training, and the corresponding number of the hidden layer nodes when the error is minimum is determined.
At this point, the initial neural network model has been validated.
5. And training and testing the neural network for multiple times according to the diversity of the training data, executing incremental neural network learning, and continuously adjusting the power structured sensitive data classification model.
Incremental learning neural network algorithm:
(1) the sample data is divided into N small data sets in a diversity mode, and then the power structured sensitive data classification model is trained for N times;
(2) and confirming whether the output result has classification errors or whether a new class needs to be added according to the output result of the test set.
If the classification is wrong, a steepest descent method is executed to adjust the weight of the power structured sensitive data classification model until the classification result is in accordance with the expectation;
and secondly, if a new class needs to be added, adding an output node on the basis of the original electric power structured sensitive data classification model, randomly initializing a newly added connection weight from the hidden layer to the output layer, confirming the number of the newly added nodes of the hidden layer by adopting a mode of gradually increasing the nodes of the hidden layer, and adjusting the electric power structured sensitive data classification model and related parameters according to an actual result and an expected result.
By adopting an improved BP neural network incremental learning algorithm, after training samples enter learning each time, adjusting each parameter; when a new training sample is added, the sensitive data classification model can be dynamically adjusted in real time, so that the problems of failure of a training network model, high requirement on the training sample, low convergence speed during training and the like which possibly occur in the traditional machine learning algorithm can be avoided, and the classification result is more accurate.
Step S3: and inputting the sensitive data characteristics into a sensitive data classification model to obtain a sensitive data classification result.
In a specific embodiment, after a sensitive data classification model is constructed, sensitive data characteristics are input into the sensitive data classification model, and different types of sensitive data in a database are screened by adopting an improved neural network algorithm to form a power service sensitive data classification result, so that specific data items and sensitive data characteristics in each classification are obtained.
In the embodiment of the invention, the electric power structured sensitive data classification flow based on the improved incremental neural network learning algorithm is shown in fig. 2: 1. reading the information of each data table of each user in the database; the database account is divided into sys users and common users. The data table read here refers to the data table information in the ordinary user. Each user contains N data table information, the data table contains M fields, and each field has Q rows of information. But one field contains a type of information, e.g. the field may be a customer number, an identification number, etc. 2. Extracting the data characteristics of each field, thereby obtaining the field name and the characteristic item of each field; and thus samples of the improved neural network algorithm. 3. According to the improved neural network algorithm, the input layer inputs the names of all the fields and the data of the fields; the output layer is the classification. And training the model. 4. Each field contained in the classification of the training model also has its feature items.
Step S4: and constructing a sensitive data hierarchical model.
In a specific embodiment, the data is classified by adopting an improved neural network algorithm, and an electric power structured sensitive data classification model is constructed. The sensitive data hierarchical model is constructed, and the method comprises the following steps:
step S41: and taking the classification result of the sensitive data, the characteristics of the sensitive data and the preset data security level as a sample set, equally dividing the sample set into a plurality of diversity sets, circularly selecting any one diversity set as a second test set, and taking the rest diversity sets as second training sets.
Step S42: setting parameters of a neural network model, and constructing an initial sensitive data hierarchical model.
Step S43: and training and testing the sensitive data classification model according to the second training set and the second testing set, and continuously adjusting the parameters of the sensitive data classification model until the output result meets the preset target.
In the embodiment of the present invention, step S43 includes the following steps:
step S431: and inputting the second training set into the sensitive data hierarchical model, and training the sensitive data hierarchical model for multiple times.
Step S432: and inputting the second test set into the sensitive data grading model after multiple times of training, and determining whether grading errors exist in the output result according to the output result of the test set.
Step S433: if the grading is wrong, the steepest descent method is executed to adjust the weight of the sensitive data classification model until the grading result is in line with expectation.
Specifically, a power structured sensitive data hierarchical model is constructed, and an improved neural network algorithm is adopted to realize the following steps:
1. training data diversity: and obtaining all training data on the basis of the classification of the power structured sensitive data. And (3) equally dividing all training data into N small data sets, taking N-1 of the small data sets as a training set and the other one as a test set in each training process, and circularly selecting the test set.
3. Defining an initial weight: for each weight, the mean value is 0 and the standard deviation is
Figure BDA0003487558130000151
Is initialized.
4. Designing a neural network model: and defining the number of nodes of an input layer, the number of nodes of an output layer, the number of hidden layers and the number of hidden layers. The number of nodes of an input layer and the number of nodes of an output layer can be determined according to the attribute dimension and the target output of the training sample. Output nodes include highly sensitive, sensitive and generic.
(1) The number of nodes of the hidden layer number is confirmed according to the following formula:
Figure BDA0003487558130000152
Niis the number of nodes of the input layer, NoThe number of nodes of an output layer is N, and the number of samples of a training set is N; alpha is any value variable which can be taken by itself, and the range is 2-10 usually. The hidden node number N is determined by the hidden node according to the mode that alpha ranges from 2 to 10hThe range of (1). And according to the mode that the number of the hidden nodes is from low to high, adopting the same sample to train one by one to determine the number of the corresponding hidden nodes when the error is minimum.
(3) After the number of hidden layer nodes is determined, the number of the hidden layer nodes is trained according to 1-3 layers, the same sample is used for training, and the corresponding number of the hidden layer nodes when the error is minimum is determined.
At this point, the initial neural network model has been validated.
5. And training and testing the neural network for multiple times according to the diversity of the training data, and continuously adjusting the power structured sensitive data hierarchical model.
(1) If the training data diversity is divided into N small data sets, training the power structured sensitive data hierarchical model for N times;
(2) and confirming whether the output result has grading errors or not according to the output result of the test set. And if the grading is wrong, executing a steepest descent method to adjust the weight of the electric power structured sensitive data classification model until the grading result is in accordance with the expectation.
By adopting the improved BP neural network algorithm, the relevant parameters of the sensitive data hierarchical model can be dynamically adjusted in real time. For example, when the classification is wrong, the weight of the electric power structured sensitive data classification model is adjusted in time, so that the classification result is in accordance with the expectation, the problems of failure of a training network model, high requirement on a training sample, low convergence speed during training and the like which possibly occur in the traditional machine learning algorithm are avoided, and the classification result is more accurate.
Step S5: and inputting the classification result of the sensitive data, the characteristics of the sensitive data and the safety level of the preset data into the sensitive data classification model to obtain a sensitive data classification result.
In a specific embodiment, on the basis of sensitive data classification, safety classification is performed according to data classification and data characteristics, and an influence range, an object and an influence degree caused by data leakage according to power data characteristics, wherein the safety classification is mainly divided into business secret data, important enterprise data and general data, and sensitive grades are highly sensitive, sensitive and general. Meanwhile, the accessible data table, the accessible data column and the accessible data row level are different due to different access rights of users. For example, some fields are sensitive to one user but not another. Or only the data which the user has the right to access can be accessed in the same table due to different user access rights. The user level can be divided into common users, administrators and super administrators, and data access permissions of different levels are set for the three roles. Meanwhile, different data security levels are set for the row-level data according to different authorities. In the embodiment of the invention, the preset data security level comprises high sensitivity, sensitivity and generality.
And (3) grading the data by adopting an improved neural network algorithm, and constructing a power structured sensitive data grading model. The hierarchical flow of the electric power structured sensitive data based on the improved incremental neural network learning algorithm is shown in fig. 3: the classification and data characteristics of the data, the influence range and the object and the influence degree of data leakage and the accessible user authority of the data are used as input layer elements, the output result is normalized to be between (0 and 1), and the learning model determines that the data is highly sensitive when the output result is 0.7-1; when the output result is 0.4-0.6, the data is sensitive; when the output result is 0-0.3, the data is normal.
Step S6: and grading and identifying the sensitive data according to the sensitive data classification result, the sensitive data characteristics and the sensitive data grading result.
In a specific embodiment, a sensitive field library is constructed, and covers specific classification types, data characteristics and corresponding sensitivity level standards of all sensitive fields, so as to realize level-fixing identification of structured sensitive data. The sensitive field library mainly comprises: sensitive information source, data item, classification category, sensitive data characteristic, user role and grading identification.
The sensitive data classification and grading identification method provided by the invention comprises the following steps: acquiring sensitive data characteristics in each service system of the internal and external networks of the power information; constructing a sensitive data classification model; inputting the sensitive data characteristics into a sensitive data classification model to obtain a sensitive data classification result; constructing a sensitive data grading model; inputting the classification result of the sensitive data, the characteristics of the sensitive data and the safety level of the preset data into a sensitive data classification model to obtain a sensitive data classification result; and grading and identifying the sensitive data according to the sensitive data classification result, the sensitive data characteristics and the sensitive data grading result. The classification of the electric power structured sensitive data and the dynamic adjustment of a classification model, parameters and the like are realized through an improved neural network algorithm, the classification and classification of the electric power structured sensitive data can be realized in a short time, the improved neural network algorithm can avoid the problems of failure of a training network model, high requirement on training samples, low convergence speed during training and the like which possibly occur in the traditional machine learning algorithm, also avoid the problems of low manual identification efficiency, inaccurate positioning of sensitive data contents and the like, and simultaneously improve the data maintenance efficiency of a sensitive field library.
The embodiment of the present invention provides a sensitive data classification and classification identification system, as shown in fig. 4, including:
the acquisition module 1 is used for acquiring sensitive data characteristics in each service system of the internal and external networks of the power information. For details, refer to the related description of step S1 in the above method embodiment, and are not repeated herein.
And the first construction module 2 is used for constructing a sensitive data classification model. For details, refer to the related description of step S2 in the above method embodiment, and are not described herein again.
And the classification module 3 is used for inputting the sensitive data characteristics into the sensitive data classification model to obtain a sensitive data classification result. For details, refer to the related description of step S3 in the above method embodiment, and are not described herein again.
And the second construction module 4 is used for constructing the sensitive data hierarchical model. For details, refer to the related description of step S4 in the above method embodiment, and are not described herein again.
And the grading module 5 is used for inputting the sensitive data classification result, the sensitive data characteristics and the preset data security level into the sensitive data grading model to obtain a sensitive data grading result. For details, refer to the related description of step S5 in the above method embodiment, and are not described herein again.
And the identification module 6 is used for grading and identifying the sensitive data according to the sensitive data classification result, the sensitive data characteristics and the sensitive data grading result. For details, refer to the related description of step S6 in the above method embodiment, and are not described herein again.
An embodiment of the present invention provides a computer device, as shown in fig. 5, the device may include a processor 81 and a memory 82, where the processor 81 and the memory 82 may be connected by a bus or by other means, and fig. 5 takes the connection by the bus as an example.
Processor 81 may be a Central Processing Unit (CPU). The Processor 81 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof.
The memory 82, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as the corresponding program instructions/modules in the embodiments of the present invention. The processor 81 executes various functional applications and data processing of the processor by executing non-transitory software programs, instructions and modules stored in the memory 82, namely, implementing the sensitive data classification hierarchical identification method in the above method embodiment.
The memory 82 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor 81, and the like. Further, the memory 82 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 82 may optionally include memory located remotely from the processor 81, which may be connected to the processor 81 via a network. Examples of such networks include, but are not limited to, the internet, intranets, mobile communication networks, and combinations thereof.
One or more modules are stored in the memory 82 and, when executed by the processor 81, perform the sensitive data classification hierarchical identification method as in the embodiment shown in fig. 1-3.
The details of the computer device can be understood by referring to the corresponding descriptions and effects in the embodiments shown in fig. 1 to fig. 3, and are not described herein again.
Those skilled in the art will appreciate that all or part of the processes of the methods of the embodiments described above can be implemented by instructing relevant hardware by a computer program, and the program can be stored in a computer-readable storage medium, and when executed, the program can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD) or a Solid State Drive (SSD), etc.; the storage medium may also comprise a combination of memories of the kind described above.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications of the invention may be made without departing from the scope of the invention.

Claims (9)

1. A sensitive data classification and hierarchical identification method is characterized by comprising the following steps:
acquiring sensitive data characteristics in each service system of the internal and external networks of the power information;
constructing a sensitive data classification model;
inputting the sensitive data characteristics into the sensitive data classification model to obtain a sensitive data classification result;
constructing a sensitive data grading model;
inputting the classification result of the sensitive data, the characteristics of the sensitive data and the safety level of preset data into the sensitive data classification model to obtain a classification result of the sensitive data;
grading and marking the sensitive data according to the sensitive data classification result, the sensitive data characteristics and the sensitive data grading result.
2. The sensitive data classification hierarchical identification method according to claim 1, wherein the constructing a sensitive data classification model comprises:
preprocessing the sensitive data;
equally dividing the preprocessed sensitive data into a plurality of diversity sets, circularly selecting any one diversity set as a first test set, and using the rest diversity sets as a first training set;
setting a neural network model and parameters, and constructing an initial sensitive data classification model;
and training and testing the sensitive data classification model according to the first training set and the first testing set, executing incremental neural network learning, and continuously adjusting the sensitive data classification model and parameters until an output result meets a preset target.
3. The sensitive data classification and hierarchical identification method according to claim 2, wherein the setting of the neural network model and the parameters to construct the initial sensitive data classification model comprises:
setting the number of nodes of an input layer, the number of nodes of an output layer, the number of hidden layers and the number of nodes of each hidden layer of the neural network;
setting initial connection weights from a neural network input layer to a hidden layer and from the hidden layer to an output layer, and constructing an initial sensitive data classification model.
4. The sensitive data classification and classification identification method according to claim 2, wherein the training and testing of the sensitive data classification model according to the first training set and the first testing set, the execution of incremental neural network learning, and the continuous adjustment of the sensitive data classification model and parameters until the output result meets a preset target comprises:
inputting the first training set into the sensitive data classification model, and performing multiple training on the sensitive data classification model;
inputting the first test set into a sensitive data classification model after multiple times of training, and determining whether the output result has classification errors or whether new classes need to be added according to the output result of the test set;
if the classification is wrong, executing a steepest descent method to adjust the weight of the sensitive data classification model until the classification result meets the expectation;
if a new class is added, an output node is added on the basis of the original sensitive data classification model, a newly added connection weight from the hidden layer to the output layer is initialized randomly, the number of the newly added nodes of the hidden layer is confirmed by adopting a mode of gradually increasing the nodes of the hidden layer, and the sensitive data classification model and the parameters are adjusted according to an actual result and an expected result.
5. The sensitive data classification and classification identification method according to claim 1, wherein the constructing of the sensitive data classification model comprises:
taking the classification result of the sensitive data, the characteristics of the sensitive data and the safety level of preset data as a sample set, equally dividing the sample set into a plurality of diversity sets, circularly selecting any one diversity set as a second test set, and taking the rest diversity sets as second training sets;
setting parameters of a neural network model, and constructing an initial sensitive data hierarchical model;
and training and testing the sensitive data classification model according to the second training set and the second testing set, and continuously adjusting the parameters of the sensitive data classification model until the output result meets the preset target.
6. The sensitive data classification and classification identification method according to claim 5, wherein the training and testing of the sensitive data classification model according to the second training set and the second testing set, and the continuous adjustment of the sensitive data classification model parameters until the output result meets a preset target, comprises:
inputting the second training set into the sensitive data classification model, and training the sensitive data classification model for multiple times;
inputting the second test set into the sensitive data grading model after multiple times of training, and determining whether grading errors exist in output results according to the output results of the test set;
if the grading is wrong, the steepest descent method is executed to adjust the weight of the sensitive data classification model until the grading result is in line with expectation.
7. A sensitive data classification hierarchical identification system, comprising:
the acquisition module is used for acquiring sensitive data characteristics in each service system of the internal and external networks of the power information;
the first construction module is used for constructing a sensitive data classification model;
the classification module is used for inputting the sensitive data characteristics into the sensitive data classification model to obtain a sensitive data classification result;
the second construction module is used for constructing a sensitive data hierarchical model;
the grading module is used for inputting the sensitive data classification result, the sensitive data characteristics and the preset data security level into the sensitive data grading model to obtain a sensitive data grading result;
and the identification module is used for grading and identifying the sensitive data according to the sensitive data classification result, the sensitive data characteristics and the sensitive data grading result.
8. A computer-readable storage medium storing computer instructions for causing a computer to perform the sensitive data classification hierarchy identification method of any one of claims 1-6.
9. A computer device, comprising: a memory and a processor, the memory and the processor being communicatively connected to each other, the memory storing computer instructions, and the processor executing the computer instructions to perform the sensitive data classification hierarchical identification method according to any one of claims 1 to 6.
CN202210087453.7A 2022-01-25 2022-01-25 Sensitive data classification and grading identification method and system Pending CN114511019A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210087453.7A CN114511019A (en) 2022-01-25 2022-01-25 Sensitive data classification and grading identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210087453.7A CN114511019A (en) 2022-01-25 2022-01-25 Sensitive data classification and grading identification method and system

Publications (1)

Publication Number Publication Date
CN114511019A true CN114511019A (en) 2022-05-17

Family

ID=81548911

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210087453.7A Pending CN114511019A (en) 2022-01-25 2022-01-25 Sensitive data classification and grading identification method and system

Country Status (1)

Country Link
CN (1) CN114511019A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115174140A (en) * 2022-05-26 2022-10-11 中国电信股份有限公司 Data identification method and device, electronic equipment and nonvolatile storage medium
CN116108393A (en) * 2023-04-12 2023-05-12 国网智能电网研究院有限公司 Power sensitive data classification and classification method and device, storage medium and electronic equipment
CN116127400A (en) * 2023-04-19 2023-05-16 国网智能电网研究院有限公司 Sensitive data identification system, method and storage medium based on heterogeneous computation
CN116436711A (en) * 2023-06-15 2023-07-14 深圳开鸿数字产业发展有限公司 Data security processing method, device, system and storage medium
CN116776237A (en) * 2023-08-23 2023-09-19 深圳前海环融联易信息科技服务有限公司 Metadata classification and classification method, device, equipment and medium
CN117216668A (en) * 2023-11-09 2023-12-12 北京安华金和科技有限公司 Data classification hierarchical processing method and system based on machine learning

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115174140A (en) * 2022-05-26 2022-10-11 中国电信股份有限公司 Data identification method and device, electronic equipment and nonvolatile storage medium
CN116108393A (en) * 2023-04-12 2023-05-12 国网智能电网研究院有限公司 Power sensitive data classification and classification method and device, storage medium and electronic equipment
CN116127400A (en) * 2023-04-19 2023-05-16 国网智能电网研究院有限公司 Sensitive data identification system, method and storage medium based on heterogeneous computation
CN116127400B (en) * 2023-04-19 2023-06-27 国网智能电网研究院有限公司 Sensitive data identification system, method and storage medium based on heterogeneous computation
CN116436711A (en) * 2023-06-15 2023-07-14 深圳开鸿数字产业发展有限公司 Data security processing method, device, system and storage medium
CN116436711B (en) * 2023-06-15 2023-09-08 深圳开鸿数字产业发展有限公司 Data security processing method, device, system and storage medium
CN116776237A (en) * 2023-08-23 2023-09-19 深圳前海环融联易信息科技服务有限公司 Metadata classification and classification method, device, equipment and medium
CN117216668A (en) * 2023-11-09 2023-12-12 北京安华金和科技有限公司 Data classification hierarchical processing method and system based on machine learning

Similar Documents

Publication Publication Date Title
CN114511019A (en) Sensitive data classification and grading identification method and system
CN112639845B (en) Machine learning system and method for determining personal information search result credibility
CN110796470B (en) Data analysis system for market subject supervision and service
WO2014055238A1 (en) System and method for building and validating a credit scoring function
CN113590698B (en) Artificial intelligence technology-based data asset classification modeling and hierarchical protection method
CN106164896B (en) Multi-dimensional recursion method and system for discovering counterparty relationship
CN110674360B (en) Tracing method and system for data
CN110727852A (en) Method, device and terminal for pushing recruitment recommendation service
CN115547466B (en) Medical institution registration and review system and method based on big data
CN111259167B (en) User request risk identification method and device
CN110689211A (en) Method and device for evaluating website service capability
CN109284978B (en) System and method for accurately identifying poverty-stricken user
US20240144405A1 (en) Method for information interaction, device, and storage medium
CN112836041B (en) Personnel relationship analysis method, device, equipment and storage medium
CN115982429B (en) Knowledge management method and system based on flow control
CN115577983B (en) Enterprise task matching method based on block chain, server and storage medium
US20210029129A1 (en) System and method for controlling security access
US20120271789A1 (en) Apparatus and method for prediction development speed of technology
CN114491168B (en) Method and system for regulating and controlling cloud sample data sharing, computer equipment and storage medium
US20220405681A1 (en) Personal introduction information generating method, computing device using the same, and storage medium
CN114022053B (en) Auditing system and equipment based on risk factors
CN116089490A (en) Data analysis method, device, terminal and storage medium
US20200342302A1 (en) Cognitive forecasting
CN113191728B (en) Resume recommendation method, device, equipment and medium based on deep learning model
RU2744625C1 (en) Method of generating reports on the basic indicators of the display system of enterprise indicators

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination