CN113392111A

CN113392111A - Self-learning management system based on sensitive database

Info

Publication number: CN113392111A
Application number: CN202110672561.6A
Authority: CN
Inventors: 林德威; 高董英; 方志坚; 黄芳芳; 潘建笠; 刘积娟; 黄鹏; 陈强; 谢妙红; 李建平; 曾驰
Original assignee: State Grid Information and Telecommunication Co Ltd; Information and Telecommunication Branch of State Grid Fujian Electric Power Co Ltd; Great Power Science and Technology Co of State Grid Information and Telecommunication Co Ltd
Current assignee: State Grid Information and Telecommunication Co Ltd; Information and Telecommunication Branch of State Grid Fujian Electric Power Co Ltd; Great Power Science and Technology Co of State Grid Information and Telecommunication Co Ltd
Priority date: 2021-06-17
Filing date: 2021-06-17
Publication date: 2021-09-14
Anticipated expiration: 2041-06-17
Also published as: CN113392111B

Abstract

The invention provides a self-learning management system based on a sensitive database, which comprises a database updating module, a storage module, a self-learning module and a processing module, wherein an initial sensitive database is stored in the storage module, the self-learning module is used for constructing sensitive data characteristics according to the initial sensitive database, the processing module is used for classifying received data, and the database updating module is used for storing the classified sensitive data into the storage module; the self-learning management system can update and classify the newly generated sensitive data again, so that the accuracy of classifying the sensitive data can be improved, and the problems of the conventional sensitive data that the processing process is more rigid and the processing efficiency and the safety are lower are solved.

Description

Self-learning management system based on sensitive database

Technical Field

The invention relates to the technical field of data processing, in particular to a self-learning management system based on a sensitive database.

Background

Sensitive data refers to data that may pose serious harm to the society or individuals after leakage. Including personal privacy data such as name, identification number, address, telephone, bank account, mailbox, password, medical information, educational background, etc.; but also data that the enterprise or social organization is not suitable for publishing, such as the business situation of the enterprise, the network structure of the enterprise, the IP address list, etc. Especially, the popularization of the current intelligent power grid system improves the granularity of information collection and also improves the leakage risk of the power utilization information.

In the prior art, in the process of processing sensitive data, the sensitive data are generally divided according to judgment criteria set in advance manually and then classified, the management method of the sensitive data is not suitable for the era of data flooding, the data updating speed is high in the current data processing field, the combination types of different data are changed, the existing identification scene cannot be met by using the original sensitive data management system, and the novel sensitive data are easily missed to be judged, so that the safety and the efficiency of the data processing process are reduced.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a self-learning management system based on a sensitive database, which can update and classify newly generated sensitive data again through the self-learning management system, can improve the accuracy of classifying the sensitive data, and solves the problems of the existing sensitive data that the processing process is more rigid and the processing efficiency and the safety are lower.

In order to achieve the purpose, the invention is realized by the following technical scheme: a self-learning management system based on a sensitive database comprises a database updating module, a storage module, a self-learning module and a processing module, wherein an initial sensitive database is stored in the storage module, the self-learning module is used for constructing sensitive data characteristics according to the initial sensitive database, the processing module is used for classifying received data, and the database updating module is used for storing the classified sensitive data into the storage module;

the self-learning module comprises a first learning unit and a second learning unit; the first learning unit is used for constructing the sensitive data characteristics according to the initial sensitive database, and the second learning unit is used for constructing the sensitive data characteristics according to the updated sensitive database;

the first learning unit comprises a sensitive data classification subunit and a first feature construction subunit; the sensitive data classification subunit is configured with a sensitive data classification policy, where the sensitive data classification policy includes: classifying the sensitive data in the initial sensitive database, wherein the classification level is high sensitive data, medium sensitive data and light sensitive data;

then, performing data label classification on the highly sensitive data, the moderately sensitive data and the mildly sensitive data, wherein the data labels are divided into a data source area, digital data, combined data, physical sign data, payment record data and login record data;

the first feature construction subunit includes a first feature construction policy that includes: extracting a data source area in the highly sensitive data, and marking the data source area as a height area characteristic;

extracting digital data and payment record data in the highly sensitive data, and marking a combination of the payment record data and the digital data marked at the same time as a high payment password characteristic;

extracting combined data and login record data in the highly sensitive data, and marking a combination which simultaneously marks the combined data and the login record data as a highly login password characteristic;

extracting digital data and login record data in the highly sensitive data, and marking a combination of the digital data and the login record data which are marked simultaneously as the characteristics of a highly logged-in account;

extracting sign data and payment record data in the highly sensitive data, and marking a combination of the marked sign data and the payment record data as a highly payment sign characteristic;

extracting sign data and login record data in the highly sensitive data, and marking a combination of simultaneously marked sign data and login record data as a highly logged sign feature;

the processing module comprises a sensitive data dividing unit, the sensitive data dividing unit is configured with a comparison strategy, and the comparison strategy comprises: classifying the received data by data labels, comparing the received data with a height area characteristic, a height payment password characteristic, a height login account characteristic, a height payment physical sign characteristic and a height login physical sign characteristic respectively, classifying the data into primary highly sensitive data when the comparison meets the characteristics, and adding the labels of the characteristics which are matched with the comparison into the data for classification;

the database updating module comprises a cache unit, and the cache unit is used for storing newly classified first-level highly sensitive data in the first time;

the storage module comprises a highly sensitive data storage unit, the highly sensitive data storage unit is configured with a relocation strategy, and the relocation strategy comprises: and the storage data in the cache unit is transferred into the highly sensitive data storage unit every first time.

Further, the second learning unit includes a second feature construction subunit configured with a second feature construction strategy, which includes: and extracting the data with the height area characteristic and the height payment password characteristic, and marking the data with the height area characteristic and the height payment password characteristic as a height concentrated payment area characteristic.

Further, the second feature construction policy further includes: and extracting the data with the high login account number characteristics and the high area characteristics, and marking the data with the high login account number characteristics and the high area characteristics as the high concentrated login area characteristics.

Further, the second feature construction policy further includes: and extracting the data with the high payment password characteristic and the data with the high login password characteristic, and marking the data with the high payment password characteristic and the high login password characteristic as the high password using characteristic.

Further, the second feature construction policy further includes: extracting the data with the height payment sign characteristics and the height login sign characteristics, and marking the data with the height payment sign characteristics and the height login sign characteristics as the height sign characteristics.

Further, the alignment strategy further comprises: and classifying the received data by data labels, comparing the received data with the characteristics of a highly concentrated payment area, a highly concentrated login area, a highly password using characteristic and a highly physical sign characteristic respectively, classifying the data into secondary highly sensitive data when the comparison meets the characteristics, and adding the labels of the characteristics which meet the comparison into the data for classification.

Further, the second learning unit further includes a feature subdivision sub-unit configured with a feature subdivision policy, the feature subdivision policy including: splitting the use characteristics of the high-level password, recording the digit of the use characteristics of the high-level password and the type number of the use combination symbols, classifying the type numbers of the combination symbols corresponding to different digits, selecting the combination with the most occurrence frequency of the type numbers of the combination symbols under different digits as a mutually matched combination, and marking the combination as the type number characteristics corresponding to the digits.

Further, the alignment strategy further comprises: and classifying the received data by data labels, comparing the received data with the type number characteristics corresponding to the digits, classifying the data into subdivided sensitive data when the comparison meets the characteristics, and adding the labels of the characteristics which meet the comparison into the data for classification.

Further, the data tag also comprises video data, picture data and mobile phone shooting source data;

the first feature construction policy further comprises: extracting video data in the highly sensitive data and mobile phone shooting source data, and marking a combination of simultaneously marked video data and mobile phone shooting source data as a high video feature;

and extracting picture data in the highly sensitive data and mobile phone shooting source data, and marking a combination of the simultaneously marked picture data and the mobile phone shooting source data as a high picture characteristic.

Further, the alignment strategy further comprises: and carrying out data label classification on the received data, then comparing the received data with the height video characteristic and the height picture characteristic, classifying the data into highly sensitive data when the comparison meets the characteristics, and adding the label of the matched characteristic into the data for classification.

The invention has the beneficial effects that: according to the method, the sensitive data in the initial sensitive database can be classified through a sensitive data classification strategy, and the classification level is high sensitive data, medium sensitive data and light sensitive data; and then, carrying out data label classification on the highly sensitive data, the moderately sensitive data and the mildly sensitive data, wherein the data labels are divided into a data source area, digital data, combined data, sign data, payment record data and login record data, learning is carried out according to the characteristics, and a first characteristic construction strategy can be used for constructing a height area characteristic, a height payment password characteristic, a height login account characteristic, a height payment sign characteristic and a height login sign characteristic, so that the classification of the sensitive data of the received data can be rapidly carried out, and the self-learning processing efficiency of the sensitive data is improved.

According to the invention, by arranging the second learning unit, the second learning unit can construct the sensitive data characteristics according to the updated sensitive database, and can re-construct the highly concentrated payment region characteristics, the highly concentrated login region characteristics, the highly password using characteristics and the highly physical sign characteristics, so that the sensitivity of the sensitive data is upgraded, and the classification accuracy of the highly sensitive data is improved; meanwhile, a feature subdivision strategy is added, so that feature subdivision can be performed according to the number of bits of the use features of the high-level password and the number of types of the use combination symbols, and the identification accuracy of password data is improved; video data, picture data and mobile phone shooting source data are added into the data label, so that high video characteristics and high picture characteristics can be obtained, and the comprehensiveness of sensitive data classification is improved.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is a schematic block diagram of a first embodiment of the present invention;

fig. 2 is a schematic block diagram of a second embodiment of the present invention.

In the figure: 1. a self-learning management system; 11. a self-learning module; 111. a first learning unit; 1111. a sensitive data classification subunit; 1112. a first feature building subunit; 112. a second learning unit; 1121. a second feature building subunit; 1122. a feature subdivision subunit; 12. a processing module; 121. a sensitive data dividing unit; 13. a database update module; 131. a buffer unit; 14. a storage module; 141. and a memory unit.

Detailed Description

In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further described with the specific embodiments.

In a first embodiment, please refer to fig. 1, a self-learning management system based on a sensitive database includes a database updating module 13, a storage module 14, a self-learning module 11, and a processing module 12, an initial sensitive database is stored in the storage module 14, the self-learning module 11 is configured to construct sensitive data characteristics according to the initial sensitive database, the processing module 12 is configured to classify received data, and the database updating module 13 is configured to store the classified sensitive data in the storage module 14.

The self-learning module 11 comprises a first learning unit 111 and a second learning unit 112; the first learning unit 111 is configured to construct sensitive data features according to an initial sensitive database, and the second learning unit 112 is configured to construct sensitive data features according to an updated sensitive database.

The first learning unit 111 comprises a sensitive data classification subunit 1111 and a first feature construction subunit 1112; the sensitive data classification subunit 1111 is configured with a sensitive data classification policy, which includes: classifying the sensitive data in the initial sensitive database, wherein the classification level is high sensitive data, medium sensitive data and light sensitive data;

and then, carrying out data label classification on the highly sensitive data, the moderately sensitive data and the mildly sensitive data, wherein the data labels are divided into a data source area, digital data, combined data, physical sign data, payment record data and login record data.

The first feature construction subunit 1112 comprises a first feature construction strategy comprising: extracting a data source region in the highly sensitive data, and marking the data source region as a height region characteristic, wherein the height region is generally divided into regions needing to ensure data security, such as a national research institute, a scientific research institute, a bank and the like, and the data output from the regions needs to be divided into the highly sensitive data.

The digital data and the payment record data in the highly sensitive data are extracted, the combination of the payment record data and the digital data which are marked simultaneously is marked as a highly sensitive payment password characteristic, and if the digital data and the payment record data occur simultaneously, the digital data is the payment password with a high probability, so that the digital data needs to be classified as the highly sensitive data.

The combined data and the login record data in the highly sensitive data are extracted, the combination of the combination data marked simultaneously and the login record data is marked as a highly login password characteristic, and if the combined data and the login record data occur simultaneously, the combined data is the login password data with a high probability, so that the combined data needs to be classified as the highly sensitive data.

The digital data and the login record data in the highly sensitive data are extracted, the combination of the simultaneously marked digital data and the login record data is marked as the highly logged account characteristic, and when the data and the login record data occur simultaneously, the digital data is the login account or the mobile phone number at a high probability, so that the highly sensitive data needs to be classified.

The sign data and the payment record data in the highly sensitive data are extracted, the combination of the sign data and the payment record data which are marked simultaneously is marked as a highly payment sign characteristic, and under the condition that the sign data and the payment record data appear simultaneously, the sign data is a sign password, such as a fingerprint password, during payment at a high probability.

The sign data and the login record data in the highly sensitive data are extracted, the combination of the sign data and the login record data which are marked simultaneously is marked as the highly logged sign feature, and under the condition that the sign data and the login record data appear simultaneously, the sign data is a sign password such as a fingerprint password during login in a high probability.

The processing module 12 includes a sensitive data dividing unit 121, where the sensitive data dividing unit 121 is configured with a comparison policy, where the comparison policy includes: and classifying the received data by data labels, comparing the received data with the height area characteristic, the height payment password characteristic, the height login account characteristic, the height payment physical sign characteristic and the height login physical sign characteristic respectively, classifying the data into primary highly sensitive data when the comparison meets the characteristics, and adding the labels of the characteristics which are matched with the comparison into the data for classification.

The data tag also comprises video data, picture data and mobile phone shooting source data;

The alignment strategy further comprises: and carrying out data label classification on the received data, then comparing the received data with the height video characteristic and the height picture characteristic, classifying the data into highly sensitive data when the comparison meets the characteristics, and adding the label of the matched characteristic into the data for classification.

The database updating module 13 includes a cache unit 131, where the cache unit 131 is configured to store the newly categorized first-level highly sensitive data in the first time;

the storage module 14 includes a highly sensitive data storage unit 141, and the highly sensitive data storage unit 141 is configured with a relocation policy, where the relocation policy includes: the storage data in the buffer unit 131 is shifted into the highly sensitive data storage unit 141 every first time.

In the second embodiment, referring to fig. 2, on the basis of the first embodiment, a second learning unit 112 is added, and the second learning unit 112 can perform feature extraction according to the updated sensitive database, so that the subdivision degree of the highly sensitive data is further improved, and the accuracy of classifying the sensitive data is improved. The second learning unit 112 includes a second feature construction sub-unit 1121, and the second feature construction sub-unit 1121 is configured with a second feature construction strategy, which includes: the data with the height area characteristic and the height payment password characteristic are extracted, and the data with the height area characteristic and the height payment password characteristic are marked as the height centralized payment area characteristic, which is more common in the field of centralized payment behaviors such as banks or shopping malls, so that the security priority of data processing in the area is higher.

The second feature construction policy further includes: the data with the high login account characteristics and the high area characteristics are extracted, and the data with the high login account characteristics and the high area characteristics are marked as the high concentrated login area characteristics, which are common in entertainment places with more user terminals such as internet cafes and the like, and users can frequently log in accounts.

The second feature construction policy further includes: and extracting the data with the high payment password characteristic and the data with the high login password characteristic, and marking the data with the high payment password characteristic and the high login password characteristic as the high password using characteristic. By extracting the characteristic, the data related to the password can be identified, so that the password is subjected to emphatic encryption processing.

The second feature construction policy further includes: extracting the data with the height payment sign characteristics and the height login sign characteristics, and marking the data with the height payment sign characteristics and the height login sign characteristics as the height sign characteristics. The data of the physical signs of the human body comprise a plurality of types, and if the data is used for payment and login, the characteristic of the physical signs is used for a physical sign password with high probability, such as a fingerprint password and a face recognition password.

The alignment strategy further comprises: and classifying the received data by data labels, comparing the received data with the characteristics of a highly concentrated payment area, a highly concentrated login area, a highly password using characteristic and a highly physical sign characteristic respectively, classifying the data into secondary highly sensitive data when the comparison meets the characteristics, and adding the labels of the characteristics which meet the comparison into the data for classification.

The second learning unit 112 further includes a feature segmentation subunit 1122, the feature segmentation subunit 1122 being configured with a feature segmentation strategy including: splitting the use characteristics of the high-level password, recording the digit of the use characteristics of the high-level password and the type number of the use combination symbols, classifying the type numbers of the combination symbols corresponding to different digits, selecting the combination with the most occurrence frequency of the type numbers of the combination symbols under different digits as a mutually matched combination, and marking the combination as the type number characteristics corresponding to the digits.

The alignment strategy further comprises: and classifying the received data by data labels, comparing the received data with the type number characteristics corresponding to the digits, classifying the data into subdivided sensitive data when the comparison meets the characteristics, and adding the labels of the characteristics which meet the comparison into the data for classification.

The working principle is as follows: in the process of processing data, the self-learning module 11 can extract features according to an initial sensitive database stored in the storage module 14, perform label and feature classification on highly sensitive data, and classify newly received data by the processing module 12, so as to improve the self-learning classification efficiency of the sensitive data, the marked highly sensitive data is firstly cached in the database updating module 13, and is uniformly and intensively stored in the storage module 14 after a certain time, and the newly added sensitive data in the storage module 14 can be subjected to re-learning classification by adding the second learning unit 112 in the self-learning module 11, so as to further improve the accuracy and the fineness of classification of the sensitive data, and improve the overall self-learning management efficiency of the sensitive data.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. The self-learning management system based on the sensitive database is characterized by comprising a database updating module (13), a storage module (14), a self-learning module (11) and a processing module (12), wherein an initial sensitive database is stored in the storage module (14), the self-learning module (11) is used for constructing sensitive data characteristics according to the initial sensitive database, the processing module (12) is used for classifying received data, and the database updating module (13) is used for storing the classified sensitive data into the storage module (14);

the self-learning module (11) comprises a first learning unit (111) and a second learning unit (112); the first learning unit (111) is used for constructing the sensitive data features according to an initial sensitive database, and the second learning unit (112) is used for constructing the sensitive data features according to an updated sensitive database;

the first learning unit (111) comprises a sensitive data classification subunit (1111) and a first feature construction subunit (1112); the sensitive data classification subunit (1111) is configured with a sensitive data classification policy comprising: classifying the sensitive data in the initial sensitive database, wherein the classification level is high sensitive data, medium sensitive data and light sensitive data;

the first feature construction subunit (1112) comprises a first feature construction strategy comprising: extracting a data source area in the highly sensitive data, and marking the data source area as a height area characteristic;

the processing module (12) comprises a sensitive data dividing unit (121), wherein the sensitive data dividing unit (121) is configured with a comparison strategy, and the comparison strategy comprises: classifying the received data by data labels, comparing the received data with a height area characteristic, a height payment password characteristic, a height login account characteristic, a height payment physical sign characteristic and a height login physical sign characteristic respectively, classifying the data into primary highly sensitive data when the comparison meets the characteristics, and adding the labels of the characteristics which are matched with the comparison into the data for classification;

the database updating module (13) comprises a cache unit (131), wherein the cache unit (131) is used for storing newly classified first-level highly sensitive data in a first time;

the storage module (14) includes a highly sensitive data storage unit (141), the highly sensitive data storage unit (141) configured with a relocation policy, the relocation policy including: the storage data in the cache unit (131) is transferred into the highly sensitive data storage unit (141) at intervals of a first time.

2. A sensitive database based self-learning management system according to claim 1, wherein the second learning unit (112) comprises a second feature construction sub-unit (1121), the second feature construction sub-unit (1121) is configured with a second feature construction strategy, the second feature construction strategy comprises: and extracting the data with the height area characteristic and the height payment password characteristic, and marking the data with the height area characteristic and the height payment password characteristic as a height concentrated payment area characteristic.

3. The sensitive database based self-learning management system of claim 2, wherein the second feature construction policy further comprises: and extracting the data with the high login account number characteristics and the high area characteristics, and marking the data with the high login account number characteristics and the high area characteristics as the high concentrated login area characteristics.

4. The sensitive database based self-learning management system of claim 3, wherein the second feature construction policy further comprises: and extracting the data with the high payment password characteristic and the data with the high login password characteristic, and marking the data with the high payment password characteristic and the high login password characteristic as the high password using characteristic.

5. The sensitive database-based self-learning management system of claim 4, wherein the second feature construction policy further comprises: extracting the data with the height payment sign characteristics and the height login sign characteristics, and marking the data with the height payment sign characteristics and the height login sign characteristics as the height sign characteristics.

6. The sensitive database-based self-learning management system of claim 5, wherein the alignment strategy further comprises: and classifying the received data by data labels, comparing the received data with the characteristics of a highly concentrated payment area, a highly concentrated login area, a highly password using characteristic and a highly physical sign characteristic respectively, classifying the data into secondary highly sensitive data when the comparison meets the characteristics, and adding the labels of the characteristics which meet the comparison into the data for classification.

7. The sensitive database based self-learning management system according to claim 6, wherein the second learning unit (112) further comprises a feature subdivision sub-unit (1122), the feature subdivision sub-unit (1122) being configured with a feature subdivision strategy comprising: splitting the use characteristics of the high-level password, recording the digit of the use characteristics of the high-level password and the type number of the use combination symbols, classifying the type numbers of the combination symbols corresponding to different digits, selecting the combination with the most occurrence frequency of the type numbers of the combination symbols under different digits as a mutually matched combination, and marking the combination as the type number characteristics corresponding to the digits.

8. The sensitive database-based self-learning management system of claim 7, wherein the alignment strategy further comprises: and classifying the received data by data labels, comparing the received data with the type number characteristics corresponding to the digits, classifying the data into subdivided sensitive data when the comparison meets the characteristics, and adding the labels of the characteristics which meet the comparison into the data for classification.

9. The sensitive database based self-learning management system of claim 8, wherein the data tag further comprises video data, picture data and mobile phone shooting source data;

10. The sensitive database-based self-learning management system of claim 9, wherein the alignment strategy further comprises: and carrying out data label classification on the received data, then comparing the received data with the height video characteristic and the height picture characteristic, classifying the data into highly sensitive data when the comparison meets the characteristics, and adding the label of the matched characteristic into the data for classification.