CN114329581A - Data protection method, device and equipment - Google Patents

Data protection method, device and equipment Download PDF

Info

Publication number
CN114329581A
CN114329581A CN202111452394.0A CN202111452394A CN114329581A CN 114329581 A CN114329581 A CN 114329581A CN 202111452394 A CN202111452394 A CN 202111452394A CN 114329581 A CN114329581 A CN 114329581A
Authority
CN
China
Prior art keywords
data source
data
determining
sensitive
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111452394.0A
Other languages
Chinese (zh)
Inventor
王辉
徐志威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN202111452394.0A priority Critical patent/CN114329581A/en
Publication of CN114329581A publication Critical patent/CN114329581A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Storage Device Security (AREA)

Abstract

The application provides a data protection method, a device and equipment, wherein the method comprises the following steps: aiming at a target data source in a plurality of data sources, acquiring a first data source and a second data source from the plurality of data sources; wherein first data in the first data source flows from the first data source to the target data source, and second data in the target data source flows from the target data source to the second data source; determining an inflow influence value corresponding to the target data source based on the sensitive value of the first data; determining an outflow influence value corresponding to the target data source based on the sensitive value of the second data; determining a target influence value corresponding to the target data source based on the inflow influence value and the outflow influence value, and determining a protection level corresponding to the target data source based on the target influence value; and performing data protection on the data in the target data source based on the protection level. Through the technical scheme, the sensitive data can be found from a large number of data sources, then data protection is carried out on the sensitive data, and the safety of the data can be greatly improved.

Description

Data protection method, device and equipment
Technical Field
The present application relates to the field of data security technologies, and in particular, to a data protection method, apparatus, and device.
Background
The HBase database is a nematic (nematic family) Distributed database developed based on HDFS (Hadoop Distributed File System), and is mainly used for ultra-large scale data set storage, so that real-time random access to ultra-large scale data can be realized.
In the HBase database, the following concepts are involved: data table: the data in the HBase database is organized using a data table consisting of rows and columns, the columns are divided into several column families, and the coordinate intersection of the rows and columns determines a cell. Line: the data table is composed of a plurality of rows, each row is provided with a row key as the unique identification of the row, three ways of accessing the rows in the data table are provided, the query is carried out through a single row key, the access is carried out through the interval of one row key, and the full table scanning is carried out. Column group: the data tables are grouped into a collection of "column families," which are the basic access control elements. Column modifier (column qualifier): data in a column family is located by a column qualifier (or column). Cell: a cell (cell) can be defined by a row, column family and column qualifier, and data stored in the cell, when it has no data type, is always considered as an array of bytes. Time stamping: each cell holds multiple versions of the same piece of data, which are indexed with a timestamp.
With the rapid development of internet technology, the data size in the HBase database is getting larger and larger, and how to protect the sensitive data in the HBase database to avoid the sensitive data from being leaked is a hot spot of current research. However, in a big data scene, there are many data tables in the HBase database, and there is no effective implementation way for finding the sensitive data from the large number of data tables and then performing data protection on the sensitive data.
Disclosure of Invention
The application provides a data protection method, which is applied to a database server, wherein the database server comprises a plurality of data sources, and each data source is used for storing data, and the method comprises the following steps:
aiming at a target data source in a plurality of data sources, acquiring a first data source and a second data source from the plurality of data sources; wherein first data in a first data source flows from the first data source to the target data source, and second data in the target data source flows from the target data source to the second data source;
determining an inflow influence value corresponding to the target data source based on the sensitive value of the first data;
determining an outflow influence value corresponding to the target data source based on the sensitive value of the second data;
determining a target influence value corresponding to the target data source based on the inflow influence value and the outflow influence value, and determining a protection level corresponding to the target data source based on the target influence value;
and performing data protection on the data in the target data source based on the protection level.
The application provides a data protection device, is applied to database server, database server includes a plurality of data sources, and each data source all is used for the storage data, the device includes:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a first data source and a second data source from a plurality of data sources aiming at a target data source in the plurality of data sources; wherein first data in the first data source flows from the first data source to the target data source, and second data in the target data source flows from the target data source to the second data source;
the determining module is used for determining an inflow influence value corresponding to the target data source based on the sensitive value of the first data; determining an outflow influence value corresponding to the target data source based on the sensitive value of the second data; determining a target influence value corresponding to the target data source based on the inflow influence value and the outflow influence value, and determining a protection level corresponding to the target data source based on the target influence value;
and the processing module is used for carrying out data protection on the data in the target data source based on the protection level.
The present application provides a database server comprising: a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor; the processor is configured to execute machine executable instructions to implement the data protection methods disclosed in the above examples of the present application.
As can be seen from the above technical solutions, in the embodiment of the present application, for a target data source (i.e., a data source having sensitive data) in a plurality of data sources (e.g., data tables), an inflow influence value and an outflow influence value corresponding to the target data source may be determined, and a target influence value corresponding to the target data source is determined based on the inflow influence value and the outflow influence value, where the target influence value may reflect a sensitivity level of the sensitive data in the target data source, so that a protection level corresponding to the target data source is determined based on the target influence value, and then data protection is performed on the data in the target data source based on the protection level, so that the sensitive data can be found from a large number of data sources, and then data protection is performed on the sensitive data, which can prevent the sensitive data from being leaked, and greatly improves the security of the data. The method can quantitatively evaluate the sensitive data, measure the sensitivity corresponding to the sensitive data in each data source, and timely discover and protect possible data sources by taking appropriate protective measures.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments of the present application or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings of the embodiments of the present application.
FIG. 1 is a schematic flow chart diagram illustrating a data protection method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart diagram illustrating a data protection method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of sensitive operations in one embodiment of the present application;
FIGS. 4A-4D are schematic diagrams of an analysis of a data source in one embodiment of the present application;
FIG. 5 is a schematic diagram of data flow between data sources in one embodiment of the present application;
FIG. 6 is a schematic diagram of a data protection device according to an embodiment of the present application;
fig. 7 is a hardware configuration diagram of a database server according to an embodiment of the present application.
Detailed Description
The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein is meant to encompass any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in the embodiments of the present application to describe various information, the information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. Depending on the context, moreover, the word "if" as used may be interpreted as "at … …" or "when … …" or "in response to a determination".
The embodiment of the present application provides a data protection method, which may be applied to a database server, where the database server includes a plurality of data sources (each data source may be a data table), and each data source is used to store data, and as shown in fig. 1, the method is a flowchart of the method, and the method may include:
step 101, aiming at a target data source in a plurality of data sources, acquiring a first data source and a second data source from the plurality of data sources; illustratively, first data in a first data source flows from the first data source to a target data source, and second data in the target data source flows from the target data source to a second data source.
Illustratively, obtaining the first data source and the second data source from a plurality of data sources may include, but is not limited to: receiving a first operation command, wherein the first operation command comprises data source information; determining an operation type corresponding to the first operation command, if the operation type is determined to belong to the sensitive operation based on the configured sensitive operation, determining a data source corresponding to the data source information as a first data source, and sending first data in the first data source to a target data source based on the first operation command. Receiving a second operation command, wherein the second operation command comprises data source information; and determining an operation type corresponding to the second operation command, if the operation type is determined to belong to the sensitive operation based on the configured sensitive operation, determining the data source corresponding to the data source information as a second data source, and sending second data in the target data source to the second data source based on the second operation command.
And 102, determining an inflow influence value corresponding to the target data source based on the sensitive value of the first data.
For example, if the first data includes K rows of data in the first data source, where K is a positive integer, determining a row accumulated sensitive value corresponding to each row of data, determining a table sensitive value corresponding to the first data source, and determining an attenuation coefficient value corresponding to an operation type corresponding to the first data; and determining inflow influence values corresponding to the target data source based on the column accumulated sensitive values corresponding to the K columns of data, the table sensitive values and the attenuation coefficient values.
For example, regarding the column accumulated sensitivity value corresponding to each column of data, the determination process of the column accumulated sensitivity value may include, but is not limited to: for each line of data, determining an amplification factor corresponding to the line of data based on the matching relationship between the line name corresponding to the line of data and the configured sensitive word; aiming at each cell in the column data, determining an initial sensitive value corresponding to the cell based on the matching relation between the words in the cell and the configured sensitive words, and determining a target sensitive value corresponding to the cell based on the initial sensitive value and the amplification factor; a column cumulative sensitivity value is determined based on the target sensitivity value corresponding to each cell in the column of data.
For example, with respect to the table-sensitive value corresponding to the first data source, the determination process of the table-sensitive value may include, but is not limited to: for each column of the data source (namely, the first data source), determining a magnification coefficient corresponding to the column based on the matching relationship between the column name and the configured sensitive word; aiming at each cell in each row of the data source, determining an initial sensitive value corresponding to the cell based on the matching relation between the words in the cell and the configured sensitive words, and determining a target sensitive value corresponding to the cell based on the initial sensitive value and the amplification factor corresponding to the column where the cell is located; determining a row accumulated sensitive value corresponding to a row based on the target sensitive value corresponding to each cell in the row; and determining the table sensitivity value based on the row accumulation sensitivity values corresponding to all the rows of the data source.
And 103, determining an outflow influence value corresponding to the target data source based on the sensitive value of the second data.
For example, if the second data includes P columns of data in the target data source, where P is a positive integer, a column cumulative sensitivity value corresponding to each column of data is determined. Determining a first table sensitive value corresponding to the target data source, and determining a first attenuation coefficient value corresponding to the operation type when the second data is sent from the target data source to the second data source; and determining a first influence value based on the row accumulated sensitive value, the first table sensitive value and the first attenuation coefficient value respectively corresponding to the P rows of data. Determining a second table sensitive value corresponding to the second data source, and determining a second attenuation coefficient value corresponding to the operation type when the second data is sent from the second data source to the third data source; and determining a second influence value based on the row accumulated sensitive value, the second table sensitive value and the second attenuation coefficient value respectively corresponding to the P rows of data. And determining an outflow influence value corresponding to the target data source based on the first influence value and the second influence value.
For example, regarding the column accumulated sensitivity value corresponding to each column of data, the determination process of the column accumulated sensitivity value may include, but is not limited to: for each line of data, determining an amplification factor corresponding to the line of data based on the matching relationship between the line name corresponding to the line of data and the configured sensitive word; aiming at each cell in the column data, determining an initial sensitive value corresponding to the cell based on the matching relation between the words in the cell and the configured sensitive words, and determining a target sensitive value corresponding to the cell based on the initial sensitive value and the amplification factor; a column cumulative sensitivity value is determined based on the target sensitivity value corresponding to each cell in the column of data.
For example, the determination of the first table sensitive value or the second table sensitive value with respect to the first table sensitive value corresponding to the target data source or the second table sensitive value corresponding to the second data source may include, but is not limited to: for each column of a data source (namely a target data source or a second data source), determining a corresponding amplification factor of the column based on the matching relation between the column name and the configured sensitive word; aiming at each cell in each row of the data source, determining an initial sensitive value corresponding to the cell based on the matching relation between the words in the cell and the configured sensitive words, and determining a target sensitive value corresponding to the cell based on the initial sensitive value and the amplification factor corresponding to the column where the cell is located; determining a row accumulated sensitive value corresponding to a row based on the target sensitive value corresponding to each cell in the row; and determining the table sensitivity value based on the row accumulation sensitivity values corresponding to all the rows of the data source.
And 104, determining a target influence value corresponding to the target data source based on the inflow influence value and the outflow influence value, and determining a protection level corresponding to the target data source based on the target influence value.
For example, before determining the protection level corresponding to the target data source based on the target influence value, the table-sensitive value corresponding to the target data source may also be determined. On this basis, the protection level corresponding to the target data source is determined based on the target influence value, which may include but is not limited to: and determining the protection level corresponding to the target data source based on the target influence value corresponding to the target data source and the table sensitive value corresponding to the target data source.
For example, with respect to the table sensitive value corresponding to the target data source, the determination process of the table sensitive value may include, but is not limited to: aiming at each column of the target data source, determining an amplification factor corresponding to the column based on the matching relation between the column name and the configured sensitive word; aiming at each cell in each row of a target data source, determining an initial sensitive value corresponding to the cell based on the matching relation between the words in the cell and the configured sensitive words, and determining a target sensitive value corresponding to the cell based on the initial sensitive value and the amplification factor corresponding to the column where the cell is located; determining a row accumulated sensitive value corresponding to a row based on the target sensitive value corresponding to each cell in the row; and determining a table sensitivity value based on row accumulation sensitivity values corresponding to all rows of the target data source.
And 105, performing data protection on the data in the target data source based on the protection level.
For example, data protection of data in the target data source based on the protection level may include, but is not limited to: if a data operation request aiming at a target data source is received, wherein the data operation request comprises operation user information, determining an access level corresponding to the operation user information; and if the operation user is determined to have the access right of the target data source based on the access level and the protection level, operating the data in the target data source based on the data operation request. And if the operation user is determined not to have the access right of the target data source based on the access level and the protection level, the data in the target data source is forbidden to be operated.
As can be seen from the above technical solutions, in the embodiment of the present application, for a target data source (i.e., a data source having sensitive data) in a plurality of data sources (e.g., data tables), an inflow influence value and an outflow influence value corresponding to the target data source may be determined, and a target influence value corresponding to the target data source is determined based on the inflow influence value and the outflow influence value, where the target influence value may reflect a sensitivity level of the sensitive data in the target data source, so that a protection level corresponding to the target data source is determined based on the target influence value, and then data protection is performed on the data in the target data source based on the protection level, so that the sensitive data can be found from a large number of data sources, and then data protection is performed on the sensitive data, which can prevent the sensitive data from being leaked, and greatly improves the security of the data. The method can quantitatively evaluate the sensitive data, measure the sensitivity corresponding to the sensitive data in each data source, and timely discover and protect possible data sources by taking appropriate protective measures.
The following describes a data protection method according to an embodiment of the present application with reference to a specific application scenario.
Before describing the technical solutions of the embodiments of the present application, technical terms related to the present application are described.
HBase database: the HBase database is a column-oriented (column family-oriented) distributed database developed based on HDFS, and the HBase database refers to the following concepts: data table: the data in the HBase database is organized using a data table consisting of rows and columns, the columns are divided into several column families, and the coordinate intersection of the rows and columns determines a cell. Line: the data table is composed of a plurality of rows, each row has a row key as a unique identifier, three ways are provided for accessing the rows in the data table, the data table is inquired through a single row key, accessed through an interval of the row key and scanned in the whole table. Column group: the data tables are grouped into a collection of "column families," which are the basic access control elements. Column modifier (column qualifier): data in a column family is located by a column qualifier (or column). Cell: a cell is defined by a row, column family and column qualifier.
Sensitive data: among all the data information of the database, there are data that are prevented from being viewed by an external system, and these data become sensitive data, which may be a field for a data structure layer.
Relationship between blood sources: information transfer across tables exists between data tables, namely, blood relationship among data tables is formed.
Decay value: and defining the information residual amount percentage of a source end and a target end for data operation between data tables, wherein the information residual amount percentage is a decline value, and the value range of the decline value can be 0-1.
Influence: the influence is bidirectional, including influenced force (influed) and propagation influence (influece), the influenced force (influed) needs to clearly 'be influenced by who', the value of the influenced force (influed) is calculated by means of a connected graph and a decline value, namely, an inflow influence value, the propagation influence (influece) needs to clearly 'influence who', and the value of the propagation influence (influece) is calculated by means of the connected graph and the decline value, namely, an outflow influence value, and the inflow influence value and the outflow influence value are influence values.
Fingerprint: in performing a journal audit, the fingerprint is used to uniquely identify the visitor's identity.
The embodiment of the application provides a data protection method, which can be applied to a database server, wherein the database server can comprise a plurality of data sources, and each data source is used for storing data. The plurality of data sources may be a plurality of data sources of the HBase database, or a plurality of data sources of other database types, and the type of the database is not limited as long as a large amount of data can be stored by the plurality of data sources.
For each of the plurality of data sources, the data source may store data using a data table, e.g., one data source corresponds to one data table. For each table, the table may be composed of rows and columns, the columns being divided into several column families, the coordinate crossings of rows and columns defining a cell.
In the embodiment of the application, a user can construct a sensitive discovered initial input value (such as a sensitive word, a sensitive operation, a sensitive relation and the like), a range to be evaluated is determined based on the initial input value, qualitative and quantitative evaluation is performed on data in the range to be evaluated, finally, a sensitive value, an influence value and the like of each data source in the whole scanning range are calculated, and the sensitive value, the influence value and the like are used as a basis for risk evaluation to set a corresponding protection means. Meanwhile, the fingerprint comparison capability is provided for post-processing, and the abnormal behavior can be quickly identified and alarmed when being found.
Referring to fig. 2, an embodiment of the present application provides a data protection method, where the method may include:
step 201, configuring sensitive information, which may be used as an initial input value, and which may include, but is not limited to, at least one of the following: sensitive words, sensitive operations, sensitive relationships.
For example, the sensitive word (which may be a plurality of sensitive words) may be configured in advance, such as by a user. For each configured sensitive word, stem extraction and morphological restoration may be performed on the sensitive word to obtain a stem corresponding to the sensitive word, for example, the stem extraction and morphological restoration may be performed on the sensitive word through an NLP (Natural language processing) library to obtain a stem corresponding to the sensitive word, which is not limited in this process. After the stem corresponding to each sensitive word is obtained, the stem corresponding to each sensitive word may be stored. In the subsequent process, for the processing process of each data table, a word vector between the stem of the data in the data table and the stem corresponding to the sensitive word may be calculated, and if the word vector is greater than a preset similarity threshold, it indicates that the data in the data table matches the sensitive word.
For each configured sensitive word, the sensitive word may correspond to one sensitive value, the sensitive values of different sensitive words may be the same, the sensitive values of different sensitive words may also be different, and the sensitive value is not limited thereto, and may be configured by the user according to experience. For example, the sensitivity values of all the sensitive words are the same, such as default sensitivity values, and the default sensitivity value may be 1; for another example, the sensitivity values of all the sensitive words are not exactly the same, such as the sensitivity value corresponding to the sensitive word 1-the sensitive word 10 is 1, the sensitivity value corresponding to the sensitive word 11-the sensitive word 20 is 1.2, the sensitivity value corresponding to the sensitive word 21-the sensitive word 30 is 0.8, and so on, which is not limited.
When the data in the data table is matched with the sensitive word, the sensitive value corresponding to the sensitive word is used as the sensitive value corresponding to the data, for example, when the sensitive value corresponding to the sensitive word is 0.8, the sensitive value corresponding to the data is 0.8.
For example, a sensitive operation (which may be a plurality of sensitive operations, the sensitive operation being a user behavior) may be pre-configured, such as by a user. There may be a variety of data operations directed to the data source, some data operations may be selected from a large number of data operations as sensitive operations, and sensitive operations may be configured.
Based on the configured sensitive operation, if the data operation for the data source is the sensitive operation, it indicates that the data needs to be analyzed, so as to implement protection of the sensitive data, and if the data operation for the data source is not the sensitive operation, it indicates that the data does not need to be analyzed, and the specific process refers to the following embodiments.
Referring to fig. 3, the example of the sensitive operation is shown, and the type of the sensitive operation is not limited, and may be arbitrarily configured according to experience. For example, the sensitive operation may be a sensitive operation for a data table, such as querying a data table, modifying a data table, deleting a data table. The sensitive operation may be a sensitive operation for a column (which may be one or more columns) in the data table, such as querying a column in the data table, modifying a column in the data table, deleting a column in the data table. The sensitive operation may be a sensitive operation for a row (which may be one or more rows) in the data table, such as querying a row in the data table, modifying a row in the data table, deleting a row in the data table.
For example, the sensitive relationship (which may be a plurality of sensitive relationships, which are data transformation relationships) may be configured in advance, such as by a user. The data conversion relations for the data sources can be various, a part of the data conversion relations can be selected from a large number of data conversion relations to serve as sensitive relations, and the sensitive relations can be configured. Based on the configured sensitive relationship, if the data conversion relationship for the data source is the sensitive relationship, the data needs to be analyzed, so as to protect the sensitive data, and if the data conversion relationship for the data source is not the sensitive relationship, the data does not need to be analyzed, and the specific process refers to the following embodiment.
The sensitivity relationship is used as a data conversion relationship between two data sources, and is a key factor for directly relating whether sensitive data leaks, so in this embodiment, a corresponding attenuation coefficient value (also referred to as a decay value) may be set for each sensitivity relationship, the attenuation coefficient values corresponding to different sensitivity relationships may be the same, the attenuation coefficient values corresponding to different sensitivity relationships may also be different, and no limitation is imposed on the attenuation coefficient values.
Referring to table 1, as a few examples of the sensitive relationship, the type of the sensitive relationship is not limited, and may be arbitrarily configured according to experience. For example, sensitive relationships may include, but are not limited to, data passthrough, data desensitization, data encryption, and the like. The data transparent transmission is used for transmitting data between two data sources in a data transparent transmission mode. "data desensitization" is used to indicate that data is transferred in a "data desensitization" manner when data is transferred between two data sources. "data encryption" is used to indicate that data transmission is performed in a "data encryption" manner when data is transmitted between two data sources.
TABLE 1
Sensitive relationships Attenuation coefficient value (decay value)
Data transparent transmission 0.85
Data desensitization 0.5
Data encryption 0.1
Referring to table 1, for each sensitivity relationship, an attenuation coefficient value may be set for the sensitivity relationship, and the attenuation coefficient value is not limited. For example, since the data passing through the data pass-through is less secure, the attenuation coefficient value corresponding to the data pass-through is larger, such as 0.85. Since the data subject to "data desensitization" is moderately safe, the attenuation coefficient value corresponding to "data desensitization" is moderate, such as 0.5. Since the data subjected to the "data encryption" has higher security, the attenuation coefficient value corresponding to the "data encryption" is smaller, such as 0.1. Of course, the above is merely an example of the attenuation coefficient value, and no limitation is made thereto.
Step 202, performing a sensitive scan. In the sensitive scanning process, a target data source (the target data source serves as a sensitive data source) needs to be determined, and a connectivity graph in a scanning range of the target data source is constructed.
For example, all data sources managed by the database server are determined, and the data sources may be located in one database or multiple databases, which is not limited to this. Referring to fig. 4A, it is assumed that all data sources managed by the database server include a data source a, a data source B, a data source C, a data source D, a data source E, a data source F, a data source G, a data source H, and a data source I. In an initial state, the data sources are all data sources to be scanned, and whether the data sources to be scanned are ordinary data sources or sensitive data sources needs to be analyzed.
For each data source, the data source may correspond to sensitive values of the following types:
1. the initial sensitivity value corresponding to the cell can be recorded as ColCellSensID(i) col(name)I.e. the primary key ID is i, and the column name is name, the initial sensitivity value corresponding to the cell, i.e. the cell sensitivity.
For example, for each cell in the data source, if the data in the cell matches a sensitive word (that is, the similarity between the word vector of the data in the cell and the word vector of the sensitive word is greater than a preset similarity threshold), the sensitive value corresponding to the sensitive word may be used as the initial sensitive value corresponding to the cell, such as 1, 0.8, and the like, which is not limited thereto. If the data in the cell is not matched with all the sensitive words, the initial sensitive value corresponding to the cell may be 0, which indicates that the data in the cell is not a sensitive word.
2. The target sensitive value corresponding to the cell can be recorded as ColSensID(i) col(name)I.e. the target sensitive value corresponding to the cell with the primary key ID i and the column name. For example, the target sensitivity value may be determined by: ColSensID(i) col(name)=zoom*ColCellSensID(i) col(name)Zoom represents the magnification factor corresponding to the column in which the cell is located, that is, the target sensitivity value corresponding to the cell can be determined based on the magnification factor corresponding to the column in which the cell is located and the initial sensitivity value corresponding to the cell.
For example, for each column in the data source, if the column name of the column matches a sensitive word (that is, the similarity between the word vector of the column name and the word vector of the sensitive word is greater than a preset similarity threshold), the amplification factor corresponding to the column may be a first value, and if the column name of the column does not match all the sensitive words, the amplification factor corresponding to the column may be a second value. The first value and the second value may be configured empirically, and the first value may be greater than the second value, for example, the second value may be 1, and the first value may be any value greater than the second value, such as 1.1, 1.2, 1.3, and the like, without limitation.
For each cell in the data source, the amplification factor corresponding to the column of the cell may be determined, and the initial sensitivity value corresponding to the cell may be determined, on the basis of the amplification factor corresponding to the column of the cell and the initial sensitivity value corresponding to the cell, the target sensitivity value corresponding to the cell may be determined.
Obviously, in the above manner, the initial sensitive value corresponding to the cell is amplified by the amplification factor corresponding to the column in which the cell is located, so as to highlight that the column name and the cell are simultaneously the superposition state condition of the sensitive word.
3. The column cumulative sensitivity value can be recorded as TotalColSenscol(name)For each column in the data source, a column cumulative sensitivity value may be determined based on the target sensitivity value corresponding to each cell in the column. For example, the sum of the target sensitivity values corresponding to each cell in the column is used as a column accumulated sensitivity value, and the expression mode of the column accumulated sensitivity value is as follows: TotalColSenscol(name)=∑ColSensID(i) col(name)
4. The line accumulation sensitivity value can be recorded as RowSensID(i)For example, for each row in the data source, a row cumulative sensitivity value may be determined based on the target sensitivity value corresponding to each cell in the row. For example, the sum of the target sensitivity values corresponding to each cell in the row is used as a row accumulated sensitivity value, and the expression mode of the row accumulated sensitivity value is as follows: RowSensID (i)=∑all_colsColSensID(i) col(name)
5. The table sensitivity value can be recorded as TableSensID(tablename)For example, the table sensitivity value may be determined based on row accumulated sensitivity values corresponding to all rows of the data source (i.e. the data table), for example, a sum of the row accumulated sensitivity values corresponding to all rows of the data source is used as the table sensitivity value, and the expression manner of the table sensitivity value may be: TableSensID(tablename)=∑all_rowsRowSensID(i)
6. The library sensitivity value can be denoted as DBSensID(DBname)For example, when the database includes a plurality of data tables, the sum of the table sensitivity values of all the data tables may be used as the library sensitivity value, and the expression manner of the library sensitivity value may be: DBSensID(DBname)=∑all_ tablesTableSensID(tablename)
In summary, for each data source, an initial sensitive value and a target sensitive value corresponding to each cell in the data source may be determined, a column accumulated sensitive value corresponding to each column in the data source may be determined, a row accumulated sensitive value corresponding to each row in the data source may be determined, and a table sensitive value corresponding to the data source may be determined.
For example, since data in the data source may change frequently, the above process may be performed periodically, and in each period, the initial sensitive value and the target sensitive value corresponding to each cell in the data source, the column accumulated sensitive value corresponding to each column, the row accumulated sensitive value corresponding to each row, and the table sensitive value corresponding to the data source are determined.
For example, for each data source, if the table sensitivity value corresponding to the data source is 0, the data source is regarded as a normal data source (i.e., is not a sensitive data source), and if the table sensitivity value corresponding to the data source is not 0, the data source is regarded as a target data source (i.e., the target data source is regarded as a sensitive data source). Referring to fig. 4B, assuming that the table sensitive value corresponding to the data source a is 0, the table sensitive value corresponding to the data source B is 0, and the table sensitive value corresponding to the data source G is 0, the data source a, the data source B, and the data source G serve as common data sources. Assuming that the table sensitive value corresponding to the data source C is not 0, the table sensitive value corresponding to the data source D is not 0, the table sensitive value corresponding to the data source E is not 0, the table sensitive value corresponding to the data source F is not 0, the table sensitive value corresponding to the data source H is not 0, and the table sensitive value corresponding to the data source I is not 0, the data source C, the data source D, the data source E, the data source F, the data source H, and the data source I are used as target data sources (i.e., sensitive data sources).
For example, for two data sources, there may be data operation behavior between the two data sources, i.e., data interaction operations involving the two data sources. If the data operation between the two data sources matches the configured sensitive operation (see fig. 3), it indicates that the data source is a sensitive operation, and if all the data operations between the two data sources do not match the configured sensitive operation, it indicates that the data source is a normal operation. Referring to FIG. 4C, a diagram illustrating the relationship between sensitive and normal operations between two data sources is shown. As can be seen from fig. 4C, a normal operation is performed between the data source a and the data source B, a normal operation is performed between the data source a and the data source C, a sensitive operation is performed between the data source C and the data source D, a normal operation is performed between the data source C and the data source H, a normal operation is performed between the data source C and the data source I, a sensitive operation is performed between the data source D and the data source E, a sensitive operation is performed between the data source D and the data source H, and a sensitive operation is performed between the data source H and the data source I.
For each target data source, a connectivity graph with the target data source as a starting point may also be constructed, as shown in fig. 4D, for the data source C, a sensitive operation exists between the data source C and the data source D, a sensitive operation exists between the data source D and the data source E, a sensitive operation exists between the data source D and the data source H, and a sensitive operation exists between the data source H and the data source I, so that the connectivity graph with the data source C as a starting point may include the data source D, the data source E, the data source H, and the data source I. Similarly, a connected graph with the data source D as a starting point, a connected graph with the data source E as a starting point, a connected graph with the data source F as a starting point, a connected graph with the data source H as a starting point, and a connected graph with the data source I as a starting point can be obtained.
Illustratively, when there is a sensitive operation between two data sources, it indicates that there is a mutual influence between the two data sources for the sensitive behavior, i.e. there is a need to calculate an influence value (e.g. an inflow influence value and an outflow influence value) between the two data sources, and when there is a normal operation between the two data sources, it indicates that there is no mutual influence between the two data sources for the sensitive behavior, i.e. there is no need to calculate an influence value between the two data sources.
Referring to FIG. 4C, the arrow from data source C to data source D indicates that there is a sensitive operation between data source C and data source D (but data source D is not), i.e., there is a flow of data from data source C to data source D. On the basis, when the data source C is used as a target data source, the data source D is a second data source corresponding to the target data source, and second data in the target data source flows out from the target data source to the second data source. When the data source D is used as a target data source, the data source C is a first data source corresponding to the target data source, and first data in the first data source flows from the first data source to the target data source.
Referring to FIG. 4C, an arrow from data source D to data source E indicates that data is flowing from data source D to data source E. When the data source D is used as a target data source, the data source E is a second data source, when the data source E is used as a target data source, the data source D is a first data source, and so on.
In summary, for each target data source, a first data source and a second data source corresponding to the target data source may be obtained from all the data sources, where first data in the first data source flows from the first data source to the target data source, and second data in the target data source flows from the target data source to the second data source. For example, when the target data source is data source C, the second data source corresponding to data source C is data source D, and data source C does not correspond to the first data source. When the target data source is the data source D, the second data source corresponding to the data source D is the data source E and the data source H, and the first data source corresponding to the data source D is the data source C. When the target data source is the data source E, the first data source corresponding to the data source E is the data source D, and the data source E does not correspond to the second data source, and so on, the first data source and the second data source corresponding to the target data source may be determined.
In one possible implementation, for each target data source, in order to learn the relationship between the target data source, the first data source, and the second data source, the following method may be adopted: receiving a first operation command, wherein the first operation command comprises data source information; determining an operation type corresponding to the first operation command, if the operation type is determined to belong to the sensitive operation based on the configured sensitive operation, determining a data source corresponding to the data source information as a first data source, and sending first data in the first data source to a target data source based on the first operation command. And receiving a second operation command, wherein the second operation command comprises data source information; and determining an operation type corresponding to the second operation command, if the operation type is determined to belong to the sensitive operation based on the configured sensitive operation, determining the data source corresponding to the data source information as a second data source, and sending second data in the target data source to the second data source based on the second operation command.
For example, referring to fig. 4C, assuming that the target data source is the data source D, when a first operation command for the data source C is received (the first operation command is used to indicate that data in the data source C is sent to the data source D), and the first operation command includes data source information (e.g., information of the data source C), an operation type corresponding to the first operation command (i.e., an operation type of a data operation for the data source C) is determined.
If it is determined that the operation type is a sensitive operation based on a configured sensitive operation (see fig. 3), that is, a data operation for the data source C is a sensitive operation, determining the data source C corresponding to the data source information as a first data source corresponding to the data source D, and sending data (i.e., the first data, which may be at least one column of data of the data source C) in the data source C to the data source D based on the first operation command, which is not limited in this regard.
Based on the operation information, when the data source connection relationship shown in fig. 4C is generated, the connection relationship between the data source C and the data source D can be obtained, and the connection relationship can be a sensitive operation.
If it is determined that the operation type does not belong to the sensitive operation based on the configured sensitive operation, that is, the data operation for the data source C is not the sensitive operation, the data source C is not the first data source corresponding to the data source D, and the data in the data source C is sent to the data source D based on the first operation command. However, in generating the data source connection relationship shown in fig. 4C, the connection relationship between the data source C and the data source D is a normal operation.
For another example, referring to fig. 4C, assuming that the target data source is the data source D, when a second operation command for the data source D is received (the second operation command is used to indicate that data in the data source D is sent to the data source E), and the second operation command includes data source information (such as information of the data source E), an operation type corresponding to the second operation command (i.e., an operation type of a data operation for the data source D) is determined.
If the operation type is determined to be a sensitive operation, that is, the data operation for the data source D is a sensitive operation, determining the data source E corresponding to the data source information as a second data source corresponding to the data source D, and sending data (that is, the second data, which may be at least one column of data of the data source D) in the data source D to the data source E based on a second operation command. Based on the operation information, a connection relationship between the data source D and the data source E can be obtained, and the connection relationship can be a sensitive operation. Or, if it is determined that the operation type does not belong to a sensitive operation, that is, the data operation for the data source D is not a sensitive operation, the data source E is not a second data source corresponding to the data source D, and the data in the data source D is sent to the data source E based on the second operation command, but the connection relationship between the data source D and the data source E is a normal operation.
Step 203, determining a target influence value corresponding to each target data source (i.e. sensitive data source). For example, if the target data source corresponds to the first data source but does not correspond to the second data source, the inflow impact value corresponding to the target data source may be determined, and the target impact value corresponding to the target data source may be determined based on the inflow impact value, for example, the target impact value is the inflow impact value. Alternatively, if the target data source corresponds to the second data source but does not correspond to the first data source, the outflow impact value corresponding to the target data source may be determined, and the target impact value corresponding to the target data source may be determined based on the outflow impact value, for example, the target impact value is the outflow impact value. Alternatively, if the target data source corresponds to the first data source and the second data source, the inflow impact value corresponding to the target data source may be determined, the outflow impact value corresponding to the target data source may be determined, and the target impact value corresponding to the target data source may be determined based on the inflow impact value and the outflow impact value, for example, the target impact value is the sum of the inflow impact value and the outflow impact value.
Referring to fig. 4C, for each target data source, taking the target data source as the data source D as an example, the data source C is a first data source, the data source E and the data source H are second data sources, the first data (hereinafter referred to as data m1) in the data source C is sent from the data source C to the data source D, the second data (hereinafter referred to as data m2) in the data source D is sent from the data source D to the data source E, and the second data (hereinafter referred to as data m3) in the data source D is sent from the data source D to the data source H, on the basis that:
1. if the data m1 includes K columns of data, where K is a positive integer, determining a column accumulated sensitivity value corresponding to each column of data, determining a table sensitivity value corresponding to the data source C, and determining an attenuation coefficient value corresponding to the operation type corresponding to the data m 1. And determining an inflow influence value corresponding to the data source D based on the column accumulated sensitive value corresponding to the K columns of data, the table sensitive value and the attenuation coefficient value, and recording the inflow influence value as an inflow influence value n 1.
For each column of data m1, a column cumulative sensitivity value corresponding to the column of data may be determined, and for data source C, a table sensitivity value corresponding to data source C may be determined. For the determination of the column accumulated sensitivity value and the table sensitivity value, see step 202, and will not be repeated herein.
With respect to the attenuation coefficient value corresponding to the operation type corresponding to the data m1, when the data m1 is sent from the data source C to the data source D, the operation type corresponding to the data m1, for example, data transparent transmission, data desensitization, data encryption, or the like, can be determined, and then, the attenuation coefficient value (decay value) can be obtained by referring to table 1 based on the operation type, for example, the attenuation coefficient value corresponding to the data transparent transmission is 0.85, the attenuation coefficient value corresponding to the data desensitization is 0.5, and the attenuation coefficient value corresponding to the data encryption is 0.1.
The inflow influence value n1 can be determined based on the column accumulated sensitivity value, the table sensitivity value and the attenuation coefficient value corresponding to each of the K columns of data, and the above relationship can be expressed by the following formula:
Figure BDA0003386680870000161
in the above formula, the data source D is influenced by the data source C, that is, the inflow influence value n1 when the data source D is influenced by the data source C, 0.85 represents the attenuation coefficient value, sensCell _ numTRepresents the sum of column cumulative sensitivity values corresponding to each column of data in the data m1, totalCell _ numTIndicating the table sensitivity values.
In one possible embodiment, it is assumed that the table structure of the data table in the data source C is shown in table 2:
TABLE 2
Figure BDA0003386680870000171
In table 2, the cell sensitivity value and the column accumulated sensitivity value can be obtained based on the initial import information and the data statistics information, the sensitivity value (table sensitivity value) of the data table can also be calculated, and the single-row sensitivity value can be calculated based on the data information and the sensitivity value, for example, the demo data can be calculated by the row sensitivity value.
In order to calculate the "influence" of the information of the sensitive source (i.e. the target data source) by the external data source (i.e. the first data source), an analysis is performed by means of sensitive operations between the data sources. The influence of the external data source on the sensitive source finally falls into a certain column of data information, and is included in the sensitive value calculation at a certain moment. The external influence is evaluated to plot the outflow path, outflow amount, and outflow range of the sensitive information, and therefore, when the external influence is measured, the external influence also means the influence within a certain time range.
Referring to FIG. 5, for an example of data m1 sent from data source C to data source D, data m1 may include columns "departure place", "departure time", "arrival time", etc. when the influence from data source C needs to be measured on data source D, i.e., the inflow influence value n1, the inflow of sensitive data in the T time period may be analyzed. For example, inflow data in the T time range is obtained, the inflow data is an instant table, see table 3, and based on the column accumulated sensitive value, the table sensitive value, and the attenuation coefficient value respectively corresponding to each column in the inflow data (i.e., data m1), the inflow influence value n1 may be determined, which is not described herein again.
TABLE 3
Figure BDA0003386680870000172
Figure BDA0003386680870000181
2. If the data m2 includes P1 columns of data, and P1 is a positive integer, determining a column accumulated sensitive value corresponding to each column of data, determining a table sensitive value corresponding to the data source D, and determining an attenuation coefficient value corresponding to the operation type corresponding to the data m 2. And determining an outflow influence value corresponding to the data source D based on the column accumulated sensitivity value, the table sensitivity value and the attenuation coefficient value corresponding to the P1 column data, and recording the outflow influence value as an outflow influence value n 2. The outflow influence value n2 is an outflow influence value corresponding to the data m2 that flows from the data source D to the data source E.
For each column of data of the data m2, a column accumulated sensitive value corresponding to the column of data may be determined, and for the data source D, a table sensitive value corresponding to the data source D may be determined, and the determining manner is referred to in step 202, which is not repeated herein. Regarding the attenuation coefficient value corresponding to the operation type corresponding to the data m2, the attenuation coefficient value (decay value) can be obtained by looking up the table 1 according to the operation type, and will not be described again.
Illustratively, the outflow impact value n2 may be expressed by the following formula,
Figure BDA0003386680870000182
an outflow impact value n2 for data source D on data source E, 0.85 an attenuation coefficient value, sensCell _ numTRepresents the sum of column cumulative sensitivity values corresponding to each column of data in the data m2, totalCell _ numTIndicating the table sensitivity values.
Figure BDA0003386680870000183
3. If the data m3 includes P2 columns of data, and P2 is a positive integer, determining a column accumulated sensitivity value corresponding to each column of data, determining a first table sensitivity value corresponding to the data source D, and determining a first attenuation coefficient value corresponding to the corresponding operation type when the data m3 is transmitted from the data source D to the data source H. And determining a first influence value corresponding to the data source D based on the column accumulated sensitivity value, the first table sensitivity value and the first attenuation coefficient value respectively corresponding to the P2 column data, and marking as a first influence value n 3. The first influence value n3 is an outflow influence value corresponding to the data m3 that flows out from the data source D to the data source H. For each column of data m3, determining a column accumulated sensitive value corresponding to the column of data, for the data source D, determining a first table sensitive value corresponding to the data source D, and for the operation type corresponding to the data m3, obtaining an attenuation coefficient value by querying the table 1 according to the operation type. Illustratively, the first influence value n3 is represented by the following formula:
Figure BDA0003386680870000184
Figure BDA0003386680870000191
a first influence value n3 representing data source D on data source H, 0.85 a first attenuation coefficient value, sensCell _ numTRepresents the sum of the column accumulated sensitivity values corresponding to each column of data in the data m3, totalCell _ numTRepresenting a first table sensitivity value.
Referring to fig. 4C, it is assumed that the data source H sends data m3 from the data source D (i.e., a target data source) to the data source I, that is, data m3 is sent from the data source H to the data source I, then the data source I serves as a third data source corresponding to the data source D, and it is assumed that the data source I sends data m3 to other data sources, then the other data sources also serve as third data sources corresponding to the data source D, and the processing manner between the data source I and the other data sources refers to the processing manner between the data source H and the data source I, which is not described in detail in this embodiment.
Illustratively, when the data m3 is sent from the data source H to the data source I, if the data m3 includes P2 columns of data, and P2 is a positive integer, a column accumulated sensitivity value corresponding to each column of data is determined, a second table sensitivity value corresponding to the data source H is determined, and a second attenuation coefficient value corresponding to the corresponding operation type when the data m3 is sent from the data source H to the data source I is determined. And determining a second influence value corresponding to the data source H based on the column accumulated sensitive value, the second table sensitive value and the second attenuation coefficient value respectively corresponding to the P2 column data, and marking as a second influence value n 4. The second influence value n4 is an outflow influence value corresponding to the data m3 that flows out from the data source H to the data source I. For each column of data m3, determining a column accumulated sensitive value corresponding to the column of data, for the data source H, determining a D second table sensitive value corresponding to the data source H, and for the operation type corresponding to the data m3, obtaining an attenuation coefficient value by querying the table 1 according to the operation type.
Illustratively, the second influence value n4 is represented by the following formula:
Figure BDA0003386680870000192
Figure BDA0003386680870000193
a second response n4 of source H to source I, 0.85 representing a second attenuation systemValue, sensCell _ numTRepresents the sum of the column accumulated sensitivity values corresponding to each column of data in the data m3, totalCell _ numTRepresenting a second table sensitivity value.
For example, after obtaining the first influence value n3 and the second influence value n4, an outflow influence value corresponding to the data source D (i.e., the target data source) may be determined based on the first influence value n3 and the second influence value n4, which is denoted as an outflow influence value n5, and the outflow influence value n5 may be the sum of the first influence value n3 and the second influence value n 4.
In summary, it can be seen that, for the target data source of the data source D, the inflow impact value n1, the outflow impact value n2 and the outflow impact value n5 can be obtained, and based on the inflow impact value n1, the outflow impact value n2 and the outflow impact value n5, the target impact value corresponding to the data source D can be obtained. For example, the target influence value may be the sum of the inflow influence value n1, the outflow influence value n2, and the outflow influence value n5, which is not limited thereto.
Step 204, for each target data source (i.e. sensitive data source), determining a protection level corresponding to the target data source. For example, the protection level may be determined based on a target impact value corresponding to the target data source; alternatively, the protection level may be determined based on a table sensitivity value corresponding to the target data source; alternatively, the protection level may be determined based on a target impact value corresponding to the target data source and a table-sensitive value corresponding to the target data source.
The protection levels may be divided into at least two levels, for example, into 3 levels of protection levels, or into 4 levels of protection levels, or into 5 levels of protection levels, without limitation, and then, taking the protection levels divided into 3 levels as an example, the 3 levels of protection levels are respectively denoted as protection level 1, protection level 2, and protection level 3, where the protection level 1 corresponds to a numerical range [0, w1), the protection level 2 corresponds to a numerical range [ w1, w2), and the protection level 3 corresponds to a numerical range [ w2, + ∞ ").
On this basis, if the protection level corresponding to the target data source is determined based on the target influence value corresponding to the target data source, if the target influence value is located in the numerical range [0, w1), it may be determined that the protection level corresponding to the target data source is the protection level 1, if the target influence value is located in the numerical range [ w1, w2), it may be determined that the protection level corresponding to the target data source is the protection level 2, and if the target influence value is located in the numerical range [ w2, + ∞), it may be determined that the protection level corresponding to the target data source is the protection level 3.
For another example, if the protection level corresponding to the target data source is determined based on the table sensitivity value corresponding to the target data source, if the table sensitivity value is located in the numerical range [0, w1), the protection level corresponding to the target data source may be determined to be the protection level 1, if the table sensitivity value is located in the numerical range [ w1, w2), the protection level corresponding to the target data source may be determined to be the protection level 2, and if the table sensitivity value is located in the numerical range [ w2, + ∞), the protection level corresponding to the target data source may be determined to be the protection level 3.
For another example, if the protection level corresponding to the target data source is determined based on the target influence value corresponding to the target data source and the table sensitive value corresponding to the target data source, the protection value corresponding to the target data source is determined based on the target influence value and the table sensitive value, for example, the protection value corresponding to the target data source is calculated by using the following formula: the guard value is the target influence value w3+ the table sensitivity value w4, w3 is the weight coefficient of the target influence value, w4 is the weight coefficient corresponding to the table sensitivity value, w3 and w4 are configured empirically, the sum of w3 and w4 is 1, if the weight of the side weight target influence value is in the guard grade, w3 is greater than w4, and if the weight of the side weight table sensitivity value is in the guard grade, w4 is greater than w 3. On the basis, if the guard value is located in the value range [0, w1), the protection level corresponding to the target data source is determined to be the protection level 1, if the guard value is located in the value range [ w1, w2), the protection level corresponding to the target data source is determined to be the protection level 2, and if the guard value is located in the value range [ w2, + ∞), the protection level corresponding to the target data source is determined to be the protection level 3.
Of course, the above are only a few examples of determining the protection level, and the determination manner is not limited.
And 205, performing data protection on the data in the target data source based on the protection level.
In a possible implementation manner, if a data operation request for a target data source is received, where the data operation request includes operation user information, determining an access level corresponding to the operation user information (for example, a corresponding relationship between the operation user information and the access level is configured in advance, and the access level may be access level 1, access level 2, access level 3, or the like); if it is determined that the operating user has the access right of the target data source based on the access level and the protection level (if the access level is greater than or equal to the protection level, if the protection level is protection level 2, the access level is access level 2 or access level 3), the data in the target data source is operated based on the data operation request, that is, the operating user is allowed to access the data in the target data source. If it is determined that the operating user does not have the access right of the target data source based on the access level and the protection level (if the access level is less than the protection level, if the protection level is protection level 2, the access level is access level 1), the data in the target data source is prohibited from being operated, that is, the operating user is not allowed to access the data in the target data source, so that the data in the target data source is prevented from being leaked.
In a possible implementation manner, for each protection level, such as protection level 1, protection level 2, protection level 3, and the like, each protection level may respectively make a corresponding protection measure, so as to limit the sensitive operation of the external user, and the protection measure is not limited and may be arbitrarily configured according to experience.
In one possible embodiment, since each target data source has an indicia of a level of protection, the user's data operations need to pass a level protection check to gain access to the legitimately accessed data.
In a possible implementation mode, a connected graph and a target influence value of a certain data source can be combined, when outflow of sensitive information needs to be cut off, a high-sensitivity point, namely the data source with the maximum target influence value, can be found fastest, data is cut off timely, and information leakage is effectively prevented from being damaged. Or when the outflow of the sensitive information needs to be cut off, the data source with the high protection level, namely the data source with the maximum protection level, can be found out as fast as possible, the data is cut off in time, and the information leakage is effectively prevented from being damaged.
In one possible implementation, sensitive tracing may also be performed. For example, based on the above-mentioned sensitive discovery and protection measures, it is also necessary to detect possible information leakage. When an external user accesses, accurate audit is carried out on each sensitive operation, fingerprint information of the user is collected, operation comparison is facilitated, a credible interval is established by means of long-term access habits of the user, abnormal operation is rapidly identified, and an alarm prompt is sent out.
Description of variables: range: the table name, table data range of the access; timing and map: a range time; IP: an access terminal IP; port: an access port; opera: and (4) sensitive operation. On this basis, the fingerprint information can be represented by: finger print is hash (range, timestamp, IP, port, opera).
After the user operates and carries out sensitive operation, the response information contains fingerprint information, and the fingerprint information is a unique identifier corresponding to the operation. When the administrator or the user needs to check the authenticity of the operation, fingerprint comparison can be performed to the other party. The distribution of the use operations of all users can be regarded as obeying a certain probability distribution rule, assuming that the user operations are randomized and obey normal distribution, the abnormal behavior discovery steps are as follows:
1. setting initial confidence level _ level (0-100%) and behavior index behaviorIndex (which can also be defined): IP, port, Timstamp, range, opera. The information is serially connected and converted into binary ASCII code, and then converted into decimal value, and the information is regarded as a user behavior index.
behaviorIndex=ToInt(ASCCI(IP,port,timestamp,range,opera))
2. Based on the confidence, the average of the historical data, and the variance of the historical data, a confidence interval confidence _ interval may be calculated. For example, the confidence corresponding to the confidence level can be found by the Z value table
Figure BDA0003386680870000221
The value is then calculated by means of the confidence interval:
Figure BDA0003386680870000222
Figure BDA0003386680870000223
3. when the current new visit comes, the recognition alarm is carried out by comparing whether the current new visit is in the confidence interval.
According to the technical scheme, the protection grade corresponding to the target data source can be determined based on the target influence value, and then data protection is performed on the data in the target data source based on the protection grade, so that sensitive data can be found from a large number of data sources, then data protection is performed on the sensitive data, the sensitive data can be prevented from being leaked, and the safety of the data is greatly improved. The method can quantitatively evaluate the sensitive data, measure the sensitivity corresponding to the sensitive data in each data source, and timely discover and protect possible data sources by taking appropriate protective measures. The fingerprint of the external data operation is recorded, accurate audit is carried out, the server side can quickly inquire the information leakage, the user can use the feedback information of the user to carry out the authenticity identification of the service operation, and the operation sensitive information is not leaked.
Based on the same application concept as the method, an embodiment of the present application provides a data protection apparatus, which is applied to a database server, where the database server includes a plurality of data sources, and each data source is used to store data, as shown in fig. 6, which is a schematic structural diagram of the apparatus, and the apparatus may include:
an obtaining module 61, configured to obtain, for a target data source of a plurality of data sources, a first data source and a second data source from the plurality of data sources; wherein first data in the first data source flows from the first data source to the target data source, and second data in the target data source flows from the target data source to the second data source;
a determining module 62, configured to determine an inflow influence value corresponding to the target data source based on the sensitive value of the first data; determining an outflow influence value corresponding to the target data source based on the sensitive value of the second data; determining a target influence value corresponding to the target data source based on the inflow influence value and the outflow influence value, and determining a protection level corresponding to the target data source based on the target influence value;
and the processing module 63 is configured to perform data protection on the data in the target data source based on the protection level.
For example, the obtaining module 61 is specifically configured to, when obtaining the first data source and the second data source from a plurality of data sources: receiving a first operation command, wherein the first operation command comprises data source information; determining an operation type corresponding to the first operation command, if the operation type is determined to belong to the sensitive operation based on the configured sensitive operation, determining a data source corresponding to the data source information as a first data source, and sending first data in the first data source to the target data source based on the first operation command; receiving a second operation command, wherein the second operation command comprises data source information; and determining an operation type corresponding to the second operation command, if the operation type is determined to belong to the sensitive operation based on the configured sensitive operation, determining the data source corresponding to the data source information as a second data source, and sending second data in the target data source to the second data source based on the second operation command.
For example, the determining module 62 is specifically configured to, when determining the inflow influence value corresponding to the target data source based on the sensitive value of the first data: if the first data comprises K lines of data in the first data source, and K is a positive integer, determining a line accumulated sensitive value corresponding to each line of data in the K lines of data, determining a table sensitive value corresponding to the first data source, and determining an attenuation coefficient value corresponding to an operation type corresponding to the first data; and determining inflow influence values corresponding to the target data source based on the column accumulated sensitive values, the table sensitive values and the attenuation coefficient values respectively corresponding to the K columns of data.
For example, the determining module 62 is specifically configured to, when determining the outflow impact value corresponding to the target data source based on the sensitive value of the second data: if the second data comprises P rows of data in the target data source, determining a row accumulated sensitive value corresponding to each row of data; determining a first table sensitive value corresponding to the target data source, and determining a first attenuation coefficient value corresponding to the operation type when second data is sent from the target data source to the second data source; determining a first influence value based on the row accumulated sensitive value, the first table sensitive value and the first attenuation coefficient value respectively corresponding to the P row data; determining a second table sensitive value corresponding to the second data source, and determining a second attenuation coefficient value corresponding to the operation type when the second data is sent from the second data source to a third data source; determining a second influence value based on the row accumulated sensitive value, the second table sensitive value and the second attenuation coefficient value respectively corresponding to the P rows of data; and determining an outflow influence value corresponding to the target data source based on the first influence value and the second influence value.
Illustratively, the determining module 62 is further configured to determine a table-sensitive value corresponding to the target data source; the determining module 62 is specifically configured to, when determining the protection level corresponding to the target data source based on the target influence value: and determining a protection level corresponding to the target data source based on the target influence value and the table sensitivity value.
For example, the determining module 62 is specifically configured to: determining an amplification factor corresponding to the column data based on the matching relationship between the column name corresponding to the column data and the configured sensitive word; aiming at each cell in the line of data, determining an initial sensitive value corresponding to the cell based on the matching relation between the words in the cell and the configured sensitive words, and determining a target sensitive value corresponding to the cell based on the initial sensitive value and the amplification factor; determining a column accumulated sensitivity value based on the target sensitivity value corresponding to each cell in the column data; wherein, the determination module is specifically configured to: aiming at each column of the data source, determining an amplification factor corresponding to the column based on the matching relation between the column name and the configured sensitive word; aiming at each cell in each row of the data source, determining an initial sensitive value corresponding to the cell based on the matching relation between the words in the cell and the configured sensitive words, and determining a target sensitive value corresponding to the cell based on the initial sensitive value and the amplification factor corresponding to the column where the cell is located; determining a row accumulated sensitive value corresponding to a row based on a target sensitive value corresponding to each cell in the row of the data source; and determining the table sensitivity value based on the row accumulation sensitivity values corresponding to all the rows of the data source.
For example, when the processing module 63 performs data protection on the data in the target data source based on the protection level, specifically, the processing module is configured to: if a data operation request aiming at the target data source is received, wherein the data operation request comprises operation user information, determining an access level corresponding to the operation user information; if it is determined that the operating user has the access right of the target data source based on the access level and the protection level, operating the data in the target data source based on the data operation request; or if it is determined that the operating user does not have the access right of the target data source based on the access level and the protection level, the data in the target data source is prohibited from being operated.
Based on the same application concept as the method described above, an embodiment of the present application provides a database server, and as shown in fig. 7, the database server may include: a processor 71 and a machine-readable storage medium 72, the machine-readable storage medium 72 storing machine-executable instructions executable by the processor 71; the processor 71 is configured to execute machine-executable instructions to implement the data protection methods disclosed in the above examples of the present application.
Based on the same application concept as the method, embodiments of the present application further provide a machine-readable storage medium, where several computer instructions are stored on the machine-readable storage medium, and when the computer instructions are executed by a processor, the data protection method disclosed in the above example of the present application can be implemented.
The machine-readable storage medium may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Furthermore, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. A data protection method is applied to a database server, the database server comprises a plurality of data sources, and each data source is used for storing data, and the method comprises the following steps:
aiming at a target data source in a plurality of data sources, acquiring a first data source and a second data source from the plurality of data sources; wherein first data in a first data source flows from the first data source to the target data source, and second data in the target data source flows from the target data source to the second data source;
determining an inflow influence value corresponding to the target data source based on the sensitive value of the first data;
determining an outflow influence value corresponding to the target data source based on the sensitive value of the second data;
determining a target influence value corresponding to the target data source based on the inflow influence value and the outflow influence value, and determining a protection level corresponding to the target data source based on the target influence value;
and performing data protection on the data in the target data source based on the protection level.
2. The method of claim 1,
the acquiring a first data source and a second data source from a plurality of data sources comprises:
receiving a first operation command, wherein the first operation command comprises data source information; determining an operation type corresponding to the first operation command, if the operation type is determined to belong to the sensitive operation based on the configured sensitive operation, determining a data source corresponding to the data source information as a first data source, and sending first data in the first data source to the target data source based on the first operation command;
receiving a second operation command, wherein the second operation command comprises data source information; and determining an operation type corresponding to the second operation command, if the operation type is determined to belong to the sensitive operation based on the configured sensitive operation, determining the data source corresponding to the data source information as a second data source, and sending second data in the target data source to the second data source based on the second operation command.
3. The method of claim 1, wherein determining the inflow impact value corresponding to the target data source based on the sensitive value of the first data comprises:
if the first data comprises K rows of data in the first data source, wherein K is a positive integer, determining a row accumulated sensitive value corresponding to each row of data, determining a table sensitive value corresponding to the first data source, and determining an attenuation coefficient value corresponding to an operation type corresponding to the first data;
and determining inflow influence values corresponding to the target data source based on the column accumulated sensitive values, the table sensitive values and the attenuation coefficient values respectively corresponding to the K columns of data.
4. The method of claim 1, wherein determining the corresponding outflow impact value for the target data source based on the sensitive value of the second data comprises:
if the second data comprises P rows of data in the target data source, determining a row accumulated sensitive value corresponding to each row of data; determining a first table sensitive value corresponding to the target data source, and determining a first attenuation coefficient value corresponding to the operation type when the second data is sent from the target data source to the second data source; determining a first influence value based on the row accumulated sensitive value, the first table sensitive value and the first attenuation coefficient value respectively corresponding to the P row data;
determining a second table sensitive value corresponding to the second data source, and determining a second attenuation coefficient value corresponding to the operation type when the second data is sent from the second data source to a third data source; determining a second influence value based on the row accumulated sensitive value, the second table sensitive value and the second attenuation coefficient value respectively corresponding to the P rows of data;
and determining an outflow influence value corresponding to the target data source based on the first influence value and the second influence value.
5. The method of claim 1,
before determining the protection level corresponding to the target data source based on the target influence value, the method further includes: determining a table sensitive value corresponding to the target data source;
the determining the protection level corresponding to the target data source based on the target influence value includes: and determining a protection level corresponding to the target data source based on the target influence value and the table sensitive value.
6. The method according to any one of claims 3 to 5,
the process of determining the column accumulated sensitivity value includes: determining an amplification factor corresponding to the column data based on the matching relationship between the column name corresponding to the column data and the configured sensitive word; aiming at each cell in the column data, determining an initial sensitive value corresponding to the cell based on the matching relation between the words in the cell and the configured sensitive words, and determining a target sensitive value corresponding to the cell based on the initial sensitive value and the amplification factor; determining a column accumulated sensitivity value based on the target sensitivity value corresponding to each cell in the column data;
the determination process of the table sensitivity value comprises the following steps: aiming at each column of the data source, determining an amplification factor corresponding to the column based on the matching relation between the column name and the configured sensitive word; aiming at each cell in each row of the data source, determining an initial sensitive value corresponding to the cell based on the matching relation between the words in the cell and the configured sensitive words, and determining a target sensitive value corresponding to the cell based on the initial sensitive value and the amplification factor corresponding to the column where the cell is located; determining a row accumulated sensitive value corresponding to a row based on the target sensitive value corresponding to each cell in the row; and determining the table sensitivity value based on the row accumulation sensitivity values corresponding to all the rows of the data source.
7. The method of claim 1,
the data protection of the data in the target data source based on the protection level includes:
if a data operation request aiming at the target data source is received, wherein the data operation request comprises operation user information, determining an access level corresponding to the operation user information;
if it is determined that the operating user has the access right of the target data source based on the access level and the protection level, operating the data in the target data source based on the data operation request;
and if it is determined that the operating user does not have the access right of the target data source based on the access level and the protection level, prohibiting the operation on the data in the target data source.
8. A data protection apparatus, for use in a database server, the database server including a plurality of data sources, each data source for storing data, the apparatus comprising:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a first data source and a second data source from a plurality of data sources aiming at a target data source in the plurality of data sources; wherein first data in the first data source flows from the first data source to the target data source, and second data in the target data source flows from the target data source to the second data source;
the determining module is used for determining an inflow influence value corresponding to the target data source based on the sensitive value of the first data; determining an outflow influence value corresponding to the target data source based on the sensitive value of the second data; determining a target influence value corresponding to the target data source based on the inflow influence value and the outflow influence value, and determining a protection level corresponding to the target data source based on the target influence value;
and the processing module is used for carrying out data protection on the data in the target data source based on the protection level.
9. The apparatus of claim 8,
the obtaining module is specifically configured to, when obtaining the first data source and the second data source from the plurality of data sources: receiving a first operation command, wherein the first operation command comprises data source information; determining an operation type corresponding to the first operation command, if the operation type is determined to belong to the sensitive operation based on the configured sensitive operation, determining a data source corresponding to the data source information as a first data source, and sending first data in the first data source to the target data source based on the first operation command; receiving a second operation command, wherein the second operation command comprises data source information; determining an operation type corresponding to the second operation command, if the operation type is determined to belong to the sensitive operation based on the configured sensitive operation, determining a data source corresponding to the data source information as a second data source, and sending second data in the target data source to the second data source based on the second operation command;
the determining module is specifically configured to, when determining the inflow influence value corresponding to the target data source based on the sensitive value of the first data: if the first data comprises K lines of data in the first data source, and K is a positive integer, determining a line accumulated sensitive value corresponding to each line of data in the K lines of data, determining a table sensitive value corresponding to the first data source, and determining an attenuation coefficient value corresponding to an operation type corresponding to the first data; determining inflow influence values corresponding to the target data source based on column accumulated sensitive values, the table sensitive values and the attenuation coefficient values corresponding to the K columns of data respectively;
the determining module is specifically configured to, when determining the outflow influence value corresponding to the target data source based on the sensitive value of the second data: if the second data comprises P rows of data in the target data source, determining a row accumulated sensitive value corresponding to each row of data; determining a first table sensitive value corresponding to the target data source, and determining a first attenuation coefficient value corresponding to the operation type when the second data is sent from the target data source to the second data source; determining a first influence value based on the row accumulated sensitive value, the first table sensitive value and the first attenuation coefficient value respectively corresponding to the P row data; determining a second table sensitive value corresponding to the second data source, and determining a second attenuation coefficient value corresponding to the operation type when the second data is sent from the second data source to a third data source; determining a second influence value based on the row accumulated sensitive value, the second table sensitive value and the second attenuation coefficient value respectively corresponding to the P rows of data; determining an outflow influence value corresponding to the target data source based on the first influence value and the second influence value;
the determining module is further configured to determine a table sensitive value corresponding to the target data source; the determining module is specifically configured to, when determining the protection level corresponding to the target data source based on the target influence value: determining a protection level corresponding to the target data source based on the target influence value and the table sensitivity value;
the determination module is specifically configured to, in the process of determining the column accumulated sensitivity value: determining an amplification factor corresponding to the column data based on the matching relationship between the column name corresponding to the column data and the configured sensitive word; aiming at each cell in the line of data, determining an initial sensitive value corresponding to the cell based on the matching relation between the words in the cell and the configured sensitive words, and determining a target sensitive value corresponding to the cell based on the initial sensitive value and the amplification factor; determining a column accumulated sensitivity value based on the target sensitivity value corresponding to each cell in the column data; wherein, the determination module is specifically configured to: aiming at each column of the data source, determining an amplification factor corresponding to the column based on the matching relation between the column name and the configured sensitive word; aiming at each cell in each row of the data source, determining an initial sensitive value corresponding to the cell based on the matching relation between the words in the cell and the configured sensitive words, and determining a target sensitive value corresponding to the cell based on the initial sensitive value and the amplification factor corresponding to the column where the cell is located; determining a row accumulated sensitive value corresponding to a row based on a target sensitive value corresponding to each cell in the row of the data source; determining a table sensitive value based on row accumulated sensitive values corresponding to all rows of the data source;
the processing module is specifically configured to, when performing data protection on the data in the target data source based on the protection level: if a data operation request aiming at the target data source is received, wherein the data operation request comprises operation user information, determining an access level corresponding to the operation user information; if it is determined that the operating user has the access right of the target data source based on the access level and the protection level, operating the data in the target data source based on the data operation request; or if it is determined that the operating user does not have the access right of the target data source based on the access level and the protection level, the data in the target data source is prohibited from being operated.
10. A database server, comprising: a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor; the processor is configured to execute machine executable instructions to implement the method steps of any of claims 1-7.
CN202111452394.0A 2021-12-01 2021-12-01 Data protection method, device and equipment Pending CN114329581A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111452394.0A CN114329581A (en) 2021-12-01 2021-12-01 Data protection method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111452394.0A CN114329581A (en) 2021-12-01 2021-12-01 Data protection method, device and equipment

Publications (1)

Publication Number Publication Date
CN114329581A true CN114329581A (en) 2022-04-12

Family

ID=81048610

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111452394.0A Pending CN114329581A (en) 2021-12-01 2021-12-01 Data protection method, device and equipment

Country Status (1)

Country Link
CN (1) CN114329581A (en)

Similar Documents

Publication Publication Date Title
JP6508353B2 (en) Information processing device
CN104751055B (en) A kind of distributed malicious code detecting method, apparatus and system based on texture
KR101627592B1 (en) Detection of confidential information
US9460310B2 (en) Method and apparatus for substitution scheme for anonymizing personally identifiable information
US20150067881A1 (en) Method and system for providing anonymized data from a database
JP2017091515A (en) Computer-implemented system and method for automatically identifying attributes for anonymization
CN111159697B (en) Key detection method and device and electronic equipment
CN107402957B (en) Method and system for constructing user behavior pattern library and detecting user behavior abnormity
US20230205755A1 (en) Methods and systems for improved search for data loss prevention
CN106033461A (en) Sensitive information query method and apparatus
US20170279786A1 (en) Systems and methods to protect sensitive information in data exchange and aggregation
CN107563204B (en) Privacy disclosure risk assessment method for anonymous data
CN106547791A (en) A kind of data access method and system
US20190005252A1 (en) Device for self-defense security based on system environment and user behavior analysis, and operating method therefor
CN111092880B (en) Network traffic data extraction method and device
EP4315096A1 (en) Secret detection on computing platform
Shahriar et al. Content provider leakage vulnerability detection in Android applications
CN112819156A (en) Data processing method, device and equipment
CN114329581A (en) Data protection method, device and equipment
KR101850650B1 (en) Portable storage device perfoming a ransomeware detection and method for the same
EP3929787A1 (en) Detecting sensitive data records using a data format analysis
Jiang et al. Robust Fingerprint of Location Trajectories Under Differential Privacy
CN115571533A (en) Confidential archive storage management method, device, equipment and readable storage medium
CN111427871B (en) Data processing method, device and equipment
CN110457600B (en) Method, device, storage medium and computer equipment for searching target group

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination