CN109308264B - Method for evaluating data desensitization effect, corresponding device and storage medium - Google Patents
Method for evaluating data desensitization effect, corresponding device and storage medium Download PDFInfo
- Publication number
- CN109308264B CN109308264B CN201811229680.9A CN201811229680A CN109308264B CN 109308264 B CN109308264 B CN 109308264B CN 201811229680 A CN201811229680 A CN 201811229680A CN 109308264 B CN109308264 B CN 109308264B
- Authority
- CN
- China
- Prior art keywords
- data
- dimension
- vector space
- data set
- space model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3668—Software testing
- G06F11/3672—Test management
- G06F11/3688—Test management for test execution, e.g. scheduling of test suites
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
Abstract
The invention discloses an evaluation method of data desensitization effect, corresponding equipment and a storage medium, wherein the evaluation method comprises the following steps: according to the attribute domain of a target data set, cutting each piece of data in the target data set into a plurality of dimensions; mapping each piece of data into a space point of a vector space model constructed in advance according to the dimension value of each dimension; determining a law of spatial points in the vector space model; and evaluating the sensitivity level of the target data set according to the rule. The invention effectively realizes the desensitization effect of the evaluation data, realizes the examination of each data in the evaluation process, does not need to manually detect the desensitization effect of the data, evaluates the sensitivity level of a target data set by a determined rule, realizes unified measurement, has consistent judgment rule, and can effectively identify potential sensitive data.
Description
Technical Field
The invention relates to the technical field of data security, in particular to a method for evaluating a data desensitization effect, corresponding equipment and a storage medium.
Background
Sensitive data is particularly protected as it contains sensitive information. If the sensitive data cannot be fully utilized, the sensitive data cannot exert the intrinsic value thereof, so that the sensitive information in the sensitive data can be protected, the data value can be fully mined, and a new requirement is met. Data desensitization is to remove sensitive information and ensure the availability of data to the maximum extent.
In the prior art, usually, desensitization effect evaluation is performed on desensitized data, a data spot check mode is generally adopted, but the data spot check mode has a poor effect, for example, the following defects exist:
1. not every desensitized data is examined, and thus there is a problem that sensitive data is not examined for desensitized data.
2. False judgment is easily generated by manually detecting sensitive data.
3. There is no unified measurement method, and the discrimination rules are not consistent.
4. Certain potentially sensitive data cannot be identified.
Aiming at the defects of the existing data spot check mode, an effective solution is not provided in the field.
Disclosure of Invention
In order to overcome the above-mentioned drawbacks, the technical problem to be solved by the present invention is to provide an evaluation method of data desensitization effect, and a corresponding device and storage medium, for at least achieving the data desensitization effect.
In order to solve the above technical problem, an evaluation method for a data desensitization effect in an embodiment of the present invention includes:
according to the attribute domain of a target data set, cutting each piece of data in the target data set into a plurality of dimensions;
mapping each piece of data into a space point of a vector space model constructed in advance according to the dimension value of each dimension;
determining a law of spatial points in the vector space model;
and evaluating the sensitivity level of the target data set according to the rule.
Optionally, the determining the regularity of the spatial points in the vector space model includes:
and counting the mapping times of each space point of the vector space model.
Optionally, said assessing the sensitivity level of said target data set according to said regularity comprises:
taking the reciprocal of the mapping times of each space point as the sensitivity level of the corresponding data of each space point;
and determining the sensitivity level of the target data set according to the sensitivity levels of the data corresponding to all the spatial points.
Optionally, the sensitivity level of each spatial point corresponding data is positively correlated with the size of the reciprocal.
Optionally, the determining the sensitivity level of the target data set according to the sensitivity levels of the data corresponding to all the spatial points includes:
and according to a preset weighted average mode, carrying out weighted average on the sensitivity levels of the data corresponding to all the spatial points to obtain the sensitivity level of the target data set.
Optionally, the mapping each piece of data to a space point of a pre-constructed vector space model according to the dimension value of each dimension includes:
determining the attribute value of the attribute domain corresponding to each dimension;
and taking the attribute value as a dimension value of each dimension.
Optionally, the mapping each piece of data to a space point of a pre-constructed vector space model according to the dimension value of each dimension includes:
and according to the plurality of dimensions, the vector space model is constructed in advance.
Optionally, the mapping each piece of data to a space point of a pre-constructed vector space model according to the dimension value of each dimension includes:
taking the attribute domain as a model dimension;
and according to the model dimension, constructing the vector space model in advance.
To solve the above technical problem, a computer device in an embodiment of the present invention is characterized in that the device includes a memory storing an evaluation program of data desensitization effect, and a processor executing the computer program to implement the steps of the method according to any one of the above.
To solve the above technical problem, in an embodiment of the present invention, a computer-readable storage medium stores an evaluation program of data desensitization effect, and the computer program is executable by at least one processor to implement the steps of the method according to any one of the above.
The embodiment of the invention has the following beneficial effects:
each of the above embodiments may divide each data in the target data set into a plurality of dimensions according to the attribute domain of the target data set, so that each data may be mapped to a spatial point of a pre-constructed vector space model according to the dimension value of each dimension, so that the sensitivity level of the target data set may be assessed by determining the rule of the spatial point in the vector space model, thereby effectively achieving the desensitization effect of the evaluation data, and in the evaluation process, each data may be checked without manually detecting the desensitization effect of the data, and the sensitivity level of the target data set may be assessed by the determined rule, thereby achieving uniform measurement, consistent determination rules, and effectively identifying potential sensitive data.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a flowchart of a method for evaluating a data desensitization effect according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in itself. Thus, "module", "component" or "unit" may be used mixedly.
The use of prefixes such as "first," "second," etc. to distinguish between elements is merely intended to facilitate the description of the invention and has no particular meaning in and of themselves.
Example one
The embodiment of the invention provides a method for evaluating a data desensitization effect, which comprises the following steps of:
s101, dividing each piece of data in a target data set into a plurality of dimensions according to an attribute domain of the target data set;
s102, mapping each piece of data into a space point of a vector space model constructed in advance according to the dimension value of each dimension;
s103, determining the rule of space points in the vector space model;
and S104, evaluating the sensitivity level of the target data set according to the rule.
The target data set may be a desensitized (desensitized information) data set, each piece of data of the data set has a plurality of attribute fields, and the attribute fields represent areas of various attributes of a piece of data, for example, the attribute fields of a piece of data may include a name field, an age field, a title field, and the like.
The vector space model can adopt some existing space models; for example, divided into three dimensions, then the vector space model may be a three-dimensional vector space model; similarly, the vector space model can be a four-dimensional vector space model if the vector space model is divided into four dimensions.
In some embodiments, the dimension value for each dimension may be determined in the following manner:
determining the attribute value of the attribute domain corresponding to each dimension;
and taking the attribute value as a dimension value of each dimension.
In the embodiment of the invention, each piece of data in a target data set is divided into a plurality of dimensions according to the attribute domain of the target data set, so that each piece of data can be mapped to a space point of a vector space model constructed in advance according to the dimension value of each dimension, the sensitivity level of the target data set can be evaluated according to the rule of the space point in the vector space model, the desensitization effect of the evaluation data is effectively realized, in the evaluation process, each piece of data is checked, the desensitization effect of the data does not need to be detected manually, the sensitivity level of the target data set is evaluated according to the determined rule, the unified measurement is realized, the judgment rules are consistent, and potential sensitive data can be effectively identified.
In some embodiments, the determining the regularity of the spatial points in the vector space model comprises:
and counting the mapping times of each space point of the vector space model.
Of course, in particular implementations, the number of mappings may be determined based on the recording by recording the number of times each spatial point is mapped.
In some embodiments, said assessing the sensitivity level of said target data set according to said regularity comprises:
taking the reciprocal of the mapping times of each space point as the sensitivity level of the corresponding data of each space point;
and determining the sensitivity level of the target data set according to the sensitivity levels of the data corresponding to all the spatial points.
For example, the inverse of the number of mappings is used as the sensitivity level. If a spatial point only corresponds to one piece of data, the data can be uniquely determined, so that original data corresponding to desensitized data is easily obtained by other correlation analysis methods, that is to say, the data sensitivity level corresponding to the spatial point is very high; and one spatial point corresponds to a plurality of pieces of data, the possibility of restoring the original data is greatly reduced, that is, the data sensitivity level corresponding to the spatial point is lower.
That is, the sensitivity level of each spatial point corresponding to data is positively correlated to the size of the reciprocal.
Optionally, the determining the sensitivity level of the target data set according to the sensitivity levels of the data corresponding to all the spatial points includes:
and according to a preset weighted average mode, carrying out weighted average on the sensitivity levels of the data corresponding to all the spatial points to obtain the sensitivity level of the target data set.
In some embodiments, the mapping each piece of data to a space point of a pre-constructed vector space model according to the dimension value of each dimension includes:
determining the attribute value of the attribute domain corresponding to each dimension;
and taking the attribute value as a dimension value of each dimension.
Wherein the attribute value may be determined from the actual target data set, e.g., the attribute value of the name field may be the character length of the name; the attribute value of the age field may be the size of the age.
In some embodiments, in order to improve evaluation efficiency, when the first piece of data is segmented, a vector space model may be constructed according to multiple dimensions of the segmentation; that is, before mapping each piece of data to a space point of a pre-constructed vector space model according to the dimension value of each dimension, the method includes:
and according to the plurality of dimensions, the vector space model is constructed in advance.
Of course, in some actual ways, each attribute domain can also be directly used as a model dimension to construct a vector space model; that is, before mapping each piece of data to a space point of a pre-constructed vector space model according to the dimension value of each dimension, the method includes:
taking the attribute domain as a model dimension;
and according to the model dimension, constructing the vector space model in advance.
The following is a specific example of the evaluation method for describing the desensitization effect of data according to the embodiment of the present invention, and the example may include the following steps:
step 1, reading a desensitized target data set, and reading according to each piece of data;
step 2, each piece of data is cut into different dimensions according to the data attribute domain;
step 3, constructing a vector space model by using the dimensions in the step 2;
step 4, mapping each piece of data into a space point on the vector space model until all target data sets are processed;
step 5, counting the occurrence frequency (namely mapping frequency) of each space point on the vector space model;
and 6, evaluating the sensitivity level according to the statistical rule of the spatial points in the vector space model, for example: the sensitivity level is taken as the reciprocal of the number of occurrences. If a point corresponds to only one piece of data, the data can be uniquely determined, which easily results in obtaining the original data of desensitized data by other correlation analysis methods, and if a point corresponds to multiple pieces of information, the possibility of restoring the original information is greatly reduced.
In the specific example, the desensitization effect is manually checked and converted into an automatic desensitization evaluation result, and the sensitivity level of each piece of desensitized data is given, so that a safety evaluation suggestion can be given.
Example two
An embodiment of the present invention provides a computer device, which includes a memory and a processor, wherein the memory stores an evaluation program of data desensitization effect, and the processor executes the computer program to implement the steps of the method according to any one of the embodiments.
EXAMPLE III
An embodiment of the invention provides a computer-readable storage medium, wherein the storage medium stores an evaluation program for desensitizing data, and the computer program is executable by at least one processor to implement the steps of the method according to any one of the embodiments.
In particular, in the second embodiment and the first embodiment, reference may be made to the first embodiment, so that corresponding technical effects are achieved.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (6)
1. A method for evaluating the desensitization effect of data, the method comprising:
according to the attribute domain of a target data set, cutting each piece of data in the target data set into a plurality of dimensions;
mapping each piece of data into a space point of a vector space model constructed in advance according to the dimension value of each dimension;
determining a law of spatial points in the vector space model;
evaluating the sensitivity level of the target data set according to the rule;
the determining the regularity of the spatial points in the vector space model includes: counting the mapping times of each space point of the vector space model;
the assessing the sensitivity level of the target data set according to the law comprises:
taking the reciprocal of the mapping times of each space point as the sensitivity level of the corresponding data of each space point;
determining the sensitivity level of the target data set according to the sensitivity levels of the data corresponding to all the spatial points;
determining the sensitivity level of the target data set according to the sensitivity levels of the data corresponding to all the spatial points, including: according to a preset weighted average mode, carrying out weighted average on the sensitivity levels of the data corresponding to all the space points to obtain the sensitivity level of the target data set;
according to the dimension value of each dimension, before mapping each piece of data to a space point of a vector space model constructed in advance, the method comprises the following steps:
determining the attribute value of the attribute domain corresponding to each dimension;
and taking the attribute value as a dimension value of each dimension.
2. The method of claim 1, wherein the sensitivity level of each spatial point corresponding data is positively correlated with the size of the reciprocal.
3. The method of claim 1 or 2, wherein the mapping each piece of data to a space point of a pre-constructed vector space model according to the dimension value of each dimension comprises:
and according to the plurality of dimensions, the vector space model is constructed in advance.
4. The method of claim 1 or 2, wherein the mapping each piece of data to a space point of a pre-constructed vector space model according to the dimension value of each dimension comprises:
taking the attribute domain as a model dimension;
and according to the model dimension, constructing the vector space model in advance.
5. A computer device, characterized in that the device comprises a memory storing an evaluation program of the desensitization effect of data and a processor executing the evaluation program to carry out the steps of the method according to any one of claims 1 to 4.
6. A computer-readable storage medium having stored thereon an evaluation program for desensitization of data, the evaluation program being executable by at least one processor to perform the steps of the method according to any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811229680.9A CN109308264B (en) | 2018-10-22 | 2018-10-22 | Method for evaluating data desensitization effect, corresponding device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811229680.9A CN109308264B (en) | 2018-10-22 | 2018-10-22 | Method for evaluating data desensitization effect, corresponding device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109308264A CN109308264A (en) | 2019-02-05 |
CN109308264B true CN109308264B (en) | 2021-11-16 |
Family
ID=65225408
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811229680.9A Active CN109308264B (en) | 2018-10-22 | 2018-10-22 | Method for evaluating data desensitization effect, corresponding device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109308264B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110941956A (en) * | 2019-10-26 | 2020-03-31 | 华为技术有限公司 | Data classification method, device and related equipment |
CN112395645A (en) * | 2020-11-30 | 2021-02-23 | 中国民航信息网络股份有限公司 | Data desensitization processing method and device |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102012985A (en) * | 2010-11-19 | 2011-04-13 | 国网电力科学研究院 | Sensitive data dynamic identification method based on data mining |
CN103761221A (en) * | 2013-12-31 | 2014-04-30 | 北京京东尚科信息技术有限公司 | System and method for identifying sensitive text messages |
CN105205408A (en) * | 2015-09-07 | 2015-12-30 | 中国科学院深圳先进技术研究院 | Spatial aggregation based trajectory data privacy protection method and system |
CN105205163A (en) * | 2015-06-29 | 2015-12-30 | 淮阴工学院 | Incremental learning multi-level binary-classification method of scientific news |
US9356961B1 (en) * | 2013-03-11 | 2016-05-31 | Emc Corporation | Privacy scoring for cloud services |
CN106845265A (en) * | 2016-12-01 | 2017-06-13 | 北京计算机技术及应用研究所 | A kind of document security level automatic identifying method |
CN106951411A (en) * | 2017-03-24 | 2017-07-14 | 福州大学 | The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing |
CN107368542A (en) * | 2017-06-27 | 2017-11-21 | 山东华软金盾软件股份有限公司 | A kind of concerning security matters Classified Protection of confidential data |
CN108268785A (en) * | 2016-12-30 | 2018-07-10 | 广东精点数据科技股份有限公司 | A kind of sensitive data identification and the device and method of desensitization |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10970404B2 (en) * | 2016-05-23 | 2021-04-06 | Informatica Llc | Method, apparatus, and computer-readable medium for automated construction of data masks |
-
2018
- 2018-10-22 CN CN201811229680.9A patent/CN109308264B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102012985A (en) * | 2010-11-19 | 2011-04-13 | 国网电力科学研究院 | Sensitive data dynamic identification method based on data mining |
US9356961B1 (en) * | 2013-03-11 | 2016-05-31 | Emc Corporation | Privacy scoring for cloud services |
CN103761221A (en) * | 2013-12-31 | 2014-04-30 | 北京京东尚科信息技术有限公司 | System and method for identifying sensitive text messages |
CN105205163A (en) * | 2015-06-29 | 2015-12-30 | 淮阴工学院 | Incremental learning multi-level binary-classification method of scientific news |
CN105205408A (en) * | 2015-09-07 | 2015-12-30 | 中国科学院深圳先进技术研究院 | Spatial aggregation based trajectory data privacy protection method and system |
CN106845265A (en) * | 2016-12-01 | 2017-06-13 | 北京计算机技术及应用研究所 | A kind of document security level automatic identifying method |
CN108268785A (en) * | 2016-12-30 | 2018-07-10 | 广东精点数据科技股份有限公司 | A kind of sensitive data identification and the device and method of desensitization |
CN106951411A (en) * | 2017-03-24 | 2017-07-14 | 福州大学 | The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing |
CN107368542A (en) * | 2017-06-27 | 2017-11-21 | 山东华软金盾软件股份有限公司 | A kind of concerning security matters Classified Protection of confidential data |
Non-Patent Citations (2)
Title |
---|
"Bleach: A Distributed Stream Data Cleaning System";Yongchao Tian等;《2017 IEEE 6th International Congress on Big Data》;20170911;第113-120页 * |
"目标网站访客舆情信息获取方法研究";张昊;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180215(第02期);I139-347 * |
Also Published As
Publication number | Publication date |
---|---|
CN109308264A (en) | 2019-02-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109308264B (en) | Method for evaluating data desensitization effect, corresponding device and storage medium | |
CN105404631B (en) | Picture identification method and device | |
CN110868378A (en) | Phishing mail detection method and device, electronic equipment and storage medium | |
CN112529575B (en) | Risk early warning method, equipment, storage medium and device | |
CN112818162B (en) | Image retrieval method, device, storage medium and electronic equipment | |
CN111090807A (en) | Knowledge graph-based user identification method and device | |
CN106301979B (en) | Method and system for detecting abnormal channel | |
CN111985192A (en) | Web attack report generation method, device, equipment and computer medium | |
CN112632529A (en) | Vulnerability identification method, device, storage medium and device | |
CN110348215B (en) | Abnormal object identification method, abnormal object identification device, electronic equipment and medium | |
CN107743087B (en) | Detection method and system for mail attack | |
CN113205130B (en) | Data auditing method and device, electronic equipment and storage medium | |
CN110719278A (en) | Method, device, equipment and medium for detecting network intrusion data | |
CN111064719A (en) | Method and device for detecting abnormal downloading behavior of file | |
CN112765003B (en) | Risk prediction method based on APP behavior log | |
CN112632528A (en) | Threat information generation method, equipment, storage medium and device | |
CN110619211A (en) | Malicious software identification method, system and related device based on dynamic characteristics | |
CN109409091B (en) | Method, device and equipment for detecting Web page and computer storage medium | |
CN109598525B (en) | Data processing method and device | |
CN113765850A (en) | Internet of things anomaly detection method and device, computing equipment and computer storage medium | |
CN106446687B (en) | Malicious sample detection method and device | |
CN114817518B (en) | License handling method, system and medium based on big data archive identification | |
CN113660227B (en) | Quantitative calculation method and device for network security vulnerability assessment | |
CN108075918B (en) | Internet service change detection method and system | |
CN113076451B (en) | Abnormal behavior identification and risk model library establishment method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CB03 | Change of inventor or designer information |
Inventor after: Song Pengju Inventor after: Li Xueying Inventor before: Song Pengju |
|
CB03 | Change of inventor or designer information |