CN111752969A - Algorithm for keeping statistical characteristics - Google Patents
Algorithm for keeping statistical characteristics Download PDFInfo
- Publication number
- CN111752969A CN111752969A CN202010582944.XA CN202010582944A CN111752969A CN 111752969 A CN111752969 A CN 111752969A CN 202010582944 A CN202010582944 A CN 202010582944A CN 111752969 A CN111752969 A CN 111752969A
- Authority
- CN
- China
- Prior art keywords
- data
- algorithm
- statistical
- row
- statistical analysis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
Abstract
The invention discloses an algorithm for keeping statistical characteristics, which comprises the following steps: A. adding a source database; B. adding a desensitization task; C. configuring a statistical feature keeping algorithm; D. performing a desensitization task; E. performing statistical analysis on the processed data; by adopting the algorithm, the sensitive data can still keep the original statistical characteristics after being processed by the algorithm, such as the mean value and the variance can not be changed, so that the data can still be subjected to statistical analysis, and the data can not lose the analysis value after being subjected to desensitization processing.
Description
Technical Field
The invention relates to the field of data security services, in particular to an algorithm for keeping statistical characteristics.
Background
With the development of business support systems, security protection of sensitive data is becoming more important, so how to effectively protect the security of sensitive data becomes important for current security work.
For the security protection of sensitive data, the main method at present is to perform desensitization processing on the sensitive data directly, and in order not to affect the use of the sensitive data, a high-simulation algorithm is generally selected for a desensitization algorithm.
At present, most desensitization algorithms for continuously desensitizing sensitive data are high-simulation algorithms, so that the use of the sensitive data is not influenced, but when the data need to be subjected to statistical analysis, such as averaging and variance calculation, the data processed by the current desensitization algorithm lose the analysis value.
Disclosure of Invention
The present invention is directed to an algorithm for maintaining statistical characteristics to solve the problems set forth in the background art.
In order to achieve the purpose, the invention provides the following technical scheme:
an algorithm for preserving statistical features, comprising the steps of:
A. adding a source database;
B. adding a desensitization task;
C. configuring a statistical feature keeping algorithm;
D. performing a desensitization task;
E. and carrying out statistical analysis on the processed data.
As a further technical scheme of the invention: the step A is specifically as follows: a database containing sensitive data that needs to be statistically analyzed is added.
As a further technical scheme of the invention: the step B is specifically as follows: and adding a desensitization task of desensitization treatment on sensitive data needing statistical analysis.
As a further technical scheme of the invention: the step C is specifically as follows: and an algorithm for maintaining statistical characteristics is configured for the sensitive data to be subjected to statistical analysis, so that the statistical analysis can still be performed after desensitization treatment is performed on the sensitive data.
As a further technical scheme of the invention: the step D is specifically as follows: sensitive data needing statistical analysis are pulled from a source database, and algorithm processing for keeping statistical characteristics is carried out on the data.
As a further technical scheme of the invention: the specific processing method is out-of-order processing, for example, the data in the first row is put on the second row, the data in the second row is put on the third row, the data in the third row is put on the first row, and so on until all the data are processed.
As a further technical scheme of the invention: the step E is specifically as follows: and taking out the desensitized data and carrying out statistical analysis, wherein the data are summed through sum grammar of sql statements if the data are stored in a target database, and the original data are summed at the same time, and comparing whether the two summations are the same or not, wherein if the two summations are the same, the algorithm can be verified to be effective.
Compared with the prior art, the invention has the beneficial effects that: by adopting the algorithm, the sensitive data can still keep the original statistical characteristics after being processed by the algorithm, such as the mean value and the variance can not be changed, so that the data can still be subjected to statistical analysis, and the data can not lose the analysis value after being subjected to desensitization processing.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An algorithm for preserving statistical features, comprising the steps of:
A. adding a source database: adding a database containing sensitive data to be subjected to statistical analysis;
B. adding a desensitization task: adding a desensitization task for desensitization processing of sensitive data which needs to be subjected to statistical analysis;
C. configuring a statistical feature preserving algorithm: an algorithm for keeping statistical characteristics is configured for sensitive data to be subjected to statistical analysis, so that the statistical analysis can still be carried out after desensitization treatment is carried out on the sensitive data;
D. performing a desensitization task: sensitive data needing statistical analysis are pulled from a source database, algorithm processing for keeping statistical characteristics is carried out on the data, the specific processing method is out-of-order processing, for example, the data in the first row is placed in the second row, the data in the second row is placed in the third row, the data in the third row is placed in the first row, and the like until all the data are processed.
E. And (3) performing statistical analysis on the processed data: and taking out the desensitized data and carrying out statistical analysis, wherein the data are summed through sum grammar of sql statements if the data are stored in a target database, and the original data are summed at the same time, and comparing whether the two summations are the same or not, wherein if the two summations are the same, the algorithm can be verified to be effective.
By adopting the method, after the sensitive data are processed by the algorithm, the original statistical characteristics such as the mean value and the variance can still be kept, so that the data can still be subjected to statistical analysis, and the data can not lose the analysis value after desensitization processing.
The invention sums the original data, and simultaneously sums the data processed by the algorithm of the proposal, compares whether the two sums are the same, and after comparison, the two sums are the same, which shows that the algorithm can keep statistical characteristics, and the data processed by the algorithm still has analytical value.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.
Claims (7)
1. An algorithm for preserving statistical features, comprising the steps of:
A. adding a source database;
B. adding a desensitization task;
C. configuring a statistical feature keeping algorithm;
D. performing a desensitization task;
E. and carrying out statistical analysis on the processed data.
2. The algorithm for maintaining statistical characteristics as claimed in claim 1, wherein the step a is specifically: a database containing sensitive data that needs to be statistically analyzed is added.
3. The algorithm for maintaining statistical characteristics as claimed in claim 1, wherein step B specifically comprises: and adding a desensitization task of desensitization treatment on sensitive data needing statistical analysis.
4. The algorithm for preserving statistical characteristics as claimed in claim 1, wherein step C is specifically: and an algorithm for maintaining statistical characteristics is configured for the sensitive data to be subjected to statistical analysis, so that the statistical analysis can still be performed after desensitization treatment is performed on the sensitive data.
5. The algorithm for preserving statistical characteristics as claimed in claim 1, wherein step D is specifically: sensitive data needing statistical analysis are pulled from a source database, and algorithm processing for keeping statistical characteristics is carried out on the data.
6. An algorithm for preserving statistical properties according to claim 5, wherein the specific processing method is out-of-order processing, such as putting the first row of data on the second row, putting the second row of data on the third row, putting the third row of data on the first row, and so on until all data are processed.
7. The algorithm for maintaining statistical characteristics as claimed in claim 1, wherein step E specifically comprises: and taking out the desensitized data and carrying out statistical analysis, wherein the data are summed through sum grammar of sql statements if the data are stored in a target database, and the original data are summed at the same time, and comparing whether the two summations are the same or not, wherein if the two summations are the same, the algorithm can be verified to be effective.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010582944.XA CN111752969A (en) | 2020-06-23 | 2020-06-23 | Algorithm for keeping statistical characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010582944.XA CN111752969A (en) | 2020-06-23 | 2020-06-23 | Algorithm for keeping statistical characteristics |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111752969A true CN111752969A (en) | 2020-10-09 |
Family
ID=72676881
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010582944.XA Pending CN111752969A (en) | 2020-06-23 | 2020-06-23 | Algorithm for keeping statistical characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111752969A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112765673A (en) * | 2021-03-16 | 2021-05-07 | 杭州数梦工场科技有限公司 | Sensitive data statistical method and related device |
WO2022233236A1 (en) * | 2021-05-04 | 2022-11-10 | International Business Machines Corporation | Secure data analytics |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106529329A (en) * | 2016-10-11 | 2017-03-22 | 中国电子科技网络信息安全有限公司 | Desensitization system and desensitization method used for big data |
CN110175468A (en) * | 2019-05-05 | 2019-08-27 | 浙江工业大学 | A kind of name desensitization method retaining distribution characteristics |
CN110598442A (en) * | 2019-09-11 | 2019-12-20 | 国网浙江省电力有限公司信息通信分公司 | Sensitive data self-adaptive desensitization method and system |
-
2020
- 2020-06-23 CN CN202010582944.XA patent/CN111752969A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106529329A (en) * | 2016-10-11 | 2017-03-22 | 中国电子科技网络信息安全有限公司 | Desensitization system and desensitization method used for big data |
CN110175468A (en) * | 2019-05-05 | 2019-08-27 | 浙江工业大学 | A kind of name desensitization method retaining distribution characteristics |
CN110598442A (en) * | 2019-09-11 | 2019-12-20 | 国网浙江省电力有限公司信息通信分公司 | Sensitive data self-adaptive desensitization method and system |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112765673A (en) * | 2021-03-16 | 2021-05-07 | 杭州数梦工场科技有限公司 | Sensitive data statistical method and related device |
WO2022233236A1 (en) * | 2021-05-04 | 2022-11-10 | International Business Machines Corporation | Secure data analytics |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111752969A (en) | Algorithm for keeping statistical characteristics | |
Sun et al. | SigPID: significant permission identification for android malware detection | |
US9106681B2 (en) | Reputation of network address | |
CN111737750B (en) | Data processing method and device, electronic equipment and storage medium | |
Brown | The estimation of Wright's fixation index from genotypic frequencies | |
US10200531B2 (en) | Mitigating potential fraud | |
US8844028B1 (en) | Arrangement and methods for performing malicious data detection and information leakage prevention | |
CN113378193A (en) | Privacy information access control method and device based on ontology reasoning | |
CN113239392A (en) | Desensitization method based on data center sensitive data | |
CN108965208A (en) | Log audit method based on correlation analysis | |
US20110154364A1 (en) | Security system to protect system services based on user defined policies | |
Budiardjo et al. | An approach for distributing sensitive values in k-Anonymity | |
CN107169356B (en) | Statistical analysis method and device | |
RU2002120470A (en) | Portable data carrier with their protection against unauthorized access ensured by distortion of messages | |
US20200167662A1 (en) | Performing data processing based on decision tree | |
WO2016032516A1 (en) | Static program analysis in an object-relational mapping framework | |
US20230195892A1 (en) | Operation behavior monitoring method and apparatus, electronic device, and storage medium | |
CN112395603A (en) | Vulnerability attack identification method and device based on instruction execution sequence characteristics and computer equipment | |
CN110363007B (en) | Method and device for updating trusted policy | |
CN114490789A (en) | Query request processing method and device | |
CN113055159B (en) | Data desensitization method and device | |
CN112749376B (en) | Dynamic desensitization method for relational database | |
CN113657120B (en) | Man-machine interaction intention analysis method and device, computer equipment and storage medium | |
CN114817977B (en) | Anonymous protection method based on sensitive attribute value constraint | |
CN117235761B (en) | Cloud computing-based data security processing method, system and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201009 |