CN111752969A - Algorithm for keeping statistical characteristics - Google Patents

Algorithm for keeping statistical characteristics Download PDF

Info

Publication number
CN111752969A
CN111752969A CN202010582944.XA CN202010582944A CN111752969A CN 111752969 A CN111752969 A CN 111752969A CN 202010582944 A CN202010582944 A CN 202010582944A CN 111752969 A CN111752969 A CN 111752969A
Authority
CN
China
Prior art keywords
data
algorithm
statistical
row
statistical analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010582944.XA
Other languages
Chinese (zh)
Inventor
缪钱勇
刘金新
陈俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information and Data Security Solutions Co Ltd
Original Assignee
Information and Data Security Solutions Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information and Data Security Solutions Co Ltd filed Critical Information and Data Security Solutions Co Ltd
Priority to CN202010582944.XA priority Critical patent/CN111752969A/en
Publication of CN111752969A publication Critical patent/CN111752969A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Abstract

The invention discloses an algorithm for keeping statistical characteristics, which comprises the following steps: A. adding a source database; B. adding a desensitization task; C. configuring a statistical feature keeping algorithm; D. performing a desensitization task; E. performing statistical analysis on the processed data; by adopting the algorithm, the sensitive data can still keep the original statistical characteristics after being processed by the algorithm, such as the mean value and the variance can not be changed, so that the data can still be subjected to statistical analysis, and the data can not lose the analysis value after being subjected to desensitization processing.

Description

Algorithm for keeping statistical characteristics
Technical Field
The invention relates to the field of data security services, in particular to an algorithm for keeping statistical characteristics.
Background
With the development of business support systems, security protection of sensitive data is becoming more important, so how to effectively protect the security of sensitive data becomes important for current security work.
For the security protection of sensitive data, the main method at present is to perform desensitization processing on the sensitive data directly, and in order not to affect the use of the sensitive data, a high-simulation algorithm is generally selected for a desensitization algorithm.
At present, most desensitization algorithms for continuously desensitizing sensitive data are high-simulation algorithms, so that the use of the sensitive data is not influenced, but when the data need to be subjected to statistical analysis, such as averaging and variance calculation, the data processed by the current desensitization algorithm lose the analysis value.
Disclosure of Invention
The present invention is directed to an algorithm for maintaining statistical characteristics to solve the problems set forth in the background art.
In order to achieve the purpose, the invention provides the following technical scheme:
an algorithm for preserving statistical features, comprising the steps of:
A. adding a source database;
B. adding a desensitization task;
C. configuring a statistical feature keeping algorithm;
D. performing a desensitization task;
E. and carrying out statistical analysis on the processed data.
As a further technical scheme of the invention: the step A is specifically as follows: a database containing sensitive data that needs to be statistically analyzed is added.
As a further technical scheme of the invention: the step B is specifically as follows: and adding a desensitization task of desensitization treatment on sensitive data needing statistical analysis.
As a further technical scheme of the invention: the step C is specifically as follows: and an algorithm for maintaining statistical characteristics is configured for the sensitive data to be subjected to statistical analysis, so that the statistical analysis can still be performed after desensitization treatment is performed on the sensitive data.
As a further technical scheme of the invention: the step D is specifically as follows: sensitive data needing statistical analysis are pulled from a source database, and algorithm processing for keeping statistical characteristics is carried out on the data.
As a further technical scheme of the invention: the specific processing method is out-of-order processing, for example, the data in the first row is put on the second row, the data in the second row is put on the third row, the data in the third row is put on the first row, and so on until all the data are processed.
As a further technical scheme of the invention: the step E is specifically as follows: and taking out the desensitized data and carrying out statistical analysis, wherein the data are summed through sum grammar of sql statements if the data are stored in a target database, and the original data are summed at the same time, and comparing whether the two summations are the same or not, wherein if the two summations are the same, the algorithm can be verified to be effective.
Compared with the prior art, the invention has the beneficial effects that: by adopting the algorithm, the sensitive data can still keep the original statistical characteristics after being processed by the algorithm, such as the mean value and the variance can not be changed, so that the data can still be subjected to statistical analysis, and the data can not lose the analysis value after being subjected to desensitization processing.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An algorithm for preserving statistical features, comprising the steps of:
A. adding a source database: adding a database containing sensitive data to be subjected to statistical analysis;
B. adding a desensitization task: adding a desensitization task for desensitization processing of sensitive data which needs to be subjected to statistical analysis;
C. configuring a statistical feature preserving algorithm: an algorithm for keeping statistical characteristics is configured for sensitive data to be subjected to statistical analysis, so that the statistical analysis can still be carried out after desensitization treatment is carried out on the sensitive data;
D. performing a desensitization task: sensitive data needing statistical analysis are pulled from a source database, algorithm processing for keeping statistical characteristics is carried out on the data, the specific processing method is out-of-order processing, for example, the data in the first row is placed in the second row, the data in the second row is placed in the third row, the data in the third row is placed in the first row, and the like until all the data are processed.
E. And (3) performing statistical analysis on the processed data: and taking out the desensitized data and carrying out statistical analysis, wherein the data are summed through sum grammar of sql statements if the data are stored in a target database, and the original data are summed at the same time, and comparing whether the two summations are the same or not, wherein if the two summations are the same, the algorithm can be verified to be effective.
By adopting the method, after the sensitive data are processed by the algorithm, the original statistical characteristics such as the mean value and the variance can still be kept, so that the data can still be subjected to statistical analysis, and the data can not lose the analysis value after desensitization processing.
The invention sums the original data, and simultaneously sums the data processed by the algorithm of the proposal, compares whether the two sums are the same, and after comparison, the two sums are the same, which shows that the algorithm can keep statistical characteristics, and the data processed by the algorithm still has analytical value.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims (7)

1. An algorithm for preserving statistical features, comprising the steps of:
A. adding a source database;
B. adding a desensitization task;
C. configuring a statistical feature keeping algorithm;
D. performing a desensitization task;
E. and carrying out statistical analysis on the processed data.
2. The algorithm for maintaining statistical characteristics as claimed in claim 1, wherein the step a is specifically: a database containing sensitive data that needs to be statistically analyzed is added.
3. The algorithm for maintaining statistical characteristics as claimed in claim 1, wherein step B specifically comprises: and adding a desensitization task of desensitization treatment on sensitive data needing statistical analysis.
4. The algorithm for preserving statistical characteristics as claimed in claim 1, wherein step C is specifically: and an algorithm for maintaining statistical characteristics is configured for the sensitive data to be subjected to statistical analysis, so that the statistical analysis can still be performed after desensitization treatment is performed on the sensitive data.
5. The algorithm for preserving statistical characteristics as claimed in claim 1, wherein step D is specifically: sensitive data needing statistical analysis are pulled from a source database, and algorithm processing for keeping statistical characteristics is carried out on the data.
6. An algorithm for preserving statistical properties according to claim 5, wherein the specific processing method is out-of-order processing, such as putting the first row of data on the second row, putting the second row of data on the third row, putting the third row of data on the first row, and so on until all data are processed.
7. The algorithm for maintaining statistical characteristics as claimed in claim 1, wherein step E specifically comprises: and taking out the desensitized data and carrying out statistical analysis, wherein the data are summed through sum grammar of sql statements if the data are stored in a target database, and the original data are summed at the same time, and comparing whether the two summations are the same or not, wherein if the two summations are the same, the algorithm can be verified to be effective.
CN202010582944.XA 2020-06-23 2020-06-23 Algorithm for keeping statistical characteristics Pending CN111752969A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010582944.XA CN111752969A (en) 2020-06-23 2020-06-23 Algorithm for keeping statistical characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010582944.XA CN111752969A (en) 2020-06-23 2020-06-23 Algorithm for keeping statistical characteristics

Publications (1)

Publication Number Publication Date
CN111752969A true CN111752969A (en) 2020-10-09

Family

ID=72676881

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010582944.XA Pending CN111752969A (en) 2020-06-23 2020-06-23 Algorithm for keeping statistical characteristics

Country Status (1)

Country Link
CN (1) CN111752969A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112765673A (en) * 2021-03-16 2021-05-07 杭州数梦工场科技有限公司 Sensitive data statistical method and related device
WO2022233236A1 (en) * 2021-05-04 2022-11-10 International Business Machines Corporation Secure data analytics

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529329A (en) * 2016-10-11 2017-03-22 中国电子科技网络信息安全有限公司 Desensitization system and desensitization method used for big data
CN110175468A (en) * 2019-05-05 2019-08-27 浙江工业大学 A kind of name desensitization method retaining distribution characteristics
CN110598442A (en) * 2019-09-11 2019-12-20 国网浙江省电力有限公司信息通信分公司 Sensitive data self-adaptive desensitization method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529329A (en) * 2016-10-11 2017-03-22 中国电子科技网络信息安全有限公司 Desensitization system and desensitization method used for big data
CN110175468A (en) * 2019-05-05 2019-08-27 浙江工业大学 A kind of name desensitization method retaining distribution characteristics
CN110598442A (en) * 2019-09-11 2019-12-20 国网浙江省电力有限公司信息通信分公司 Sensitive data self-adaptive desensitization method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112765673A (en) * 2021-03-16 2021-05-07 杭州数梦工场科技有限公司 Sensitive data statistical method and related device
WO2022233236A1 (en) * 2021-05-04 2022-11-10 International Business Machines Corporation Secure data analytics

Similar Documents

Publication Publication Date Title
CN111752969A (en) Algorithm for keeping statistical characteristics
Sun et al. SigPID: significant permission identification for android malware detection
US9106681B2 (en) Reputation of network address
CN111737750B (en) Data processing method and device, electronic equipment and storage medium
Brown The estimation of Wright's fixation index from genotypic frequencies
US10200531B2 (en) Mitigating potential fraud
US8844028B1 (en) Arrangement and methods for performing malicious data detection and information leakage prevention
CN113378193A (en) Privacy information access control method and device based on ontology reasoning
CN113239392A (en) Desensitization method based on data center sensitive data
CN108965208A (en) Log audit method based on correlation analysis
US20110154364A1 (en) Security system to protect system services based on user defined policies
Budiardjo et al. An approach for distributing sensitive values in k-Anonymity
CN107169356B (en) Statistical analysis method and device
RU2002120470A (en) Portable data carrier with their protection against unauthorized access ensured by distortion of messages
US20200167662A1 (en) Performing data processing based on decision tree
WO2016032516A1 (en) Static program analysis in an object-relational mapping framework
US20230195892A1 (en) Operation behavior monitoring method and apparatus, electronic device, and storage medium
CN112395603A (en) Vulnerability attack identification method and device based on instruction execution sequence characteristics and computer equipment
CN110363007B (en) Method and device for updating trusted policy
CN114490789A (en) Query request processing method and device
CN113055159B (en) Data desensitization method and device
CN112749376B (en) Dynamic desensitization method for relational database
CN113657120B (en) Man-machine interaction intention analysis method and device, computer equipment and storage medium
CN114817977B (en) Anonymous protection method based on sensitive attribute value constraint
CN117235761B (en) Cloud computing-based data security processing method, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201009