CN109408534A - Method based on character string uniqueness and repeatability displacement output - Google Patents
Method based on character string uniqueness and repeatability displacement output Download PDFInfo
- Publication number
- CN109408534A CN109408534A CN201811301201.XA CN201811301201A CN109408534A CN 109408534 A CN109408534 A CN 109408534A CN 201811301201 A CN201811301201 A CN 201811301201A CN 109408534 A CN109408534 A CN 109408534A
- Authority
- CN
- China
- Prior art keywords
- data
- value
- key
- dictionary
- library
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The method based on character string uniqueness and repeatability displacement output that the invention discloses a kind of, comprising: S1: the data in dictionary library are cached in library by the dictionary that java applet is cached to Redis caching library;S2: the sensitive data of source database is transported in Redis program in batches by java applet;S3:Redis program receives source data by step S2, searches the data in Redis caching library, and sensitive data is replaced into target value;Target value data set is returned to java applet by S4:Redis program;Target value data set is written to object library by S5:Java program;S6: repeating step S2-S5, completes the displacement output of institute's active data.While hiding data sensitive information of the present invention, retain the original business meaning of data;It supports the data sharing under cluster environment, improves treatment effeciency;Guarantee the uniqueness and repeatable output characteristics of data.
Description
Technical field
The present invention relates to a kind of data desensitization methods, more particularly to a kind of character string uniqueness that is based on to replace with repeatability
The method of output.
Background technique
In the epoch of current big data, data be known as by industry be enterprise most one of valuable assets, by tired to it
Long-pending data are analyzed, and to grasp market trend in real time and to make tactful reply rapidly, or are formulated for it precisely effective
Marketing strategy provides decision support, and enterprise can also be helped to provide the service with personalization much sooner for consumer.But once
Data leak, and while not only bringing risk to goodwill, also undertake consumer not because of leakage of personal information
The various nets such as the potential danger known, such as fishing website, fraudulent website, viral wooden horse, pseudo-base station, refuse messages and harassing call
The arch-criminal of network swindle and harassing and wrecking mostlys come from the leakage of personal information.
For guarantee there are the data of sensitive information not leak, data can be bleached according to the desensitization rule of formulation, or
Claim data desensitization.In general, as long as by the data with sensitive information by sequence, encryption generates the hands such as random value replacement
Section can hide sensitive information, but some special datas with business meaning, be desensitized by above-described means, will
The business meaning of data can be destroyed, in order to retain the business meaning of data, the way of the prior art is that have by exhaustive one
Then the data acquisition system (hereinafter referred to as dictionary) of similar traffic meaning goes replacement to have the data of sensitive information in order or at random
(hereinafter referred to as source data) to reach obfuscated data sensitive information, and retains the purpose of script business meaning.The prior art is specific
Mode is as follows:
1) source data information is directly transformed: processing routine loads source data from source database, and processing routine is according to formulation
Rule encrypts source data, sequence, generates the modes such as random permutation and generates target value, target value is finally output to target
In library.
2) sensitive data displacement is carried out in local file storage dictionary data: dictionary value set being stored in local text in advance
Part (text files such as such as excel, cvs, text) will need the sensitive source desensitized to add from source database by processing routine
It is downloaded to processing routine memory, each sensitivity source searches legal substitution value into dictionary file according to the rule made,
Finally substitution value is output in object library.
The prior art has the disadvantage in that
1, can not ensure the uniqueness and the repeatable output of data of data simultaneously: the prior art by the feature of source data,
A position digital is calculated, for corresponding to the dictionary value of some position in dictionary table, although which may insure that data can
Repeat output (i.e. identical source data desensitize out same target value), but since the dictionary quantity of dictionary table is finite, and source data
Quantity theoretically be it is infinite, with one it is finite must gather indicate an infinite set, inherently duplicate data (i.e.
Same source data does not desensitize out same target value), it not can guarantee the uniqueness of data.
2, data can not be shared: desensitization data are handled with local cache, using the fast reading and writing performance of its memory, and nothing
Time delays caused by need to consuming because of telecommunication network can improve desensitization process efficiency to greatest extent.But with big data era
Arrival, when the performance of a simple machine is to the operation of processing big data quantity seemed unable to do what one wishes, and supercomputer
Several and extremely expensive, in order to deal with above-mentioned big data quantity operation, industry uses Clustering, and Jiang Duotai is mutually independent, logical
The computer for crossing high speed network interconnection forms a separate server, obtains in the case where paying lower cost high performance
Data-handling capacity.But since every computer is all individually present in cluster, if the local cache using computer is handled
Desensitize the same task of data processing, can not allow data sharing between every computer, so local cache processing desensitization data
It is only suitable under single-unit operation environment, the data under cluster environment cannot achieve shared.
3, inefficient: with database processing desensitization data, can solve above-mentioned 2nd point data can not be real under cluster environment
Existing sharing problem.But because data need to land file in data, I/O bottleneck seriously affects desensitization performance, not can guarantee desensitization
Efficiency, and routine access database is connected by telecommunication network, is prolonged so accessing database every time and requiring consumption network
When, and access frequency is also high, and consumption network delay will be longer.
Therefore, it is necessary to the efficient data desensitization methods that a kind of repeatable output of data and uniqueness are realized simultaneously.
Summary of the invention
Output is replaced based on character string uniqueness and repeatability the technical problem to be solved in the present invention is to provide a kind of
Method, while meeting the uniqueness and repeatable output property of data, the data sharing under cluster environment is realized, desensitization is improved
Rate.
The present invention to solve above-mentioned technical problem and the technical solution adopted is that provide it is a kind of based on character string uniqueness with
The method of repeatability displacement output, includes the following steps:
S1: the data in dictionary library are cached to Redis by java applet and are cached in the dictionary caching library in library;The word
Allusion quotation library is the data acquisition system for replacing sensitive data, and the data in the dictionary library are known as dictionary value;
S2: the sensitive data of source database is transported in Redis program in batches by java applet;The source data
Library is the sensitive data set before not desensitizing, and the data in the source database are known as source data;
S3:Redis program receives source data by step S2, searches the data in Redis caching library, and by sensitive data
It is replaced into target value;
Target value data set is returned to java applet by S4:Redis program;
Target value data set is written to object library by S5:Java program, and the object library is the sensitive data after desensitization
Set;
S6: repeating step S2-S5, completes the displacement output of institute's active data.
Further, Redis caching library further includes key-value pair caching library, and the key-value pair caching is stored with key in library
Value pair, the key-value pair are a kind of data structures for storage, and data format is<key, value>, wherein source data is
Key, dictionary value are value.
Further, the key-value pair caches the dictionary value of key-value pair in library, slow from dictionary when being previous processing source data
The dictionary value taken out in warehousing, and the dictionary value of taking-up is removed from dictionary caching library, then with processed source number
It is that value forms key-value pair according to the dictionary value for key, taking-up.
Further, the step S3 is specifically included: S31:Redis program receives the set of source data that java applet is sent
It closes;S32: a source data is taken out from source data set, and it is removed from source data set;S33: it is with source data
Key searches the identical key-value pair of key value in key-value pair caching library, after finding key-value pair, takes out the value value of key-value pair
It is assigned to target value;S34: target value is deposited into target value set;S35: step S32-S34 is repeated, by source data set
Institute's active data be replaced into target value, and be deposited into target value set.
Further, it can not find out the identical key-value pair of key value in the step S33, then taken at random from dictionary caching library
A dictionary value out, and the dictionary value of taking-up is removed from dictionary caching library, dictionary value is then assigned to target value, together
When with source data be key, dictionary value is that value forms key-value pair, and key-value pair is deposited into key-value pair caching library.
Further, the data being transported in batches in Redis program every time in the step S2 are 10000.
The present invention, which compares the prior art, to be had following the utility model has the advantages that provided by the invention be based on character string uniqueness and can weigh
The method of renaturation displacement output while having the advantage that 1, hiding data sensitive information, retains the original business meaning of data
Justice;2, support the data sharing under cluster environment: using Redis inner server as in cluster environment, each computer it
Between data sharing platform, not only can store dictionary data in local file and utilize memory process data at high speeds, but also can be in cluster
Shared data in environment;3, it improves treatment effeciency: by handling data mode with Batch sending, reducing each calculating to greatest extent
Connection number between machine, to shorten the additional process time generated by network delay;And with Redis inner server substitution
Traditional Relational DataBase avoids traditional Relational DataBase because of property brought by I/O bottleneck as shared data switching plane
It can problem;4, guarantee the uniqueness and repeatable output characteristics of data: by the way that source data to be associated with dictionary value, and will close
The data buffer storage of connection removes in a cache pool, while by the dictionary value being associated.Handling processed source every time
When data, dictionary value directly is taken out from the associated data in cache pool, to realize the repeatable output and uniqueness of data.
Detailed description of the invention
Fig. 1 is the method architecture diagram in the embodiment of the present invention based on character string uniqueness and repeatability displacement output;
Fig. 2 is java applet logical flow chart in the embodiment of the present invention;
Fig. 3 is Redis programmed logic flow chart in the embodiment of the present invention.
Specific embodiment
The invention will be further described with reference to the accompanying drawings and examples.
Fig. 1 is the method architecture diagram in the embodiment of the present invention based on character string uniqueness and repeatability displacement output.
Referring to Figure 1, the method provided by the invention based on character string uniqueness and repeatability displacement output, including such as
Lower step:
S1: the data in dictionary library 1 are cached to Redis by java applet 3 and are cached in the dictionary caching library in library 4;Institute
Stating dictionary library 1 is the data acquisition system for replacing sensitive data, and the data in the dictionary library 1 are known as dictionary value;
S2: the sensitive data of source database 2 is transported in Redis program 5 in batches by java applet 3;The source number
According to the sensitive data set that library 2 is before not desensitizing, the data in the source database 2 are known as source data;
S3:Redis program 5 searches the data in Redis caching library 4, and will by the source data received in step S2
Sensitive data is replaced into target value;
Target value data set is returned to java applet 3 by S4:Redis program 5;
Target value data set is written to object library 6 by S5:Java program 3, and the object library 6 is the sensitive number after desensitization
According to set;
S6: repeating step S2-S5, completes the displacement output of institute's active data.
Specifically, Redis caching library 4 further includes key-value pair caching library, the key-value pair caching is stored with key in library
Value pair, the key-value pair are a kind of data structures for storage, and data format is<key, value>, wherein source data is
Key, dictionary value are value.The dictionary value of key-value pair, slow from dictionary when being previous processing source data in key-value pair caching library
The dictionary value taken out in warehousing, and the dictionary value of taking-up is removed from dictionary caching library, then with processed source number
It is that value forms key-value pair according to the dictionary value for key, taking-up.
Fig. 2 is 3 logical flow chart of java applet in the embodiment of the present invention;Fig. 3 is Redis program 5 in the embodiment of the present invention
Logical flow chart.
Method provided by the invention based on character string uniqueness and repeatability displacement output, by Java when specific implementation
Program 3 and Redis program 5 execute:
Fig. 2 is referred to, 3 processing step of java applet is as follows:
1) judge in Redis caching library 4 whether the data of existing dictionary library 1, dictionary library 1 if it does not exist, by dictionary library 1
Data buffer storage to Redis caching library 4 in;
2) 100000 source datas are loaded from source database 2 every time;
3) source data hair will have been loaded and be packaged into data flow, be transported in Redis program 5 and carry out desensitization process;
4) desensitization returned is received as a result, and being written into object library 6;
5) judge whether the data in source database 2 have all loaded, if data do not load completely in source database 2
It is complete, continue to execute step 2) -4);
6) terminate process.
In the process, dictionary library 1 need to only be cached once into Redis caching library, other desensitization tasks needs are used
It when the dictionary library 1, can directly use, no longer need to the consumption synchrodata time.10000 datas of batch load every time
Load number is reduced with data, does not also need to occupy excessive memory, firstly because memory is a kind of hardware of valuableness, so can not
Efficient and unlimited increase memory size is run in order to meet program, therefore all data cannot be disposably loaded into program
In depositing, followed by each data that load from source database require consumption network delay time, and load number is more, and consumption is got over
Long network delay time, so needing to reduce load number.
Fig. 3 is referred to, 5 processing step of Redis program is as follows:
1) source data set is received from java applet 3;
2) source data is taken out from source data set, and it is removed from source data set;
3) it is key with source data, searches the identical key-value pair of key value in key-value pair caching library, after finding key-value pair,
The value value for taking out key-value pair is assigned to target value;It can not find out key-value pair, then execute following steps:
A) it is taken out from dictionary caching library and takes out a dictionary value at random, and it is removed from dictionary caching library;
B) dictionary value is assigned to target value, is then key with source data, dictionary value is value composition key-value pair, and will
It is deposited into key-value pair caching library;
4) target value is deposited into target value set;
5) judge whether the data in source data set have been fully completed desensitization process, source data set, which exists, not to desensitize
Data continue to execute step 2) -4);
6) target value set is returned into java applet 3;
7) terminate process.
In the process, the source data to have desensitized is cached in key-value pair library with key-value pair format, it can be at next time
When managing identical source data, same target value is taken out according to key (source data), to reach the repeatable output purpose for guaranteeing data.
It after often taking a dictionary value, and removes it, the corresponding different dictionary value of the source data for keeping each different is protected with reaching
Demonstrate,prove the uniqueness purpose of data.
In conclusion it is provided by the invention based on character string uniqueness and repeatability displacement output method, have with
Lower advantage: 1, hiding data sensitive information while, retain the original business meaning of data;2, the data under cluster environment are supported
It is shared: be used as in cluster environment using Redis inner server, the data sharing platform between each computer, both can
Ground file store dictionary data utilize memory process data at high speeds, and can in cluster environment shared data;3, raising is handled
Efficiency: by handling data mode with Batch sending, the connection number between each computer is reduced to greatest extent, to shorten
The additional process time generated by network delay;And traditional Relational DataBase is substituted with Redis inner server, as shared
Data interchange platform avoids traditional Relational DataBase because of performance issue brought by I/O bottleneck;4, guarantee the uniqueness of data
With repeatable output characteristics: by the way that source data to be associated with dictionary value, and by associated data buffer storage in a cache pool
In, while the dictionary value being associated being removed.Every time when handling processed source data, directly from the pass in cache pool
Join data and take out dictionary value, to realize the repeatable output and uniqueness of data.
Although the present invention is disclosed as above with preferred embodiment, however, it is not to limit the invention, any this field skill
Art personnel, without departing from the spirit and scope of the present invention, when can make a little modification and perfect therefore of the invention protection model
It encloses to work as and subject to the definition of the claims.
Claims (6)
1. a kind of method based on character string uniqueness and repeatability displacement output, which comprises the steps of:
S1: the data in dictionary library are cached to Redis by java applet and are cached in the dictionary caching library in library;The dictionary library
For the data acquisition system for replacing sensitive data, the data in the dictionary library are known as dictionary value;
S2: the sensitive data of source database is transported in Redis program in batches by java applet;The source database is
Sensitive data set before not desensitizing, the data in the source database are known as source data;
S3:Redis program receives source data by step S2, searches the data in Redis caching library, and sensitive data is replaced
For target value;
Target value data set is returned to java applet by S4:Redis program;
Target value data set is written to object library by S5:Java program, and the object library is the sensitive data set after desensitization;
S6: repeating step S2-S5, completes the displacement output of institute's active data.
2. the method based on character string uniqueness and repeatability displacement output as described in claim 1, which is characterized in that institute
Stating Redis caching library further includes key-value pair caching library, and the key-value pair caching is stored with key-value pair in library, and the key-value pair is one
Kind for storage data structure, data format be<key, value>, wherein source data be key, dictionary value be value.
3. the method based on character string uniqueness and repeatability displacement output as claimed in claim 2, which is characterized in that institute
The dictionary value of key-value pair in key-value pair caching library is stated, to cache the dictionary value taken out in library from dictionary when previous processing source data,
And the dictionary value of taking-up is removed from dictionary caching library, is then key, the dictionary value of taking-up with processed source data
Key-value pair is formed for value.
4. the method based on character string uniqueness and repeatability displacement output as claimed in claim 2, which is characterized in that institute
Step S3 is stated to specifically include:
S31:Redis program receives the source data set that java applet is sent;
S32: a source data is taken out from source data set, and it is removed from source data set;
S33: being key with source data, searches the identical key-value pair of key value in key-value pair caching library, after finding key-value pair, takes
The value value of key-value pair is assigned to target value out;
S34: target value is deposited into target value set;
S35: step S32-S34 is repeated, institute's active data of source data set is replaced into target value, and be deposited into target value collection
In conjunction.
5. the method based on character string uniqueness and repeatability displacement output as claimed in claim 4, which is characterized in that institute
It states and can not find out the identical key-value pair of key value in step S33, then take out a dictionary value at random from dictionary caching library, and will
The dictionary value of taking-up is removed from dictionary caching library, dictionary value is then assigned to target value, while being key, word with source data
Allusion quotation value is that value forms key-value pair, and key-value pair is deposited into key-value pair caching library.
6. the method based on character string uniqueness and repeatability displacement output as described in claim 1, which is characterized in that institute
Stating the data being transported in batches in Redis program every time in step S2 is 10000.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811301201.XA CN109408534A (en) | 2018-11-02 | 2018-11-02 | Method based on character string uniqueness and repeatability displacement output |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811301201.XA CN109408534A (en) | 2018-11-02 | 2018-11-02 | Method based on character string uniqueness and repeatability displacement output |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109408534A true CN109408534A (en) | 2019-03-01 |
Family
ID=65471334
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811301201.XA Pending CN109408534A (en) | 2018-11-02 | 2018-11-02 | Method based on character string uniqueness and repeatability displacement output |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109408534A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110457949A (en) * | 2019-08-14 | 2019-11-15 | 于向东 | A kind of data desensitization method of holding uniqueness based on data dictionary, integrality, relevance |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030179605A1 (en) * | 2002-03-22 | 2003-09-25 | Riesenman Robert J. | Obtaining data mask mapping information |
KR101049072B1 (en) * | 2011-02-17 | 2011-07-15 | (주)케이사인 | The method of mapping using identification data |
US20140019586A1 (en) * | 2012-07-13 | 2014-01-16 | Infosys Limited | Methods for format preserving and data masking and devices thereof |
CN104378234A (en) * | 2014-11-19 | 2015-02-25 | 北京数迅科技有限公司 | Cross-data-center data transmission processing method and system |
CN106656471A (en) * | 2016-12-22 | 2017-05-10 | 武汉信安珞珈科技有限公司 | Method and system for protecting user sensitive information |
US20170344495A1 (en) * | 2016-05-27 | 2017-11-30 | International Business Machines Corporation | Consistent utility-preserving masking of a dataset in a distributed enviornment |
CN107679418A (en) * | 2017-09-30 | 2018-02-09 | 武汉汉思信息技术有限责任公司 | Data desensitization method, server and storage medium |
CN107766741A (en) * | 2017-10-23 | 2018-03-06 | 中恒华瑞(北京)信息技术有限公司 | Data desensitization system and method |
CN107766503A (en) * | 2017-10-20 | 2018-03-06 | 福建中金在线信息科技有限公司 | Data method for quickly querying and device based on redis |
US20180253562A1 (en) * | 2017-03-01 | 2018-09-06 | International Business Machines Corporation | Self-contained consistent data masking |
-
2018
- 2018-11-02 CN CN201811301201.XA patent/CN109408534A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030179605A1 (en) * | 2002-03-22 | 2003-09-25 | Riesenman Robert J. | Obtaining data mask mapping information |
KR101049072B1 (en) * | 2011-02-17 | 2011-07-15 | (주)케이사인 | The method of mapping using identification data |
US20140019586A1 (en) * | 2012-07-13 | 2014-01-16 | Infosys Limited | Methods for format preserving and data masking and devices thereof |
CN104378234A (en) * | 2014-11-19 | 2015-02-25 | 北京数迅科技有限公司 | Cross-data-center data transmission processing method and system |
US20170344495A1 (en) * | 2016-05-27 | 2017-11-30 | International Business Machines Corporation | Consistent utility-preserving masking of a dataset in a distributed enviornment |
CN106656471A (en) * | 2016-12-22 | 2017-05-10 | 武汉信安珞珈科技有限公司 | Method and system for protecting user sensitive information |
US20180253562A1 (en) * | 2017-03-01 | 2018-09-06 | International Business Machines Corporation | Self-contained consistent data masking |
CN107679418A (en) * | 2017-09-30 | 2018-02-09 | 武汉汉思信息技术有限责任公司 | Data desensitization method, server and storage medium |
CN107766503A (en) * | 2017-10-20 | 2018-03-06 | 福建中金在线信息科技有限公司 | Data method for quickly querying and device based on redis |
CN107766741A (en) * | 2017-10-23 | 2018-03-06 | 中恒华瑞(北京)信息技术有限公司 | Data desensitization system and method |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110457949A (en) * | 2019-08-14 | 2019-11-15 | 于向东 | A kind of data desensitization method of holding uniqueness based on data dictionary, integrality, relevance |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106528746B (en) | Transaction Information querying method, apparatus and system | |
US20110320415A1 (en) | Piecemeal list prefetch | |
US20070239673A1 (en) | Removing nodes from a query tree based on a result set | |
US20110246518A1 (en) | Method and system of distributed caching | |
EP1808779B1 (en) | Bundling database | |
CN106776929A (en) | A kind of method for information retrieval and device | |
CN106708996A (en) | Method and system for full text search of relational database | |
CN102999319B (en) | A kind of method and system performing result based on AOP technology cache function | |
CN105574054A (en) | Distributed cache range query method, apparatus and system | |
US8965879B2 (en) | Unique join data caching method | |
US20220124117A1 (en) | Protecting data in non-volatile storages provided to clouds against malicious attacks | |
CN108052824A (en) | A kind of risk prevention system method, apparatus and electronic equipment | |
CN104461826A (en) | Object flow monitoring method, device and system | |
CN104239337B (en) | Processing method and processing device of tabling look-up based on TCAM | |
CN107577787A (en) | The method and system of associated data information storage | |
US9245132B1 (en) | Systems and methods for data loss prevention | |
CN103559307A (en) | Caching method and device for query | |
CN109408534A (en) | Method based on character string uniqueness and repeatability displacement output | |
CN106599247A (en) | Method and device for merging data file in LSM-tree structure | |
CN109672623A (en) | A kind of message processing method and device | |
US20080027967A1 (en) | Method, Apparatus, and Computer Program Product for Providing Intelligent Handling of Web Activity Based on Measured Resource Utilization | |
US10223396B2 (en) | Worm hashing | |
US11086939B2 (en) | Generation of regular expressions | |
WO2021096615A1 (en) | Method and system for identifying information objects using deep ai-based knowledge objects | |
CN106407347A (en) | Data caching method and apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190301 |
|
RJ01 | Rejection of invention patent application after publication |