CN109408534A

CN109408534A - Method based on character string uniqueness and repeatability displacement output

Info

Publication number: CN109408534A
Application number: CN201811301201.XA
Authority: CN
Inventors: 程永新; 谢涛; 孙钊雄; 郭振宇
Original assignee: Shanghai New Torch Network Information Technology Ltd By Share Ltd
Current assignee: Shanghai New Torch Network Information Technology Ltd By Share Ltd
Priority date: 2018-11-02
Filing date: 2018-11-02
Publication date: 2019-03-01

Abstract

The method based on character string uniqueness and repeatability displacement output that the invention discloses a kind of, comprising: S1: the data in dictionary library are cached in library by the dictionary that java applet is cached to Redis caching library；S2: the sensitive data of source database is transported in Redis program in batches by java applet；S3:Redis program receives source data by step S2, searches the data in Redis caching library, and sensitive data is replaced into target value；Target value data set is returned to java applet by S4:Redis program；Target value data set is written to object library by S5:Java program；S6: repeating step S2-S5, completes the displacement output of institute's active data.While hiding data sensitive information of the present invention, retain the original business meaning of data；It supports the data sharing under cluster environment, improves treatment effeciency；Guarantee the uniqueness and repeatable output characteristics of data.

Description

Method based on character string uniqueness and repeatability displacement output

Technical field

The present invention relates to a kind of data desensitization methods, more particularly to a kind of character string uniqueness that is based on to replace with repeatability The method of output.

Background technique

In the epoch of current big data, data be known as by industry be enterprise most one of valuable assets, by tired to it Long-pending data are analyzed, and to grasp market trend in real time and to make tactful reply rapidly, or are formulated for it precisely effective Marketing strategy provides decision support, and enterprise can also be helped to provide the service with personalization much sooner for consumer.But once Data leak, and while not only bringing risk to goodwill, also undertake consumer not because of leakage of personal information The various nets such as the potential danger known, such as fishing website, fraudulent website, viral wooden horse, pseudo-base station, refuse messages and harassing call The arch-criminal of network swindle and harassing and wrecking mostlys come from the leakage of personal information.

For guarantee there are the data of sensitive information not leak, data can be bleached according to the desensitization rule of formulation, or Claim data desensitization.In general, as long as by the data with sensitive information by sequence, encryption generates the hands such as random value replacement Section can hide sensitive information, but some special datas with business meaning, be desensitized by above-described means, will The business meaning of data can be destroyed, in order to retain the business meaning of data, the way of the prior art is that have by exhaustive one Then the data acquisition system (hereinafter referred to as dictionary) of similar traffic meaning goes replacement to have the data of sensitive information in order or at random (hereinafter referred to as source data) to reach obfuscated data sensitive information, and retains the purpose of script business meaning.The prior art is specific Mode is as follows:

1) source data information is directly transformed: processing routine loads source data from source database, and processing routine is according to formulation Rule encrypts source data, sequence, generates the modes such as random permutation and generates target value, target value is finally output to target In library.

2) sensitive data displacement is carried out in local file storage dictionary data: dictionary value set being stored in local text in advance Part (text files such as such as excel, cvs, text) will need the sensitive source desensitized to add from source database by processing routine It is downloaded to processing routine memory, each sensitivity source searches legal substitution value into dictionary file according to the rule made, Finally substitution value is output in object library.

The prior art has the disadvantage in that

1, can not ensure the uniqueness and the repeatable output of data of data simultaneously: the prior art by the feature of source data, A position digital is calculated, for corresponding to the dictionary value of some position in dictionary table, although which may insure that data can Repeat output (i.e. identical source data desensitize out same target value), but since the dictionary quantity of dictionary table is finite, and source data Quantity theoretically be it is infinite, with one it is finite must gather indicate an infinite set, inherently duplicate data (i.e. Same source data does not desensitize out same target value), it not can guarantee the uniqueness of data.

2, data can not be shared: desensitization data are handled with local cache, using the fast reading and writing performance of its memory, and nothing Time delays caused by need to consuming because of telecommunication network can improve desensitization process efficiency to greatest extent.But with big data era Arrival, when the performance of a simple machine is to the operation of processing big data quantity seemed unable to do what one wishes, and supercomputer Several and extremely expensive, in order to deal with above-mentioned big data quantity operation, industry uses Clustering, and Jiang Duotai is mutually independent, logical The computer for crossing high speed network interconnection forms a separate server, obtains in the case where paying lower cost high performance Data-handling capacity.But since every computer is all individually present in cluster, if the local cache using computer is handled Desensitize the same task of data processing, can not allow data sharing between every computer, so local cache processing desensitization data It is only suitable under single-unit operation environment, the data under cluster environment cannot achieve shared.

3, inefficient: with database processing desensitization data, can solve above-mentioned 2nd point data can not be real under cluster environment Existing sharing problem.But because data need to land file in data, I/O bottleneck seriously affects desensitization performance, not can guarantee desensitization Efficiency, and routine access database is connected by telecommunication network, is prolonged so accessing database every time and requiring consumption network When, and access frequency is also high, and consumption network delay will be longer.

Therefore, it is necessary to the efficient data desensitization methods that a kind of repeatable output of data and uniqueness are realized simultaneously.

Summary of the invention

Output is replaced based on character string uniqueness and repeatability the technical problem to be solved in the present invention is to provide a kind of Method, while meeting the uniqueness and repeatable output property of data, the data sharing under cluster environment is realized, desensitization is improved Rate.

The present invention to solve above-mentioned technical problem and the technical solution adopted is that provide it is a kind of based on character string uniqueness with The method of repeatability displacement output, includes the following steps:

S1: the data in dictionary library are cached to Redis by java applet and are cached in the dictionary caching library in library；The word Allusion quotation library is the data acquisition system for replacing sensitive data, and the data in the dictionary library are known as dictionary value；

S2: the sensitive data of source database is transported in Redis program in batches by java applet；The source data Library is the sensitive data set before not desensitizing, and the data in the source database are known as source data；

S3:Redis program receives source data by step S2, searches the data in Redis caching library, and by sensitive data It is replaced into target value；

Target value data set is returned to java applet by S4:Redis program；

Target value data set is written to object library by S5:Java program, and the object library is the sensitive data after desensitization Set；

S6: repeating step S2-S5, completes the displacement output of institute's active data.

Further, Redis caching library further includes key-value pair caching library, and the key-value pair caching is stored with key in library Value pair, the key-value pair are a kind of data structures for storage, and data format is<key, value>, wherein source data is Key, dictionary value are value.

Further, the key-value pair caches the dictionary value of key-value pair in library, slow from dictionary when being previous processing source data The dictionary value taken out in warehousing, and the dictionary value of taking-up is removed from dictionary caching library, then with processed source number It is that value forms key-value pair according to the dictionary value for key, taking-up.

Further, the step S3 is specifically included: S31:Redis program receives the set of source data that java applet is sent It closes；S32: a source data is taken out from source data set, and it is removed from source data set；S33: it is with source data Key searches the identical key-value pair of key value in key-value pair caching library, after finding key-value pair, takes out the value value of key-value pair It is assigned to target value；S34: target value is deposited into target value set；S35: step S32-S34 is repeated, by source data set Institute's active data be replaced into target value, and be deposited into target value set.

Further, it can not find out the identical key-value pair of key value in the step S33, then taken at random from dictionary caching library A dictionary value out, and the dictionary value of taking-up is removed from dictionary caching library, dictionary value is then assigned to target value, together When with source data be key, dictionary value is that value forms key-value pair, and key-value pair is deposited into key-value pair caching library.

Further, the data being transported in batches in Redis program every time in the step S2 are 10000.

The present invention, which compares the prior art, to be had following the utility model has the advantages that provided by the invention be based on character string uniqueness and can weigh The method of renaturation displacement output while having the advantage that 1, hiding data sensitive information, retains the original business meaning of data Justice；2, support the data sharing under cluster environment: using Redis inner server as in cluster environment, each computer it Between data sharing platform, not only can store dictionary data in local file and utilize memory process data at high speeds, but also can be in cluster Shared data in environment；3, it improves treatment effeciency: by handling data mode with Batch sending, reducing each calculating to greatest extent Connection number between machine, to shorten the additional process time generated by network delay；And with Redis inner server substitution Traditional Relational DataBase avoids traditional Relational DataBase because of property brought by I/O bottleneck as shared data switching plane It can problem；4, guarantee the uniqueness and repeatable output characteristics of data: by the way that source data to be associated with dictionary value, and will close The data buffer storage of connection removes in a cache pool, while by the dictionary value being associated.Handling processed source every time When data, dictionary value directly is taken out from the associated data in cache pool, to realize the repeatable output and uniqueness of data.

Detailed description of the invention

Fig. 1 is the method architecture diagram in the embodiment of the present invention based on character string uniqueness and repeatability displacement output；

Fig. 2 is java applet logical flow chart in the embodiment of the present invention；

Fig. 3 is Redis programmed logic flow chart in the embodiment of the present invention.

Specific embodiment

The invention will be further described with reference to the accompanying drawings and examples.

Fig. 1 is the method architecture diagram in the embodiment of the present invention based on character string uniqueness and repeatability displacement output.

Referring to Figure 1, the method provided by the invention based on character string uniqueness and repeatability displacement output, including such as Lower step:

S1: the data in dictionary library 1 are cached to Redis by java applet 3 and are cached in the dictionary caching library in library 4；Institute Stating dictionary library 1 is the data acquisition system for replacing sensitive data, and the data in the dictionary library 1 are known as dictionary value；

S2: the sensitive data of source database 2 is transported in Redis program 5 in batches by java applet 3；The source number According to the sensitive data set that library 2 is before not desensitizing, the data in the source database 2 are known as source data；

S3:Redis program 5 searches the data in Redis caching library 4, and will by the source data received in step S2 Sensitive data is replaced into target value；

Target value data set is returned to java applet 3 by S4:Redis program 5；

Target value data set is written to object library 6 by S5:Java program 3, and the object library 6 is the sensitive number after desensitization According to set；

Specifically, Redis caching library 4 further includes key-value pair caching library, the key-value pair caching is stored with key in library Value pair, the key-value pair are a kind of data structures for storage, and data format is<key, value>, wherein source data is Key, dictionary value are value.The dictionary value of key-value pair, slow from dictionary when being previous processing source data in key-value pair caching library The dictionary value taken out in warehousing, and the dictionary value of taking-up is removed from dictionary caching library, then with processed source number It is that value forms key-value pair according to the dictionary value for key, taking-up.

Fig. 2 is 3 logical flow chart of java applet in the embodiment of the present invention；Fig. 3 is Redis program 5 in the embodiment of the present invention Logical flow chart.

Method provided by the invention based on character string uniqueness and repeatability displacement output, by Java when specific implementation Program 3 and Redis program 5 execute:

Fig. 2 is referred to, 3 processing step of java applet is as follows:

1) judge in Redis caching library 4 whether the data of existing dictionary library 1, dictionary library 1 if it does not exist, by dictionary library 1 Data buffer storage to Redis caching library 4 in；

2) 100000 source datas are loaded from source database 2 every time；

3) source data hair will have been loaded and be packaged into data flow, be transported in Redis program 5 and carry out desensitization process；

4) desensitization returned is received as a result, and being written into object library 6；

5) judge whether the data in source database 2 have all loaded, if data do not load completely in source database 2 It is complete, continue to execute step 2) -4)；

6) terminate process.

In the process, dictionary library 1 need to only be cached once into Redis caching library, other desensitization tasks needs are used It when the dictionary library 1, can directly use, no longer need to the consumption synchrodata time.10000 datas of batch load every time Load number is reduced with data, does not also need to occupy excessive memory, firstly because memory is a kind of hardware of valuableness, so can not Efficient and unlimited increase memory size is run in order to meet program, therefore all data cannot be disposably loaded into program In depositing, followed by each data that load from source database require consumption network delay time, and load number is more, and consumption is got over Long network delay time, so needing to reduce load number.

Fig. 3 is referred to, 5 processing step of Redis program is as follows:

1) source data set is received from java applet 3；

2) source data is taken out from source data set, and it is removed from source data set；

3) it is key with source data, searches the identical key-value pair of key value in key-value pair caching library, after finding key-value pair, The value value for taking out key-value pair is assigned to target value；It can not find out key-value pair, then execute following steps:

A) it is taken out from dictionary caching library and takes out a dictionary value at random, and it is removed from dictionary caching library；

B) dictionary value is assigned to target value, is then key with source data, dictionary value is value composition key-value pair, and will It is deposited into key-value pair caching library；

4) target value is deposited into target value set；

5) judge whether the data in source data set have been fully completed desensitization process, source data set, which exists, not to desensitize Data continue to execute step 2) -4)；

6) target value set is returned into java applet 3；

7) terminate process.

In the process, the source data to have desensitized is cached in key-value pair library with key-value pair format, it can be at next time When managing identical source data, same target value is taken out according to key (source data), to reach the repeatable output purpose for guaranteeing data. It after often taking a dictionary value, and removes it, the corresponding different dictionary value of the source data for keeping each different is protected with reaching Demonstrate,prove the uniqueness purpose of data.

In conclusion it is provided by the invention based on character string uniqueness and repeatability displacement output method, have with Lower advantage: 1, hiding data sensitive information while, retain the original business meaning of data；2, the data under cluster environment are supported It is shared: be used as in cluster environment using Redis inner server, the data sharing platform between each computer, both can Ground file store dictionary data utilize memory process data at high speeds, and can in cluster environment shared data；3, raising is handled Efficiency: by handling data mode with Batch sending, the connection number between each computer is reduced to greatest extent, to shorten The additional process time generated by network delay；And traditional Relational DataBase is substituted with Redis inner server, as shared Data interchange platform avoids traditional Relational DataBase because of performance issue brought by I/O bottleneck；4, guarantee the uniqueness of data With repeatable output characteristics: by the way that source data to be associated with dictionary value, and by associated data buffer storage in a cache pool In, while the dictionary value being associated being removed.Every time when handling processed source data, directly from the pass in cache pool Join data and take out dictionary value, to realize the repeatable output and uniqueness of data.

Although the present invention is disclosed as above with preferred embodiment, however, it is not to limit the invention, any this field skill Art personnel, without departing from the spirit and scope of the present invention, when can make a little modification and perfect therefore of the invention protection model It encloses to work as and subject to the definition of the claims.

Claims

1. a kind of method based on character string uniqueness and repeatability displacement output, which comprises the steps of:

S1: the data in dictionary library are cached to Redis by java applet and are cached in the dictionary caching library in library；The dictionary library For the data acquisition system for replacing sensitive data, the data in the dictionary library are known as dictionary value；

S2: the sensitive data of source database is transported in Redis program in batches by java applet；The source database is Sensitive data set before not desensitizing, the data in the source database are known as source data；

S3:Redis program receives source data by step S2, searches the data in Redis caching library, and sensitive data is replaced For target value；

Target value data set is returned to java applet by S4:Redis program；

Target value data set is written to object library by S5:Java program, and the object library is the sensitive data set after desensitization；

2. the method based on character string uniqueness and repeatability displacement output as described in claim 1, which is characterized in that institute Stating Redis caching library further includes key-value pair caching library, and the key-value pair caching is stored with key-value pair in library, and the key-value pair is one Kind for storage data structure, data format be<key, value>, wherein source data be key, dictionary value be value.

3. the method based on character string uniqueness and repeatability displacement output as claimed in claim 2, which is characterized in that institute The dictionary value of key-value pair in key-value pair caching library is stated, to cache the dictionary value taken out in library from dictionary when previous processing source data, And the dictionary value of taking-up is removed from dictionary caching library, is then key, the dictionary value of taking-up with processed source data Key-value pair is formed for value.

4. the method based on character string uniqueness and repeatability displacement output as claimed in claim 2, which is characterized in that institute Step S3 is stated to specifically include:

S31:Redis program receives the source data set that java applet is sent；

S32: a source data is taken out from source data set, and it is removed from source data set；

S33: being key with source data, searches the identical key-value pair of key value in key-value pair caching library, after finding key-value pair, takes The value value of key-value pair is assigned to target value out；

S34: target value is deposited into target value set；

S35: step S32-S34 is repeated, institute's active data of source data set is replaced into target value, and be deposited into target value collection In conjunction.

5. the method based on character string uniqueness and repeatability displacement output as claimed in claim 4, which is characterized in that institute It states and can not find out the identical key-value pair of key value in step S33, then take out a dictionary value at random from dictionary caching library, and will The dictionary value of taking-up is removed from dictionary caching library, dictionary value is then assigned to target value, while being key, word with source data Allusion quotation value is that value forms key-value pair, and key-value pair is deposited into key-value pair caching library.

6. the method based on character string uniqueness and repeatability displacement output as described in claim 1, which is characterized in that institute Stating the data being transported in batches in Redis program every time in step S2 is 10000.