CN102460404B - Generate obfuscated data - Google Patents

Generate obfuscated data Download PDF

Info

Publication number
CN102460404B
CN102460404B CN201080032309.9A CN201080032309A CN102460404B CN 102460404 B CN102460404 B CN 102460404B CN 201080032309 A CN201080032309 A CN 201080032309A CN 102460404 B CN102460404 B CN 102460404B
Authority
CN
China
Prior art keywords
value
obscure
record
key
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201080032309.9A
Other languages
Chinese (zh)
Other versions
CN102460404A (en
Inventor
P.尼尔加德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ab Initio Technology LLC
Original Assignee
Ab Initio Technology LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ab Initio Technology LLC filed Critical Ab Initio Technology LLC
Publication of CN102460404A publication Critical patent/CN102460404A/en
Application granted granted Critical
Publication of CN102460404B publication Critical patent/CN102460404B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Storage Device Security (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The method of obfuscated data comprises: reading (210) occurs in the value in one or more field of the multiple records in data source; Store (220) key value; To each in multiple record, generate (230) with described key value and obscure the original value that value comes in the given field of alternative described record, like this, described in obscure value and depend on described key value and relevant to described original value definitely; And store (240) and comprise the described obfuscated data set of record, described record comprises obscures value in data-storage system.

Description

Generate obfuscated data
The cross reference of related application
This application claims the right of priority of the Application U.S. Serial No 12/497,354 of the Application U.S. Serial No submission on July 2nd, 61/183,054 and 2009 submitted on June 1st, 2009, above-mentioned each application is incorporated by reference into this.
Technical field
This instructions relates to generation and obscures (obfuscated) data.
Background technology
In many companies, software developer is (such as, in the environment of the actual customer data of process) work beyond production environment, and for security reasons, they can not access production data.But the application program in order to ensure them will use production data true(-)running, during the development& testing of some feature representing production data, they may need actual test data.In order to provide so actual test data, input production data collection can be obscured and guarantee not leave over sensitive information, and can obfuscated data be stored thus use as test data.Depend on the law of the country of the needs of project and developer, the privacy policy of tissue or even use obfuscated data, the demand for obfuscated data may be widely different.Such as, data obfuscation may relate to containing replacing or changing personal information, such as name, address, birthday, social insurance number and credit card and bank account numbers.
Summary of the invention
In an aspect, generally speaking, the method for obfuscated data comprises: read in from data source the value occurred one or more field of multiple record; Storage key value; To each in multiple record, generate in the given field of described record with key value and obscure value and carry out alternative original value, make this value of obscuring depend on key value and relevant to original value definitely; And in data-storage system, storing the set comprising the obfuscated data of record, described record comprises obscures value.
Each side can comprise one or more following characteristic.
Described method comprises stores profile information further, and described profile information comprises the statistics eigenwert of at least one field.
Value is obscured with described key value with described in the profile information generation of given field store.
Describedly obscure value with the frequency decided based on the statistics in the profile information eigenwert stored of given field, appear in the given field of obfuscated data set.
Describedly obscure value by using original value and key as the input of the function to generating indexes value and using described index value to obscure concentrated the searching of value and obscure value predetermined and obscure value described in generating.
Describedly predetermined obscure value collection and store as look-up table, wherein eachly obscure value corresponding to one or more index value.
Multiple index values in one scope correspond to be obscured concentrated identical of value obscure value predetermined.
The size of described scope is based on the statistics of the profile information eigenwert stored of given field.
Generate with described key value and obscure the original value that value comes in the given field of alternative described record and comprise: with determining that function produces the selective value for selecting to obscure value in conjunction with original value and key.
With the mapping determined, described selective value is mapped to and obscures value.
The codomain obscuring value is selected to comprise from the multiple original values in the given field of the record of data source.
One or more original value is not included in codomain.
One or more value in codomain is not included in original value.
Described determine function key by cipher mode stop recover original value from obscuring value.
Describedly determine that function provides the different sequences of selective value original value relatively continuously for different key values.
For the First ray of the first value to the selective value of continuous original value of key, be can not predict from the second value for key the second sequence of the selective value of continuous original value.
Generating with key value and obscure the original value that value substitutes in the given field recorded and comprise: determining selective value whether corresponding to effectively obscuring value, and if be not, then repeatedly with determining that selective value and key are combined the additional selective value of generation by function, until additional selective value corresponds to effectively obscure value.
Effectively obscure value to be made up of predetermined figure.
Described method comprises further the record from data source is divided into multiple record set, and, with different computational resource concurrently in the record of different record set with the original value obscured in the alternative given field of value generated.
At least the first record obscuring value comprised in obfuscated data set comprises the original value that at least one value that is not confused substitutes.
Whether described method comprises further: the multiple records occurred for original value, as one man substituted by identical value of obscuring based on original value, determines whether to use the key value original value obscured in alternative first record of value.
In one aspect of the method, generally speaking, the system for obfuscated data comprises: the data source being provided in the record in one or more field with value; Data-storage system; And one or more is connected to the processor of data-storage system.One or more processor described provides execution environment: read and read in from data source the value occurred one or more field of multiple record; Storage key value; To each in multiple record, generate with key value and obscure value and substitute original value in the given field recorded, obscure value described in making and depend on key value and relevant to original value definitely; And in data-storage system, storing the obfuscated data set comprising record, described record comprises obscures value.
In one aspect of the method, generally speaking, the system for obfuscated data comprises: the data source being provided in the record in one or more field with value; Data-storage system; And the device of the value occurred one or more field of multiple record is read in from data source; To each in multiple record, with key value generate obscure value substitute record given field in original value, obscure value described in making and depend on key value and device relevant to original value definitely; And in data-storage system, storing the device comprising the obfuscated data set of record, described record comprises obscures value.
In one aspect of the method, generally speaking, computer-readable medium stores the computer program being used for obfuscated data.Described computer program comprises the instruction being provided for computing machine execution following steps: read in from data source the value occurred one or more field of multiple record; Storage key value; To each in multiple record, generate with key value and obscure value and substitute original value in the given field recorded, obscure value described in making and depend on key value and relevant to original value definitely; And in data-storage system, storing the obfuscated data set comprising record, described record comprises obscures value.
Each side can have one or more following advantages.
Owing to obscuring the relation having between value and original actual value and determine, therefore, during obscuring process, referential integrity (referential integrity) can be maintained, make obfuscated data meet the referential integrity restrictive condition identical with production data.Obscure process and also can guarantee that some operation to obfuscated data performs maintains some feature, such as in " associating " operates for the number of the value of each key.Due to the function that the given determination relation obscured between value and the original value of correspondence is the key value stored, and do not depend on other value of obscuring, therefore, described in obscure and can perform in the different piece of data set concurrently, still maintain the relation between these parts simultaneously.Describedly obscure process unauthorized side can be stoped to carry out reverse engineering to obfuscated data, and prevent from retrieving original value from production data.The feature of the general profile of the scope of such as record format, probable value, statistical nature and obfuscated data can with raw data as far as possible phase near-earth mate.Such as, because credit card number uses check number, therefore obfuscated data also may have correct calculated value for check number.If raw data has misspelling and inconsistent place, then obfuscated data can have same or similar kind irregular come test errors process.For the value of such as name (such as, name and surname) and address, the frequency of particular value in obfuscated data can reflect their frequencies in production data.
The details of one or more embodiment of the present invention is explained in detail in the following drawings with in illustrating.Other characteristic of the present invention, object and advantage will be apparent by instructions and accompanying drawing and claim.
Accompanying drawing explanation
Fig. 1 is the block diagram of the system of the calculating for performing graphic based (graph);
Fig. 2 is the process flow diagram of exemplary data process of obfuscation;
Fig. 3 is the schematic diagram that the determination of data obfuscation process maps;
Fig. 4 is the exemplary data flow graph of data obfuscation;
Fig. 5 is exemplary lookup table;
Fig. 6 is the table of pseudo-random permutation (permutation) example.
Fig. 7 is the table for generating the process instance effectively obscuring value.
Embodiment
With reference to Fig. 1, the system 100 developing program with obfuscated data comprises data source 102, it can comprise one or more data source, the such as tie point of memory device or online data stream, wherein each can store the data (such as, database table, electronic form file, plane text (flat text) file or the machine (native) form that used by large scale computer) of the arbitrary form in multiple storage format.Execution environment 104 for generating obfuscated data comprises data profiling module (data profiling module) 106 and data obfuscation module 112.Described execution environment 104 can under the control of suitable operating system (such as UNIX operating system) with one or more multi-purpose computer for main frame.Such as, described execution environment 104 can comprise multi-node parallel computing environment, it comprises the configuration of the computer system using multiple CPU (central processing unit) (CPU), described be configured to local (such as, the such as multicomputer system of SMP computing machine), or this locality is distributed (such as, be coupled as multiple processors of cluster or MPP), or remote distributed (such as, multiple processors by LAN or WAN network are coupled), or its any combination.
Data profiling module 106 reads data from data source 102, and stores the profile information of the various features of the data of description value occurred in data source 102.There is provided the memory device of data source 102 can be execution environment 104 this locality, such as, be stored on the storage medium of the computing machine being connected to operation execution environment 104; Such as, or can be that execution environment 104 is long-range, be main frame with remote system (such as, large scale computer 110), communicated with the computing machine of operation execution environment 104 by local or wide area data network.
The profile information that data obfuscation module 112 usage data profiling module 106 produces generates the set of the obfuscated data 114 be stored in the addressable data-storage system 116 of execution environment 104.Data-storage system 116 also can be developed environment 118 and access, and developer 120 can by obfuscated data 114 development& testing program in development environment 118.But, by keeping developer 120 cannot original production data in access data sources 102, the safety of described original production data can be kept.In some implementation, development environment 118 is for being the system of data flow diagram by application and development, and described data flow diagram is included between summit by summit (assembly or data set) that directional link (representing the stream of job element) connects.Such as, in the US publication 2007/0011668 (being incorporated by reference into this) being entitled as " Managing Parameters for Graph-Based Applications ", describe in further detail such environment.
Data profiling module 106 can according to data of filing in the polytype system comprising disparate databases system format.Data can be organized as the record of the value (comprising possible null value) of respective field (also referred to as " attribute " or " row ").Profile information can be organized as the independently profile (being called " field profile ") provided for different fields, and it is described in the value occurred in these fields.When first time reads data from data source, data profiling module 106 starts with some the initial format information about the record in this data source usually.(note in certain environments, even may not know the interrecord structure of data source at first, but just may be determined after analysis data source.) bit number, the kind (such as, character string, have symbol/signless integer) in the order of intrarecord field and the value that represents with bit that represent unique value can be comprised about the initial information of record.When data profiling module 106 reads record from data source, it calculates statistics and other descriptive information (such as, the frequency of particular value) of the value in the given field of reflection.Data profiling module 106 then stores these statisticss and descriptive information with the form of field profile, accesses for by data obfuscation module 112.Profile information also can comprise the information be associated with multiple fields of the record in data source 102, the total quantity such as recorded and total quantity that is effective or invalid record.Such as, in the US publication 2005/0114369 (being incorporated by reference into this) being entitled as " Data Profiling ", describe the one explanation of the process for filing to the field of data source.
Fig. 2 illustrates the process flow diagram of exemplary data process of obfuscation 200.Step 200 comprises the value occurred one or more field of multiple record from data source reading (210).Alternatively, store comprise the statistics eigenwert of at least one field profile information (such as, the form of the obfuscated data determined by the index value scope of the statistics corresponded in profile information, as described in more detail below).Process 200 comprises storage (220) key value, and it is used for providing security to obscure and can not reverse easily described in guaranteeing by encryption technology.To each in multiple record, process 200 generates (230) with key value and obscures value and substitute original value in the given field recorded, obscures value and depend on key value and relevant to original value definitely described in making.If use the profile information stored, then obscure value and appear in obfuscated data set with the frequency determined based on the profile information stored.Process 200 is included in data-storage system the obfuscated data set storing (240) and comprise record, and described record comprises obscures value.
In some implementation, whenever new data source can with or existing source receives new record time, repeating data process of obfuscation 200.Described process can be called by user, or with repeat interval or in response to some event Automatically invoked.
Obscure in method at some, the ability obscuring actual production data may be enough; But in other methods, have reversion process of obfuscation and to mate the ability of getting back to actual value also may be useful by obscuring value.In some method, such as, in said process 200, it is useful for such as can using the key of storage and encryption technology and guarantee that process of obfuscation can not be inverted and obtain actual value, as described in more detail below.
In some cases, the consistent distribution As time goes on carrying out obscuring value may be useful.Such as, the transaction data comprising the record corresponding to each different transaction relevant from particular customer may need to mate the Customer ID obscured in advance, the All Activity of given actual Customer ID is assigned with and identical obscures Customer ID.As another example, same address may be shared from the client in other database of same family.May expect to guarantee that the obfuscated data of these clients records and identical obscure address.If people need read and understand obfuscated data, then may expect that the value with being selected from predetermined distinguished value concentrated substitutes actual value, instead of substitute these values by the value generated arbitrarily simply.There is various ways to guarantee that set-point and correspondence obscure the consistent distribution between value.
In one approach, first time runs into set-point, obscures value from predetermined Stochastic choice of concentrating and is mapped to this set-point.Then, such as in Mapping data structure, two values are dependently of each other stored.To the whole follow-up appearance of the set-point be stored in advance in Mapping data structure, from data structure, retrieve identical correspondence obscure value.
In another approach, such as in said process 200, key is used to provide the mapping of the random determination occurred, and does not need the reality mapped in advance and obscure value and be stored in Mapping data structure.Therefore, this method based on key can save storage space in some cases.Such as, the hash function of key and strong encryption may be used for retrieving from predetermined collection (such as, look-up table) obscuring value.Or key and pseudo-random permutation algorithm may be used for calculating and obscure value.In both cases, as described in more detail below, the use of key ensure that given actual value always obscures value corresponding to identical, makes corresponding relation occur at random simultaneously.
Fig. 3 illustrates at the example mapping 300 from the original value region 310 of input data set and the determination obscured between value region 320 of these original values alternative.Key k to be stored in key storage 330 and as one man for all original values being mapped to each value of obscuring in the given section of obscuring, in the described given section of obscuring, to maintain referential integrity.Not needing to maintain in the difference section of obscuring of leading portion referential integrity, different keys can be used.
From the original value v in region 310 1combine with key k composite function 340, to produce selective value x from selected zone 350.Any combined value v can be used 1with the determination technology of key k, such as with value v 1with key k as the mathematical function inputted or expression formula.Composite function 340 is determined, makes identical v 1identical x value is always produced with k value.
Then mapping function 360 (such as, using the determination of look-up table to map) is used to be mapped to from region 320 by selective value x and to obscure value v 2.Mapping function 360 is also determined, given x value is always produced and identical obscures value v 2.Obscure value region 320 and may comprise some value identical with original value region 310, but also may be not exclusively overlapping, some value in region 310 is not comprised and obscures value as possible in region 320, and some value in region 320 not included in region 310.Such as, may expect to make many original values to become possible and obscure value (the common name in the city in such as address field or country or name field), but some certain sensitive information of eliminating can be filtered and obscure value (such as, credit card number, social insurance number or telephone number) as possible.In some cases; may expect to have and effectively obscure social insurance number (such as; support the validity test for obfuscated data); and in some cases; may expect invalid to obscure social insurance number (such as, guaranteeing that obfuscated data can not reveal anyone personal information).
Composite function 340 and mapping function 360 one of them or the two can comprise encryption technology, thus make to be difficult to process of obfuscation reversion and from the obfuscated data v of correspondence 2middle recovery raw data v 1.To following cryptographic hash function and key pseudo-random permutation technology, composite function 340 comprises encryption technology to produce then to be selected to obscure value v as entering form 2the selective value x of index.But in other implementations, composite function 340 can be the non-encrypted technology (such as, simply splicing) producing selective value x, and it is then used as encryption function (such as provides and obscure value v 2hash function) input, or obscure value v for searching 2index.Other determines that mapping can from given original value v 1value v is obscured in direct generation 2, without the need to calculating middle selective value x.
In some implementation, the method obscuring particular value can depend on the feature of this value.Such as, the data value that will be confused appeared in the given field of input data set may be classified as " limited " or " infinitely " region having value, and is classified as " evenly " or " non-homogeneous " distribution having value.To obscuring based on key, these features to may be used for determining that from look-up table value is obscured in retrieval and still obscure value with pseudo-random permutation calculating.Even if do not use key, these features also may be used for determining whether the frequency of particular value in obfuscated data reflects their frequencies in actual production data.
For " limited area data ", the number that can occur in the probable value in given field is restricted to the finite population of the value concentrated at predetermined effective value (such as, the character string of numeral or regular length).Between the confused stage of limited area data, validity check may be used for determining whether obscure value concentrates at predetermined effective value." infinite region data " are without the need to having predetermined probable value collection (such as, the value of random length).The example with the field of limited area data comprises social insurance number (SSN), credit card number (CCN), Customer ID (Custid), US phone number and U.S. Zip.The example with the field of infinite region data comprises name, surname and street address.
For " Uniform-distributed Data ", suppose that different pieces of information value may approximately equal, and to be generally expected to for everyone expression in database be unique.For " non-uniformly distributed data ", different value may appear at data centralization with different frequency, and repeats in the record of the different people that may represent in a database.Between the confused stage of non-uniformly distributed data, its frequency in actual production data of frequency matching of particular value in obfuscated data can be guaranteed with " frequency search " function, as described in more detail below.To field listed above, social insurance number, credit card number, Customer ID and US phone number are the examples of the field having Uniform-distributed Data, expect that it is unique for given client; And name, surname and U.S. Zip are the examples of the field having non-uniformly distributed data, it may repeat different client.
For infinite region data, or for some non-uniformly distributed data, perhaps validity check can not or cannot perform effectively.In these cases, if reasonably (plausible) value cannot be calculated, then look-up table can be used.Such as, the look-up table of rational name and address can be stored to obscure these fields.For non-uniformly distributed data, can guaranteeing that obscuring value is actual distribution with frequency search function, or for being uniformly distributed but infinite region data, obscuring process and can to guarantee from look-up table selective value equably.
Use encryption technology to carry out constructor based on obscuring of key, can repeat and predictable appears but is actually in its result at random.For obscuring the given collection of real data and selecting key.If obfuscated data is once impaired, then do not have key just cannot recover actual value from obfuscated data, therefore key should be maintained secrecy and be stored in a secured manner.The given key obscured and use in multiple execution of process can be stored in, guarantee for given actual value in multiple executory any appearance, generate and identical obscure value.Based on key obscure process can executed in parallel in multiple parts of multiple data set or single data set, this is because, based on key obscure not necessarily need to maintain before use from reality to the Mapping data structure obscuring value.Such as, the record of data centralization can be segmented (such as, given field based on such as Customer ID) become multiple record set, and obscure value generation and replace can with different computational resource (such as, different processors or different computing machines) to different record set executed in parallel.For the given field particular technology obscured performed based on key depends on the characteristic of the data value of this field:
For limited area and equally distributed data, with key and pseudo-random permutation algorithm calculated value.Store identical key to use in multiple execution.The validity obscuring value can be guaranteed with one or more Validity Function.
For infinite region data or non-uniformly distributed data, the searching value from look-up table with key and cryptographic hash function.Store identical key and look-up table uses in multiple execution.By guaranteeing that the value in look-up table is effective, the validity obscuring value can be guaranteed.
About Fig. 4, exemplary data flow diagram 400 performs obscure process to as inputting client (Customers) data set 402 provided.Read the record in data set 402 and provide it to the stream of the assembly in figure as record.Performed by usage data flow graph and obscure, data obfuscation process can combine with multiple the arbitrary of additional Data Stream Processing by system 100, and parallel processing technique can be used to perform any assembly in figure.Figure 40 0 comprises a series of " reformatting (Reformat) " assembly, each via with obscure value substitute record given field in actual value, the given record that input port at it receives is reformatted, and exports at its output port the record reformatted.To each (such as, all fields in record, or the selection subsets of field in record) in the multiple fields in the customer data collection 402 that will be confused, one is had to reformat assembly.In this example, six fields are had to be confused: surname (Last Name), name (First Name), address (Address), SSN, CCN and Custid.Assembly 404 processes obscuring of last name field, assembly 406 processes obscuring of name field, and assembly 408 processes obscuring of address field, obscuring of assembly 410 treatment S SN field, assembly 412 processes obscuring of CCN field, and assembly 414 processes obscuring of Custid field.The stream obscuring record exported from assembly 414 is obscured customer data collection 416 as the output and being stored in of Figure 40 0.The data set 418 that Figure 40 0 also characterizes the information of some attribute of input data set 402 with storage is associated, as described in more detail below.All reformatting assemblies can use common key value, its parameter as Figure 40 0 and storing.Maintenance key parameter safety is depended in the security of obfuscated data collection 416.Key can fully be grown (such as, 12 or 60 figure places, or longer) strengthen security.
Process from before first of data set 402 record or meanwhile in assembly, assembly determines whether to use non-key technology, key table lookup technique or key pseudo-random permutation technology to be that the field of this assembly process is determined to obscure value.If field has the value that do not need as one man to distribute between the different records of given client association (such as, dealing money), and be not responsive especially, then the value in this field of record can be obscured by the technology of the key value not relying on storage.Such as, assembly can use random value generating function.If field has the value as one man distributed between the different records of given client association, and/or should with specifically distribute, region or validity test match, the key so stored may be used for performing one of key table lookup technique or key pseudo-random permutation technology.
If field has the value of infinite region or non-uniform Distribution, then assembly uses the key table lookup technique based on keyed hash method.The key value that cryptographic hash function stores carrys out computation index value, and this index value is used for searching value from possible obscuring value form.Because keyed hash produces the random value occurred, therefore index (and draw thus obscure value) seemingly Stochastic choice.But if know key value, then index is actually measurable and repeatably.If field value has non-uniform Distribution, then assembly uses this field from one of data set 418 and utilizes " frequency search " of frequency profile information to operate.
Such as, to the field of such as name, surname, address and U.S. Zip (U.S.Zip code), for each in these fields, data set 418 comprises " frequency (Frequency) " data set and " highest frequency (Frequency Max) " data set.Highest frequency data set is included in the tale of all values occurred in the given field of real data, and tolerance frequency search operation is given field searches tale.Therefore, each highest frequency data set comprises signal total count value.Each frequency data collection comprises by the not indexed look-up table of overlapping range, and tolerance frequency search operation uses " interval is searched " function for search given field value to given index value.Owing to have selected different index values, therefore based on they frequencies of occurrences in real data, with suitable He Ne laser field value.
Such as, Fig. 5 illustrates the example of the look-up table of the frequency data collection for name field.For the index value in 0-2 scope selects name " Nuo Dun (Norton) ", for the index value in 3-10 scope selects name " Lee (Lee) ", and select name " Butler (Butler) " for index value 11.The frequency that the size of scope and respective value appear in real data according to the statistics of profile information is proportional.Therefore, if index value occurs with equal probability, then the value of each name can occur with the identical frequency occurred in real data.
If field has limited area and equally distributed value, then assembly uses the key pseudo-random permutation technology (such as, Luby-Rackoff pseudo-random permutation generation unit) generated based on pseudo random number.In some implementation, for any given key and scope 1, N (such as, for the original value of such as social insurance number or credit card number, scope corresponding to the number of limited area) in input value, displacement generation unit function f (k, n) relevant with actual value obscures value for producing in the mode occurred at random.Such as, different n value produces the value of different f (k, n), and wherein f (k, n) is the integer between 1 and N.Relation between n and f (k, n) is determined, but random appearance (such as, the successive value of n produces the value of the f (k, n) of accidental distributed appearance).K value is for different value of K provides different n to the key value of f (k, n) sequence.For the set-point of key k, the successive value for n determines the sequence of f (k, the n) value obtained; But, for the sequence of one of k value for f (k, the n) value of the successive value of n, can not only predict from the sequence of another value for k for f (k, the n) value of the successive value of n.
Form shown in Fig. 6 illustrates ordinal value for the n between 1 and 20 and single key value k, and displacement generation unit " can upset (shuffle) " example of the probable value of f (k, n) between 1 and 20.In this example, each input value that value is mapped to n is upset for one of f (k, n).Because input value and the combination of key of often going are unique, so there is no two, to upset value be identical.Owing to obscuring value according to upsetting value f (k, n) to select, it is identical for therefore also not having two to obscure value.For the sake of simplicity, example given in Fig. 6 illustrates that 20 are upset value, but can generate larger sequence.
Following example describes the implementation of each reformatting assembly in the data flow diagram of Fig. 4.
The assembly 404 obscuring last name field value can will seem random index creation in the interval look-up table of surname with keyed_pick function.Even if their actual surname is identical, obtain different obscuring surname in order to ensure different client, Custid field can pass in the key value of keyed_pick in calculating and use.Be combined interval search carry out this operation, the distribution statistics of surname can be maintained.In this example, in real data, there is the kinsfolk of identical surname may be assigned with different surnames in obfuscated data.
The assembly 406 obscuring name field value can realize in the mode similar to assembly 404.If occur identifying that client is the field of sex in real data, then keyed_pick function can distinguish masculinity and femininity name.Or described function such as can do " well inferring " by using further look-up table.
The assembly 408 keyed_pick function obscuring address word segment value will seem random index creation in two interval look-up tables: one comprises postcode, city and country; Another comprises house number and street name.If know key, then index can be predictable.In order to more seldom arrive sensitive information, assembly can select postcode and street name independently, and can construct non-existent address, such as 1600 Pennsylvania Avenue, Lexington, MA02421 (1600Pennsylvania Avenue, Lexington, MA 02421).Or for the application software will verifying address, assembly can be configured to select street name and postcode simultaneously.Not impractically high to given street in order to ensure house number, assembly can be limited possible selective value.
Obscure assembly 410 pseudo-random permutation choice of technology pseudorandom 9 figure place of SSN field value, until find corresponding to effective SSN.By the technology shown in Fig. 7, assembly 410 also can guarantee that each to obscure value be unique.For the sake of simplicity, we suppose that the even number in Fig. 7 represents effective SSN, and the odd number of 9 figure places is not effective SSN.As mentioned above, pseudo-random permutation technology " can upset " possible values with displacement generation unit function to given field.Beginning two row of the form in Fig. 7 illustrate that this is upset, and illustrate how SSN may be upset.3rd row explanation number of times on demand calls the result of the function confirming SSN, to guarantee to export effective SSN.
The order of the arrow step display in form:
A. for each input SSN (representing in row 1), encode_ssn function distributes and upsets value in same a line of row 2.
If the numerical value b. selected in row 2 is even number (effectively), then it can be used as and obscures value and to be written in the output variable of checking (in row 3 expression).If the numerical value selected in row 2 is odd number (invalid), then function gets back to row 1, finds the number of selection wherein, and checks that whether the value in this line of row 2 is effective.
C. this step is repeated until find Effective Numerical.Because each numerical value in row 2 only can be reached (that is, be man-to-man from row 1 to the mapping of row 2) by the numerical value of in row 1, therefore in row 3, each authenticated value of obscuring is unique.Such as, for the input field comprising 2 and 4, assembly 410 by travel through respectively the top of the table of Fig. 7 display order to find effective output valve.First ray uses the arrow in the form of Fig. 7 to illustrate.
The validity standard that the assembly 412 obscuring CCN field value is 16 figure places based on CCN and starts with 4, although can adopt the order of other figure place any or figure place.First 6 figure place may be enough to determine publisher (issuer).Last figure place is the mistake that control numerical value (such as, calculating with Luhn algorithm) checks in its numerical digit in front.Assembly 412 pseudo-random permutation technology selects pseudorandom 15 figure place, until find effective one, and then calculates control numerical digit.Assembly 414 provides validity check function to confirm that numerical value is effective CCN by checking length and controlling numerical digit.
The assembly 414 obscuring Custid field value is the hypothesis of 10 figure places between 1000000000 and 9999999999 based on Custid.The same with SSN and CCN, this assembly can define the coding function selecting pseudo-random values by pseudo-random permutation technology.Obscure and can be from for the method for SSN and CCN is different, validity check may be not necessarily.
After obfuscated data, data obfuscation module 112 can test the effect obscured.In some implementation, module 112, by performing joint operation with the key that may be the compound key (such as, the value of name field combines with the value of last name field) that multiple field value forms, confirms to there is not real data obscuring in field.By the value that will obscure in the field of record compared with the value of the corresponding field in physical record, module 112 can confirm that obfuscated data comprises the value different from True Data for any given name and surname.
Above-mentioned obfuscation can be implemented with the software for performing on computers.Such as, software is comprising forming process at least one processor, at least one data-storage system (comprise volatile and nonvolatile memory and/or memory element), one or more programming of at least one input equipment or port and at least one output device or port or programmable computer system (may be multiple framework, such as distributed, client/server or grid type) upper one or more computer program performed separately.Described software can configuration example as provided one or more module in the larger program of other business of the design and structure about calculation chart.The node of figure and element can as being stored in the data structure of computer-readable medium or other is stored in the organising data meeting data model in data repository and implements.
Described software can be provided on the storage medium of such as CD-ROM, can be read by programmable calculator that is general or specific use, or by examples of network communication media transmission (encoding in the signal propagated) to its computing machine of execution.Repertoire can perform on the computing machine of specific use, or uses the hardware of the specific use of such as coprocessor to perform.Described software can perform in a distributed way, and the different computing machine of the different calculating sections of wherein being specified by software performs.Each such computer program is preferably stored in or downloads to storage medium or equipment (such as, solid-state memory or medium, or magnetic or optical medium), described storage medium or equipment can be read, for being configured and operate computing machine during computer system reads to perform step described herein when storage medium or equipment by general or specific use programmable calculator.Described invention system also can be considered as the computer-readable recording medium that is configured with computer program and implement, and wherein storage medium is configured to cause computer system operate in special and predetermined mode thus perform function described herein.
Many embodiments of the present invention are described.But, understanding can be carried out various amendment without departing from the spirit and scope of the present invention.Such as, some step above-mentioned may have nothing to do with order, therefore can be different from the order of described order and perform.It will be appreciated that foregoing description meant for illustration and not limit the scope of the invention, described scope is by accessory claim scope definition.Such as, above-mentioned many functional steps can perform with different order, and do not have materially affect overall process.Other embodiment is within the scope of claim.

Claims (69)

1., for a method for obfuscated data, described method comprises:
The value occurred one or more field of multiple record is read in from data source;
Storage key value;
To each in multiple record, the original value obscured value and come in the given field of alternative described record is generated with described key value, obscure value described in making and depend on described key value and relevant to described original value definitely, wherein use same key to be mapped to by the original value of described multiple record and correspondingly obscure value; And
Store in data-storage system and comprise the obfuscated data set comprising the record obscuring value.
2. the method for claim 1, comprises stores profile information further, and described profile information comprises the statistics eigenwert of field described at least one.
3. method as claimed in claim 2, wherein, to described given field, obscures value described in generating with described key value and the profile information stored.
4. method as claimed in claim 3, wherein, described in obscure value with the frequency determined based on the statistics in the profile information eigenwert stored of described given field, appear in the given field of described obfuscated data set.
5. method as claimed in claim 4, wherein, by with described original value and described key as the input of the function of generating indexes value and with described index value predetermined obscuring value is concentrated search described in obscure value, obscure value described in generation.
6. method as claimed in claim 5, wherein, described predetermined value collection of obscuring stores as look-up table, in described look-up table, eachly obscures value corresponding to one or more index value.
7. method as claimed in claim 5, wherein, the multiple index values in a scope correspond to predetermined obscured concentrated identical of value described and obscures value.
8. method as claimed in claim 7, wherein, the size of described scope is based on the described statistics in the profile information eigenwert stored of given field.
9. the method for claim 1, wherein generate with described key value and obscure the original value that value comes in the given field of alternative described record and comprise: with determining that original value described in combination of function and described key produce the selective value for obscuring value described in selecting.
10. method as claimed in claim 9, wherein, with determine to map described selective value is mapped to described in obscure value.
11. methods as claimed in claim 9, wherein, the codomain obscuring value described in therefrom selecting comprises from the multiple described original value in the given field of the record of described data source.
12. methods as claimed in claim 11, wherein, original value described in one or more is not included in described codomain.
13. methods as claimed in claim 12, wherein, one or more value in described codomain is not included in described original value.
14. methods as claimed in claim 9, wherein, described determine the described key of function cryptographically stop from described obscure value recover described original value.
15. methods as claimed in claim 9, wherein, describedly determine that function provides selective value to the different order of continuous original value for the different value of described key.
16. methods as claimed in claim 15, wherein, the first value for described key can not predict from the second value for described key the second sequence of the selective value of continuous original value the First ray of the selective value of continuous original value.
17. methods as claimed in claim 9, wherein, obscure the original value that value comes in the given field of alternative described record described in generating with described key value to comprise: determine described selective value whether with effectively to obscure value corresponding, and if not, then repeatedly determine that selective value described in combination of function and described key produce additional selective value with described, until described additional selective value is with effectively to obscure value corresponding.
18. methods as claimed in claim 17, wherein, effectively obscure value and are made up of predetermined figure place.
19. the method for claim 1, comprise further and the described record from described data source is divided into multiple record set, further, with different computational resource concurrently in the record of different record set with the original value obscured described in generating in the alternative described given field of value.
20. at least the first records obscuring value the method for claim 1, wherein comprised in described obfuscated data set comprise the original value that at least one value that is not confused substitutes.
21. methods as claimed in claim 20, comprise further: multiple records that described original value is occurred, whether as one man substituted by identical value of obscuring based on described original value, determine whether to use described key value to substitute the original value in described first record to obscure value.
22. 1 kinds of systems for obfuscated data, described system comprises:
Be provided in the data source of the record in one or more field with value;
Data-storage system; And
One or more is connected to the processor of described data-storage system, provides execution environment:
The value occurred one or more field of multiple record is read in from described data source;
Storage key value;
To each in multiple record, the original value obscured value and come in the given field of alternative described record is generated with key value, obscure value described in making and depend on described key value and relevant to described original value definitely, wherein use same key to be mapped to by the original value of described multiple record and correspondingly obscure value; And
Store in described data-storage system and comprise the obfuscated data set comprising the record obscuring value.
23. 1 kinds of systems for obfuscated data, described system comprises:
Be provided in the data source of the record in one or more field with value;
Data-storage system; And
The device of the value occurred one or more field of multiple record is read in from data source;
To each in multiple record, the original value obscured value and come in the given field of alternative described record is generated with key value, obscure value described in making and depend on described key value and device relevant to described original value definitely, wherein use same key to be mapped to by the original value of described multiple record and correspondingly obscure value; And
The device comprising the obfuscated data set comprising the record obscuring value is stored in described data-storage system.
24. methods as claimed in claim 9, wherein, describedly determine that function always produces identical selective value to described original value with the identical value of described key value.
25. the method for claim 1, wherein in the given section of obscuring of multiple sections of obscuring storing obfuscated data set different separately, and the key value stored is as one man for carrying out alternative whole original value by respective value of obscuring.
26. the system as claimed in claim 22, wherein, configure described execution environment, to store the profile information of the statistics eigenwert comprising field described at least one.
27. systems as claimed in claim 26, wherein, to described given field, obscure value described in generating with described key value and the profile information stored.
28. systems as claimed in claim 27, wherein, described in obscure value with the frequency determined based on the statistics in the profile information eigenwert stored of described given field, appear in the given field of described obfuscated data set.
29. systems as claimed in claim 28, wherein, by with described original value and described key as the input of the function of generating indexes value and with described index value predetermined obscure value concentrate search described in obscure value, obscure value described in generation.
30. systems as claimed in claim 29, wherein, describedly predetermined obscure value collection and store as look-up table, in described look-up table, eachly obscure value corresponding to one or more index value.
31. systems as claimed in claim 30, wherein, the multiple index values in a scope correspond to predetermined obscured concentrated identical of value described and obscures value.
32. systems as claimed in claim 31, wherein, the size of described scope is based on the described statistics in the profile information eigenwert stored of given field.
33. systems as claimed in claim 26, wherein, generate with described key value and obscure the original value that value comes in the given field of alternative described record and comprise: with determining that original value described in combination of function and described key produce the selective value for obscuring value described in selecting.
34. systems as claimed in claim 33, wherein, with determine to map described selective value is mapped to described in obscure value.
35. systems as claimed in claim 33, wherein, the codomain obscuring value described in therefrom selecting comprises from the multiple described original value in the given field of the record of described data source.
36. systems as claimed in claim 35, wherein, original value described in one or more is not included in described codomain.
37. systems as claimed in claim 36, wherein, one or more value in described codomain is not included in described original value.
38. systems as claimed in claim 33, wherein, described determine the described key of function cryptographically stop from described obscure value recover described original value.
39. systems as claimed in claim 33, wherein, describedly determine that function provides selective value to the different order of continuous original value for the different value of described key.
40. systems as claimed in claim 39, wherein, the first value for described key can not predict from the second value for described key the second sequence of the selective value of continuous original value the First ray of the selective value of continuous original value.
41. systems as claimed in claim 33, wherein, obscure the original value that value comes in the given field of alternative described record described in generating with described key value to comprise: determine described selective value whether with effectively to obscure value corresponding, and if not, then repeatedly determine that selective value described in combination of function and described key produce additional selective value with described, until described additional selective value is with effectively to obscure value corresponding.
42. systems as claimed in claim 41, wherein, effectively obscure value and are made up of predetermined figure place.
43. the system as claimed in claim 22, wherein, configure described execution environment, so that the described record from described data source is divided into multiple record set, further, with different computational resource concurrently in the record of different record set with the original value obscured described in generating in the alternative described given field of value.
44. the system as claimed in claim 22, wherein, at least the first record obscuring value comprised in described obfuscated data set comprises the original value that at least one value that is not confused substitutes.
45. systems as claimed in claim 44, wherein, configure described execution environment, with the multiple records occurred described original value, whether as one man substituted by identical value of obscuring based on described original value, determine whether to use described key value to substitute the original value in described first record to obscure value.
46. systems as claimed in claim 33, wherein, describedly determine that function always produces identical selective value to described original value with the identical value of described key value.
47. the system as claimed in claim 22, wherein, in the given section of obscuring of multiple sections of obscuring storing obfuscated data set different separately, the key value stored is as one man for carrying out alternative whole original value by respective value of obscuring.
48. systems as claimed in claim 23, also comprise the device of the profile information for storing the statistics eigenwert comprising field described at least one.
49. systems as claimed in claim 48, wherein, to described given field, obscure value described in generating with described key value and the profile information stored.
50. systems as claimed in claim 49, wherein, described in obscure value with the frequency determined based on the statistics in the profile information eigenwert stored of described given field, appear in the given field of described obfuscated data set.
51. systems as claimed in claim 50, wherein, by with described original value and described key as the input of the function of generating indexes value and with described index value predetermined obscure value concentrate search described in obscure value, obscure value described in generation.
52. systems as claimed in claim 51, wherein, describedly predetermined obscure value collection and store as look-up table, in described look-up table, eachly obscure value corresponding to one or more index value.
53. systems as claimed in claim 51, wherein, the multiple index values in a scope correspond to predetermined obscured concentrated identical of value described and obscures value.
54. systems as claimed in claim 53, wherein, the size of described scope is based on the described statistics in the profile information eigenwert stored of given field.
55. systems as claimed in claim 33, wherein, generate with described key value and obscure the original value that value comes in the given field of alternative described record and comprise: with determining that original value described in combination of function and described key produce the selective value for obscuring value described in selecting.
56. systems as claimed in claim 55, wherein, with determine to map described selective value is mapped to described in obscure value.
57. systems as claimed in claim 55, wherein, the codomain obscuring value described in therefrom selecting comprises from the multiple described original value in the given field of the record of described data source.
58. systems as claimed in claim 57, wherein, original value described in one or more is not included in described codomain.
59. systems as claimed in claim 58, wherein, one or more value in described codomain is not included in described original value.
60. systems as claimed in claim 55, wherein, described determine the described key of function cryptographically stop from described obscure value recover described original value.
61. systems as claimed in claim 55, wherein, describedly determine that function provides selective value to the different order of continuous original value for the different value of described key.
62. systems as claimed in claim 61, wherein, the first value for described key can not predict from the second value for described key the second sequence of the selective value of continuous original value the First ray of the selective value of continuous original value.
63. systems as claimed in claim 55, wherein, obscure the original value that value comes in the given field of alternative described record described in generating with described key value to comprise: determine described selective value whether with effectively to obscure value corresponding, and if not, then repeatedly determine that selective value described in combination of function and described key produce additional selective value with described, until described additional selective value is with effectively to obscure value corresponding.
64. systems as described in claim 63, wherein, effectively obscure value and are made up of predetermined figure place.
65. systems as claimed in claim 23, wherein, also comprise and the described record from described data source is divided into multiple record set, and, in the record of different record set, described in generating, obscure the device that value substitutes the original value in described given field concurrently with different computational resource.
66. systems as claimed in claim 23, wherein, at least the first record obscuring value comprised in described obfuscated data set comprises the original value that at least one value that is not confused substitutes.
67. systems as described in claim 66, also comprise multiple records that described original value is occurred, whether as one man substituted by identical value of obscuring based on described original value, determine whether to use described key value to obscure the device that value substitutes the original value in described first record.
68. systems as claimed in claim 55, wherein, describedly determine that function always produces identical selective value to described original value with the identical value of described key value.
69. systems as claimed in claim 23, wherein, in the given section of obscuring of multiple sections of obscuring storing obfuscated data set different separately, the key value stored is as one man for carrying out alternative whole original value by respective value of obscuring.
CN201080032309.9A 2009-06-01 2010-06-01 Generate obfuscated data Active CN102460404B (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US18305409P 2009-06-01 2009-06-01
US61/183,054 2009-06-01
US12/497,354 US10102398B2 (en) 2009-06-01 2009-07-02 Generating obfuscated data
US12/497,354 2009-07-02
PCT/US2010/036812 WO2010141410A1 (en) 2009-06-01 2010-06-01 Generating obfuscated data

Publications (2)

Publication Number Publication Date
CN102460404A CN102460404A (en) 2012-05-16
CN102460404B true CN102460404B (en) 2015-09-09

Family

ID=43221811

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201080032309.9A Active CN102460404B (en) 2009-06-01 2010-06-01 Generate obfuscated data

Country Status (8)

Country Link
US (1) US10102398B2 (en)
EP (1) EP2438519B1 (en)
JP (1) JP5878462B2 (en)
KR (1) KR101873946B1 (en)
CN (1) CN102460404B (en)
AU (1) AU2010256869B2 (en)
CA (1) CA2763232C (en)
WO (1) WO2010141410A1 (en)

Families Citing this family (95)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110264631A1 (en) * 2010-04-21 2011-10-27 Dataguise Inc. Method and system for de-identification of data
US8566910B2 (en) * 2010-05-18 2013-10-22 Nokia Corporation Method and apparatus to bind a key to a namespace
US8661257B2 (en) 2010-05-18 2014-02-25 Nokia Corporation Generic bootstrapping architecture usage with Web applications and Web pages
US20120005720A1 (en) * 2010-07-01 2012-01-05 International Business Machines Corporation Categorization Of Privacy Data And Data Flow Detection With Rules Engine To Detect Privacy Breaches
GB201101805D0 (en) * 2011-02-02 2011-03-16 Oka Bi Ltd Computer system and method
US10534931B2 (en) 2011-03-17 2020-01-14 Attachmate Corporation Systems, devices and methods for automatic detection and masking of private data
WO2012127572A1 (en) * 2011-03-18 2012-09-27 富士通株式会社 Secret data processing method, program and device
KR101265099B1 (en) 2011-06-15 2013-05-20 주식회사 터보테크 A Method For Software Security Treatment And A Storage Medium
US8645763B2 (en) * 2011-09-12 2014-02-04 Microsoft Corporation Memory dump with expanded data and user privacy protection
JP2013061843A (en) * 2011-09-14 2013-04-04 Fujifilm Corp Computer software analysis system and client computer, and operation control method thereof and operation program thereof
US8539601B2 (en) 2012-01-17 2013-09-17 Lockheed Martin Corporation Secure data storage and retrieval
US9058813B1 (en) * 2012-09-21 2015-06-16 Rawles Llc Automated removal of personally identifiable information
US9298941B2 (en) 2012-11-12 2016-03-29 EPI-USE Systems, Ltd. Secure data copying
JP2014119486A (en) * 2012-12-13 2014-06-30 Hitachi Solutions Ltd Secret retrieval processing system, secret retrieval processing method, and secret retrieval processing program
US8954546B2 (en) 2013-01-25 2015-02-10 Concurix Corporation Tracing with a workload distributor
US9207969B2 (en) 2013-01-25 2015-12-08 Microsoft Technology Licensing, Llc Parallel tracing for performance and detail
US9021262B2 (en) 2013-01-25 2015-04-28 Concurix Corporation Obfuscating trace data
US9892026B2 (en) * 2013-02-01 2018-02-13 Ab Initio Technology Llc Data records selection
US20130283281A1 (en) 2013-02-12 2013-10-24 Concurix Corporation Deploying Trace Objectives using Cost Analyses
US8997063B2 (en) 2013-02-12 2015-03-31 Concurix Corporation Periodicity optimization in an automated tracing system
US8924941B2 (en) 2013-02-12 2014-12-30 Concurix Corporation Optimization analysis using similar frequencies
JP6206866B2 (en) 2013-02-19 2017-10-04 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Apparatus and method for holding obfuscated data in server
US20130219372A1 (en) 2013-03-15 2013-08-22 Concurix Corporation Runtime Settings Derived from Relationships Identified in Tracer Data
US9575874B2 (en) 2013-04-20 2017-02-21 Microsoft Technology Licensing, Llc Error list and bug report analysis for configuring an application tracer
US9292415B2 (en) 2013-09-04 2016-03-22 Microsoft Technology Licensing, Llc Module specific tracing in a shared module environment
KR101490047B1 (en) * 2013-09-27 2015-02-04 숭실대학교산학협력단 Apparatus for tamper protection of application code based on self modification and method thereof
US10515231B2 (en) * 2013-11-08 2019-12-24 Symcor Inc. Method of obfuscating relationships between data in database tables
EP3069241B1 (en) 2013-11-13 2018-08-15 Microsoft Technology Licensing, LLC Application execution path tracing with configurable origin definition
US10278592B2 (en) 2013-12-09 2019-05-07 Samsung Electronics Co., Ltd. Modular sensor platform
US11366927B1 (en) 2013-12-11 2022-06-21 Allscripts Software, Llc Computing system for de-identifying patient data
US10403392B1 (en) * 2013-12-11 2019-09-03 Allscripts Software, Llc Data de-identification methodologies
AU2014364882B2 (en) 2013-12-18 2020-02-06 Ab Initio Technology Llc Data generation
US10075290B2 (en) * 2013-12-20 2018-09-11 Koninklijke Philips N.V. Operator lifting in cryptographic algorithm
KR20160105396A (en) 2013-12-31 2016-09-06 삼성전자주식회사 Battery charger related applications
WO2015157870A1 (en) * 2014-04-17 2015-10-22 Datex Inc. Method, device and software for securing web application data through tokenization
WO2015177594A2 (en) 2014-05-22 2015-11-26 Samsung Electronics Co., Ltd. Electrocardiogram watch clasp
US9592007B2 (en) 2014-05-23 2017-03-14 Samsung Electronics Co., Ltd. Adjustable wearable system having a modular sensor platform
US10136857B2 (en) 2014-05-23 2018-11-27 Samsung Electronics Co., Ltd. Adjustable wearable system having a modular sensor platform
US9390282B2 (en) 2014-09-03 2016-07-12 Microsoft Technology Licensing, Llc Outsourcing document-transformation tasks while protecting sensitive information
JP6723989B2 (en) 2014-09-08 2020-07-15 アビニシオ テクノロジー エルエルシー Data driven inspection framework
US10976907B2 (en) 2014-09-26 2021-04-13 Oracle International Corporation Declarative external data source importation, exportation, and metadata reflection utilizing http and HDFS protocols
US10210246B2 (en) 2014-09-26 2019-02-19 Oracle International Corporation Techniques for similarity analysis and data enrichment using knowledge sources
US10891272B2 (en) * 2014-09-26 2021-01-12 Oracle International Corporation Declarative language and visualization system for recommended data transformations and repairs
DE102014016548A1 (en) * 2014-11-10 2016-05-12 Giesecke & Devrient Gmbh Method for testing and hardening software applications
JP6387466B2 (en) * 2014-12-22 2018-09-05 コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. Electronic computing device
DE102015104159B4 (en) * 2015-03-19 2018-05-09 Forensik.It Gmbh Selection between a real and a virtual user-specific data record for a data communication
SG10201502401XA (en) 2015-03-26 2016-10-28 Huawei Internat Pte Ltd Method of obfuscating data
US10140437B2 (en) 2015-05-05 2018-11-27 Nxp B.V. Array indexing with modular encoded values
JP6506099B2 (en) * 2015-05-20 2019-04-24 株式会社野村総合研究所 DATA MASKING DEVICE, DATA MASKING METHOD, AND COMPUTER PROGRAM
US9958521B2 (en) 2015-07-07 2018-05-01 Q Bio, Inc. Field-invariant quantitative magnetic-resonance signatures
US10194829B2 (en) 2015-07-07 2019-02-05 Q Bio, Inc. Fast scanning based on magnetic resonance history
EP3125144B1 (en) * 2015-07-31 2019-11-13 Nxp B.V. Array indexing with modular encoded values
US9665734B2 (en) 2015-09-12 2017-05-30 Q Bio, Inc. Uniform-frequency records with obscured context
US10964412B2 (en) 2015-10-20 2021-03-30 Q Bio, Inc. Population-based medical rules via anonymous sharing
US10359486B2 (en) 2016-04-03 2019-07-23 Q Bio, Inc. Rapid determination of a relaxation time
US10222441B2 (en) 2016-04-03 2019-03-05 Q Bio, Inc. Tensor field mapping
US11843597B2 (en) * 2016-05-18 2023-12-12 Vercrio, Inc. Automated scalable identity-proofing and authentication process
US20180035285A1 (en) * 2016-07-29 2018-02-01 International Business Machines Corporation Semantic Privacy Enforcement
US10740418B2 (en) * 2016-11-03 2020-08-11 International Business Machines Corporation System and method for monitoring user searches to obfuscate web searches by using emulated user profiles
US10915661B2 (en) 2016-11-03 2021-02-09 International Business Machines Corporation System and method for cognitive agent-based web search obfuscation
US10929481B2 (en) 2016-11-03 2021-02-23 International Business Machines Corporation System and method for cognitive agent-based user search behavior modeling
US10885132B2 (en) * 2016-11-03 2021-01-05 International Business Machines Corporation System and method for web search obfuscation using emulated user profiles
WO2018102286A1 (en) * 2016-12-02 2018-06-07 Equifax, Inc. Generating and processing obfuscated sensitive information
CN106936814B (en) * 2017-01-20 2018-07-06 北京海泰方圆科技股份有限公司 A kind of network protection methods, devices and systems
US11650195B2 (en) 2017-02-03 2023-05-16 Q Bio, Inc. Iterative medical testing of biological samples
US10614236B2 (en) * 2017-03-01 2020-04-07 International Business Machines Corporation Self-contained consistent data masking
US10936180B2 (en) 2017-03-16 2021-03-02 Q Bio, Inc. User interface for medical information
US20180285591A1 (en) * 2017-03-29 2018-10-04 Ca, Inc. Document redaction with data isolation
US9934287B1 (en) 2017-07-25 2018-04-03 Capital One Services, Llc Systems and methods for expedited large file processing
US11157563B2 (en) * 2018-07-13 2021-10-26 Bank Of America Corporation System for monitoring lower level environment for unsanitized data
KR102663589B1 (en) * 2018-10-26 2024-05-09 삼성전자주식회사 Server and controlling method thereof
US11176268B1 (en) * 2018-11-28 2021-11-16 NortonLifeLock Inc. Systems and methods for generating user profiles
US11360166B2 (en) 2019-02-15 2022-06-14 Q Bio, Inc Tensor field mapping with magnetostatic constraint
US11354586B2 (en) 2019-02-15 2022-06-07 Q Bio, Inc. Model parameter determination using a predictive model
CN110049035B (en) * 2019-04-10 2022-11-08 深圳市腾讯信息技术有限公司 Network attack protection method and device, electronic equipment and medium
CN111865869B (en) * 2019-04-24 2023-08-08 北京沃东天骏信息技术有限公司 Registration and authentication method and device based on random mapping, medium and electronic equipment
US11250169B2 (en) 2019-05-02 2022-02-15 Bank Of America Corporation System for real-time authenticated obfuscation of electronic data
CN110378083B (en) * 2019-06-12 2021-03-12 北京奇艺世纪科技有限公司 Boolean value confusion method and device and computer readable storage medium
US20210004485A1 (en) * 2019-07-01 2021-01-07 International Business Machines Corporation Cognitive Iterative Minimization of Personally Identifiable Information in Electronic Documents
US11429734B2 (en) * 2019-07-22 2022-08-30 Microsoft Technology Licensing, Llc Protection of sensitive data fields in webpages
US11614509B2 (en) 2019-09-27 2023-03-28 Q Bio, Inc. Maxwell parallel imaging
KR102622283B1 (en) 2019-09-27 2024-01-08 큐 바이오, 인코퍼레이티드 Maxwell Parallel Imaging
CN113010364B (en) * 2019-12-20 2023-08-01 北京奇艺世纪科技有限公司 Service data acquisition method and device and electronic equipment
US11664998B2 (en) 2020-05-27 2023-05-30 International Business Machines Corporation Intelligent hashing of sensitive information
US20210409196A1 (en) * 2020-06-30 2021-12-30 Sectigo, Inc. Secure Key Storage Systems Methods And Devices
US11604740B2 (en) * 2020-12-01 2023-03-14 Capital One Services, Llc Obfuscating cryptographic material in memory
US11580249B2 (en) 2021-02-10 2023-02-14 Bank Of America Corporation System for implementing multi-dimensional data obfuscation
US20220253541A1 (en) * 2021-02-10 2022-08-11 Bank Of America Corporation System for electronic data obfuscation through alteration of data format
US11907268B2 (en) 2021-02-10 2024-02-20 Bank Of America Corporation System for identification of obfuscated electronic data through placeholder indicators
CN113032791B (en) * 2021-04-01 2024-05-31 深圳市纽创信安科技开发有限公司 IP core, IP core management method and chip
WO2023037301A1 (en) * 2021-09-09 2023-03-16 Biosense Webster (Israel) Ltd. Method for securely storing and retrieving medical data
US11614508B1 (en) 2021-10-25 2023-03-28 Q Bio, Inc. Sparse representation of measurements
CN113987556B (en) * 2021-12-24 2022-05-10 杭州趣链科技有限公司 Data processing method and device, electronic equipment and storage medium
CN117278986B (en) * 2023-11-23 2024-03-15 浙江小遛信息科技有限公司 Data processing method and data processing equipment for sharing travel
CN118051892A (en) * 2024-04-15 2024-05-17 山东捷瑞数字科技股份有限公司 Integer unique identification confusion protection method, device, equipment and medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1831820A (en) * 2005-02-07 2006-09-13 微软公司 Method and system for obfuscating data structures by deterministic natural data substitution

Family Cites Families (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5664187A (en) 1994-10-26 1997-09-02 Hewlett-Packard Company Method and system for selecting data for migration in a hierarchic data storage system using frequency distribution tables
JPH1030943A (en) 1996-07-15 1998-02-03 Ckd Corp Sensor device, display device, and data writing device
US6728699B1 (en) 1997-09-23 2004-04-27 Unisys Corporation Method and apparatus for using prior results when processing successive database requests
US6581058B1 (en) 1998-05-22 2003-06-17 Microsoft Corporation Scalable system for clustering of large databases having mixed data attributes
DE19911176A1 (en) 1999-03-12 2000-09-21 Lok Lombardkasse Ag Anonymization process
WO2001001260A2 (en) * 1999-06-30 2001-01-04 Raf Technology, Inc. Secure, limited-access database system and method
US6546389B1 (en) 2000-01-19 2003-04-08 International Business Machines Corporation Method and system for building a decision-tree classifier from privacy-preserving data
US6567936B1 (en) 2000-02-08 2003-05-20 Microsoft Corporation Data clustering using error-tolerant frequent item sets
JP2001256076A (en) 2000-03-08 2001-09-21 Ricoh Co Ltd Device and method for generating test data and recording medium
US7237123B2 (en) 2000-09-22 2007-06-26 Ecd Systems, Inc. Systems and methods for preventing unauthorized use of digital content
JP4582939B2 (en) * 2001-03-07 2010-11-17 ソニー株式会社 Information management system, information management method, information processing apparatus, information processing method, and program
US20020138492A1 (en) 2001-03-07 2002-09-26 David Kil Data mining application with improved data mining algorithm selection
WO2002084531A2 (en) 2001-04-10 2002-10-24 Univ Carnegie Mellon Systems and methods for deidentifying entries in a data source
US7266699B2 (en) * 2001-08-30 2007-09-04 Application Security, Inc. Cryptographic infrastructure for encrypting a database
US7136787B2 (en) 2001-12-19 2006-11-14 Archimedes, Inc. Generation of continuous mathematical model for common features of a subject group
US7080063B2 (en) 2002-05-10 2006-07-18 Oracle International Corporation Probabilistic model generation
US7194317B2 (en) 2002-08-22 2007-03-20 Air Products And Chemicals, Inc. Fast plant test for model-based control
US20040107189A1 (en) 2002-12-03 2004-06-03 Lockheed Martin Corporation System for identifying similarities in record fields
WO2004084483A1 (en) * 2003-03-20 2004-09-30 Japan Medical Data Center Co., Ltd. Information management system
US7324109B2 (en) 2003-04-24 2008-01-29 Palmer James R Method for superimposing statistical information on tubular data
US7085981B2 (en) 2003-06-09 2006-08-01 International Business Machines Corporation Method and apparatus for generating test data sets in accordance with user feedback
CN102982065B (en) 2003-09-15 2016-09-21 起元科技有限公司 Data processing method, data processing equipment and computer-readable recording medium
US6957161B2 (en) 2003-09-25 2005-10-18 Dell Products L.P. Information handling system including power supply self diagnostics
US7797342B2 (en) * 2004-09-03 2010-09-14 Sybase, Inc. Database system providing encrypted column support for applications
JP2006163831A (en) 2004-12-07 2006-06-22 Nippon Telegr & Teleph Corp <Ntt> Device, method, and program for managing information, information invalidating device, and information collating device
US7334466B1 (en) 2005-01-04 2008-02-26 The United States Of America As Represented By The Secretary Of The Army Method and apparatus for predicting and evaluating projectile performance
JP2006236220A (en) 2005-02-28 2006-09-07 Ntt Data Technology Corp Device, method, program and storage medium for forming test data file
US7684963B2 (en) 2005-03-29 2010-03-23 International Business Machines Corporation Systems and methods of data traffic generation via density estimation using SVD
JP2007102540A (en) * 2005-10-05 2007-04-19 Hitachi Software Eng Co Ltd Character string conversion device and character string conversion program
JP2007108356A (en) * 2005-10-12 2007-04-26 Fujitsu Ltd Personal information concealing device and program for same
US7565349B2 (en) 2005-11-10 2009-07-21 International Business Machines Corporation Method for computing frequency distribution for many fields in one pass in parallel
KR100735012B1 (en) 2006-01-23 2007-07-03 삼성전자주식회사 Methodology for estimating statistical distribution characteristics of product parameters
US7937693B2 (en) 2006-04-26 2011-05-03 9Rays.Net, Inc. System and method for obfuscation of reverse compiled computer code
JP4878527B2 (en) * 2006-09-08 2012-02-15 富士通株式会社 Test data creation device
US8209549B1 (en) 2006-10-19 2012-06-26 United Services Automobile Association (Usaa) Systems and methods for cryptographic masking of private data
US7724918B2 (en) 2006-11-22 2010-05-25 International Business Machines Corporation Data obfuscation of text data using entity detection and replacement
JP5083218B2 (en) * 2006-12-04 2012-11-28 日本電気株式会社 Information management system, anonymization method, and storage medium
US8069129B2 (en) 2007-04-10 2011-11-29 Ab Initio Technology Llc Editing and compiling business rules
JP4575416B2 (en) 2007-10-29 2010-11-04 みずほ情報総研株式会社 Test data generation system, test data generation method, and test data generation program
JP4986817B2 (en) 2007-11-13 2012-07-25 株式会社ソニーDadc Evaluation device, evaluation method, program
US7877398B2 (en) 2007-11-19 2011-01-25 International Business Machines Corporation Masking related sensitive data in groups
US7953727B2 (en) * 2008-04-04 2011-05-31 International Business Machines Corporation Handling requests for data stored in database tables
US9305180B2 (en) * 2008-05-12 2016-04-05 New BIS Luxco S.à r.l Data obfuscation system, method, and computer implementation of data obfuscation for secret databases
EP2189925A3 (en) * 2008-11-25 2015-10-14 SafeNet, Inc. Database obfuscation system and method
KR20150040384A (en) 2009-06-10 2015-04-14 아브 이니티오 테크놀로지 엘엘시 Generating test data
US8862557B2 (en) 2009-12-23 2014-10-14 Adi, Llc System and method for rule-driven constraint-based generation of domain-specific data sets
US9298878B2 (en) * 2010-07-29 2016-03-29 Oracle International Corporation System and method for real-time transactional data obfuscation
JP6066927B2 (en) 2011-01-28 2017-01-25 アビニシオ テクノロジー エルエルシー Generation of data pattern information
US9892026B2 (en) 2013-02-01 2018-02-13 Ab Initio Technology Llc Data records selection
AU2014364882B2 (en) 2013-12-18 2020-02-06 Ab Initio Technology Llc Data generation

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1831820A (en) * 2005-02-07 2006-09-13 微软公司 Method and system for obfuscating data structures by deterministic natural data substitution

Also Published As

Publication number Publication date
WO2010141410A1 (en) 2010-12-09
KR101873946B1 (en) 2018-07-03
JP2012529114A (en) 2012-11-15
CA2763232A1 (en) 2010-12-09
US20100306854A1 (en) 2010-12-02
KR20120037423A (en) 2012-04-19
EP2438519A4 (en) 2013-01-09
CN102460404A (en) 2012-05-16
JP5878462B2 (en) 2016-03-08
AU2010256869A1 (en) 2011-12-15
AU2010256869B2 (en) 2016-06-16
EP2438519A1 (en) 2012-04-11
US10102398B2 (en) 2018-10-16
EP2438519B1 (en) 2017-10-11
CA2763232C (en) 2019-02-12

Similar Documents

Publication Publication Date Title
CN102460404B (en) Generate obfuscated data
Wang et al. Searchable encryption over feature-rich data
US9720943B2 (en) Columnar table data protection
CN109791594B (en) Method and readable medium for performing write and store operations on a relational database
CN102460076A (en) Generating test data
Xu et al. Authenticating aggregate queries over set-valued data with confidentiality
Vatsalan et al. Efficient two-party private blocking based on sorted nearest neighborhood clustering
US20170337386A1 (en) Method, apparatus, and computer-readable medium for automated construction of data masks
US11222131B2 (en) Method for a secure storage of data records
CN115659417A (en) Audit log storage method, audit log verification method, audit log storage device, audit log verification device and computer equipment
Vaiwsri et al. Accurate and efficient privacy-preserving string matching
Millham et al. Pattern mining algorithms
Loporchio et al. Authenticating spatial queries on blockchain systems
CN113934729A (en) Data management method based on knowledge graph, related equipment and medium
Xu et al. Dynamic proofs of retrievability with square-root oblivious RAM
He et al. FMSM: A fuzzy multi-keyword search scheme for encrypted cloud data based on multi-chain network
Vijayalakshmi et al. Revamp perception of bitcoin using cognizant Merkle
US20240119178A1 (en) Anonymizing personal information for use in assessing fraud risk
CN114116715B (en) Storage construction and retrieval method for secret state knowledge graph for protecting confidentiality of data
CN116756779B (en) Electronic form data objectification storage system and method
US20230163970A1 (en) Generating cryptographic proof of a series of transactions
Lin et al. A Noise Generation Scheme Based on Huffman Coding for Preserving Privacy
Ahmed A Novel Framework to Secure Schema for Data Warehouse in Cloud Computing (Force Encryption Schema Solution)
Geng Enhancing Relation Database Security With Shuffling
CN115687535A (en) Management method and device of relational database

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant