CN107480549B - A kind of sensitive information desensitization method and system that data-oriented is shared - Google Patents
A kind of sensitive information desensitization method and system that data-oriented is shared Download PDFInfo
- Publication number
- CN107480549B CN107480549B CN201710506066.1A CN201710506066A CN107480549B CN 107480549 B CN107480549 B CN 107480549B CN 201710506066 A CN201710506066 A CN 201710506066A CN 107480549 B CN107480549 B CN 107480549B
- Authority
- CN
- China
- Prior art keywords
- data
- desensitization
- sensitive
- sensitive information
- attributes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
- G06F21/6254—Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Bioethics (AREA)
- Medical Informatics (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The sensitive information desensitization method and system shared the present invention relates to a kind of data-oriented; the present invention uses statistics, natural language processing technique and machine learning techniques; the protection for using this whole process sensitive data from data publication to data application is realized, is proposed based on the automatic identification for constructing sensitive information keywords database, naming the sensitive informations such as entity class and address class;The Sensitive Attributes degree of association is calculated using Sigmoid function;Desensitization strategy is carried out using the building form of building Sensitive Attributes create-rule library and the desensitization algorithm for naming entity desensitization rule and core;Respectively in connection with numeric type Sensitive Attributes and categorical attribute desensitization depth calculation, obtain the desensitization degree of whole data set, and the method for taking download link address Hash realizes the controlled output etc. of data, it can guarantee data sensitive information security and maximize the sensitive information processing strategie for meeting analysis mining requirement, have the characteristics that desensitization effect is good, highly reliable.
Description
Technical field
It is shared the present invention relates to the interleaving techniques field more particularly to a kind of data-oriented of information technology and data safety
Sensitive information desensitization method and system.
Background technique
In recent years, information technology and economic society cross to merge and have caused data and rapidly increase, and data have become important
Sexual development resource.2016, government pushed information system and common data to interconnect opening and shares energetically, accelerated government information platform
Information island is eliminated in integration, and recommending data resource is open to the society, guides social development, better services are in the public.However big
Under data background, data opening and shares also bringing challenges property the problem of, Various types of data leakage event frequently occur, such as Anhui nearly six
Thousand newborn's information leakage events, have targetedly fraudulent call event etc. at annual college entrance examination information leakage, so that the whole society
More collaboration focused data safeguard protection is transferred to from data opening and shares are focused on.For this purpose, many countries promulgate range of information
Safety-related laws and regulations, such as " privacy act " and " Government of the People's Republic of China's information discloses regulations " in China, this is just
It is required that data have to comply with specific condition during opening and shares, it cannot be personal comprising mark in open data set
The data of identity, to guarantee that the user of data set cannot be inferred to individual privacy information etc. easily;And again reasonably
Meet common people's diversified demand, guarantee that data resource can generate new value.Therefore, data security protecting is realized, and can be most
Bigization plays data resource utility value, is the challenging problem of current information security processing technology field.
In recent years, a large amount of research has been done in terms of protecting sensitive data.Patent No. CN201511026582.1 is from number
It sets out according to the angle of desensitization system, describes the sensitive data under big data environment and circulating, exchanging the entire rings such as shared, transaction
The protection of section, and different sensitive guard methods has been used in each link, it is also proposed that it is based on expert system and natural language
The sensitive data of processing finds method, finally also passes through the metric data desensitization ring of verifying desensitization result correctness and authenticity
Section.Patent No. CN201610338383.2 propose it is a kind of in a network environment to after data encryption will encryption code key and encryption after
Desensitization Data Physical separate storage, and to encryption code key and desensitization data stringent access authority is set, guarantee that data add
Close or decryption safety.The structured query language SQL that patent No. CN201510303954.4 is sent by receiving user
Instruction judges to include sensitive data in accessed data, and passes through access privilege and pre-set desensitization conversion rule
Then SQL instruction is converted, so that the desensitization data that the instruction after conversion is accessed.Patent No. CN201510755773.5
It discloses one kind and desensitization method is retained using format to different types of private data, be put in storage it with ciphertext form, can keep away
Exempt from ciphertext length and define length greater than literary name section, cause data to load and occur, avoids type and source number after number field encryption
It is mismatched according to type, data is caused to load error.
However in above-mentioned desensitization system or desensitization method, all have some limitations.Main cause is: (1)
Most of desensitization system and method are both in the structural data of database, and for unstructured data (such as textual data
According to) be not involved with and how to handle;(2) lack the completeness for considering sensitive data desensitization, if sensitive data desensitization depth is not
It is enough, it prevents using non-sensibility data reconstruction sensitive data;(3) mark uniqueness is consistent with format after not can guarantee data desensitization
Property require, such as hospital data, generally identified and positioned with identification card number it is personal, if calculated using desensitization algorithm or encryption
Method, so that ID card information loses the uniqueness of mark and the consistency of format.
Summary of the invention
The present invention is to overcome above-mentioned shortcoming, and it is an object of the present invention to provide the sensitive information desensitization that a kind of data-oriented is shared
Method and system, the present invention use statistics, natural language processing technique and machine learning techniques, realize from data publication to
Data application uses the protection of this whole process sensitive data, proposes real based on building sensitive information keywords database, name
The automatic identification of the sensitive informations such as body class and address class;The Sensitive Attributes degree of association is calculated using Sigmoid function;It is quick using constructing
The building form of the desensitization algorithm of sense attribute create-rule library and name entity desensitization rule and core carries out desensitization strategy;Point
Not Jie He numeric type Sensitive Attributes and categorical attribute desensitize depth calculation, obtain the desensitization degree of whole data set, and take
The method of download link address Hash realizes the controlled output etc. of data, can guarantee data sensitive information security and maximize full
The sensitive information processing strategie that sufficient analysis mining requires.
The present invention is to reach above-mentioned purpose by the following technical programs: a kind of sensitive information desensitization side that data-oriented is shared
Method includes the following steps:
(1) sensitive information automatic identification rule and sensitive information processing rule are preset, wherein the sensitive information is certainly
Dynamic recognition rule includes constructing all kinds of sensitive information keywords databases, the automatic knowledge to sensitive information in sensitive information keywords database
Not, the automatic identification of number and numerical value class sensitive information, the automatic identification for naming entity class sensitive information, address class sensitive information
Accurately identify;The sensitive information processing rule includes Sensitive Attributes create-rule, setting desensitization algorithm, name entity desensitization
Processing, address information desensitization process;The data of data set provider publication are checked in data consumer's request;
(2) data are pre-processed, pre-processes laggard style of writing notebook data participle and part-of-speech tagging;
(3) automatic identification is carried out to sensitive information according to pre-set sensitive information automatic identification rule;
(4) it is analyzed by the Sensitive Attributes calculation of relationship degree to sensitive information, retains the Sensitive Attributes degree of association and be higher than threshold value
Sensitive information;Wherein threshold value is preset;
(5) rule is handled according to pre-set sensitive information and desensitization process is carried out to sensitive information;
(6) the desensitization depth of sensitive information is calculated, and judges whether desensitization depth meets preset requirement;If no
Meet, then return step (5) re-starts desensitization process;Otherwise, the data set after desensitization is exported, for data consumer
It checks.
Preferably, the pretreatment operation of the step (2) is as follows: being divided according to data type the data of publication
Class, data type include structured form types of databases data, list data, data warehouse data and non-structured document
Data;It needs to check the integrality, consistency, correctness of attribute value when pretreatment, and by non-structured number of files
According to text data is parsed into, parsed when document data parses using analytical tool.
Preferably, the automatic identification of the name entity class sensitive information is used based on hidden Markov HMM model
The part-of-speech tagging and building name entity knowledge base combination of Viterbi algorithm are realized;The address class sensitive information
It accurately identifies by judging that the adjacent sequence of terms of address information is realized.
Preferably, the Sensitive Attributes calculation of relationship degree method is as follows:
(a) degree of association of classifying type Sensitive Attributes is standardized using Sigmoid function, is such as given a definition:
Wherein, the codomain section of the function is [0,1], and continuous, smooth, monotonic increase;
(b) assume that every record has p attribute { u in data set T1,u2,...,up, and if each attribute respectively correspond
Dry attribute value, is divided into and is denoted as { q1,q2,...,qp};In one record, the corresponding attribute value of Sensitive Attributes occurs being denoted as 1,
Do not occur being denoted as 0, then this record can be expressed as (a q1+q2+...+qp) dimension row vectorWhen data set T has n item
Record, is successively denoted as { t1,t2,...,tn, then just there are n (q1+q2+...+qp) dimension row vector, it is expressed as
(c) by (q1+q2+...+qp) correspond in dimension row vector value on position carry out with or and XOR operation, useIndicate with or when operation correspond to the case where attribute value is collectively labeled as 1 on position, useIt indicates with or transports
Attribute value on position is corresponded to when calculation is collectively labeled as 0;The then degree of association S (I between two attributes1,I2) calculation formula is as follows:
Wherein, by parameter lambda in calculating1, λ2, λ3It is set to 0.5,0.25,0.25, and codomain is 0≤S (I1,I2)≤1。
Preferably, described check numbers carries out desensitization process with the sensitive information of numeric type specifically: sensitive by formulating
The rule is stored in Sensitive Attributes create-rule library by the rule that attribute generates, and is called preset based on data distortion and encryption
Desensitization algorithm converts newborn Sensitive Attributes value according to desensitization task, the data after eventually forming desensitization.
Preferably, code table of the described pair of name entity class sensitive information using a common Chinese name entity, storage
The mechanism name and Chinese Name of million ranks are replaced after original name entity progress Hash tables look-up, complete desensitization process;
Method to address class sensitive information is to be desensitized according to the level of detail of address information, will switch to longitude and latitude by address,
If can not parse original sensitive address information, do not need to desensitize, explanation is that comparison obscures address;If can parse
Related latitude and longitude information generates another new ground then according to longitude and latitude is converted in the range of original address location/county out
Location information, and address to street/small towns is obscured according to user's access right.
Preferably, the desensitization depth is difference degree between the data set and raw data set measured after desensitizing,
Difference degree size is directly proportional to desensitization depth size, and calculation method is as follows: the calculating of (I) Numeric Attributes desensitization depth:
Assuming that Numeric Attributes codomain of attribute value before desensitization isAttribute value after desensitizationThen Numeric Attributes desensitization depth Dsz(m,m*):
(II) categorical attribute desensitization depth calculation:
The desensitization depth of categorical attribute is sought by extensive tree-model is constructed, categorical attribute is calculated using following formula
Desensitize depth Dfl(r,r*):
Dfl(r,r*)=((Nh-1)×Step(r,r*))/((N-1)×step(r,e))
Wherein, r, r*Attribute value after indicating attribute value before desensitizing and desensitizing, NhIndicate a certain preceding attribute of categorical attribute desensitization
The child node number of value and its same father node, N indicate extensive leaf nodes number, and e indicates root node, setp (x, y) table
Show attribute value node x desensitization after attribute node y the number of steps of;
(III) combining step (I) and step (II) obtain data set desensitization depth calculation formula D (T, T*), it is as follows:
Wherein, n indicates contained record number in data set;c1, c2It is expressed as Numeric Attributes number and categorical attribute
Number.
Preferably, the method that the data set after described pair of desensitization takes Hash converts under original storage link generation newly
The mode for carrying chained address carries out the controlled output of data.
A kind of shared sensitive information of data-oriented desensitizes system, including System Management Unit, data source administrative unit, quick
Feel information identificating unit, sensitive information processing unit, data outputting unit;The System Management Unit is for constructing desensitization system
User account and access control identify the role and permission of user, only allow the legal user's operation for closing power corresponding
Data;The data source administrative unit includes storing data source information;The sensitive information recognition unit is each for automatic identification
Sensitive information in categorical data source, and calculate data source and concentrate each Sensitive Attributes relevance;The sensitive information processing unit
For automatically creating desensitization task, matching desensitization strategy and desensitization algorithm;The data outputting unit is for safely and effectively controlling
The data output that sensitive data processed uses;System Management Unit, data source administrative unit, sensitive information recognition unit, sensitive letter
Breath processing unit, data outputting unit are sequentially connected.
Preferably, the data source administrative unit includes data source types, IP address, storage address and data source data
Structure extraction and management;The sensitive information recognition unit is based on natural language processing technique and carries out at participle to text data
Reason, on the basis ofs constructing all kinds of sensitive information knowledge bases, mark sensitive information rank etc. using manual type, rule-based and mould
Formula matching way automatic identification sensitive information, while introducing Sigmoid functional based method and calculating the Sensitive Attributes degree of association;The sensitivity
Information process unit is based on natural language processing technique and data request for utilization is examined and created automatically corresponding desensitization times
Business, Sensitive Attributes create-rule library is respectively adopted, searched using Hash table converted within the scope of mode and address information longitude and latitude and
The modes such as all kinds of desensitization algorithms carry out desensitization process to all kinds of sensitive informations;The data outputting unit is quick by desensitization process
Feel attribute value and replace original Sensitive Attributes value, and new storage address is generated using hash algorithm transformation initial data storage address
Output data.
The beneficial effects of the present invention are: (1) present invention can to avoid desensitization data in data set unique identification belong to
Property duplicating property problem;(2) by the present invention in that calculating the degree of association of Sensitive Attributes with Sigmoid function, realization will be close
The high Attribute transposition of degree can not only prevent desensitization data and be rebuild, can also delete the weak attribute of correlation at one group,
Operation efficiency is provided;(3) present invention combines the desensitization depth of Numeric Attributes and categorical attribute, to calculate entire data set
Desensitization degree efficiently control desensitization effect in such a way that threshold value is set;(4) method that the present invention takes Hash converts
Original storage link generates new download link address, realizes the controlled output of data, can guarantee data sensitive information security
Protection;(5) present invention is suitable for the sensitive information of the structural data of type of database and the unstructured data of Doctype
Desensitization, has the characteristics that desensitization effect is good, highly reliable.
Detailed description of the invention
Fig. 1 is the configuration diagram of present system;
Fig. 2 is the flow diagram of the method for the present invention;
Fig. 3 is the data source format schematic diagram inputted in the embodiment of the present invention;
Fig. 4 is the name Entity recognition block diagram in the embodiment of the present invention.
Specific embodiment
The present invention is described further combined with specific embodiments below, but protection scope of the present invention is not limited in
This:
Embodiment: as shown in Figure 1, the sensitive information desensitization system that a kind of data-oriented is shared includes for being arranged and managing
System user account information constructs the System Management Unit of role and authority configuration;For the data source capsule of storing data source information
Manage unit;Can sensitive information in all types of data sources of automatic identification, and data source can be calculated and concentrate each Sensitive Attributes association
The sensitive information recognition unit of property;Desensitization task can be automatically created, matching is desensitized at tactful and desensitization algorithm sensitive information
Manage unit;The data outputting unit that sensitive data uses can safely and effectively be controlled.The System Management Unit includes building
The system user account that desensitizes and access control, identify the role and permission of user, only allow the legal user behaviour for closing power
Make corresponding data.
The data source administrative unit includes storing data source information, including original data source information and target data source
Information, the type of data source are that database data, document data, data warehouse data etc. are one of or a variety of.Unified
Global control sensitive data source, including data-source IP address, storage address, title, data class may be implemented in data source control
Type and type of database and username and password etc.;Can all types of data sources be carried out with pretreatment operation, pretreatment simultaneously
Data source afterwards regenerates address link, uses for subsequent sensitive information recognition unit and sensitive information processing unit.
The sensitive information recognition unit can according to building sensitive information knowledge base, default sensitive information discovery rule,
The all types of data sources of automatic identifications such as customized discovery rule are marked related to Sensitive Attributes by the sensitive information rank of priori
Property analysis, further determine that Sensitive Attributes at different levels and the relevance between it, prevent because sensitive data desensitization degree not
Cause sensitive data to be rebuild deeply, causes secondary leakage.
The sensitive information processing unit can be based on user right and access control, be arranged for Sensitive Attributes at different levels
Corresponding desensitization strategy, desensitization rule and desensitization algorithm, while supporting customized setting desensitization process method.
The data outputting unit can be realized in data using being protected in downloading process, and output protection method is will
The Sensitive Attributes value of desensitization process replaces original Sensitive Attributes value, and generates new storage address, but do not change source data
Storage address and content, the address data memory after desensitization is raw by using hash algorithm transformation initial data storage address
At, while for the storage efficiency of less big data platform, desensitization data are destroyed in time.
The source of the data set of the present embodiment is the people's mediation document of certain city part, in every deed of arrangement, except concluding a case
It is all document data by details and reconciliation agreement, such as PDF, word document, other attributes are deposited in the form of structural data
It is placed in database table.
As shown in Fig. 2, the sensitive information desensitization method that a kind of data-oriented is shared, specific embodiment are as follows:
Step 1: acquisition, the pretreatment of data
Step 1.1: data acquisition
Data set provider is released news by the account and permission obtained in System Management Unit, and the data that will acquire
It is stored in data source administrative unit, if table 1 is people's mediation case field composed structure.
Table 1
Input data source format (due to being related to individual privacy, has been done in the data of input as shown in Figure 3 in systems
Desensitization process replaces number with letter, but is considered truthful data for the time being in the present invention): when data consumer obtains data
When, applied, after examination & approval pass through, system requests to carry out data desensitization operation according to application.
Step 1.2: structured data type pretreatment and document data parsing
The pretreatment of structural data mainly to it is noise-containing in each attribute value (including mistake, exist deviation expectation
Outlier), it is inconsistent that (representation of certain attribute values is inconsistent in data set, as gone out in date of birth and identity card
Phase birthday is inconsistent), there are the data that unique identification attribute has situations such as repeated (such as ID card No. repetition), missing values
It is marked;Expression is not inconsistent normally, if case is 16-06-12 by time of origin, initial data should be transformed to 2016-06-
12;
Document data parsing is to use corresponding analytical tool parsing to extract document text content, as POI parses WORD
Document can also parse other document formats with PDFBox operation pdf document etc., as HTML, WORD, XML, PDF, EXCEL,
TXT。
Step 1.3: text data participle and part-of-speech tagging
(1) it reconciles as follows in case " case is by details ":
Party A and Party B fasten neighborhood downstairs, live in Shanghai City Center Road ABC and do No. A 203 Room water taps
It does not fasten, causes 103 Room cabinet of Zhang San family infiltration downstairs, clothing drenches, go to and upstairs solved, and discovery interior should not have
People exists, and then looks for property and holds consultation, and learns that owner's name is Li Si, contact method 19821210912 is contacted immediately and wanted
Timely processing is asked, but after 3 days, is upstairs also not handled by, user's heavy losses downstairs, the present village the Xiang Yi people are had resulted in
Mediation committee's application is reconciled, it is desirable that 103 Room owner's reimbursements of damages.
(2) it introduces dictionary and stop words is segmented
It is more customized about dictionaries such as mechanism name suffix, area, new word, special words, be such as added " mediation committee ",
" upper to go downstairs ", the Belt and Road, " construction project ", participle can pay the utmost attention to dictionary, then " promote the Belt and Road construction project ", just
Preferentially it is divided into propulsion/the Belt and Road/construction project;Existing various deactivated vocabularys in network are arranged, the base of duplicate removal, leak repairing is carried out
On on plinth, arrange one than more comprehensive vocabulary, such as " Party A ", " Party B ", " both sides ", " progress ", " even if " word and each
Kind punctuation mark etc..
(3) " case is by details " participle and part-of-speech tagging result
Upper/n neighbours/n inhabitation/Shanghai City v/Center Road the ns/room No. A/m/m203/m of the ns/n tap water/l faucet/n that goes downstairs
Fasten/v downstairs/the s Zhang San/room nr 103/m/n cabinet/n infiltration/v clothing/n drenches/n goes to/v upstairs/n solution/v discovery/v
Interior/s do not have/and v people/n searching/v property/n negotiation/n learns/v owner/n name/n Li Si/nr contact method/n
19821210912/m connection/n requirement/v processing in time/i 3/m days/q upstairs/n handle/n downstairs/n user/n seriously damages
Mistake/mono- village the l/n people's mediation committee/n application/v reconciles the room v 103/m/n owner/n reimbursement of damages/n.
Wherein, step 1.1 belongs to System Management Unit functional category, and step 1.2,1.3 belong to data source administrative unit function
It can scope.
Step 2: construction sensitive information keywords database
All kinds of sensitive information keywords databases are constructed by manual type and mark its sensitive rank, in number and numerical value class
The keywords database of contact method has telephone number, contact method, cell-phone number (code), communication modes, home Tel, mobile number, connection
The various expression ways such as logical number, telecom number.Sensitive information is divided by four kinds of ranks, first level according to sensitivity simultaneously
To can recognize attribute, the attribute of someone, such as ID card No., name, address can be definitely positioned;Second level is half identification
Attribute, i.e., single-row attribute can not position someone, but multiple row information can be used to potentially identify someone;Third level is
Sensitive Attributes, such as disease, income, schooling;Fourth level is non-sensitive attribute;As shown in table 2;In addition in the present invention
The sensitive information being discussed is first three rank Sensitive Attributes, as shown in table 3.
Table 2
Table 3
Step 3: number and numerical value class sensitive information automatic identification
Number and numeric type sensitive information are identity card ID, various card accounts and password, contact method, virtual account and close
The information such as code, license plate number, Social Security Number.The identification of such sensitive information can be based on create-rule, pass through regular expression
Mode finds, and such sensitive information is all that can definitely identify someone, therefore be such attribute labeling is identifiable
Sensitive Attributes.
Step 4: naming the automatic identification of entity class
The recognition methods of name, mechanism name is based on the Viterbi of the hidden Markov HMM model of natural language processing technique
The part-of-speech tagging and building name entity knowledge base combination of algorithm are identified.
Wherein building name entity knowledge base includes construction sensitive information keywords database, all kinds of name entity patterns, front and back
Sew rule and situation template.Training corpus can be passed through by wherein naming entity patterns, front and back to sew rule and the discovery of position
Library, the feature vocabulary and front and back for obtaining name entity sew regular vocabulary and corresponding position vocabulary, then have marked with participle tool
Part of speech is combined mode and extracts entity part above and below part of speech, as shown in Figure 4.
Step 5: address category information accurately identifies
It is obtained by the adjacent sequence of terms of the address fragment information after judgement participle than better address information, if adjacent
It connects sequence of terms (the continuous 2-3 word of context) to be expressed as address category information or meet address matching rule, then carries out
Combination is again identified that, and will carry out longitude and latitude conversion, if it is possible to calculate latitude and longitude information, then it represents that such address
Information is that can recognize attribute value.For example, Shanghai City/Zhongshan Road ns/No. A/m/m of ns, detects Shanghai City, the table of Zhongshan Road two
Take over the land for use location Sensitive Attributes, and according to address pattern match obtain below No. A also belong to address information, so that it may these are abutted
Sequence of terms combination, so that it may obtain this better address of Shanghai City Center Road A, and calculate its longitude and latitude.
Step 6: Sensitive Attributes calculation of relationship degree
It is calculated by Attribute Association degree, finds the correlation in data set between Sensitive Attributes, and the degree of association is bigger, then say
Bright correlation is stronger.By calculating the degree of association between Sensitive Attributes, can to gather the more close Sensitive Attributes of the degree of association
It is combined, the very weak attribute of the degree of association is deleted, the size of desensitization data set can not only be reduced in this way, reduce data
The operand of desensitization process, improves the execution efficiency of respective algorithms, at the same can also by identity that priori knowledge determines and
Half identity Sensitive Attributes, then excavate arrive other Sensitive Attributes in this way, data desensitization effect is further increased, is prevented
Recombination causes leakage may between Sensitive Attributes.
In the present invention, it is standardized using the degree of association of the Sigmoid function to classifying type Sensitive Attributes, it is as follows
Definition:
Wherein, the codomain section of the function is [0,1], and continuous, smooth, monotonic increase.As x=0, codomain 0.5.
Assuming that every record has p attribute { u in data set T1,u2,...,up, and each attribute respectively correspond it is several
A attribute value, is divided into and is denoted as { q1,q2,...,qp, in a record, the corresponding attribute value of Sensitive Attributes occurs being denoted as 1, no
Appearance is denoted as 0, then this record can be expressed as (a q1+q2+...+qp) dimension row vectorWhen data set T has n item note
Record, is successively denoted as { t1,t2,...,tn, then just there are n (q1+q2+...+qp) dimension row vector, it is expressed as
By (q1+q2+...+qp) correspond in dimension row vector value on position carry out with or and XOR operation, use
Indicate with or when operation correspond to the case where attribute value is collectively labeled as 1 on position, useIndicate with or when operation correspond to position
Upper attribute value is collectively labeled as 0;The then degree of association S (I of two attributes1,I2) calculation formula are as follows:
Wherein, by parameter lambda in the present invention1, λ2, λ3It is respectively set to 0.5,0.25,0.25, and codomain is 0≤S (I1,I2)
≤1。
The degree of association between Sensitive Attributes is measured by constructing Sigmoid function in the present embodiment.Use formula
(1), formula (2) calculates the related coefficient of two Sensitive Attributes, and related coefficient is bigger, then correlation is higher.
Such as: schooling attribute value { university, senior middle school, junior middle school, primary school }, wage category attribute value have 10K or more,
10K-8K, 8K-6K, 8K-6K, 2K-4K, 2K or less }, will according to schooling and wage category attribute value university, senior middle school, just
In, primary school, 10K or more, 10K-8K, 8K-6K, 8K-6K, 2K-4K, 2K or less }, when record 1, record 2, the column that record 3 obtains
Vector is
{ 1,0,0,0,1,0,0,0,0,0 }, { 0,0,1,0,0,0,1,0,0,0 }, { 1,0,0,0,1,0,0,0,0,0 }.
Three above record calculate two-by-two with or with exclusive or, θ (x)=0.4 is obtained, is then calculated using formula (1)
Obtaining correlation is 0.95.
In the present invention, other methods can also be used to be calculated when Sensitive Attributes calculation of relationship degree, in the present invention
It protects in right, such as Apriori algorithm based on correlation rule frequent item set, the sensitivity for the condition that meets is found by iterative manner
Attribute frequent item set;In addition there are also use mean-square contingency coefficient, it is assumed that two Sensitive Attributes are I1And I2, codomain is respectively
{v11,v12,...,v1pAnd { v21,v22,...,v2q}.So I1And I2Mean-square contingency coefficient are as follows:
Wherein, Sensitive Attributes value v1iAnd v2jFrequency of occurrence f is concentrated in initial dataiAnd fjTo indicate.fijIndicate v1iAnd v2j
The number occurred in same record.Therefore, fiAnd f.jWith following relationship:AndAnd 0
≤Φ2(S1,S2)≤1。
Wherein, step 2 to step 6 belongs to sensitive information recognition unit functional category.
Following steps 7 to step 11 belongs to sensitive information processing unit functional category, and system can be based on natural language
Application content (including applied field is filled in the automatic desensitization request or data consumer that processing technique submits data set provider
Scape, application purpose etc.) examination & approval automatically, copending by rear, desensitization task, the system accordingly of creating automatically identify request for data
Middle sensitive information, and desensitization process is carried out according to corresponding desensitization task.
Step 7: setting desensitization algorithm
Desensitization algorithm based on data distortion and encryption is set in systems, as random number replacement, customized exchange replace,
Hash, Encryption Algorithm etc. convert initial data;It can also carry out blocking number according to the actual requirement of desensitization task simultaneously
According to certain characters, data generaliza-tion etc.;
Step 8: being desensitized based on Sensitive Attributes create-rule library
For the rule that the sensitive data of number or numeric type can be generated by formulating Sensitive Attributes, which is stored in
Sensitive field create-rule library;The rule that the sensitivity field generates can be equal to the generation of the sensitive field in initial data completely
Rule, then preset desensitization algorithm converts newborn Sensitive Attributes value according to desensitization task in invocation step 7, last shape
At data after desensitization.Create-rule, the create-rule on date such as ID card No. carry out on sensitive position further according to certain rule
Character such as is replaced, obscures at the operation, and retains the character that there is administrative region, age bracket, gender etc. to statistically analyze meaning, this
Complete high emulation may be implemented in sample, the uniqueness of identification number is also ensured, and provide convenience for statistical analysis, so that can not
Differentiate its authenticity.
Step 9: naming the desensitization process of entity
Desensitization method for the name entity of mechanism first name and last name name is the code table that entity is named using a common Chinese,
The mechanism name and Chinese Name for storing million ranks are replaced after original name entity progress Hash tables look-up;
Step 10: address information desensitization process
It for the sensitive data of address class, can be desensitized according to the level of detail of address information, method is to pass through ground
Location switchs to longitude and latitude, if can not parse original sensitive address information, does not need to desensitize, explanation is to compare faintly
Location;If related latitude and longitude information can be parsed, according to longitude and latitude is converted in the range of original address location/county, give birth to
Address to street/small towns is obscured at another new address information, and according to user's access right.
Step 11: sensitive information desensitization depth calculation
Desensitization depth is difference degree between the data set and raw data set measured after desensitizing, if otherness is bigger,
Indicate that desensitization depth is bigger, i.e., Information Security is bigger;On the contrary then safety is with regard to smaller.Wherein, desensitize depth computing method
It is as follows:
11.1) Numeric Attributes desensitization depth
Assuming that Numeric Attributes codomain of attribute value before desensitization isAttribute value after desensitizationSo Numeric Attributes desensitization depth Dsz(m,m*):
11.2) categorical attribute desensitization depth calculation
In the present invention, it needs to seek the desensitization depth of categorical attribute by extensive tree-model is constructed, using following public affairs
Formula calculates categorical attribute desensitization depth Dfl(r,r*):
Dfl(r,r*)=((Nh-1)×Step(r,r*))/((N-1)×step(r,e)) (5)
Wherein, r, r*Attribute value after indicating attribute value before desensitizing and desensitizing, NhIndicate a certain preceding attribute of categorical attribute desensitization
The child node number of value and its same father node, N indicate extensive leaf nodes number, and e indicates root node, setp (x, y) table
Show attribute value node x desensitization after attribute node y the number of steps of.
11.3) data set desensitization depth calculation
It is D (T, T in conjunction with data set desensitization depth calculation formula 11.1) and 12.1), is obtained*):
Wherein, n indicates contained record number in data set;c1, c2It is expressed as Numeric Attributes number and categorical attribute
Number.
In the present embodiment, numeric type Sensitive Attributes are calculated separately out using formula (4), formula (5) and classifying type is sensitive
The desensitization depth of attribute is calculating entire data set desensitization depth using formula (6).
In the present invention, the calculating of data set desensitization depth is not limited to the calculation method of step 11, while can also use
Other methods such as indicate data desensitization depth, expression formula using the information loss amount of entropy are as follows:
Wherein RmIndicate the record number containing m in data set, RnIndicate the record number after a desensitization process comprising n, andH(Rn)、H(Rm) indicate RnAnd RmComentropy.
In addition, H (Rn) and H (Rm) general expression are as follows:
And freq (Rx, s) and indicate RxData set has the record number of s.
Step 12: data desensitization output
Data consumer obtains the data after desensitization according to user right, and data output protection method is by desensitization process
Sensitive Attributes value replace original Sensitive Attributes value, and generate new storage address, but do not change the storage address of source data
And content, the address data memory after desensitization are generated by using hash algorithm transformation initial data storage address, are simultaneously
The storage efficiency of less big data platform in time destroys desensitization data.
It is specific embodiments of the present invention and the technical principle used described in above, if conception under this invention institute
The change of work when the spirit that generated function is still covered without departing from specification and attached drawing, should belong to of the invention
Protection scope.
Claims (9)
1. a kind of shared sensitive information desensitization method of data-oriented, which comprises the steps of:
(1) sensitive information automatic identification rule and sensitive information processing rule are preset, wherein the sensitive information is known automatically
Not rule include construct all kinds of sensitive information keywords databases, to the automatic identification of sensitive information in sensitive information keywords database, number
Code and the automatic identification of numerical value class sensitive information, automatic identification, the essence of address class sensitive information of name entity class sensitive information
Really identification;The sensitive information handles rule and includes Sensitive Attributes create-rule, setting desensitization algorithm, names at entity desensitization
Reason, address information desensitization process;The data of data set provider publication are checked in data consumer's request;
(2) data are pre-processed, pre-processes laggard style of writing notebook data participle and part-of-speech tagging;
(3) automatic identification is carried out to sensitive information according to pre-set sensitive information automatic identification rule;
(4) it is analyzed by the Sensitive Attributes calculation of relationship degree to sensitive information, retains the Sensitive Attributes degree of association and be higher than the quick of threshold value
Feel information;Wherein threshold value is preset;Wherein, the Sensitive Attributes calculation of relationship degree method is as follows:
(a) degree of association of classifying type Sensitive Attributes is standardized using Sigmoid function, is such as given a definition:
Wherein, the codomain section of the function is [0,1], and continuous, smooth, monotonic increase;
(b) assume that every record has p attribute { u in data set T1,u2,...,up, and each attribute respectively corresponds several
Attribute value is divided into and is denoted as { q1,q2,...,qp};In one record, the corresponding attribute value of Sensitive Attributes occurs being denoted as 1, does not go out
It is now denoted as 0, then this record can be expressed as (a q1+q2+...+qp) dimension row vectorWhen data set T have n item record,
Successively it is denoted as { t1,t2,...,tn, then just there are n (q1+q2+...+qp) dimension row vector, it is expressed as
(c) by (q1+q2+...+qp) correspond in dimension row vector value on position carry out with or and XOR operation, useTable
The case where attribute value is collectively labeled as 1 on position is corresponded to when showing same or operation, is usedIndicate with or when operation correspond on position
Attribute value is collectively labeled as 0;The then degree of association S (I between two attributes1,I2) calculation formula is as follows:
Wherein, by parameter lambda in calculating1, λ2, λ3It is set to 0.5,0.25,0.25, and codomain is 0≤S (I1,I2)≤1;
(5) rule is handled according to pre-set sensitive information and desensitization process is carried out to sensitive information;
(6) the desensitization depth of sensitive information is calculated, and judges whether desensitization depth meets preset requirement;If not being inconsistent
It closes, then return step (5) re-starts desensitization process;Otherwise, the data set after desensitization is exported, is looked into for data consumer
It sees.
2. a kind of shared sensitive information desensitization method of data-oriented according to claim 1, it is characterised in that: the step
Suddenly the pretreatment operation of (2) is as follows: being classified to the data of publication according to data type, data type includes structured form
Types of databases data, list data, data warehouse data and non-structured document data;It needs when pretreatment to attribute value
Integrality, consistency, correctness checked, and non-structured document data be parsed into text data, document data
It is parsed when parsing using analytical tool.
3. a kind of shared sensitive information desensitization method of data-oriented according to claim 1, it is characterised in that: the life
Name entity class sensitive information automatic identification using the Viterbi algorithm based on hidden Markov HMM model part-of-speech tagging and
Building name entity knowledge base combination is realized;The address class sensitive information is accurately identified by judging address information
Adjacent sequence of terms is realized.
4. a kind of shared sensitive information desensitization method of data-oriented according to claim 1, it is characterised in that: check numbers
Desensitization process is carried out with the sensitive information of numeric type specifically: the rule is stored in by the rule generated by formulating Sensitive Attributes
Sensitive Attributes create-rule library, call the preset desensitization algorithm based on data distortion and encryption to newborn Sensitive Attributes value according to
Desensitization task is converted, the data after eventually forming desensitization.
5. a kind of shared sensitive information desensitization method of data-oriented according to claim 1, it is characterised in that: to name
Entity class sensitive information stores the mechanism name and Chinese Name of million ranks using the code table of a common Chinese name entity,
It is replaced after original name entity progress Hash is tabled look-up, completes desensitization process;Method to address class sensitive information is root
It desensitizes according to the level of detail of address information, longitude and latitude will be switched to by address, if can not parse original sensitive address letter
Breath, then do not need to desensitize, and explanation is that comparison obscures address;If related latitude and longitude information can be parsed, according to original
Longitude and latitude is converted in the range of beginning address location/county, generates another new address information, and according to user's right to use
Limit fuzzy address to street/small towns.
6. a kind of shared sensitive information desensitization method of data-oriented according to claim 1, it is characterised in that: described
Desensitization depth is difference degree between the data set and raw data set measured after desensitizing, and difference degree size and desensitization depth are big
Small directly proportional, calculation method is as follows:
(I) calculating of Numeric Attributes desensitization depth:
Assuming that Numeric Attributes codomain of attribute value before desensitization isAttribute value after desensitizationThen Numeric Attributes desensitization depth Dsz(m,m*):
(II) categorical attribute desensitization depth calculation:
The desensitization depth of categorical attribute is sought by extensive tree-model is constructed, categorical attribute desensitization is calculated using following formula
Depth Dfl(r,r*):
Dfl(r,r*)=((Nh-1)×Step(r,r*))/((N-1)×step(r,e))
Wherein, r, r*Attribute value after indicating attribute value before desensitizing and desensitizing, NhIndicate before a certain categorical attribute desensitizes attribute value with
The child node number of its same father node, N indicate extensive leaf nodes number, and e indicates that root node, setp (x, y) indicate to belong to
Property value node x desensitization after attribute node y the number of steps of;
(III) combining step (I) and step (II) obtain data set desensitization depth calculation formula D (T, T*), it is as follows:
Wherein, n indicates contained record number in data set;c1, c2It is expressed as Numeric Attributes number and categorical attribute
Number.
7. a kind of shared sensitive information desensitization method of data-oriented according to claim 1, it is characterised in that: described right
The mode that the method that data set after desensitization takes Hash converts the new download link address of original storage link generation is counted
According to controlled output.
The system 8. a kind of shared sensitive information of data-oriented desensitizes, it is characterised in that: including System Management Unit, data source capsule
Manage unit, sensitive information recognition unit, sensitive information processing unit, data outputting unit;The System Management Unit is used for structure
Desensitization system user account and access control are built, the role and permission of user are identified, only allows the legal user for closing power
Operate corresponding data;The data source administrative unit includes storing data source information;The sensitive information recognition unit is used for
Sensitive information in all types of data sources of automatic identification, and calculate data source and concentrate each Sensitive Attributes relevance;Wherein, sensitive to belong to
Property calculation of relationship degree method is as follows:
(a) degree of association of classifying type Sensitive Attributes is standardized using Sigmoid function, is such as given a definition:
Wherein, the codomain section of the function is [0,1], and continuous, smooth, monotonic increase;
(b) assume that every record has p attribute { u in data set T1,u2,...,up, and each attribute respectively corresponds several
Attribute value is divided into and is denoted as { q1,q2,...,qp};In one record, the corresponding attribute value of Sensitive Attributes occurs being denoted as 1, does not go out
It is now denoted as 0, then this record can be expressed as (a q1+q2+...+qp) dimension row vectorWhen data set T have n item record,
Successively it is denoted as { t1,t2,...,tn, then just there are n (q1+q2+...+qp) dimension row vector, it is expressed as
(c) by (q1+q2+...+qp) correspond in dimension row vector value on position carry out with or and XOR operation, useTable
The case where attribute value is collectively labeled as 1 on position is corresponded to when showing same or operation, is usedIndicate with or when operation correspond on position
Attribute value is collectively labeled as 0;The then degree of association S (I between two attributes1,I2) calculation formula is as follows:
Wherein, by parameter lambda in calculating1, λ2, λ3It is set to 0.5,0.25,0.25, and codomain is 0≤S (I1,I2)≤1;
The sensitive information processing unit is for automatically creating desensitization task, matching desensitization strategy and desensitization algorithm;The data
The data output that output unit is used for safely and effectively control sensitive data;System Management Unit, data source administrative unit,
Sensitive information recognition unit, sensitive information processing unit, data outputting unit are sequentially connected.
The system 9. a kind of shared sensitive information of data-oriented according to claim 8 desensitizes, it is characterised in that: the number
It include that data source types, IP address, storage address and data source data structure are extracted and managed according to source control unit;The sensitivity
Information identificating unit is based on natural language processing technique and carries out word segmentation processing to text data, all kinds of being constructed using manual type
On the basis of sensitive information knowledge base, mark sensitive information rank, rule-based and pattern matching mode automatic identification sensitive information,
Sigmoid functional based method is introduced simultaneously calculates the Sensitive Attributes degree of association;The sensitive information processing unit is based at natural language
Reason technology is examined and is created automatically corresponding desensitization task to data request for utilization, and Sensitive Attributes create-rule is respectively adopted
Library, searched using Hash table converted within the scope of mode and address information longitude and latitude and all kinds of desensitization algorithms to all kinds of sensitive informations into
Row desensitization process;The Sensitive Attributes value of desensitization process is replaced original Sensitive Attributes value by the data outputting unit, and is used
Hash algorithm transformation initial data storage address generates new storage address output data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710506066.1A CN107480549B (en) | 2017-06-28 | 2017-06-28 | A kind of sensitive information desensitization method and system that data-oriented is shared |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710506066.1A CN107480549B (en) | 2017-06-28 | 2017-06-28 | A kind of sensitive information desensitization method and system that data-oriented is shared |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107480549A CN107480549A (en) | 2017-12-15 |
CN107480549B true CN107480549B (en) | 2019-08-02 |
Family
ID=60596029
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710506066.1A Active CN107480549B (en) | 2017-06-28 | 2017-06-28 | A kind of sensitive information desensitization method and system that data-oriented is shared |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107480549B (en) |
Families Citing this family (70)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108009435B (en) * | 2017-12-18 | 2020-12-18 | 网智天元科技集团股份有限公司 | Data desensitization method, device and storage medium |
CN108428187A (en) * | 2017-12-21 | 2018-08-21 | 中国平安人寿保险股份有限公司 | Address matching method, apparatus and storage medium |
CN108280355A (en) * | 2017-12-26 | 2018-07-13 | 山东浪潮云服务信息科技有限公司 | A kind of data desensitization method and device |
CN108268800A (en) * | 2017-12-29 | 2018-07-10 | 上海上讯信息技术股份有限公司 | A kind of address desensitization method of configurable regional extent and information format |
CN108133294B (en) * | 2018-01-10 | 2020-12-04 | 阳光财产保险股份有限公司 | Prediction method and device based on information sharing |
CN108846292B (en) * | 2018-05-30 | 2021-08-17 | 中国联合网络通信集团有限公司 | Desensitization rule generation method and device |
CN108776762B (en) * | 2018-06-08 | 2022-01-28 | 北京中电普华信息技术有限公司 | Data desensitization processing method and device |
CN109214642B (en) * | 2018-07-10 | 2020-09-18 | 华中科技大学 | Automatic extraction and classification method and system for building construction process constraints |
CN109063507A (en) * | 2018-07-13 | 2018-12-21 | 上海派兰数据科技有限公司 | A kind of general design model for hospital information system analysis |
CN109063511A (en) * | 2018-08-16 | 2018-12-21 | 深圳云安宝科技有限公司 | Data access control method, device, proxy server and medium based on Web API |
CN109308258A (en) * | 2018-08-21 | 2019-02-05 | 中国平安人寿保险股份有限公司 | Building method, device, computer equipment and the storage medium of test data |
CN109582861B (en) * | 2018-10-29 | 2023-04-07 | 复旦大学 | Data privacy information detection system |
CN109460676A (en) * | 2018-10-30 | 2019-03-12 | 全球能源互联网研究院有限公司 | A kind of desensitization method of blended data, desensitization device and desensitization equipment |
CN109344258B (en) * | 2018-11-28 | 2021-11-12 | 中国电子科技网络信息安全有限公司 | Intelligent self-adaptive sensitive data identification system and method |
CN109740363B (en) * | 2019-01-04 | 2023-03-14 | 贵州大学 | Document grading desensitization encryption method |
CN109783698B (en) * | 2019-01-15 | 2023-05-26 | 辽宁大学 | Industrial production data entity identification method based on Merkle-tree |
CN109872282B (en) * | 2019-01-16 | 2021-08-06 | 众安信息技术服务有限公司 | Image desensitization method and system based on computer vision |
CN109800600B (en) * | 2019-01-23 | 2020-11-24 | 中国海洋大学 | Ocean big data sensitivity evaluation system and prevention method facing privacy requirements |
CN109918647A (en) * | 2019-01-30 | 2019-06-21 | 中国科学院信息工程研究所 | A kind of security fields name entity recognition method and neural network model |
CN109977222A (en) * | 2019-03-05 | 2019-07-05 | 广州海晟科技有限公司 | The recognition methods of data sensitive behavior |
CN109978016B (en) * | 2019-03-06 | 2022-08-23 | 重庆邮电大学 | Network user identity identification method |
CN111832062A (en) * | 2019-04-19 | 2020-10-27 | 珠海金山办公软件有限公司 | Method and device for desensitizing selected area data in table file |
CN110135175A (en) * | 2019-04-26 | 2019-08-16 | 平安科技(深圳)有限公司 | Information processing, acquisition methods, device, equipment and medium based on block chain |
CN110175468B (en) * | 2019-05-05 | 2020-12-01 | 浙江工业大学 | Name desensitization method with distribution characteristics reserved |
CN110135193A (en) * | 2019-05-15 | 2019-08-16 | 广东工业大学 | A kind of data desensitization method, device, equipment and computer readable storage medium |
CN110109998B (en) * | 2019-05-17 | 2023-05-30 | 贵州数据宝网络科技有限公司 | Intelligent data transaction integration system |
CN110134719B (en) * | 2019-05-17 | 2023-04-28 | 贵州大学 | Identification and classification method for sensitive attribute of structured data |
CN110188571A (en) * | 2019-06-05 | 2019-08-30 | 深圳市优网科技有限公司 | Desensitization method and system based on sensitive data |
CN110704861B (en) * | 2019-08-07 | 2023-03-24 | 荣邦科技有限公司 | Method, device and system for real-time desensitization based on open interface |
CN110633577B (en) * | 2019-08-22 | 2023-08-29 | 创新先进技术有限公司 | Text desensitization method and device |
CN110532805B (en) * | 2019-09-05 | 2023-01-24 | 国网山西省电力公司阳泉供电公司 | Data desensitization method and device |
CN110543779B (en) * | 2019-09-05 | 2023-04-07 | 国网山西省电力公司阳泉供电公司 | Data processing method and device |
CN110795761A (en) * | 2019-10-29 | 2020-02-14 | 国网山东省电力公司信息通信公司 | Dynamic desensitization method for sensitive data of ubiquitous power Internet of things |
CN110795751A (en) * | 2019-10-30 | 2020-02-14 | 浪潮云信息技术有限公司 | Method for carrying out safety protection on sensitive data through natural language analysis |
CN110851864A (en) * | 2019-11-08 | 2020-02-28 | 国网浙江省电力有限公司信息通信分公司 | Sensitive data automatic identification and processing method and system |
CN111027032B (en) * | 2019-11-13 | 2022-07-26 | 北京字节跳动网络技术有限公司 | Authority management method, device, medium and electronic equipment |
CN111079174A (en) * | 2019-11-21 | 2020-04-28 | 中国电力科学研究院有限公司 | Power consumption data desensitization method and system based on anonymization and differential privacy technology |
CN111079179A (en) * | 2019-12-16 | 2020-04-28 | 北京天融信网络安全技术有限公司 | Data processing method and device, electronic equipment and readable storage medium |
CN111143633B (en) * | 2019-12-24 | 2023-09-01 | 北京明朝万达科技股份有限公司 | Data decryption method and device, electronic equipment and storage medium |
CN111143880B (en) * | 2019-12-27 | 2022-06-07 | 中电长城网际系统应用有限公司 | Data processing method and device, electronic equipment and readable medium |
CN111143882A (en) * | 2019-12-31 | 2020-05-12 | 杭州依图医疗技术有限公司 | Information processing method and device |
CN111143884B (en) * | 2019-12-31 | 2022-07-12 | 北京懿医云科技有限公司 | Data desensitization method and device, electronic equipment and storage medium |
CN111191291B (en) * | 2020-01-04 | 2022-06-17 | 西安电子科技大学 | Database attribute sensitivity quantification method based on attack probability |
CN111079198A (en) * | 2020-03-10 | 2020-04-28 | 广州电力交易中心有限责任公司 | Data publishing method and system based on electric power transaction |
US20210304341A1 (en) * | 2020-03-26 | 2021-09-30 | International Business Machines Corporation | Preventing disclosure of sensitive information |
CN111522950B (en) * | 2020-04-26 | 2023-06-27 | 成都思维世纪科技有限责任公司 | Rapid identification system for unstructured massive text sensitive data |
CN111611312A (en) * | 2020-05-19 | 2020-09-01 | 四川万网鑫成信息科技有限公司 | Data desensitization method based on rule engine and block chain technology |
CN111709052B (en) * | 2020-06-01 | 2021-05-25 | 支付宝(杭州)信息技术有限公司 | Private data identification and processing method, device, equipment and readable medium |
CN111881480A (en) * | 2020-07-31 | 2020-11-03 | 平安付科技服务有限公司 | Private data encryption method and device, computer equipment and storage medium |
US20220100899A1 (en) * | 2020-09-25 | 2022-03-31 | International Business Machines Corporation | Protecting sensitive data in documents |
CN112232050B (en) * | 2020-10-13 | 2024-04-09 | 中国平安人寿保险股份有限公司 | Method, device, terminal and readable medium for generating greeting report |
CN112329055A (en) * | 2020-11-02 | 2021-02-05 | 微医云(杭州)控股有限公司 | Method and device for desensitizing user data, electronic equipment and storage medium |
CN112434331B (en) * | 2020-11-20 | 2023-08-18 | 百度在线网络技术(北京)有限公司 | Data desensitization method, device, equipment and storage medium |
CN112395645A (en) * | 2020-11-30 | 2021-02-23 | 中国民航信息网络股份有限公司 | Data desensitization processing method and device |
CN112765658A (en) * | 2021-01-15 | 2021-05-07 | 杭州数梦工场科技有限公司 | Data desensitization method and device, electronic equipment and storage medium |
CN112800465A (en) * | 2021-02-09 | 2021-05-14 | 第四范式(北京)技术有限公司 | Method and device for processing text data to be labeled, electronic equipment and medium |
CN112765673A (en) * | 2021-03-16 | 2021-05-07 | 杭州数梦工场科技有限公司 | Sensitive data statistical method and related device |
CN112989414B (en) * | 2021-03-21 | 2024-03-19 | 贵州大学 | Mobile service data desensitization rule generation method based on width learning |
CN115221544A (en) * | 2021-04-16 | 2022-10-21 | 华为云计算技术有限公司 | Data desensitization method and device |
CN113127929B (en) * | 2021-04-30 | 2024-03-01 | 天翼安全科技有限公司 | Data desensitizing method, desensitizing rule processing method, device, equipment and storage medium |
CN113626865A (en) * | 2021-08-11 | 2021-11-09 | 南京莱斯网信技术研究院有限公司 | Data sharing opening method and system for preventing sensitive information from being leaked |
CN113761576A (en) * | 2021-09-03 | 2021-12-07 | 国网山东省电力公司电力科学研究院 | Privacy protection method and device, storage medium and electronic equipment |
CN113792323A (en) * | 2021-11-15 | 2021-12-14 | 聊城高新生物技术有限公司 | Sensitive data encryption method and device based on agricultural products and electronic equipment |
CN116070205B (en) * | 2023-03-07 | 2023-06-13 | 北京和升达信息安全技术有限公司 | Data clearing method and device, electronic equipment and storage medium |
CN116205236B (en) * | 2023-05-06 | 2023-08-18 | 四川三合力通科技发展集团有限公司 | Data rapid desensitization system and method based on entity naming identification |
CN116776351A (en) * | 2023-06-21 | 2023-09-19 | 中国民用航空总局第二研究所 | Preserving format encryption method and system for personal information to resist statistical analysis attack |
CN117010019B (en) * | 2023-08-04 | 2024-04-16 | 北京泰策科技有限公司 | Data desensitization method and system based on NLP language model |
CN116910817B (en) * | 2023-09-13 | 2023-12-29 | 北京国药新创科技发展有限公司 | Desensitization processing method and device for medical data and electronic equipment |
CN117201206B (en) * | 2023-11-08 | 2024-01-09 | 河北翎贺计算机信息技术有限公司 | Network safety supervision system for preventing network data leakage |
CN117290888B (en) * | 2023-11-23 | 2024-02-09 | 江苏风云科技服务有限公司 | Information desensitization method for big data, storage medium and server |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102480481A (en) * | 2010-11-26 | 2012-05-30 | 腾讯科技(深圳)有限公司 | Method and device for improving security of product user data |
CN103778380A (en) * | 2013-12-31 | 2014-05-07 | 网秦(北京)科技有限公司 | Data desensitization method and device and data anti-desensitization method and device |
CN104123504A (en) * | 2014-06-27 | 2014-10-29 | 武汉理工大学 | Cloud platform privacy protection method based on frequent item retrieval |
CN104156668A (en) * | 2014-08-04 | 2014-11-19 | 江苏大学 | Privacy protection reissuing method for multiple sensitive attribute data |
CN106650487A (en) * | 2016-09-29 | 2017-05-10 | 广西师范大学 | Multi-partite graph privacy protection method published based on multi-dimension sensitive data |
-
2017
- 2017-06-28 CN CN201710506066.1A patent/CN107480549B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102480481A (en) * | 2010-11-26 | 2012-05-30 | 腾讯科技(深圳)有限公司 | Method and device for improving security of product user data |
CN103778380A (en) * | 2013-12-31 | 2014-05-07 | 网秦(北京)科技有限公司 | Data desensitization method and device and data anti-desensitization method and device |
CN104123504A (en) * | 2014-06-27 | 2014-10-29 | 武汉理工大学 | Cloud platform privacy protection method based on frequent item retrieval |
CN104156668A (en) * | 2014-08-04 | 2014-11-19 | 江苏大学 | Privacy protection reissuing method for multiple sensitive attribute data |
CN106650487A (en) * | 2016-09-29 | 2017-05-10 | 广西师范大学 | Multi-partite graph privacy protection method published based on multi-dimension sensitive data |
Non-Patent Citations (1)
Title |
---|
隐私保护的数据挖掘算法研究;吴珏;《中国博士学位论文全文数据库》;20130215(第2期);全文 |
Also Published As
Publication number | Publication date |
---|---|
CN107480549A (en) | 2017-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107480549B (en) | A kind of sensitive information desensitization method and system that data-oriented is shared | |
Gottschalk et al. | EventKG–the hub of event knowledge on the web–and biographical timeline generation | |
Van Keulen et al. | A probabilistic XML approach to data integration | |
US11132610B2 (en) | Extracting structured knowledge from unstructured text | |
US8666928B2 (en) | Knowledge repository | |
US9519681B2 (en) | Enhanced knowledge repository | |
del Valle-Cano et al. | SocialHaterBERT: A dichotomous approach for automatically detecting hate speech on Twitter through textual analysis and user profiles | |
CN109564578A (en) | It is inputted based on natural language user interface and generates natural language output | |
Hamzei et al. | Translating place-related questions to GeoSPARQL queries | |
EP1999692A2 (en) | Knowledge repository | |
Valencia‐García et al. | A knowledge acquisition methodology to ontology construction for information retrieval from medical documents | |
Braun et al. | Consumer protection in the digital era: The potential of customer-centered legaltech | |
Prentice et al. | Tractor: A framework for soft information fusion | |
Sathyendra et al. | Helping users understand privacy notices with automated query answering functionality: An exploratory study | |
Zavarella et al. | An Ontology-Based Approach to Social Media Mining for Crisis Management. | |
Pedersen et al. | Open‐endedness, Schemas and Ontological Commitment | |
CN111339252B (en) | Searching method, searching device and storage medium | |
Segev | Adaptive ontology use for crisis knowledge representation | |
Purohit et al. | An information filtering and management model for twitter traffic to assist crises response coordination | |
Purohit et al. | Transactional Knowledge Graph Generation To Model Adversarial Activities | |
Cortis et al. | Techniques for the identification of semantically-equivalent online identities | |
Delanaux | Privacy-Preserving Linked Data Integration | |
Shrivastava | Understanding the Importance of Entities and Roles in Natural Language Inference: A Model and Datasets | |
Xiaohan et al. | On Constructing a Knowledge Base of Chinese Criminal Cases | |
Zhang et al. | Judicial Intelligent Assistant System: Extracting Events from Divorce Cases to Detect Disputes for the Judge |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder | ||
CP01 | Change in the name or title of a patent holder |
Address after: 310012 1st floor, building 1, 223 Yile Road, Hangzhou City, Zhejiang Province Patentee after: Yinjiang Technology Co.,Ltd. Address before: 310012 1st floor, building 1, 223 Yile Road, Hangzhou City, Zhejiang Province Patentee before: ENJOYOR Co.,Ltd. |